Data Mining for Secure Software Engineering – Source Code ...

levelsordData Management

Nov 20, 2013 (3 years and 9 months ago)

80 views

OOSE 01/17


Institute of Computer Science and Information Engineering,

National Cheng Kung University


Member:Q76001074
薛弘志



P76014020
蔡文豪



F74982155
周詩御


Reference

Prasad, A.V.K. and Ramakrishna, S. (2010b),
‘Data
Mining for Secure Software Engineering


Source
Code Management Tool Case Study’,
International
Journal of Engineering Science and Technology, vol. 2
(7), pp.2667
-
2677.

2

Introduction

To improve software productivity and
quality
, software
engineers
are increasingly applying data mining algorithms to
various software engineering tasks.



However mining software engineering data poses several
challenges,
requiring various
algorithms
to effectively mine
sequences, graphs and text from such data.



Using well established data mining techniques, it can explore
the potential of this valuable data in order to better manage
their
projects and do produce higher
-
quality software systems
that
are delivered on time and with in budget.



3

Introduction(cont.)

Mining algorithms for software engineering falls into four
main categories:


1.
Frequent
pattern mining


finding commonly occurring
patterns.

2.
Pattern
matching


finding data instances for given
patterns.

3.
Clustering



grouping data into clusters and

4.
Classification



predicting labels of data based on
already labeled data.

4

Introduction(cont.)

Software engineering data can be broadly categorized
into:


1.
Sequences

such as execution traces collected at runtime,
static traces extracted from source code, and
co
-
changed
code locations.


2.
Graphs

such as dynamic call graphs collected at runtime and
static call graphs extracted from source
code.


3.
Text

such
as bug
reports,
e
-
mails
,
code comments
,
and
documentation
.

5

Objectives

The objective of the research work to propose strategic
Data Mining tools for program source code
debugging
which improves Software
Reliability
& Quality.


Software
engineers can start with either a problem driven
approach, but in practice they commonly adopt a
mixture
of
the
first two steps
:
collecting data to mine and
determining the
SE
tasks to assist
.


The
three remaining steps are
inorder
,
preprocessing
data
, adopting a
mining
algorithm, and post processing
applying mining results.

6

Objectives(cont.)

Processing data
involved
first extracting relevant data
from the raw
SE data.
This
data is further processed by
cleaning and properly formatting it for
the mining
algorithm
.


The next step produces a mining algorithm and its
supporting tool, based on the mining requirements
derived in the first two steps.


The final step transforms the mining algorithm results in
to an appropriate format required to assist the SE task.



7

Objectives(cont.)

Further, many such tools are general purpose and should
be adapted to assist the particular task at hand
.


However
,
software
engineering researchers may lack
the
expertise
to adapt
mining algorithms, while
data mining
researchers may lack the
background to understand
mining requirements in the software engineering domain
.


On promise way to reduce this gap is to foster
close
collaborations
between
the software engineering
community(requirement
providers
)
and
data mining
community(solution
providers).

8

Implementations

The management of source code is one of the greatest
challenges facing
programmers
today.
As programs
become larger and
more
complex, the
need
to organize
and manage source code
increases
.


Author’s
motivation
is to implement
source code
maintenance routines which
parse tokens from an ANSI
C++ file, formats the file, extract header files and colorize
a file.

9

Implementations(cont.)

When files
are shared among objects, it is difficult to
track which files are dependent on others.


A
source code maintenance
program
can parse the
source code and produce documentation that describes
each class its member variables and
functions
.


Maintaining structure
code
amongst team members is
extremely difficult
and
time
consuming
because
programmers must modify
their
individual
styles.


A
source code
formatter
offers a convenient solution to
this problem.

10

Implementations(cont.)

Code
maintenance modules receive source code as
input, break the code down into tokens and then
output
them in a new format.


The
utility is based
on
three class groups:

A
scanner

reads the code and breaks it down into
tokens

and
returns
them back to the parser. It also
identifies the type of token to return.

The
parser

requests successive tokens
from
the scanner
and takes appropriate action before requesting the next
token. The action of parser is to write out
the
token.

11

The
sequence diagram of the overall
code maintenance
process

12

The
Sample Class Diagram of the
CToken

Hierarchy

13


Token Classes Derived from
CToken

14


Valid Formatting flags

15


Format Strings for C++

16

Result


17

Conclusion

The mining algorithms works on software engineering
data like text, sequences, graphs which improves
software engineering tasks like Programming,
Maintenance, Bug Detection & Debugging.


The author only implemented the tool for source code
management, it is useful for code maintenance and
programming.


For a programmer, bug detection and debugging is more
important, so how to use the mining algorithms to assist
programmer is the future work.

18

Thank you for your listening!