Software Process Evaluation: A Machine
Learning Approach
Nanyang Technological University, Singapore
November 10, 2011
Ning Chen, Steven C.H. Hoi, Xiaokui Xiao
Software Processes
•
Policies
•
Organizational Structures
•
Technologies
•
Procedures
•
Artifacts
Outline
•
Introduction
•
Problem Statement
•
Related Work
•
Approach
•
Experiments
•
Conclusion & Future Work
Introduction
•
Background
–
The quality of software processes are directly
related to
productivity
and
quality
.
–
Key challenge faced by software development
organizations:
software process evaluation
.
–
Conventional methods:
Questionnaire/Interview/Artifacts study
Introduction (Cont’)
•
Motivation
–
Limitation of conventional methods
•
Time consuming
•
Authority constraints
•
Subjective evaluation
•
Experienced evaluation expert
Introduction (Cont’)
•
Contributions
–
We propose a novel
machine learning approach
that
could help practitioners to evaluate their software
processes.
–
We present a new
quantitative indicator
to
evaluate the quality and performance of software
processes objectively.
–
We compare and explore different kinds of
sequence classification algorithms
for solving this
problem.
Outline
•
Introduction
•
Problem Statement
•
Related Work
•
Approach
•
Experiments
•
Conclusion & Future Work
Problem Statement
•
Scope:
Restrict the discussion of a software process
as a systematic approach to the accomplishment of
some software development tasks. (
Procedure
)
•
Goal:
Evaluate the quality and performance of a
software process objectively.
Example of a simple requirement change process
Status of
a requirement change
Problem Statement (Cont’)
•
Key Idea:
Model a set of
process executions
as
sequential instances
, and then formulate the evaluation
task as a binary classification of the sequential instances
into either
“normal”
or
“abnormal"
. Given a set of
process executions, a quantitative measure, referred to as
the
“process execution qualification rate”
is calculated.
<NEW,ASSESSED,ASSIGNED,RESOLVED,
VERIFIED,CLOSED
>
A sequential instance
“normal”
or
“abnormal"
.
Outline
•
Introduction
•
Problem Statement
•
Related Work
•
Approach
•
Experiments
•
Conclusion & Future Work
Related Work
•
Software Process Models and Methodologies
–
High level, comprehensive, and general frameworks for
software process evaluation.
•
SPICE (1998)
•
Agile (2002)
•
CMMI 1.3 (2010) ; SCAMPI (2011)
•
Software Process Validation
–
Measure the differences between process models and process
executions.
•
J. E. Cook et al. Software process validation: quantitatively measuring
the correspondence of a process to a model, ACM TOSEM 1999
Related Work (Cont’)
•
Software Process Mining
–
Discover explicit software process models using
process mining algorithms.
•
J. Samalikova, et al. Toward objective software process
information: experiences from a case study, Software
Quality Control, 2011.
•
Data Mining for Software Engineering
–
Software specification recovery(El
-
Ramly et al. 2002;
Lo et al. 2007)
–
Bug assignment task(Anvik et al. 2006)
–
Non
-
functional requirements classification(Clenland
-
Huang et al. 2006)
Outline
•
Introduction
•
Problem Statement
•
Related Work
•
Approach
•
Experiments
•
Conclusion & Future Work
Approach (Overview)
Approach (Collecting Data)
•
Extracts
raw data
from related
software
repositories
according to the characteristics
of the software process that we intend to
evaluate.
Example of a simple requirement change process
Approach (Data Preprocessing)
•
Convert every process execution from the raw
data into a sequence
-
based instance.
•
Determine an
“alphabet”
that consists of a
set of symbolic values, each of which
represents a status of some artifact or an
action of some task in the software process.
(N,S,A,R,V,C
)
Approach (Data Preprocessing)
(Cont’)
•
Convert each process execution from the raw
data into a sequence of
symbolic values
.
•
Get a sequence database contains a set of
unlabelled sequences.
<N,S,A,R,V,C
>
Approach (Building Sequence Classifiers)
•
Sampling and labeling a training data set
•
Feature representation
•
Training classifiers by machine learning
algorithms.
Approach (Building Sequence Classifiers)
(Cont’)
•
Sampling and labeling a training data set
–
Class labels{Normal, Abnormal}
–
Training data size
–
Assign an appropriate class label
Approach (Building Sequence Classifiers)
(Cont’)
•
Feature representation
–
K
-
grams feature technique
•
Training classifiers by machine learning algorithms
–
Support Vector Machine (SVM)
–
Naïve Bayes (NB)
–
Decision Tree(C4.5)
Approach (Quantitative Indicator)
•
A New Quantitative Indicator:
Process execution qualification rate
(P= # actual normal sequences/total # sequences)
Remark: We adopt the precision and recall values on the
training set as the estimated precision and recall values to
calculate the value of P
Outline
•
Introduction
•
Problem Statement
•
Related Work
•
Approach
•
Experiments
•
Conclusion & Future Work
Experiments
•
Experimental testbed
–
Evaluation scope: Four projects form a large software
development center of a commercial bank in China.
–
The target software process under evaluation:
Defect
management process
Experiments (Cont’)
•
Experimental setup
–
Defect reports collected form HP Quality Center.
–
Get a sequence database of 2622 examples.
–
Collect the ground truth labels of all the sequences.
Experiment 1 Results
Main Observations:
(1)
SVM achieves the best performance.
(2)
Very positive result (Partly derived from the nature of
the data)
Experiment 2 Results
Main Observations:
(1)
Increase the size of training data leads to classification
performance improvement.
(2)
But the improvement becomes minor when the size of
training data is larger than 20%.
Experiment 3 Results
Main Observations:
(1)
The indicator is able to estimate a fairly accurate value
of P when the amount of training data is sufficient .
(2)
SVM achieve the best performance typically when the
amount of training data is small.
Case Studies Results
Main Observations:
(1)
Estimated P is close to the true value.
(2)
The P indicator is able to differentiate the quality of
process among different projects.
Limitation of Validation
•
Lack of empirical studies for our proposed
approach.
•
Classification performance for unusual
sequences are not systematically analyzed.
Outline
•
Introduction
•
Problem Statement
•
Related Work
•
Approach
•
Experiments
•
Conclusion & Future Work
Conclusion
•
We propose a novel quantitative machine
learning approach for software process
evaluation.
•
Preliminary experiments and case studies show
that our approach is effective and promising.
Future Work
•
Apply our proposed approach to other more
complicated software process evaluation tasks.
•
Compare conventional approaches with our
proposed machine learning approach.
Contract: Chen Ning
E
-
mail: hzzjucn@gmail.com
Appendix: Criteria for labeling the
training set
•
Adhere to the principles of the software
process methodologies adopted.
•
Ensure no ambiguity cases appear in the
minimal evaluation unit.
•
Let experts in the organization involved.
Appendix: K
-
grams feature technique
•
Given a long sequence, a short sequence segment of
any k consecutive symbols is called a k
-
gram.
•
Each sequence can be represented as a fixed
-
dimension vector of the frequencies of the k
-
grams
appeared in the sequence.
Appendix: Performance Metrics
•
“Area Under ROC Curve”(AUC) , another metric for evaluation.
•
“Root Mean Square Error” (RMSE), a widely
-
used measure of
the differences between values predicated and the actual truth
values.
Appendix: Formula for calculating
process execution qualification rate
•
In the above definition, it is important to note that both true
“precision” and “recall” on the test set are unknown during the
software process evaluation phase for a real
-
world application
(unless we manually label all the test sequences).
Appendix: Defect management process
•
There are two kinds of nodes:
“Status”
node which represents the
possible status of defect and the responsible role, and
“Decision”
node which represents the possible decisions that can affect the
state change of a defect.
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο