CSC445: A Case Study
Case

Based Reasoning
Case

Based Reasoning
Case

based reasoning
–
akin to the human intuitive thinking process
–
make use of analogies or cases of previous
experiences when solving problems
–
useful in a wide variety of software development
domains
software quality estimation
software cost estimation
software design and reuse
Case

Based Reasoning (cont.)
Working hypothesis for CBR
–
modules with similar attributes should belong to the same
quality

based group
To obtain a CBR model for a given data set some
parameters have to be assigned
–
e.x.
n
N
& c
In order to obtain a preferred model, we have to vary
the combinations of parameters, build the models
and choose the ''best one'' manually
Case

Based Reasoning (cont.)
A CBR system comprises of 3 major
components:
–
a case library
–
a similarity function
–
a solution algorithm
In a CBR system, program modules related
to previously developed systems are stored
in a
case library
Case

Based Reasoning (cont.)
A
similarity function
measures the distance
between the current case and all the cases in the
case library.
Modules with the smallest distances from the module
under investigation are considered similar and
designated as the nearest neighbors.
Many similarity functions can be used, such as
–
city block, Euclidean & Mahalanobis
Case

Based Reasoning (cont.)
Mahalanobis distance
where
–
x
i
stands for the current case
–
c
j
is the jth case in the case library
–
the prime (′) implies a transpose
–
S
is the variance

covariance matrix of the independent variables
over the entire case library
)
(
)'
(
1
i
j
i
j
ij
S
d
x
c
x
c
Case

Based Reasoning (cont.)
A generalized data clustering classification rule is
used as the
solution algorithm
of the CBR system
otherwise
,
)
(
)
(
if
,
)
(
nfp
c
d
d
fp
Class
i
fp
i
nfp
i
x
x
x
Case

Based Reasoning (cont.)
In the context of a two

group classification
model, two types of misclassifications can
occur:
–
Type I (
nfp
module classified as
fp
)
–
Type II (
fp
module classified as
nfp
)
Case

Based Reasoning (cont.)
For a given n
N
, an inverse
relationship between the
Type I and Type II error
rates is observed when
varying the value of c
The preferred balance is
that the two error rates are
approximately equal with
the Type II error rate
being as low as possible.
0.1
0.2
0.3
0.4
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
Type I
Type II
preferred balance:
C=0.95 Type I =23.16% Type II = 23.14%
An Example:
Case

Based Reasoning (cont.)
1.
Create a new project
2.
Choose the fit data set
Cross validation:
In
K

fold cross

validation, the original
sample is partitioned into
K
subsamples.
Of the
K
subsamples, a single subsample
is retained as the validation data for
testing the model, and the remaining
K
−
1
subsamples are used as training data. The
cross

validation process is then repeated
K
times (the
folds
), with each of the
K
subsamples used exactly once as the
validation data. The K results from the
folds then can be averaged (or otherwise
combined) to produce a single estimation.
Case

Based Reasoning (cont.)
3. Select the metrics (independent
variables) and dependent variable.
4. Choose the model with CBR
Case

Based Reasoning (cont.)
Case

Based Reasoning (cont.)
5. Create a new experiment
6. Choose the similarity function
Note: If you choose Mahaanobis
distance, use the “pooled
covariance”
7. Model type should be
“classification”
Case

Based Reasoning (cont.)
Case

Based Reasoning (cont.)
Press the “Execute”
button to run the
program.
Case

Based Reasoning (cont.)
Result will be display in
this box.
8. Choose the
preferred model based
on the model selection
strategy.
Case

Based Reasoning (cont.)
Which one is the preferred model?
Case

Based Reasoning (cont.)
Which one is the preferred model?
C=0.6 and n
N
=14 or C=0.6 and n
N
=15
Type I error rate =27.551%
Type II error rate =28.571%
Case

Based Reasoning (cont.)
9. Once you choose the preferred model, record the parameters you used.
For example C=0.6 and n
N
=15
10. Then, apply the selected model ( the selected parameters) to the test data set.
Case

Based Reasoning (cont.)
Case

Based Reasoning (cont.)
Case

Based Reasoning (cont.)
Case

Based Reasoning (cont.)
This is the prediction result on the test data set
Case

Based Reasoning (cont.)
12. Calculate the ECM.
For example:
ECM = (15
×
1+14
×
5)/94=0.904255
Case

Based Reasoning (cont.)
Thresh
old
Similarity
function
Fit
Test
Type I
Type II
Overall
Type I
Type II
Overall
1
City Block
Euclidean
Mahalanobis
2
City Block
Euclidean
Mahalanobis
In terms of Type I and Type II error rates
Comments 0
Log in to post a comment