Case-Based Reasoning (cont.)

naivenorthAI and Robotics

Nov 8, 2013 (3 years and 9 months ago)

70 views

CSC445: A Case Study


Case
-
Based Reasoning

Case
-
Based Reasoning


Case
-
based reasoning


akin to the human intuitive thinking process


make use of analogies or cases of previous
experiences when solving problems


useful in a wide variety of software development
domains


software quality estimation


software cost estimation


software design and reuse

Case
-
Based Reasoning (cont.)


Working hypothesis for CBR


modules with similar attributes should belong to the same
quality
-
based group


To obtain a CBR model for a given data set some
parameters have to be assigned


e.x.

n
N

& c


In order to obtain a preferred model, we have to vary
the combinations of parameters, build the models
and choose the ''best one'' manually

Case
-
Based Reasoning (cont.)


A CBR system comprises of 3 major
components:


a case library


a similarity function


a solution algorithm


In a CBR system, program modules related
to previously developed systems are stored
in a
case library

Case
-
Based Reasoning (cont.)


A
similarity function

measures the distance
between the current case and all the cases in the
case library.


Modules with the smallest distances from the module
under investigation are considered similar and
designated as the nearest neighbors.


Many similarity functions can be used, such as


city block, Euclidean & Mahalanobis

Case
-
Based Reasoning (cont.)


Mahalanobis distance




where


x
i

stands for the current case


c
j

is the jth case in the case library


the prime (′) implies a transpose


S

is the variance
-
covariance matrix of the independent variables
over the entire case library






)
(
)'
(
1
i
j
i
j
ij
S
d
x
c
x
c




Case
-
Based Reasoning (cont.)


A generalized data clustering classification rule is
used as the
solution algorithm

of the CBR system










otherwise

,
)
(
)
(

if

,
)
(
nfp
c
d
d
fp
Class
i
fp
i
nfp
i
x
x
x
Case
-
Based Reasoning (cont.)


In the context of a two
-
group classification
model, two types of misclassifications can
occur:


Type I (
nfp

module classified as
fp
)


Type II (
fp

module classified as
nfp
)

Case
-
Based Reasoning (cont.)


For a given n
N
, an inverse
relationship between the
Type I and Type II error
rates is observed when
varying the value of c


The preferred balance is
that the two error rates are
approximately equal with
the Type II error rate
being as low as possible.



0.1
0.2
0.3
0.4
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
Type I
Type II
preferred balance:

C=0.95 Type I =23.16% Type II = 23.14%

An Example:

Case
-
Based Reasoning (cont.)

1.
Create a new project

2.
Choose the fit data set

Cross validation:

In
K
-
fold cross
-
validation, the original
sample is partitioned into
K

subsamples.
Of the
K

subsamples, a single subsample
is retained as the validation data for
testing the model, and the remaining
K



1
subsamples are used as training data. The
cross
-
validation process is then repeated
K

times (the
folds
), with each of the
K

subsamples used exactly once as the
validation data. The K results from the
folds then can be averaged (or otherwise
combined) to produce a single estimation.

Case
-
Based Reasoning (cont.)

3. Select the metrics (independent
variables) and dependent variable.


4. Choose the model with CBR

Case
-
Based Reasoning (cont.)

Case
-
Based Reasoning (cont.)

5. Create a new experiment


6. Choose the similarity function

Note: If you choose Mahaanobis
distance, use the “pooled
covariance”


7. Model type should be
“classification”

Case
-
Based Reasoning (cont.)

Case
-
Based Reasoning (cont.)

Press the “Execute”
button to run the
program.

Case
-
Based Reasoning (cont.)

Result will be display in
this box.


8. Choose the
preferred model based
on the model selection
strategy.

Case
-
Based Reasoning (cont.)

Which one is the preferred model?

Case
-
Based Reasoning (cont.)

Which one is the preferred model?


C=0.6 and n
N
=14 or C=0.6 and n
N
=15

Type I error rate =27.551%

Type II error rate =28.571%

Case
-
Based Reasoning (cont.)

9. Once you choose the preferred model, record the parameters you used.


For example C=0.6 and n
N
=15


10. Then, apply the selected model ( the selected parameters) to the test data set.

Case
-
Based Reasoning (cont.)

Case
-
Based Reasoning (cont.)

Case
-
Based Reasoning (cont.)

Case
-
Based Reasoning (cont.)

This is the prediction result on the test data set

Case
-
Based Reasoning (cont.)

12. Calculate the ECM.


For example:


ECM = (15
×
1+14
×
5)/94=0.904255


Case
-
Based Reasoning (cont.)

Thresh
old

Similarity
function

Fit

Test

Type I

Type II

Overall

Type I

Type II

Overall

1

City Block













Euclidean













Mahalanobis













2

City Block













Euclidean













Mahalanobis













In terms of Type I and Type II error rates