Searching for Credible Relations in Machine Learning
Doctoral Dissertation
Vedrana
Vidulin
Supervisor:
prof. dr.
Matja
ž
Gams
Co
-
supervisor:
prof. dr.
Bogdan
Filipi
č
Ljubljana,
3 February 2012
2
of 20
Searching for Credible Relations in Machine Learning
Introduction
•
Task: domain analysis of complex domains
•
Problem:
–
When DM methods construct models on complex domains, the
models often contain parts (relations) that are
less
-
credible
from
the perspective of human analyst.
–
Less
-
credible
parts can:
•
Lead
to
wrong conclusions about the most important relations in the
domain
•
Undermine user’s trust in DM methods
(
Stumpf
et al., 2009
)
.
•
Proposed solution: a new method that in algorithmic way
combines human understanding and raw computer power in
order to extract credible relations
–
supported by data and
meaningful for the human.
3
of 20
Searching for Credible Relations in Machine Learning
An Example
•
A decision
-
tree model is constructed:
–
W
ith J48 algorithm in Weka,
–
F
rom a data set that represents the impact of R&D sector
on economic welfare of a country
Country
GERD per
capita
(PPP$)
Researchers
per million
inhabitants (HC)
…
Sector investing
the most in R&D
GNI per
capita
Armenia
7.6
1,660
…
Government
low
Latvia
37.1
2,455
…
Government
middle
Japan
813.7
6,227
Business enterprise
high
…
…
…
…
…
…
37 attributes: R&D sector
167 examples: Countries
Class: Economic
welfare
4
of 20
Searching for Credible Relations in Machine Learning
An Example (2)
5
of 20
Searching for Credible Relations in Machine Learning
Outline
•
Definition of credible relation
•
Human
-
Machine Data Mining (HMDM) method
•
Experimental evaluation
•
Conclusions and contributions
6
of 20
Searching for Credible Relations in Machine Learning
Credible Relation
•
Relation
–
a pattern that connects a set of attributes that
describe the properties of a concept underlying the data
and a class/target attribute that represents the concept.
•
Credible relation
–
of great meaning
and of
high quality:
–
Meaning
–
a
subjective criterion attributed by the
human
based
on the common sense, an informal knowledge about
the domain, observed
frequency and
stability of the relation
.
–
Quality
–
an
objective criterion that indicates a support
of
the
selected quality
measures.
•
Credible model
–
composed only of credible relations.
7
of 20
Searching for Credible Relations in Machine Learning
How to Establish Credible Relations?
The relation is composed of
attributes
A
1
and
A
2
.
Re
-
examine relation’s credibility by:
1)
Removing attributes
A
1
and
A
2
from data set
2)
Adding attributes
A
1
and
A
2
to
∅
If the relation is supported by evidence, add it
to the list of candidates for credible relations.
8
of 20
Searching for Credible Relations in Machine Learning
The HMDM Algorithm
Until no new interesting relations
Repeat
Create
several models (e.g., trees)
Choose
most interesting models
F
or
each interesting
model
Examine
credibility
of relations in the
model
by
adding
and
removing
attributes from the data set
Merge candidate relations
with the output
list of credible relations
9
of 20
Searching for Credible Relations in Machine Learning
The HMDM Algorithm (2)
HMDM
(data set)
REPEAT
Select
DM
method
Select parameters and their ranges, define constraints
Perform INITIAL_DM creating
a list of
models
LM
:
FOR
each interesting model
M
from
LM
, reexamine
M
:
REPEAT
Perform
any of the following: {
ADD_ATTRIBUTES
REMOVE_ATTRIBUTES
Expand
credibility
indicator }
Evaluate
the results with several quality
measures and for meaning
UNTIL
no more interesting relations are found in the search
space near
the initial model
Store
credible
relations and
integrate conclusions
END
FOR
UNTIL
no more new interesting relations are found anywhere in the data
set
10
of 20
Searching for Credible Relations in Machine Learning
HMDM: ADD_ATTRIBUTES
A
TTRIBUTES
A
1
A
2
A
3
C
1
1
0
1
1
1
0
1
1
0
1
0
0
1
1
0
1
1
0
1
0
0
1
0
1
0
0
0
Quality: Accuracy (%)
Model:
J48 trees
Candidates for credible relations
A
1
&
A
2
–
combination
…
11
of 20
Searching for Credible Relations in Machine Learning
HMDM: REMOVE_ATTRIBUTES
Quality: Accuracy (%)
A
TTRIBUTES
A
1
A
2
A
3
C
1
0
1
1
0
1
0
0
0
1
0
0
1
0
1
1
1
0
1
1
1
1
1
1
1
1
1
1
Model:
J48 trees
Candidates for credible relations
A
1
||
A
3
–
redundancy
…
12
of 20
Searching for Credible Relations in Machine Learning
Type
-
Credibility Scheme
•
Three levels of credibility:
1.
Frequent and stable relations
•
Often appear in models
•
When added improve quality
•
When removed reduce quality
2.
Frequent and less
-
stable relations
•
Often appear in models
•
When added
sometimes improve quality and sometimes not
•
When removed sometimes reduce
quality
and sometimes
not
3.
Not supported by evidence
13
of 20
Searching for Credible Relations in Machine Learning
Quality Measures
•
The decision trees are evaluated according to:
–
A
ccuracy
–
C
orrected class probability estimate (CCPE)
–
K
appa
•
The regression trees are evaluated
according
to:
–
C
orrelation coefficient
–
R
elative absolute accuracy (RAA)
•
In addition, trees are evaluated
according to
∆
–
the total
change in quality caused by adding and removing attributes:
∆
=
𝐴𝐶𝐶
∆
+
𝐶𝐶𝑃𝐸
∆
+
𝐾𝑎 𝑎
∆
14
of 20
Searching for Credible Relations in Machine Learning
Experimental Evaluation
•
Performed on three domains:
1.
Research and development (R&D
)
2.
Higher
education
3.
Automatic web genre identification
15
of 20
Searching for Credible Relations in Machine Learning
R&D Domain: Remove Attributes Graph
GERD
-
PC || GERD
-
GDP
RES
-
HC || RES
-
FTE
APP
-
NON
-
RES
16
of 20
Searching for Credible Relations in Machine Learning
Domains
•
Higher education
–
Goal: An analysis of the impact of higher education sector
on economic welfare of a country
–
DM methods: J48 and M5P trees
–
Data: 60 attributes
;
167 examples
: countries;
class: GNI per
capita
•
Automatic web genre identification
–
Goal: Improve predictive performance by eliminating
less
-
credible
relations from J48 decision
-
tree models
–
Data: 500 attributes
: words;
1,539
examples:
web pages
;
class: 20 genres
17
of 20
Searching for Credible Relations in Machine Learning
R&D and Higher Education
Domains
–
Credible Relations
R&D
•
First
level: increase
the level of investment in R&D sector
•
Second level:
–
I
ncrease
the number of patents
–
Increase
the number of researchers
–
D
evelop
business enterprise sector as the key leader in R&D
activities
Higher education
•
First
level: stimulate
participation in higher education and improve
student exchange programs
•
Second level:
–
I
ncrease
the level of investment in all levels of education (“low”)
–
Increase
number of graduates in science programs (“middle”)
–
Attract
more foreign students (“middle”)
18
of 20
Searching for Credible Relations in Machine Learning
Evaluation
•
User study
on 22 participants:
–
64% of participants did not recognize less
-
credible relations in the
single model
–
When presented with credible models all accepted credible models
as better
Accuracy (%)
Data
J48
HMDM
HI
-
EDU
71.86
R&D
63.47
Correlation coefficient
Data
M5P
HMDM
HI
-
EDU
0.681
R&D
0.722
0.787
Data:
Genres
F
-
Measure
J48
HMDM
Micro
-
AVG
0.280
0.370
Macro
-
AVG
0.284
0.377
19
of 20
Searching for Credible Relations in Machine Learning
Conclusions
•
A
novel method Human
-
Machine Data Mining (HMDM)
was
designed that combines human understanding and
raw computer power to extract credible relations from
data.
•
The HMDM method was evaluated on three complex
domains showing that:
–
t
he method is able to find important relations in data
–
c
redible models are better
in quality
than the models
constructed by automatic DM methods
–
h
umans accept
credible
models
20
of 20
Searching for Credible Relations in Machine Learning
Contributions
•
The main
contributions:
–
A
new method Human
-
Machine Data Mining (
HMDM) was
designed for extracting credible
relations from
data
–
The
CCPE statistical measure, originally conceived for
classification
rules, was
extended
for decision
trees
–
Interactive explanation structures in the form of added and
removed attributes graphs
were designed, conceived to
facilitate
the extraction
of credible
relations
•
Additional contributions:
–
A
computer program was developed to support the HMDM
method
–
The analysis of three real
-
life domains
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Comments 0
Log in to post a comment