Searching for Credible Relations in Machine Learning

wonderfuldistinctAI and Robotics

Oct 16, 2013 (3 years and 11 months ago)

54 views

Searching for Credible Relations in Machine Learning

Doctoral Dissertation

Vedrana

Vidulin

Supervisor:
prof. dr.
Matja
ž

Gams

Co
-
supervisor:
prof. dr.
Bogdan

Filipi
č

Ljubljana,
3 February 2012

2

of 20

Searching for Credible Relations in Machine Learning

Introduction


Task: domain analysis of complex domains


Problem:


When DM methods construct models on complex domains, the
models often contain parts (relations) that are
less
-
credible
from
the perspective of human analyst.


Less
-
credible
parts can:


Lead
to
wrong conclusions about the most important relations in the
domain


Undermine user’s trust in DM methods
(
Stumpf

et al., 2009
)
.


Proposed solution: a new method that in algorithmic way
combines human understanding and raw computer power in
order to extract credible relations


supported by data and
meaningful for the human.

3

of 20

Searching for Credible Relations in Machine Learning

An Example


A decision
-
tree model is constructed:


W
ith J48 algorithm in Weka,


F
rom a data set that represents the impact of R&D sector
on economic welfare of a country

Country

GERD per
capita
(PPP$)

Researchers
per million

inhabitants (HC)



Sector investing
the most in R&D

GNI per
capita

Armenia

7.6

1,660



Government

low

Latvia

37.1

2,455



Government

middle

Japan

813.7

6,227

Business enterprise

high













37 attributes: R&D sector

167 examples: Countries

Class: Economic


welfare

4

of 20

Searching for Credible Relations in Machine Learning

An Example (2)

5

of 20

Searching for Credible Relations in Machine Learning

Outline


Definition of credible relation


Human
-
Machine Data Mining (HMDM) method


Experimental evaluation


Conclusions and contributions


6

of 20

Searching for Credible Relations in Machine Learning

Credible Relation


Relation



a pattern that connects a set of attributes that
describe the properties of a concept underlying the data
and a class/target attribute that represents the concept.


Credible relation



of great meaning
and of
high quality:


Meaning



a
subjective criterion attributed by the
human
based
on the common sense, an informal knowledge about
the domain, observed
frequency and
stability of the relation
.


Quality



an
objective criterion that indicates a support
of
the
selected quality
measures.


Credible model



composed only of credible relations.

7

of 20

Searching for Credible Relations in Machine Learning

How to Establish Credible Relations?

The relation is composed of

attributes
A
1

and
A
2
.

Re
-
examine relation’s credibility by:

1)
Removing attributes
A
1

and
A
2
from data set

2)
Adding attributes
A
1

and
A
2
to


If the relation is supported by evidence, add it
to the list of candidates for credible relations.

8

of 20

Searching for Credible Relations in Machine Learning

The HMDM Algorithm

Until no new interesting relations


Repeat


Create
several models (e.g., trees)


Choose
most interesting models

F
or
each interesting
model

Examine
credibility

of relations in the
model

by
adding

and
removing

attributes from the data set

Merge candidate relations
with the output
list of credible relations

9

of 20

Searching for Credible Relations in Machine Learning

The HMDM Algorithm (2)

HMDM
(data set)


REPEAT



Select
DM
method


Select parameters and their ranges, define constraints


Perform INITIAL_DM creating
a list of
models
LM
:


FOR
each interesting model
M

from
LM
, reexamine
M
:



REPEAT


Perform
any of the following: {



ADD_ATTRIBUTES



REMOVE_ATTRIBUTES



Expand
credibility
indicator }


Evaluate
the results with several quality
measures and for meaning



UNTIL
no more interesting relations are found in the search
space near
the initial model


Store
credible
relations and
integrate conclusions


END
FOR


UNTIL
no more new interesting relations are found anywhere in the data
set

10

of 20

Searching for Credible Relations in Machine Learning

HMDM: ADD_ATTRIBUTES

A
TTRIBUTES

A
1

A
2

A
3

C

1

1

0

1

1

1

0

1

1

0

1

0

0

1

1

0

1

1

0

1

0

0

1

0

1

0

0

0

Quality: Accuracy (%)

Model:

J48 trees

Candidates for credible relations

A
1

&
A
2


combination



11

of 20

Searching for Credible Relations in Machine Learning

HMDM: REMOVE_ATTRIBUTES

Quality: Accuracy (%)

A
TTRIBUTES

A
1

A
2

A
3

C

1

0

1

1

0

1

0

0

0

1

0

0

1

0

1

1

1

0

1

1

1

1

1

1

1

1

1

1

Model:

J48 trees

Candidates for credible relations

A
1

||
A
3



redundancy



12

of 20

Searching for Credible Relations in Machine Learning

Type
-
Credibility Scheme


Three levels of credibility:

1.
Frequent and stable relations


Often appear in models


When added improve quality


When removed reduce quality

2.
Frequent and less
-
stable relations


Often appear in models


When added
sometimes improve quality and sometimes not


When removed sometimes reduce
quality
and sometimes
not

3.
Not supported by evidence


13

of 20

Searching for Credible Relations in Machine Learning

Quality Measures


The decision trees are evaluated according to:


A
ccuracy


C
orrected class probability estimate (CCPE)


K
appa


The regression trees are evaluated
according
to:


C
orrelation coefficient


R
elative absolute accuracy (RAA)


In addition, trees are evaluated
according to





the total
change in quality caused by adding and removing attributes:



=
𝐴𝐶𝐶

+
𝐶𝐶𝑃𝐸

+
𝐾𝑎 𝑎


14

of 20

Searching for Credible Relations in Machine Learning

Experimental Evaluation


Performed on three domains:

1.
Research and development (R&D
)

2.
Higher
education

3.
Automatic web genre identification

15

of 20

Searching for Credible Relations in Machine Learning

R&D Domain: Remove Attributes Graph

GERD
-
PC || GERD
-
GDP

RES
-
HC || RES
-
FTE

APP
-
NON
-
RES

16

of 20

Searching for Credible Relations in Machine Learning

Domains


Higher education


Goal: An analysis of the impact of higher education sector
on economic welfare of a country


DM methods: J48 and M5P trees


Data: 60 attributes
;

167 examples
: countries;

class: GNI per
capita


Automatic web genre identification


Goal: Improve predictive performance by eliminating
less
-
credible

relations from J48 decision
-
tree models


Data: 500 attributes
: words;

1,539
examples:
web pages
;

class: 20 genres


17

of 20

Searching for Credible Relations in Machine Learning

R&D and Higher Education
Domains


Credible Relations

R&D


First
level: increase
the level of investment in R&D sector


Second level:


I
ncrease
the number of patents


Increase
the number of researchers


D
evelop
business enterprise sector as the key leader in R&D
activities

Higher education


First
level: stimulate
participation in higher education and improve
student exchange programs


Second level:


I
ncrease
the level of investment in all levels of education (“low”)


Increase
number of graduates in science programs (“middle”)


Attract
more foreign students (“middle”)


18

of 20

Searching for Credible Relations in Machine Learning

Evaluation


User study

on 22 participants:


64% of participants did not recognize less
-
credible relations in the
single model


When presented with credible models all accepted credible models
as better

Accuracy (%)

Data

J48

HMDM

HI
-
EDU

71.86

R&D

63.47

Correlation coefficient

Data

M5P

HMDM

HI
-
EDU

0.681

R&D

0.722

0.787

Data:

Genres

F
-
Measure

J48

HMDM

Micro
-
AVG

0.280

0.370

Macro
-
AVG

0.284

0.377

19

of 20

Searching for Credible Relations in Machine Learning

Conclusions


A
novel method Human
-
Machine Data Mining (HMDM)
was
designed that combines human understanding and
raw computer power to extract credible relations from
data.


The HMDM method was evaluated on three complex
domains showing that:


t
he method is able to find important relations in data


c
redible models are better

in quality

than the models
constructed by automatic DM methods


h
umans accept
credible
models

20

of 20

Searching for Credible Relations in Machine Learning

Contributions


The main
contributions:


A
new method Human
-
Machine Data Mining (
HMDM) was
designed for extracting credible
relations from
data


The
CCPE statistical measure, originally conceived for
classification
rules, was
extended
for decision
trees


Interactive explanation structures in the form of added and
removed attributes graphs
were designed, conceived to
facilitate
the extraction
of credible
relations


Additional contributions:


A
computer program was developed to support the HMDM
method


The analysis of three real
-
life domains