Budgeted Machine Learning of

wonderfuldistinctΤεχνίτη Νοημοσύνη και Ρομποτική

16 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

65 εμφανίσεις

Budgeted Machine Learning of

Bayesian Networks

Michael R.
Gubbels

Dr. Stephen D. Scott

Department of Computer Science and Engineering

University of Nebraska
-
Lincoln

McNair Scholars Program

August 2009

Overview


Introduction


Methods


Results


Discussion


Conclusions

2

Machine Learning

The concern of
machine learning
is to design
algorithms

for learning
general knowledge from a collection of related special cases called
examples
. A collection of related examples forms a data set for the
concept

relating its examples.


Learning algorithms construct a general
model

for relationships
among the characteristics or
attributes

in data set.


Using this model, the attributes of interest or
labels

of an example
can be learned in terms of its other attributes. This allows the label
of a new example or
instance

to be
predicted

in terms of its
attributes.

3

Machine Learning

A
TTRIBUTES

Smoker

Bronchitis

Fatigue

Chest

X
-
ray

Lung
Cancer

E
XAMPLES

Yes

No

Yes

Negative

No

No

No

No

Positive

No

∙∙∙

Yes

Yes

No

Positive

Yes

4

L
EARNING

A
LGORITHM

observed by

L
EARNED

M
ODEL

Yes

generates

predicts

presented to

(Yes, No, Yes, Positive, ?)

Budgeted Machine Learning

5

In
budgeted machine learning
, an algorithm is given a
budget

and
must pay for the attributes observed during learning. Using the
attributes purchased from examples, the algorithm constructs a
general model for a concept.


The labels of examples are free, so can be observed without penalty,
but the attributes have an associated cost.


Attributes are purchased until the budget is exhausted.

Budgeted Machine Learning

6

A
TTRIBUTES

Smoker

Bronchitis

Fatigue

Chest

X
-
ray

Lung
Cancer

C
OST

$10

$50

$10

$100

E
XAMPLES

?

?

?

?

No

?

?

?

?

No

∙∙∙

?

?

?

?

Yes

B
UDGETED

L
EARNING

A
LGORITHM

L
EARNED

M
ODEL

Yes

purchases from

generates

predicts

observed by

evaluated by

(Yes, No, Yes, Positive, ?)

presented to

Budgeted Machine Learning


Attribute selection policies


Round robin

Purchases one value at a time from each attribute

e.g., (A
1
, A
2
, A
3
, A
4
, A
1
, A
2
, A
3
, A
4
,

A
1
,

… )



Biased robin


Repeatedly purchases values from the same while
those purchases improve the model


e.g., ( A
1
, A
1
, A
2,

A
3,

A
3,

A
3,

A
4,

A
1,
A
1,
… )

7

Budgeted Machine Learning

8

Naïve
Bayes

classifier

Bayesian network










卩浰S敲et漠瑲慩a


啮U敡汩V瑩c 慳獵浰瑩潮o 潦oat瑲i扵t攠
i湤数敮摥湣n


P潯o 敳瑩浡t敳 潦o瑲略⁰ 潢慢ili瑩敳


䑩f晩f畬u

t漠瑲慩a


䍡C

浯摥l⁣潮摩瑩潮慬o
i湤数敮摥湣攠潦oat瑲i扵t敳


䍯浰Ct敳

慣a畲at攠
灲潢慢ili瑩敳

Purpose


Evaluate how well existing algorithms learn
Bayesian networks for use in classification


Produce more accurate probability estimates


Should improve efficacy of existing algorithms that
depend on such estimates


Explicitly represent attribute independencies


Should facilitate learning of a more accurate model


Should improve classification performance of model

9

Methods

1.
Generated data sets


Used
Asia

and
ALARM

Bayesian network models


Asia

network has 8 attributes:


Predicted
Bronchitis


ALARM

network has 37 attributes


Predicted
Breathing Pressure

2.
Constructed model

3.
Evaluated the learned networks


10

Methods

1.
Generated data sets

2.
Constructed model


Used round robin and biased robin policies


Learned naïve and complex Bayesian networks


Structures were given


Uniform and noisy prior knowledge


Uniform attribute cost


Varied the learning algorithm’s total budget

3.
Evaluated the learned networks

11

Methods

1.
Evaluated the learned networks

2.
Constructed model

3.
Evaluated the learned networks


For uniform and noisy prior knowledge


For many numbers of purchases


Baseline (“best possible”) classification
performance of model


12

Results

13

Results

14

Discussion


Naïve Bayesian networks


Converge to baseline faster with uniform priors


Bayesian networks


Have more accurate baseline than naïve networks


Converge to baseline faster with noisy priors

15

Conclusions


Bayesian networks learned using existing algorithms
converge to baseline


Bayesian networks may be preferable to naïve
networks when learning complex concepts or when
prior knowledge is available



Future work


Evaluate more existing policies with Bayesian networks


Analysis of models learned for complex concepts


New algorithms to exploit Bayesian network structure


Learning from data with different cost models

16

Acknowledgements


Dr. Stephen D. Scott

Research Mentor


Kun Deng

Graduate Student


Amy Lehman

Graduate Student Mentor


UNL McNair Scholars Program

17

Bibliography

Lizotte
, D. J.,
Madani
, O., & Greiner, R. (2003). Budgeted learning of
naïve
-
Bayes

classifiers.
Uncertainty in Artificial Intelligence
, 378
-
385.

Tong, S., &
Koller
, D. (2001). Active learning for parameter estimation
in Bayesian networks.
International Joint
Converences

on
Artificial Intelligence.

Deng, K., Bourke, C., Scott, S.,
Sunderman
, J., &
Zheng
, Y. (2007).
Bandit
-
based algorithms for budgeted learning.
Seventh IEEE
International Conference on Data Mining
, 463
-
468.

Neapolitan, R. E. (2004).
Learning
bayesian

networks.

New Jersey:
Pearson Prentice Hall.

Mitchell, T. M. (1997).
Machine learning.

New York: McGraw
-
Hill.

18

Pseudocode

A
TTRIBUTE

S
ELECTION

P
OLICIES

R
OUND

R
OBIN

B
IASED

R
OBIN

R
ANDOM

a

S
ELECT
(
M
IN
-
C
OST
(A)
)


U
NTIL
(
B
UDGET
-
E
XHAUSTED
?)
:


e


S
ELECT
(
R
ANDOM
(E)
)


v

P
URCHASE
(
a,e
)


M

L
EARN
M
(v)


a


S
ELECT
(
N
EXT
(A)
)

a

S
ELECT
(
M
IN
-
C
OST
(A)
)


U
NTIL
(
B
UDGET
-
E
XHAUSTED
?)
:


e


S
ELECT
(
R
ANDOM
(E)
)


m
old


C
ORRECTNESS
(M
)


v

P
URCHASE
(
a,e
)


M

L
EARN
M
(v)


m
new


C
ORRECTNESS
(M
)


I
F
(
m
new
< m
old
):


a


S
ELECT
(
N
EXT
(A)
)

a

S
ELECT
(
R
ANDOM
(A)
)


U
NTIL
(
B
UDGET
-
E
XHAUSTED
?)
:


e


S
ELECT
(
R
ANDOM
(E)
)


v

P
URCHASE
(
a,e
)


M

L
EARN
M
(v)


a


S
ELECT
(
R
ANDOM
(A)
)

Figure 4.
Pseudocode

for the round robin, biased robin, and random data selection policies.
A

is the set of
attributes available to purchase values from, and
a

is a particular attribute in
A
.
E

is the set of examples,
and
e

is a particular example in
E
.
v

is an attribute value in an example.
M

denotes the specific model
being learned, and the variables
m
old

and
m
new

represent the correctness of model
M

before and after
learning new information.

19

Pseudocode

A
TTRIBUTE

S
ELECTION

P
OLICIES

R
OUND

R
OBIN

B
IASED

R
OBIN

a

S
ELECT
(
M
IN
-
C
OST
(A)
)


U
NTIL
(
B
UDGET
-
E
XHAUSTED
?)
:


e


S
ELECT
(
R
ANDOM
(E)
)


v

P
URCHASE
(
a,e
)


M

L
EARN
M
(v)


a


S
ELECT
(
N
EXT
(A)
)

a

S
ELECT
(
M
IN
-
C
OST
(A)
)


U
NTIL
(
B
UDGET
-
E
XHAUSTED
?)
:


e


S
ELECT
(
R
ANDOM
(E)
)


m
old


C
ORRECTNESS
(M
)


v

P
URCHASE
(
a,e
)


M

L
EARN
M
(v)


m
new


C
ORRECTNESS
(M
)


I
F
(
m
new
< m
old
):


a


S
ELECT
(
N
EXT
(A)
)

20