Bayesian Networks

fancyfantasicΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 3 χρόνια και 7 μήνες)

97 εμφανίσεις

Expert Systems 8

1

Bayesian Networks

1.
Probability theory

2.
BN as knowledge model

3.
Bayes in Court

4.
Dazzle examples

5.
Conclusions

Reverend Thomas Bayes

(1702
-
1761)

Jenneke IJzerman,

Bayesiaanse Statistiek in de Rechtspraak,

VU Amsterdam, September 2004.

http://www.few.vu.nl/onderwijs/stage/werkstuk/werkstukken/werkstuk
-
ijzerman.doc

Expert Systems 8

2

Thought Experiment: Hypothesis Selection

Imagine two types of bag:


BagA:
250

+
750


BagB:
750

+
250









Take 5 balls from a bag:


Result:
4

+
1

What is the type of the bag?

Probability of this result from


BagA: 0. 0144


BagB: 0. 396

Conclusion: The bag is BagB.

But…


We don’t know how the bag
was selected


We don’t even know that type
BagB exists



Experiment is meaningful
only

in light of the a priori posed
hypotheses (BagA, BagB) and
their assumed likelihoods.

Expert Systems 8

3

Classical and Bayesian statistics

Classical statistics:


Compute the prob for your
data, assuming a hypothesis


Reject a hypothesis if the
data becomes unlikely


Bayesian statistics:


Compute the prob for a
hypothesis, given your data


Requires
a priori

prob for
each hypothesis;

these are extremely
important!

Expert Systems 8

4

Part I: Probability theory

What is a probability?


Frequentist: relative
frequency of occurrence.


Subjectivist: amount of belief



Mathematician:

Axioms (Kolmogorov),

assignment of non
-
negative
numbers to a set of states
,
sum 1 (100%).


State has several variables:
product space.

With
n

binary variables: 2
n
.


Multi
-
valued variables.


Blont

Not blond

30

70

Blond

Not
blond

Mother
blond

15

15

Mother
n.b.

15

55

Expert Systems 8

5

Conditional Probability: Using evidence


First table:

Probability for any woman to
deliver blond baby


Second table:

Describes for blond and non
-
blond mothers separately


Third table:

Describe
only

for blond mother


Row is
rescaled

with its weight;

Def.
conditional probability
:

Pr(A|B) = Pr( A & B ) / Pr(B)


Rewrite:


Pr(A & B) = Pr(B) x Pr(A | B)

Blond

Not blond

30

70

Blond

Not
blond

Mother
blond

15

15

Mother
n.b.

15

55

Blond

Not
blond

Mother
blond

50

50

Expert Systems 8

6

Dependence and Independence


The prob for a blond child are
30%, but larger for a blond
mother and smaller for a
non
-
blond mother.


The prob for a boy are 50%,
also for blond mothers, and
also for non
-
blond mothers.


Def.: A and B are
independent
:
Pr(A|B) = Pr(A)


Exercise: Show that

Pr(A|B) = Pr(A)

is equivalent to

Pr(B|A) = Pr(B)

(aka B and A are independent).


Blond

Not
blond

Mother
blond

15

15

Mother
n.b.

15

55

Boy

Girl

Mother
blond

15

15

Mother
n.b.

35

35

Boy

Girl

Mother
blond

50

50

Expert Systems 8

7

Bayes Rule: from data to hypothesis

4

+
1

Other

BagA

0.0144

0.986

BagB

0.396

0.604

Other


Classical Probability Theory:

0.0144 is the relative weight
of
4
+
1

in the ROW of BagA.


Bayesian Theory describes
the distribution over the
column of
4
+
1
.

Classical statistics:
ROW distribution

Bayesian statistic:
COLUMN distr.

Bayes’ Rule:


Observe that

Pr(A & B)

= Pr(A) x Pr(B|A)



= Pr(B) x Pr(A|B)


Conclude Bayes’ Rule:



)
(
)
(
)
|
(
)
|
(
B
P
A
P
A
B
P
B
A
P

Expert Systems 8

8

Reasons for Dependence 1: Causality


Dependency: P(B|A) ≠ P(B)


Positive Correlation: >


Negative correlation: <


Possible explanation:

A causes B.

Example:

P(headache) = 6%



P(ha | party) = 10%



P(ha |
¬party) = 2%

h.a.

no h.a.

party

5

45

no part

1

49

Alternative explanation:

B causes A.

In the same example:

P(party) = 50%

P(party | h.a.) = 83%

P(party | no h.a.) = 48%


“Headaches make students go
to parties.”



In statistics, correlation has no
direction.



Expert Systems 8

9

2. Table of headache and
money:






Pr(broke) = 30%

Pr(broke | h.a.) = 50%


3. Table of headache and
money for party attendants:

Reasons for Dependence 2: Common cause

1. The student party may lead
to headache and is costly

(money versus broke):


mon
-
br

h.a.

no h.a.

party

5

2
-
3

45

18
-
27

no part

1

1
-
0

49

49
-
0

h.a.

no h.a.

money

3

67

broke

3

27

h.a.

no h.a.

money

2

18

broke

3

27

This dependency disappears if
the common cause variable is
known

Expert Systems 8

10

Reasons for Dependence 3: Common effect

A and B are independent:









Pr(B) = 80%

Pr(B|A) = 80%

B and A are independent.

Their combination stimulates C;
for instances satisfying C:








Pr(B) = 90%

Pr(B|A) = 93%, Pr(B|
¬A)=80%

(#C)

A

non A

B

40 (14)

40 (4)

non B

10 (1)

10 (1)

A

non A

B

14

4

non B

1

1

This dependency appears if the
common effect variable is known

Expert Systems 8

11

Part II: Bayesian Networks


Probabilistic Graphical Model


Probabilistic Network


Bayesian Network


Belief Network


Consists of:


Variables

(
n
)


Domains

(
here

binary)


Acyclic arc set, modeling the
statistical influences


Per variable V (indegree
k
):
Pr(V | E), for 2
k

cases of E.


Information in node:

exponential in indegree.

Pr

-

pa

50%

pa

br

ha

Pr

pa

¬pa

ha

10%

2%

Pr

pa

¬pa

br

40%

0%

C

B

A

Pr

A,B

A,
¬B

¬A,B

¬A,¬B

C

56%

10%

10%

10%

Expert Systems 8

12

The Bayesian Network Model

Closed World Assumption


Rule based:


IF x attends party


THEN x has headache


WITH cf = .10

What if x didn’t attend?


Bayesian model:

Direction of arcs and correlation

pa

ha

Pr

pa

¬pa

ha

10%

2%

Pr

-

pa

50%

pa

ha

Pr

ha

¬ha

pa

83%

48%

Pr

-

ha

6%

Pr(ha|
¬pa) is included: claim
all relevant info is modeled

1.
BN does not necessarily
model causality

2.
Built upon HE understanding
of relationships; often causal

Expert Systems 8

13

A little theorem


A Bayesian network on
n

binary variables

uniquely

defines a probability distribution

over the associated set of 2
n

states.



Distribution has 2
n

parameters

(numbers in [0..1] with sum 1).


Typical network has in
-
degree 2 to 3:

represented by 4
n

to 8
n

parameters (PIGLET!!).



Bayesian Networks are an efficient representation

Expert Systems 8

14

The Utrecht DSS group


Initiated by Prof Linda van der Gaag from ~1990


Focus: development of BN support tools


Use experience from building several actual BNs


Medical

applications


Oesoca,

~40 nodes.



Courses:

Probabilistic

Reasoning


Network

Algoritms

(Ma ACS).

Expert Systems 8

15

How to obtain a BN model

Describe Human Expert knowledge:

Metastatic Cancer may be detected by an
increased level of serum calcium (SC). The
Brain Tumor (BT) may be seen on a CT scan
(CT). Severe headaches (SH) are indicative
for the presence of a brain tumor. Both a
Brain tumor and an increased level of serum
calcium may bring the patient in a coma
(Co).








Probabilities: Expert guess or statistical
study

Learn BN structure

automatically from

data by means of

Data Mining


Research of
Carsten


Models not intuitive


Not considered XS


Helpful addition to
Knowledge Acquisition
from Human Expert


Master ACS.

mc

sc

bt

co

sh

ct

Expert Systems 8

16

Inference in Bayesian Networks

The probability of a state

S = (v1, .. , vn):

Multiply Pr(vi | S)

The marginal

(overall)
probability of each variable:









Sampling:

Produce a series of
cases, distributed according
to the probability distribution
implicit in the BN

Pr

-

pa

50%

pa

br

ha

Pr

pa

¬pa

ha

10%

2%

Pr

pa

¬pa

br

40%

0%

Pr (pa,
¬ha, ¬br)

= 0.50 * 0.90 * 0.60


= 0.27

Pr(pa) = 50%

Pr(ha) = 6%

Pr(br) = 20%

Expert Systems 8

17

Consultation: Entering Evidence

Consultation applies the BN knowledge to a specific case


Known variable values can be entered into the network


Probability tables for all nodes are updated



Obtain (sth

like) new BN

modeling the

conditional

distribution


Again, show

distributions

and state

probabilities



Backward and

Forward

propagation

Expert Systems 8

18

Test Selection (Danielle)


In consultation, enter data
until
goal variable

is known
with sufficient probability.


Data items are obtained at
specific
cost
.


Data items influence the
distribution of the goal.


Problem:


Given the current state of
the consultation, find out
what is the best variable to
test next.

Started CS study 1996,

PhD Thesis defense Oct 2005

Expert Systems 8

19

Some more work done in Linda’s DSS group


Sensitivity Analysis:

Numerical parameters in the BN may be inaccurate;

how does this influence the consultation outcome?



More efficient inferencing:

Inferencing is costly, especially in the presence of


Cycles (NB.: There are no directed cycles!)


Nodes with a high in
-
degree


Approximate reasoning, network decompositions, …



Writing a program tool: Dazzle

Expert Systems 8

20

Part III: In the Courtroom

What happens in a trial?


Prosecutor and Defense
collect information


Judge decides if there is
sufficient evidence that
person is guilty


Forensic tests are far more

conclusive than medical ones

but still
probabilistic

in

nature!

Pr(symptom|sick) = 80%

Pr(trace|innocent) = 0.01%

Tempting to forget statistics.

Need a priori probabilities.

)
(
)
(
)
|
(
)
|
(
B
P
A
P
A
B
P
B
A
P

Jenneke IJzerman, Bayesiaanse
Statistiek in de Rechtspraak, VU
Amsterdam, September 2004.

Expert Systems 8

21

Prosecutor’s Fallacy

The story:


A DNA sample was taken
from the crime site


Probability of a match of
samples of different people
is 1 in 10,000


20,000 inhabitants are
sampled


John’s DNA matches the
sample



Prosecutor: chances that
John is innocent is 1 in
10,000


Judge convicts John

The analysis


The prosecutor confuses


Pr(inn | evid)

(a)


Pr(evid | inn)

(b)


Forensic experts can only
shed light on (b)


The Judge must find (a);

a priori probabilities are
needed!! (Bayes)


Dangerous to convict on DNA
samples alone




Pr(innocent match) = 86%


Pr(1 such match) = 27%

Expert Systems 8

22

Defender’s Fallacy

The story


Town has 100,001 people


We expect 11 to match

(1 guilty plus 10 innocent)


Probability that John is guilty
is 9%.



John must be released


Implicit assumptions:


Offender is from town.


Equal a priori probability for
each inhabitant


It is necessary to take other

circumstances into account;

why was John prosecuted and

what other evidence exists?


Conclusions:


PF: it is necessary to take
Bayes and a priori prob
s

into
account


DF: estimating the a prioris
is crucial for the outcome

Expert Systems 8

23

verslagen van deskundigen
behelzende hun gevoelen
betreffende hetgeen hunne
wetenschap hen leert omtrent
datgene wat aan hun oordeel
onderworpen is

Experts’ and Judge’s task

IJzerman’s ideas about trial:

1.
Forensic Expert may not
claim
a priori

or
a posteriori

probabilities (Dutch Penalty
Code, 344
-
1.4)

2.
Judge must set a priori

3.
Judge must compute a
posteriori, based on
statements of experts


4.
Judge must have explicit
threshold of probability for
beyond reasonable doubt

5.
Threshold should be
explicitized in law.

Is this realistic?

1.
Avoid confusing Pr(G|E) and
Pr(E|G), a good idea

2.
A priori’s are extremely
important; this almost pre
-
determines the verdict

3.
How is this done? Bayesian
Network designed and
controlled by Judge?


4.
No judge will obey a
mathematical formula

5.
Public agreement and
acceptance?

Expert Systems 8

24

Bayesian Alcoholism Test


Driving under influence of alcohol leads to a penalty


Administrative procedure may voiden licence



Judge must decide if the subject is an alcohol addict;

incidental or regular (harmful) drinking


Psychiatrists advice the court by
determining if drinking
was incidental or regular



Goal HHAU: Harmful and Hazardous Alcohol Use


Probabilistically confirmed or denied by clinical tests


Bayesian Alcoholism Test: developed 1999
-
2004 by A.
Korzec, Amsterdam.

Expert Systems 8

25

Variables in Bayesian Alcoholism Test

Hidden variables:


HHAU: alcoholisme


Liver disease


Observable causes:


Hepatitis risk


Social factors


BMI, diabetes


Observable effects:


Skin color


Lab: blood, breadth


Level of Response


Smoking


CAGE questionnaire



Expert Systems 8

26

Knowledge Elicitation for BAT

Knowledge in the Network


Qualitative

-

What variables are relevant

-


How do they interrelate



Quantitative

-


A priori probabilities

-


Conditional probabilities


for hidden diseases

-

Conditional probabilities


for effects

-

Response of lab tests to


hidden diseases




How it was obtained


Network structure??

IJzerman does not report
about this


Probabilities

-

Literature studies:


40% of probabilities

-

Expert opinions:


60% of probabilities

Expert Systems 8

27

Consultation with BAT

Enter evidence about subject:


Clinical signs:

skin, smoking, LRA;

CAGE.


Lab results


Social factors

The network will return:


Probability that Subject has
HHAU


Probabilities for liver disease
and diabetes



The responsible Human Medical

Expert converts this probability

to a YES/NO for the judge!

(Interpretation phase)


HME may take other data into

account (rare disease).

Knowing what the CAGE is used
for may influence the answers that
the subject gives.

Expert Systems 8

28

Part IV: Bayes in the Field

The Dazzle program


Tool for designing and analysing BN


Mouse
-
click the network;

fill in the probabilities


Consult by evidence submission


Read posterior probabilities



Development 2004
-
2006


Written in Haskell


Arjen van IJzendoorn, Martijn Schrage



www.cs.uu.nl/dazzle

Expert Systems 8

29

Importance of a good model

In 1998, Donna Anthony
(31) was convicted for
murdering her two children.
She was in prison for seven
years but claimed her
children died of cot death.

Prosecutor:

The probability of two cot
deaths in one family is too
small, unless the mother is
guilty.

Expert Systems 8

30

The Evidence against Donna Anthony


BN with priors eliminates
Prosecutor’s Fallacy


Enter the evidence:

both children died


A priori probability is very
small (1 in 1,000,000)



Dazzle establishes a
97.6% probability of guilt



Name of expert: Prof. Sir
Roy Meadow (1933)


His testimony brought a
dozen mothers in prison in
a decade

Expert Systems 8

31

A More Refined Model

Allow for genetic or social circumstances
for which parent is not liable.

Expert Systems 8

32

The Evidence against Donna?

Refined model: genetic
defect is the most likely
cause of repeated deaths


Donna Anthony was
released in 2005 after 7
years in prison

6/2005: Struck from GMC register

7/2005: Appeal by Meadow

2/2006: Granted; otherwise experts
refuse witnessing

Expert Systems 8

33

Classical Swine Fever, Petra Geenen


Swine Fever is a
costly disease


Development
2004/5


42 var
s
, 80 arcs


2454 Pr
s
, but
many are 0.



Pig/herd level


Prior extremely
small


Probability
elicitation with
questionnaire


Expert Systems 8

34

Conclusions


Mathematically sound model to reason with uncertainty


Further studied in Probabilistic Reasoning (ACS)


Applicable to areas where knowledge is highly statistical



Acquisition: Instead of classical
IF a THEN b (WITH c)
,

obtain
both

Pr(b|a)

and

Pr(b|
¬a)


More work but more powerful model


One formalism allows both
diagnostic

and
prognostic

reasoning



Danger: apparent exactness is deceiving


Disadvantage: Lack of explanation facilities (research);

Model is quite transparant, but consultations are not.



Increasing popularity, despite difficulty in building