Artificial Intelligence Overview Bayesian Networks

placecornersdeceitAI and Robotics

Nov 7, 2013 (3 years and 1 month ago)

142 views

School of Computer Science
and Informatics
Cardiff University
Artificial Intelligence
IV.Uncertain Knowledge and Reasoning
3.Bayesian Networks
F.C.Langbein
1.4
Overview
Bayesian networks
Syntax
Global Semantics
Local semantics
Markov blanket
Constructing Bayesian networks
F.C.Langbein,Artificial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 1
Bayesian Networks
A simple,graphical notation for conditional independence
assertions
Compact specification of full joint distributions
Syntax
a set of nodes,one per variable X
l
a directed,acyclic graph (link ≈ “directly influences”)
a conditional probability distribution (CPD) for each node
given its parents
P(X
l
|Parents(X
l
))
Simplest case:CPD is a conditional probability table (CPT)
Giving distribution over X
l
for each combination of parent
values
F.C.Langbein,Artificial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 2
Dentistry Example
Topology of network encodes conditional independence
assertions
Weather is independent of the other variables
Toothache and Catch are conditionally independent
given Cavity
F.C.Langbein,Artificial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 3
Burglar AlarmExample
I’m at work.Is there is a burglary at home?
Neighbour John calls to say my alarm is ringing.
Neighbour Mary does not call.
Sometimes alarm is set off by minor earthquakes.
Boolean variables:
Burglary,
Earthquake,
Alarm,
JohnCalls,
MaryCalls
Construct network to reflect
causal knowledge
A burglar may set the alarm off
An earthquake may set the alarm off
The alarm may cause Mary to call
The alarm may cause John to call
F.C.Langbein,Artificial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 4
Burglar AlarmExample
Less space:Max.k parents ⇒O(nd
k
) numbers vs.O(d
n
)
Faster to answer queries
Simpler to find CPTs
F.C.Langbein,Artificial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 5
Global Semantics
Global semantics defines the full joint distribution as the
product of the local conditional distributions
P(X
1
,...,X
n
) =
￿
n
l=1
P(X
l
|Parents(X
l
))
Combines chain rule and independence
Examples
P(j ∧m∧a ∧¬b ∧¬e) =
P(¬b)P(¬e)P(a|¬b ∧¬e)P(j|a)P(m|a)
P(J|B,E) = α
￿
a
￿
m
P(J,B,E,a,m)
= α
￿
a
￿
m
P(J|a)P(B)P(E)P(a|B,E)P(m|A)
P(B|J) = α
￿
e
￿
a
￿
m
P(B,J,e,a,m)
= α
￿
e
￿
a
￿
m
P(B)P(J|a)P(e)P(a|B,e)P(m|a)
F.C.Langbein,Artificial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 6
Local Semantics
Local semantics:each node is conditionally independent
of its non-descendants given its parents
Theorem:
local semantics ⇔global semantics
(Proof:apply chain rule with ordering of parents before children)
F.C.Langbein,Artificial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 7
Markov Blanket
Each node is conditionally independent of all others given
its
Markov blanket
Markov blanket = parents + children + children’s parents
F.C.Langbein,Artificial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 8
Constructing Bayesian Networks
Need a method such that a series of locally testable
assertions of conditional independence guarantees the
required global semantics
CHOOSE ordering of variables X
1
,...,X
n
For l ←1 to n
ADD X
l
to the network
SELECT parents from X
1
,...,X
l−1
such that
P(X
l
|Parents(X
l
)) = P(X
l
|X
1
,...,X
l−1
)
This choice of parents guarantees global semantics
P(X
1
,...,X
n
) =
￿
n
l=1
P(X
l
|X
1
,...,X
l−1
) (chain rule)
=
￿
n
l=1
P(X
l
|Parents(X
l
)) (by construction)
F.C.Langbein,Artificial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 9
Burglar AlarmExample
Suppose we choose the ordering M,J,A,B,E
P(J|M) = P(J)?
F.C.Langbein,Artificial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 10
Burglar AlarmExample
Suppose we choose the ordering M,J,A,B,E
P(J|M) = P(J)?
No
P(A|J,M) = P(A|J)?P(A|J,M) = P(A)?
F.C.Langbein,Artificial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 11
Burglar AlarmExample
Suppose we choose the ordering M,J,A,B,E
P(J|M) = P(J)?No
P(A|J,M) = P(A|J)?P(A|J,M) = P(A)?
No
P(B|A,J,M) = P(B|A)?
P(B|A,J,M) = P(B)?
F.C.Langbein,Artificial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 12
Burglar AlarmExample
Suppose we choose the ordering M,J,A,B,E
P(J|M) = P(J)?No
P(A|J,M) = P(A|J)?P(A|J,M) = P(A)?No
P(B|A,J,M) = P(B|A)?
Yes
P(B|A,J,M) = P(B)?
No
P(E|B,A,J,M) = P(E|A)?
P(E|B,A,J,M) = P(E|A,B)?
F.C.Langbein,Artificial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 13
Burglar AlarmExample
Suppose we choose the ordering M,J,A,B,E
P(J|M) = P(J)?No
P(A|J,M) = P(A|J)?P(A|J,M) = P(A)?No
P(B|A,J,M) = P(B|A)?Yes
P(B|A,J,M) = P(B)?No
P(E|B,A,J,M) = P(E|A)?
No
P(E|B,A,J,M) = P(E|A,B)?
Yes
F.C.Langbein,Artificial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 14
Burglar AlarmExample
For non-causal (∼diagnostic) directions
Deciding conditional independence is hard
(causal models and conditional independence seem hardwired for
humans)
Assessing conditional probabilities is hard
Network is less compact:1 +2 +4 +2 +4 = 13 numbers
instead of 1 +1 +4 +2 +2 = 10 numbers needed
F.C.Langbein,Artificial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 15
Car Diagnosis Example
Initial evidence:car does not start
Testable variables(green);“broken,so fix it” variables (orange)
Hidden variables (grey) ensure sparse structure,reduce parameters
F.C.Langbein,Artificial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 16
Car Insurance Example
Note,arcs do not deny independence
However,absence of arc asserts independence
F.C.Langbein,Artificial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 17