# Artificial Intelligence Overview Bayesian Networks

School of Computer Science
and Informatics
Cardiff University
Artiﬁcial Intelligence
IV.Uncertain Knowledge and Reasoning
3.Bayesian Networks
F.C.Langbein
1.4
Overview
Bayesian networks
Syntax
Global Semantics
Local semantics
Markov blanket
Constructing Bayesian networks
F.C.Langbein,Artiﬁcial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 1
Bayesian Networks
A simple,graphical notation for conditional independence
assertions
Compact speciﬁcation of full joint distributions
Syntax
a set of nodes,one per variable X
l
a directed,acyclic graph (link ≈ “directly inﬂuences”)
a conditional probability distribution (CPD) for each node
given its parents
P(X
l
|Parents(X
l
))
Simplest case:CPD is a conditional probability table (CPT)
Giving distribution over X
l
for each combination of parent
values
F.C.Langbein,Artiﬁcial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 2
Dentistry Example
Topology of network encodes conditional independence
assertions
Weather is independent of the other variables
Toothache and Catch are conditionally independent
given Cavity
F.C.Langbein,Artiﬁcial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 3
Burglar AlarmExample
I’m at work.Is there is a burglary at home?
Neighbour John calls to say my alarm is ringing.
Neighbour Mary does not call.
Sometimes alarm is set off by minor earthquakes.
Boolean variables:
Burglary,
Earthquake,
Alarm,
JohnCalls,
MaryCalls
Construct network to reﬂect
causal knowledge
A burglar may set the alarm off
An earthquake may set the alarm off
The alarm may cause Mary to call
The alarm may cause John to call
F.C.Langbein,Artiﬁcial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 4
Burglar AlarmExample
Less space:Max.k parents ⇒O(nd
k
) numbers vs.O(d
n
)
Faster to answer queries
Simpler to ﬁnd CPTs
F.C.Langbein,Artiﬁcial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 5
Global Semantics
Global semantics deﬁnes the full joint distribution as the
product of the local conditional distributions
P(X
1
,...,X
n
) =
￿
n
l=1
P(X
l
|Parents(X
l
))
Combines chain rule and independence
Examples
P(j ∧m∧a ∧¬b ∧¬e) =
P(¬b)P(¬e)P(a|¬b ∧¬e)P(j|a)P(m|a)
P(J|B,E) = α
￿
a
￿
m
P(J,B,E,a,m)
= α
￿
a
￿
m
P(J|a)P(B)P(E)P(a|B,E)P(m|A)
P(B|J) = α
￿
e
￿
a
￿
m
P(B,J,e,a,m)
= α
￿
e
￿
a
￿
m
P(B)P(J|a)P(e)P(a|B,e)P(m|a)
F.C.Langbein,Artiﬁcial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 6
Local Semantics
Local semantics:each node is conditionally independent
of its non-descendants given its parents
Theorem:
local semantics ⇔global semantics
(Proof:apply chain rule with ordering of parents before children)
F.C.Langbein,Artiﬁcial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 7
Markov Blanket
Each node is conditionally independent of all others given
its
Markov blanket
Markov blanket = parents + children + children’s parents
F.C.Langbein,Artiﬁcial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 8
Constructing Bayesian Networks
Need a method such that a series of locally testable
assertions of conditional independence guarantees the
required global semantics
CHOOSE ordering of variables X
1
,...,X
n
For l ←1 to n
l
to the network
SELECT parents from X
1
,...,X
l−1
such that
P(X
l
|Parents(X
l
)) = P(X
l
|X
1
,...,X
l−1
)
This choice of parents guarantees global semantics
P(X
1
,...,X
n
) =
￿
n
l=1
P(X
l
|X
1
,...,X
l−1
) (chain rule)
=
￿
n
l=1
P(X
l
|Parents(X
l
)) (by construction)
F.C.Langbein,Artiﬁcial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 9
Burglar AlarmExample
Suppose we choose the ordering M,J,A,B,E
P(J|M) = P(J)?
F.C.Langbein,Artiﬁcial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 10
Burglar AlarmExample
Suppose we choose the ordering M,J,A,B,E
P(J|M) = P(J)?
No
P(A|J,M) = P(A|J)?P(A|J,M) = P(A)?
F.C.Langbein,Artiﬁcial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 11
Burglar AlarmExample
Suppose we choose the ordering M,J,A,B,E
P(J|M) = P(J)?No
P(A|J,M) = P(A|J)?P(A|J,M) = P(A)?
No
P(B|A,J,M) = P(B|A)?
P(B|A,J,M) = P(B)?
F.C.Langbein,Artiﬁcial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 12
Burglar AlarmExample
Suppose we choose the ordering M,J,A,B,E
P(J|M) = P(J)?No
P(A|J,M) = P(A|J)?P(A|J,M) = P(A)?No
P(B|A,J,M) = P(B|A)?
Yes
P(B|A,J,M) = P(B)?
No
P(E|B,A,J,M) = P(E|A)?
P(E|B,A,J,M) = P(E|A,B)?
F.C.Langbein,Artiﬁcial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 13
Burglar AlarmExample
Suppose we choose the ordering M,J,A,B,E
P(J|M) = P(J)?No
P(A|J,M) = P(A|J)?P(A|J,M) = P(A)?No
P(B|A,J,M) = P(B|A)?Yes
P(B|A,J,M) = P(B)?No
P(E|B,A,J,M) = P(E|A)?
No
P(E|B,A,J,M) = P(E|A,B)?
Yes
F.C.Langbein,Artiﬁcial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 14
Burglar AlarmExample
For non-causal (∼diagnostic) directions
Deciding conditional independence is hard
(causal models and conditional independence seem hardwired for
humans)
Assessing conditional probabilities is hard
Network is less compact:1 +2 +4 +2 +4 = 13 numbers
instead of 1 +1 +4 +2 +2 = 10 numbers needed
F.C.Langbein,Artiﬁcial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 15
Car Diagnosis Example
Initial evidence:car does not start
Testable variables(green);“broken,so ﬁx it” variables (orange)
Hidden variables (grey) ensure sparse structure,reduce parameters
F.C.Langbein,Artiﬁcial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 16
Car Insurance Example
Note,arcs do not deny independence
However,absence of arc asserts independence
F.C.Langbein,Artiﬁcial Intelligence – IV.Uncertain Knowledge and Reasoning;3.Bayesian Networks 17