# Bayesian Networks - iitk.ac.in - Indian Institute of Technology Kanpur

AI and Robotics

Nov 7, 2013 (4 years and 6 months ago)

97 views

BayesianNetworks
LaxmidharBehera
DepartmentofElectricalEngineering
IndianInstituteofTechnology,Kanpur
lbehera@iitk.ac.in

ProbabilityTheory
Discreterandomvariables:x
1
,x
2
SumRule:Marginalpdf
P(x
1
)=
￿
x
2
P(x
1
,x
2
)
ProductRule:Jointpdf
P(x
1
,x
2
)=P(x
1
)P(x
2
|x
1
)
Independentrandomvariables
P(x
1
,x
2
)=P(x
1
)P(x
2
)

ProblemComplexity
Weshallbeconcernedwiththenthorderbinaryprobability
distribution:P(x
1
,x
2
,....,x
n
)
Thisnthorderdistributionconsistsof2
n
variables/elementswhichis
denedastheprobabilityofaspecicconguration.
Estimationofeachvariablerequiresobservationofanexponentially
increasingnumberofbinarysymbols.

AnExample-ExtensionofLowerorderDistribution
x
1
P(x
1
)
01/4
13/4
x
2
P(x
2
)
01/2
11/2
(x
1
,x
2
)P(x
1
,x
2
)P(x
1
,x
2
)
001/80
011/81/4
103/81/2
113/81/4
TherstextensionfollowstheruleP(x
1
,x
2
)=P(x
1
)P(x
2
),andthe
secondextensionfollowstheruleP(x
1
,x
2
)=P(x
1
)P(x
2
|x
1
),where
P(x
2
|x
1
)hastobeknownapriori.

P(x
1
,x
2
,x
3
)=P(x
1
,x
2
)P(x
3
|x
2
)=
P(x
1
,x
2
)P(x
2
,x
3
)
P(x
2
)
P(x
1
,x
2
)=P(x
2
,x
3
)
(.,.)P(.,.)
000
011/2
101/2
110
(x
1
,x
2
,x
3
)P(x
1
,x
2
,x
3
)
0000
0010
0101/2
0110
1000
1011/2
1100
1110

ProductApproximations
Followingarethepossibleapproximationsforthethirdorderdistributions
p(x
1
,x
2
,x
3
).
•P(x
1
)P(x
2
)P(x
3
)
•P(x
1
,x
2
)P(x
3
)
•P(x
1
,x
3
)P(x
2
)
•P(x
2
,x
3
)P(x
1
)
•P(x
1
,x
2
)P(x
3
|x
1
)
•P(x
1
,x
2
)P(x
3
|x
2
)
•P(x
1
,x
3
)P(x
2
|x
1
)
•P(x
1
,x
3
)P(x
2
|x
3
)
•P(x
2
,x
3
)P(x
1
|x
2
)
•P(x
2
,x
3
)P(x
1
|x
3
)
Thecasesthatarenotproductapproximations:
Example:P(x
1
,x
2
,x
3
)6=P(x
1
,x
2
)P(x
2
,x
3
)P(x
1
,x
3
)
ThisisbecauseP(x
1
,x
2
)6=
P
x
3
P(x
1
,x
2
)P(x
2
,x
3
)P(x
1
,x
3
).

ApproximationProcedure
TheproductapproximationofanNthorderdistributioncontainsat
mostaproductofNterms,sinceitmustbepossibletowritethe
termsdowninasequencesuchthateachnewtermcontainsatleast
onevariable(x
j
)notcontainedinthepreviousterms.
Theunitysumproperty:Ifthevariablesarethensummedin
reverseorderbackthroughthesequence,theunitysumproperty
shouldbedemonstrated.
Example:Considerthefollowingproductapproximationfora7th
orderdistribution-P(x
1
,x
2
)P(x
5
|x
1
)P(x
6
|x
5
)P(x
3
,x
4
|x
6
)P(x
7
).
Thisexampledemonstratesunitysumproperty.

BayesTheorem
Theprobabilityofahypothesisx
1
,giventheobservedoutcomex
2
is
givenby:
P(x
1
|x
2
)=
P(x
2
|x
1
)P(x
1
)
P(x
2
)
•P(x
1
|x
2
)istheposteriorprobabilityofhypothesis
•P(x
2
|x
1
)isthelikelihoodofobserveddata
•P(x
1
)isthepriorprobabilityofhypothesis
P(x
2
)=
￿
x
1
P(x
2
,x
1
)=
￿
x
1
P(x
2
|x
1
)P(x
1
)

BayesianNetwork
BayesianNetworkisaprobabilisticgraphicalmodelthatrepresents
therandomvariablesandtheirconditionaldependencies,usinga
DirectedAcyclicGraph(DAG).
Examples
P(a,b,c)=P(c|a,b)P(a,b)
=P(c|a,b)P(b|a)P(a)

BayesianNetwork
ConditionalIndependence:x
7
isonlydependentonx
4
andx
5
P(x
1
,x
2
,x
3
,x
4
,x
5
,x
6
,x
7
)=P(x
1
)P(x
2
)P(x
3
)P(x
4
|x
1
,x
2
,x
3
)
P(x
5
|x1,x3)P(x
6
|x
4
)P(x
7
|x
4
,x
5
)

BayesianNetwork:Example
Wehavefourbinaryevents,representedbyfourrandomvariables:
•W=true:GrassisWet
•S=true:WaterSprinklerison
•R=true:ItisRaining
•C=true:ItisCloudy

BayesianNetwork:Example
ByChainRule,
P(C,S,R,W)=P(C)P(S|C)P(R|C,S)P(W|C,S,R)(1)
Theconditionalindependencerelationshipsallowustorepresentthe
jointmorecompactly.

BayesianNetwork:Example
Byusingconditionalindependencerelationships,
P(C,S,R,W)=P(C)P(S|C)P(R|C)P(W|S,R)(2)

BayesianNetwork:SimplerExample
Question:Whatistheprobabilitythatitisraining,giventhegrassis
wet?

ConditionalIndependence
p(a|b,c)=p(a|c)
aisconditionallyindependentofbgivenc.
p(a,b|c)=p(a|b,c)p(b|c)
=p(a|c)p(b|c)
aandbarestatisticallyindependent,givenc.
Notethatp(a,b)6=p(a)p(b).

ConditionalIndependence:Example1
Ingeneral,ifcisunobserved,
p(a,b,c)=p(a|c)p(b|c)p(c)
aandbarenotindependentingeneral,as
p(a,b)=
￿
c
p(a|c)p(b|c)p(c)
6=p(a)p(b)

ConditionalIndependence:Example1
But,ifcisobserved,
p(a,b|c)=
p(a,b,c)
p(c)
=p(a|c)p(b|c)
Butstill,
p(a,b)6=p(a)p(b)

Thenodecissaidtobetail-to-tailw.r.t.thepathfromnodeatonode
b.

ConditionalIndependence:Example2
Ingeneral,ifcisunobserved,
p(a,b,c)=p(a)p(c|a)p(b|c)
aandbarenotindependentingeneral,as
p(a,b)=p(a)p(b|a)6=p(a)p(b)

ConditionalIndependence:Example2
But,ifcisobserved
p(a,b|c)=
p(a,b,c)
p(c)
=
p(a)p(c|a)p(b|c)
p(c)
=p(a|c)p(b|c)

Thus,conditionalindependenceholds.Butstill,
p(a,b)6=p(a)p(b)
nodeb.

ConditionalIndependence:Example3
Ingeneral,ifcisunobserved,
p(a,b,c)=p(a)p(b)p(c|a,b)
aandbareindependentingeneral,as
p(a,b)=
￿
c
p(a)p(b)p(c|a,b)
=p(a)p(b)

ConditionalIndependence:Example3
But,ifcisobserved
p(a,b|c)=
p(a,b,c)
p(c)
=
p(a)p(b)p(c|a,b)
p(c)
6=p(a|c)p(b|c)

Thus,conditionalindependencedoesnotholds,but
p(a,b)=p(a)p(b)
nodeb.

ConditionalIndependence:Summary
unlessitisobserved.
unobserved;andunblocksthepath,whenobserved.

D-separation
Goal:Toascertainwhetheraparticularconditionalindependence
statementisimpliedbyagivedDAG.
A,B,Carenon-intersectingsetsofnodes.Then,isAconditionally
independentofB,givenC?

D-separation
•ConsiderallthepathsfromanynodeinAtoanynodeinB.
•Anypathissaidtobeblocked,ifitincludesanodesuchthat
either
thenode,andthenodeisinthesetC,or
node,noranyofitsdescendants,isinthesetC.
•Ifallpathsareblocked,thenAissaidtobed-separatedfromB
byC,andthejointdistributionoverallofthevariablesinthe
graphwillsatisfyconditionalindependenceofAandB,givenC.

D-separation:Exercise
Ingraph(a),checkforc.i.ofaandb,givenc.
Ingraph(b),checkforc.i.ofaandb,givenf.

Ingraph(a),
•thepathfromatobisnotblockedbyf
•thepathfromatobisnotblockedbye
Thusgraph(a)doesnotimplyc.i.ofaandb,givenc.
Ingraph(b),
•thepathfromatobisblockedbyf(andetoo)
Thusgraph(b)impliesc.i.ofaandb,givenf.
Hence,thisc.i.propertywillbesatisedbyanydistributionwhich
factorizesaccordingtothisgraph.

THANKYOU

AnInformationMeasureforProbabilityDistributions
LetP
0
,P
1
,...,P
j
,...,P
2
n
−1
arethevariables/elementswherethe
elementP
j
isthebinarysequenceofthenumberj.
TheEntropyofthedistributionisgivenby
H
s
=−
2
n
−1
￿
0
P
j
logP
j
Informationcontainedinannthorderbinarydistributionisdened
tobe
I
p
=nlog2−H
s
=nlog2+
2
n
−1
￿
0
P
j
logP
j

ConceptofEntropy
KnownDistribution:Theatterthedistribution,informationcontent
ismore.Themorepeakedthedistribution,lessisinformation
content.
UnknownDistribution:Theatterthedistribution,lessinformation
Maximuminformationcontent(Feinstein,1958):H
s
max
=nlog2.
ThissufcesthatI
p
=H
s
max
−H
s
.
Whenthedistributionisperfectlyat,H
s
=nlog2.Whenthe
distributionisperfectlypeaked,H
s
=0.Thus0<I
p
<nlog2.

Thebasicpremiseofthisdenitionisthatwhensequencesare

AMeasurefortheclosenessofoneprobabilitydistributiontoanother
1
,x
2
,....,x
n
)withelementsP
0
,P
1
,...,P
j
,...,P
2
n−1
is
approximatedbyanotherdistributionwhoseelementsare
P

0
,P

1
,...,P

j
,...,P

2
n−1
suchthat
P
P

j
=1.I
P

istheinformationthatthe
I
P

=nlog2+
2
n−1
Xj=0
P
j
logP

j
Theclosenessofapproximationisthendenedasfollows:
I
P−P

=I
P
−I
P

=
2
n−1
X
j=0
P
j
log
P
j
P

j
≥0