# Bayesian networks

IA et Robotique

7 nov. 2013 (il y a 4 années et 11 mois)

114 vue(s)

Bayesiannetworks
1
Outline
♦Syntax
♦Semantics
♦Parameterizeddistributions
2
Bayesiannetworks
Asimple,graphicalnotationforconditionalindependenceassertions
andhenceforcompactspeciﬁcationoffulljointdistributions
Syntax:
asetofnodes,onepervariable

“directlyinﬂuences”)
aconditionaldistributionforeachnodegivenitsparents:
P(X
i
|Parents(X
i
))
Inthesimplestcase,conditionaldistributionrepresentedas
a
conditionalprobabilitytable
(CPT)givingthe
distributionover
X
i
foreachcombinationofparentvalues
3
Example
Topologyofnetworkencodesconditionalindependenceassertions:
Weather
Cavity
Toothache
Catch
Weather
isindependentoftheothervariables
Toothache
and
Catch
areconditionallyindependentgiven
Cavity
4
Example
I’matwork,neighborJohncallstosaymyalarmisringing,butneighbor
Marydoesn’tcall.Sometimesit’ssetoﬀbyminorearthquakes.Istherea
burglar?
Variables:
Burglar
,
Earthquake
,
Alarm
,
JohnCalls
,
MaryCalls
Networktopologyreﬂects“causal”knowledge:
–Aburglarcansetthealarmoﬀ
–Anearthquakecansetthealarmoﬀ
–ThealarmcancauseMarytocall
–ThealarmcancauseJohntocall
5
Examplecontd.
.001
P(B)
.002
P(E)
Alarm
Earthquake
MaryCalls
JohnCalls
Burglary
B
T
T
F
F
E
T
F
T
F
.95
.29
.001
.94
P(A|B,E)
A
T
F
.90
.05
P(J|A)
A
T
F
.70
.01
P(M|A)
6
Compactness
ACPTforBoolean
X
i
with
k
Booleanparentshas
B
E
J
A
M
2
k
rowsforthecombinationsofparentvalues
Eachrowrequiresonenumber
p
for
X
i
=true
(thenumberfor
X
i
=false
isjust
1−p
)
Ifeachvariablehasnomorethan
k
parents,
thecompletenetworkrequires
O(n∙2
k
)
numbers
I.e.,growslinearlywith
n
,vs.
O(2
n
)
forthefulljointdistribution
Forburglarynet,
1+1+4+2+2=10
numbers(vs.
2
5
−1=31
)
7
Globalsemantics
Global
semanticsdeﬁnesthefulljointdistribution
B
E
J
A
M
astheproductofthelocalconditionaldistributions:
P(x
1
,...,x
n
)=Π
n
i=1
P(x
i
|parents(X
i
))
e.g.,
P(j∧m∧a∧¬b∧¬e)
=
8
Globalsemantics
“Global”semanticsdeﬁnesthefulljointdistribution
B
E
J
A
M
astheproductofthelocalconditionaldistributions:
P(x
1
,...,x
n
)=Π
n
i=1
P(x
i
|parents(X
i
))
e.g.,
P(j∧m∧a∧¬b∧¬e)
=P(j|a)P(m|a)P(a|¬b,¬e)P(¬b)P(¬e)
=0.9×0.7×0.001×0.999×0.998
≈0.00063
9
Localsemantics
Local
semantics:eachnodeisconditionallyindependent
ofitsnondescendantsgivenitsparents
. . .
. . .
U
1
X
Um
Yn
Z
nj
Y
1
Z
1j
Theorem:
Localsemantics⇔globalsemantics
10
Markovblanket
Eachnodeisconditionallyindependentofallothersgivenits
Markovblanket
:parents+children+children’sparents
. . .
. . .
U1
X
U
m
Yn
Z
nj
Y
1
Z
1j
11
D-separation
Q:WhenarenodesXindependentofnodesYgivennodesE?
A:WheneveryundirectedpathfromanodeinXtoanodeinYisd-
separatedbyE.
X
Y
E
(1)
(2)
(3)
Z
Z
Z
12
Example
Battery
Ignition
Gas
Starts
Moves
13
ConstructingBayesiannetworks
Needamethodsuchthataseriesoflocallytestableassertionsof
conditionalindependenceguaranteestherequiredglobalsemantics
1.Chooseanorderingofvariables
X
1
,...,X
n
2.For
i
=1to
n
X
i
tothenetwork
selectparentsfrom
X
1
,...,X
i−1
suchthat
P(X
i
|Parents(X
i
))=P(X
i
|X
1
,...,X
i−1
)
Thischoiceofparentsguaranteestheglobalsemantics:
P(X
1
,...,X
n
)=Π
n
i=1
P(X
i
|X
1
,...,X
i−1
)
(chainrule)

n
i=1
P(X
i
|Parents(X
i
))
(byconstruction)
14
Example
Supposewechoosetheordering
M
,
J
,
A
,
B
,
E
MaryCalls
JohnCalls
P(J|M)=P(J)
?
15
Example
Supposewechoosetheordering
M
,
J
,
A
,
B
,
E
MaryCalls
Alarm
JohnCalls
P(J|M)=P(J)
?No
P(A|J,M)=P(A|J)
?
P(A|J,M)=P(A)
?
16
Example
Supposewechoosetheordering
M
,
J
,
A
,
B
,
E
MaryCalls
Alarm
Burglary
JohnCalls
P(J|M)=P(J)
?No
P(A|J,M)=P(A|J)
?
P(A|J,M)=P(A)
?No
P(B|A,J,M)=P(B|A)
?
P(B|A,J,M)=P(B)
?
17
Example
Supposewechoosetheordering
M
,
J
,
A
,
B
,
E
MaryCalls
Alarm
Burglary
Earthquake
JohnCalls
P(J|M)=P(J)
?No
P(A|J,M)=P(A|J)
?
P(A|J,M)=P(A)
?No
P(B|A,J,M)=P(B|A)
?Yes
P(B|A,J,M)=P(B)
?No
P(E|B,A,J,M)=P(E|A)
?
P(E|B,A,J,M)=P(E|A,B)
?
18
Example
Supposewechoosetheordering
M
,
J
,
A
,
B
,
E
MaryCalls
Alarm
Burglary
Earthquake
JohnCalls
P(J|M)=P(J)
?No
P(A|J,M)=P(A|J)
?
P(A|J,M)=P(A)
?No
P(B|A,J,M)=P(B|A)
?Yes
P(B|A,J,M)=P(B)
?No
P(E|B,A,J,M)=P(E|A)
?No
P(E|B,A,J,M)=P(E|A,B)
?Yes
19
Examplecontd.
MaryCalls
Alarm
Burglary
Earthquake
JohnCalls
Decidingconditionalindependenceishardinnoncausaldirections
(Causalmodelsandconditionalindependenceseemhardwiredforhumans!)
Assessingconditionalprobabilitiesishardinnoncausaldirections
Networkislesscompact:
1+2+4+2+4=13
numbersneeded
20
Example:Cardiagnosis
Initialevidence:carwon’tstart
Testablevariables(green),“broken,soﬁxit”variables(orange)
Hiddenvariables(gray)ensuresparsestructure,reduceparameters
lights
no oil
no gas
starter
broken
battery age
alternator
broken
fanbelt
broken
battery
no charging
battery
flat
gas gauge
fuel line
blocked
oil light
battery
meter
car won't
start
dipstick
21
Example:Carinsurance
SocioEcon
Age
GoodStudent
ExtraCar
Mileage
VehicleYear
RiskAversion
SeniorTrain
DrivingSkill
MakeModel
DrivingHist
DrivQuality
Antilock
Airbag
CarValue
HomeBase
AntiTheft
Theft
OwnDamage
PropertyCost
LiabilityCost
MedicalCost
Cushioning
Ruggedness
Accident
OtherCost
OwnCost
22
Compactconditionaldistributions
CPTgrowsexponentiallywithnumberofparents
CPTbecomesinﬁnitewithcontinuous-valuedparentorchild
Solution:
canonical
distributionsthataredeﬁnedcompactly
Deterministic
nodesarethesimplestcase:
X=f(Parents(X))
forsomefunction
f
E.g.,Booleanfunctions
E.g.,numericalrelationshipsamongcontinuousvariables
∂Level
∂t
=inﬂow+precipitation-outﬂow-evaporation
23
Compactconditionaldistributionscontd.
Noisy-OR
distributionsmodelmultiplenoninteractingcauses
1)Parents
U
1
...U
k
leaknode
)
2)Independentfailureprobability
q
i
foreachcausealone
⇒P(X|U
1
...U
j
,¬U
j+1
...¬U
k
)=1−Π
j
i=1
q
i
ColdFluMalaria
P(Fever)
P(¬Fever)
FFF
0.0
1.0
FFT
0.9
0.1
FTF
0.8
0.2
FTT
0.98
0.02=0.2×0.1
TFF
0.4
0.6
TFT
0.94
0.06=0.6×0.1
TTF
0.88
0.12=0.6×0.2
TTT
0.988
0.012=0.6×0.2×0.1
Numberofparameters
linear
innumberofparents
24
Hybrid(discrete+continuous)networks
Discrete(
Subsidy?
and
);continuous(
Harvest
and
Cost
)
Harvest
Subsidy?
Cost
Option1:discretization—possiblylargeerrors,largeCPTs
Option2:ﬁnitelyparameterizedcanonicalfamilies
1)Continuousvariable,discrete+continuousparents(e.g.,
Cost
)
2)Discretevariable,continuousparents(e.g.,
)
25
Continuouschildvariables
Needone
conditionaldensity
functionforchildvariablegivencontinuous
parents,foreachpossibleassignmenttodiscreteparents
Mostcommonisthe
linearGaussian
model,e.g.,:
P(Cost=c|Harvest=h,Subsidy?=true)
=N(a
t
h+b
t

t
)(c)
=
1
σ
t

exp

1
2

c−(a
t
h+b
t
)
σ
t

2

Mean
Cost
varieslinearlywith
Harvest
,varianceisﬁxed
Linearvariationisunreasonableoverthefullrange
butworksOKifthe
likely
rangeof
Harvest
isnarrow
26
Continuouschildvariables
0
5
10
0
5
10
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Cost
Harvest
P(Cost|Harvest,Subsidy?=true)
All-continuousnetworkwithLGdistributions

fulljointdistributionisamultivariateGaussian
Discrete+continuousLGnetworkisa
conditionalGaussian
networki.e.,a
multivariateGaussianoverallcontinuousvariablesforeachcombinationof
discretevariablevalues
27
Discretevariablew/continuousparents
Probabilityof
given
Cost
shouldbea“soft”threshold:
0
0.2
0.4
0.6
0.8
1
0
2
4
6
8
10
12
Cost c
Probit
distributionusesintegralofGaussian:
Φ(x)=
R
x
−∞
N(0,1)(x)dx
28
Whytheprobit?
1.It’ssortoftherightshape
2.Canviewashardthresholdwhoselocationissubjecttonoise
Cost
Cost
Noise
29
Discretevariablecontd.
Sigmoid
(or
logit
)distributionalsousedinneuralnetworks:
1
1+exp(−2
−c+µ
σ
)
Sigmoidhassimilarshapetoprobitbutmuchlongertails:
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
2
4
6
8
10
12