Bayesian networks

lettuceescargatoireΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 3 χρόνια και 7 μήνες)

68 εμφανίσεις

Bayesiannetworks
1
Outline
♦Syntax
♦Semantics
♦Parameterizeddistributions
2
Bayesiannetworks
Asimple,graphicalnotationforconditionalindependenceassertions
andhenceforcompactspecificationoffulljointdistributions
Syntax:
asetofnodes,onepervariable
adirected,acyclicgraph(link

“directlyinfluences”)
aconditionaldistributionforeachnodegivenitsparents:
P(X
i
|Parents(X
i
))
Inthesimplestcase,conditionaldistributionrepresentedas
a
conditionalprobabilitytable
(CPT)givingthe
distributionover
X
i
foreachcombinationofparentvalues
3
Example
Topologyofnetworkencodesconditionalindependenceassertions:
Weather
Cavity
Toothache
Catch
Weather
isindependentoftheothervariables
Toothache
and
Catch
areconditionallyindependentgiven
Cavity
4
Example
I’matwork,neighborJohncallstosaymyalarmisringing,butneighbor
Marydoesn’tcall.Sometimesit’ssetoffbyminorearthquakes.Istherea
burglar?
Variables:
Burglar
,
Earthquake
,
Alarm
,
JohnCalls
,
MaryCalls
Networktopologyreflects“causal”knowledge:
–Aburglarcansetthealarmoff
–Anearthquakecansetthealarmoff
–ThealarmcancauseMarytocall
–ThealarmcancauseJohntocall
5
Examplecontd.
.001
P(B)
.002
P(E)
Alarm
Earthquake
MaryCalls
JohnCalls
Burglary
B
T
T
F
F
E
T
F
T
F
.95
.29
.001
.94
P(A|B,E)
A
T
F
.90
.05
P(J|A)
A
T
F
.70
.01
P(M|A)
6
Compactness
ACPTforBoolean
X
i
with
k
Booleanparentshas
B
E
J
A
M
2
k
rowsforthecombinationsofparentvalues
Eachrowrequiresonenumber
p
for
X
i
=true
(thenumberfor
X
i
=false
isjust
1−p
)
Ifeachvariablehasnomorethan
k
parents,
thecompletenetworkrequires
O(n∙2
k
)
numbers
I.e.,growslinearlywith
n
,vs.
O(2
n
)
forthefulljointdistribution
Forburglarynet,
1+1+4+2+2=10
numbers(vs.
2
5
−1=31
)
7
Globalsemantics
Global
semanticsdefinesthefulljointdistribution
B
E
J
A
M
astheproductofthelocalconditionaldistributions:
P(x
1
,...,x
n
)=Π
n
i=1
P(x
i
|parents(X
i
))
e.g.,
P(j∧m∧a∧¬b∧¬e)
=
8
Globalsemantics
“Global”semanticsdefinesthefulljointdistribution
B
E
J
A
M
astheproductofthelocalconditionaldistributions:
P(x
1
,...,x
n
)=Π
n
i=1
P(x
i
|parents(X
i
))
e.g.,
P(j∧m∧a∧¬b∧¬e)
=P(j|a)P(m|a)P(a|¬b,¬e)P(¬b)P(¬e)
=0.9×0.7×0.001×0.999×0.998
≈0.00063
9
Localsemantics
Local
semantics:eachnodeisconditionallyindependent
ofitsnondescendantsgivenitsparents
. . .
. . .
U
1
X
Um
Yn
Z
nj
Y
1
Z
1j
Theorem:
Localsemantics⇔globalsemantics
10
Markovblanket
Eachnodeisconditionallyindependentofallothersgivenits
Markovblanket
:parents+children+children’sparents
. . .
. . .
U1
X
U
m
Yn
Z
nj
Y
1
Z
1j
11
D-separation
Q:WhenarenodesXindependentofnodesYgivennodesE?
A:WheneveryundirectedpathfromanodeinXtoanodeinYisd-
separatedbyE.
X
Y
E
(1)
(2)
(3)
Z
Z
Z
12
Example
Radio
Battery
Ignition
Gas
Starts
Moves
AreGasandRadioindependent?GivenBattery?Ignition?Starts?Moves?
13
ConstructingBayesiannetworks
Needamethodsuchthataseriesoflocallytestableassertionsof
conditionalindependenceguaranteestherequiredglobalsemantics
1.Chooseanorderingofvariables
X
1
,...,X
n
2.For
i
=1to
n
add
X
i
tothenetwork
selectparentsfrom
X
1
,...,X
i−1
suchthat
P(X
i
|Parents(X
i
))=P(X
i
|X
1
,...,X
i−1
)
Thischoiceofparentsguaranteestheglobalsemantics:
P(X
1
,...,X
n
)=Π
n
i=1
P(X
i
|X
1
,...,X
i−1
)
(chainrule)

n
i=1
P(X
i
|Parents(X
i
))
(byconstruction)
14
Example
Supposewechoosetheordering
M
,
J
,
A
,
B
,
E
MaryCalls
JohnCalls
P(J|M)=P(J)
?
15
Example
Supposewechoosetheordering
M
,
J
,
A
,
B
,
E
MaryCalls
Alarm
JohnCalls
P(J|M)=P(J)
?No
P(A|J,M)=P(A|J)
?
P(A|J,M)=P(A)
?
16
Example
Supposewechoosetheordering
M
,
J
,
A
,
B
,
E
MaryCalls
Alarm
Burglary
JohnCalls
P(J|M)=P(J)
?No
P(A|J,M)=P(A|J)
?
P(A|J,M)=P(A)
?No
P(B|A,J,M)=P(B|A)
?
P(B|A,J,M)=P(B)
?
17
Example
Supposewechoosetheordering
M
,
J
,
A
,
B
,
E
MaryCalls
Alarm
Burglary
Earthquake
JohnCalls
P(J|M)=P(J)
?No
P(A|J,M)=P(A|J)
?
P(A|J,M)=P(A)
?No
P(B|A,J,M)=P(B|A)
?Yes
P(B|A,J,M)=P(B)
?No
P(E|B,A,J,M)=P(E|A)
?
P(E|B,A,J,M)=P(E|A,B)
?
18
Example
Supposewechoosetheordering
M
,
J
,
A
,
B
,
E
MaryCalls
Alarm
Burglary
Earthquake
JohnCalls
P(J|M)=P(J)
?No
P(A|J,M)=P(A|J)
?
P(A|J,M)=P(A)
?No
P(B|A,J,M)=P(B|A)
?Yes
P(B|A,J,M)=P(B)
?No
P(E|B,A,J,M)=P(E|A)
?No
P(E|B,A,J,M)=P(E|A,B)
?Yes
19
Examplecontd.
MaryCalls
Alarm
Burglary
Earthquake
JohnCalls
Decidingconditionalindependenceishardinnoncausaldirections
(Causalmodelsandconditionalindependenceseemhardwiredforhumans!)
Assessingconditionalprobabilitiesishardinnoncausaldirections
Networkislesscompact:
1+2+4+2+4=13
numbersneeded
20
Example:Cardiagnosis
Initialevidence:carwon’tstart
Testablevariables(green),“broken,sofixit”variables(orange)
Hiddenvariables(gray)ensuresparsestructure,reduceparameters
lights
no oil
no gas
starter
broken
battery age
alternator
broken
fanbelt
broken
battery
dead
no charging
battery
flat
gas gauge
fuel line
blocked
oil light
battery
meter
car won't
start
dipstick
21
Example:Carinsurance
SocioEcon
Age
GoodStudent
ExtraCar
Mileage
VehicleYear
RiskAversion
SeniorTrain
DrivingSkill
MakeModel
DrivingHist
DrivQuality
Antilock
Airbag
CarValue
HomeBase
AntiTheft
Theft
OwnDamage
PropertyCost
LiabilityCost
MedicalCost
Cushioning
Ruggedness
Accident
OtherCost
OwnCost
22
Compactconditionaldistributions
CPTgrowsexponentiallywithnumberofparents
CPTbecomesinfinitewithcontinuous-valuedparentorchild
Solution:
canonical
distributionsthataredefinedcompactly
Deterministic
nodesarethesimplestcase:
X=f(Parents(X))
forsomefunction
f
E.g.,Booleanfunctions
NorthAmerican⇔Canadian∨US∨Mexican
E.g.,numericalrelationshipsamongcontinuousvariables
∂Level
∂t
=inflow+precipitation-outflow-evaporation
23
Compactconditionaldistributionscontd.
Noisy-OR
distributionsmodelmultiplenoninteractingcauses
1)Parents
U
1
...U
k
includeallcauses(canadd
leaknode
)
2)Independentfailureprobability
q
i
foreachcausealone
⇒P(X|U
1
...U
j
,¬U
j+1
...¬U
k
)=1−Π
j
i=1
q
i
ColdFluMalaria
P(Fever)
P(¬Fever)
FFF
0.0
1.0
FFT
0.9
0.1
FTF
0.8
0.2
FTT
0.98
0.02=0.2×0.1
TFF
0.4
0.6
TFT
0.94
0.06=0.6×0.1
TTF
0.88
0.12=0.6×0.2
TTT
0.988
0.012=0.6×0.2×0.1
Numberofparameters
linear
innumberofparents
24
Hybrid(discrete+continuous)networks
Discrete(
Subsidy?
and
Buys?
);continuous(
Harvest
and
Cost
)
Buys?
Harvest
Subsidy?
Cost
Option1:discretization—possiblylargeerrors,largeCPTs
Option2:finitelyparameterizedcanonicalfamilies
1)Continuousvariable,discrete+continuousparents(e.g.,
Cost
)
2)Discretevariable,continuousparents(e.g.,
Buys?
)
25
Continuouschildvariables
Needone
conditionaldensity
functionforchildvariablegivencontinuous
parents,foreachpossibleassignmenttodiscreteparents
Mostcommonisthe
linearGaussian
model,e.g.,:
P(Cost=c|Harvest=h,Subsidy?=true)
=N(a
t
h+b
t

t
)(c)
=
1
σ
t


exp






1
2




c−(a
t
h+b
t
)
σ
t




2





Mean
Cost
varieslinearlywith
Harvest
,varianceisfixed
Linearvariationisunreasonableoverthefullrange
butworksOKifthe
likely
rangeof
Harvest
isnarrow
26
Continuouschildvariables
0
5
10
0
5
10
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Cost
Harvest
P(Cost|Harvest,Subsidy?=true)
All-continuousnetworkwithLGdistributions

fulljointdistributionisamultivariateGaussian
Discrete+continuousLGnetworkisa
conditionalGaussian
networki.e.,a
multivariateGaussianoverallcontinuousvariablesforeachcombinationof
discretevariablevalues
27
Discretevariablew/continuousparents
Probabilityof
Buys?
given
Cost
shouldbea“soft”threshold:
0
0.2
0.4
0.6
0.8
1
0
2
4
6
8
10
12
P(Buys?=false|Cost=c)
Cost c
Probit
distributionusesintegralofGaussian:
Φ(x)=
R
x
−∞
N(0,1)(x)dx
P(Buys?=true|Cost=c)=Φ((−c+µ)/σ)
28
Whytheprobit?
1.It’ssortoftherightshape
2.Canviewashardthresholdwhoselocationissubjecttonoise
Buys?
Cost
Cost
Noise
29
Discretevariablecontd.
Sigmoid
(or
logit
)distributionalsousedinneuralnetworks:
P(Buys?=true|Cost=c)=
1
1+exp(−2
−c+µ
σ
)
Sigmoidhassimilarshapetoprobitbutmuchlongertails:
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
2
4
6
8
10
12
P(Buys?=false|Cost=c)
Cost c
30
Summary
Bayesnetsprovideanaturalrepresentationfor(causallyinduced)
conditionalindependence
Topology+CPTs=compactrepresentationofjointdistribution
Generallyeasyfor(non)expertstoconstruct
Canonicaldistributions(e.g.,noisy-OR)=compactrepresentationofCPTs
Continuousvariables⇒parameterizeddistributions(e.g.,linearGaussian)
31