Artificial Intelligence Methods

reverandrunΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

48 εμφανίσεις

ArtificialIntelligenceMethods
Bayesiannetworks
Inwhichweexplainhowtobuildnetworkmodelstoreasonunder
uncertaintyaccordingtothelawsofprobabilitytheory.
Dr.IgorTrajkovski
Dr.IgorTrajkovski1
Outline
♦Syntax
♦Semantics
♦Parameterizeddistributions
Dr.IgorTrajkovski2
Bayesiannetworks
Asimple,graphicalnotationforconditionalindependenceassertions
andhenceforcompactspecificationoffulljointdistributions
Syntax:
asetofnodes,onepervariable
adirected,acyclicgraph(link

“directlyinfluences”)
aconditionaldistributionforeachnodegivenitsparents:
P(X
i
|Parents(X
i
))
Inthesimplestcase,conditionaldistributionrepresentedas
a
conditionalprobabilitytable
(CPT)givingthe
distributionover
X
i
foreachcombinationofparentvalues
Dr.IgorTrajkovski3
Example
Topologyofnetworkencodesconditionalindependenceassertions:
Weather
Cavity
Toothache
Catch
Weather
isindependentoftheothervariables
Toothache
and
Catch
areconditionallyindependentgiven
Cavity
Dr.IgorTrajkovski4
Example
I’matwork,neighborJohncallstosaymyalarmisringing,butneighbor
Marydoesn’tcall.Sometimesit’ssetoffbyminorearthquakes.Istherea
burglar?
Variables:
Burglar
,
Earthquake
,
Alarm
,
JohnCalls
,
MaryCalls
Networktopologyreflects“causal”knowledge:
–Aburglarcansetthealarmoff
–Anearthquakecansetthealarmoff
–ThealarmcancauseMarytocall
–ThealarmcancauseJohntocall
Dr.IgorTrajkovski5
Examplecontd.
.001
P(B)
.002
P(E)
Alarm
Earthquake
MaryCalls
JohnCalls
Burglary
B
T
T
F
F
E
T
F
T
F
.95
.29
.001
.94
P(A|B,E)
A
T
F
.90
.05
P(J|A)
A
T
F
.70
.01
P(M|A)
Dr.IgorTrajkovski6
Compactness
ACPTforBoolean
X
i
with
k
Booleanparentshas
B
E
J
A
M
2
k
rowsforthecombinationsofparentvalues
Eachrowrequiresonenumber
p
for
X
i
=true
(thenumberfor
X
i
=false
isjust
1−p
)
Ifeachvariablehasnomorethan
k
parents,
thecompletenetworkrequires
O(n∙2
k
)
numbers
I.e.,growslinearlywith
n
,vs.
O(2
n
)
forthefulljointdistribution
Forburglarynet,
1+1+4+2+2=10
numbers(vs.
2
5
−1=31
)
Dr.IgorTrajkovski7
Globalsemantics
Global
semanticsdefinesthefulljointdistribution
B
E
J
A
M
astheproductofthelocalconditionaldistributions:
P(x
1
,...,x
n
)=Π
n
i=1
P(x
i
|parents(X
i
))
e.g.,
P(j∧m∧a∧¬b∧¬e)
=
Dr.IgorTrajkovski8
Globalsemantics
“Global”semanticsdefinesthefulljointdistribution
B
E
J
A
M
astheproductofthelocalconditionaldistributions:
P(x
1
,...,x
n
)=Π
n
i=1
P(x
i
|parents(X
i
))
e.g.,
P(j∧m∧a∧¬b∧¬e)
=P(j|a)P(m|a)P(a|¬b,¬e)P(¬b)P(¬e)
=0.9×0.7×0.001×0.999×0.998
≈0.00063
Dr.IgorTrajkovski9
Localsemantics
Local
semantics:eachnodeisconditionallyindependent
ofitsnondescendantsgivenitsparents
. . .
. . .
U
1
X
Um
Yn
Z
nj
Y
1
Z
1j
Theorem:
Localsemantics⇔globalsemantics
Dr.IgorTrajkovski10
Markovblanket
Eachnodeisconditionallyindependentofallothersgivenits
Markovblanket
:parents+children+children’sparents
. . .
. . .
U
1
X
U
m
Yn
Z
nj
Y
1
Z
1j
Dr.IgorTrajkovski11
ConstructingBayesiannetworks
Needamethodsuchthataseriesoflocallytestableassertionsof
conditionalindependenceguaranteestherequiredglobalsemantics
1.Chooseanorderingofvariables
X
1
,...,X
n
2.For
i
=1to
n
add
X
i
tothenetwork
selectparentsfrom
X
1
,...,X
i−1
suchthat
P(X
i
|Parents(X
i
))=P(X
i
|X
1
,...,X
i−1
)
Thischoiceofparentsguaranteestheglobalsemantics:
P(X
1
,...,X
n
)=Π
n
i=1
P(X
i
|X
1
,...,X
i−1
)
(chainrule)

n
i=1
P(X
i
|Parents(X
i
))
(byconstruction)
Dr.IgorTrajkovski12
Example
Supposewechoosetheordering
M
,
J
,
A
,
B
,
E
MaryCalls
JohnCalls
P(J|M)=P(J)
?
Dr.IgorTrajkovski13
Example
Supposewechoosetheordering
M
,
J
,
A
,
B
,
E
MaryCalls
Alarm
JohnCalls
P(J|M)=P(J)
?No
P(A|J,M)=P(A|J)
?
P(A|J,M)=P(A)
?
Dr.IgorTrajkovski14
Example
Supposewechoosetheordering
M
,
J
,
A
,
B
,
E
MaryCalls
Alarm
Burglary
JohnCalls
P(J|M)=P(J)
?No
P(A|J,M)=P(A|J)
?
P(A|J,M)=P(A)
?No
P(B|A,J,M)=P(B|A)
?
P(B|A,J,M)=P(B)
?
Dr.IgorTrajkovski15
Example
Supposewechoosetheordering
M
,
J
,
A
,
B
,
E
MaryCalls
Alarm
Burglary
Earthquake
JohnCalls
P(J|M)=P(J)
?No
P(A|J,M)=P(A|J)
?
P(A|J,M)=P(A)
?No
P(B|A,J,M)=P(B|A)
?Yes
P(B|A,J,M)=P(B)
?No
P(E|B,A,J,M)=P(E|A)
?
P(E|B,A,J,M)=P(E|A,B)
?
Dr.IgorTrajkovski16
Example
Supposewechoosetheordering
M
,
J
,
A
,
B
,
E
MaryCalls
Alarm
Burglary
Earthquake
JohnCalls
P(J|M)=P(J)
?No
P(A|J,M)=P(A|J)
?
P(A|J,M)=P(A)
?No
P(B|A,J,M)=P(B|A)
?Yes
P(B|A,J,M)=P(B)
?No
P(E|B,A,J,M)=P(E|A)
?No
P(E|B,A,J,M)=P(E|A,B)
?Yes
Dr.IgorTrajkovski17
Examplecontd.
MaryCalls
Alarm
Burglary
Earthquake
JohnCalls
Decidingconditionalindependenceishardinnoncausaldirections
(Causalmodelsandconditionalindependenceseemhardwiredforhumans!)
Assessingconditionalprobabilitiesishardinnoncausaldirections
Networkislesscompact:
1+2+4+2+4=13
numbersneeded
Dr.IgorTrajkovski18
Example:Cardiagnosis
Initialevidence:carwon’tstart
Testablevariables(green),“broken,sofixit”variables(orange)
Hiddenvariables(gray)ensuresparsestructure,reduceparameters
lights
no oil
no gas
starter
broken
battery age
alternator
broken
fanbelt
broken
battery
dead
no charging
battery
flat
gas gauge
fuel line
blocked
oil light
battery
meter
car won't
start
dipstick
Dr.IgorTrajkovski19
Example:Carinsurance
SocioEcon
Age
GoodStudent
ExtraCar
Mileage
VehicleYear
RiskAversion
SeniorTrain
DrivingSkill
MakeModel
DrivingHist
DrivQuality
Antilock
Airbag
CarValue
HomeBase
AntiTheft
Theft
OwnDamage
PropertyCost
LiabilityCost
MedicalCost
Cushioning
Ruggedness
Accident
OtherCost
OwnCost
Dr.IgorTrajkovski20
Compactconditionaldistributions
CPTgrowsexponentiallywithnumberofparents
CPTbecomesinfinitewithcontinuous-valuedparentorchild
Solution:
canonical
distributionsthataredefinedcompactly
Deterministic
nodesarethesimplestcase:
X=f(Parents(X))
forsomefunction
f
E.g.,Booleanfunctions
NorthAmerican⇔Canadian∨US∨Mexican
E.g.,numericalrelationshipsamongcontinuousvariables
∂Level
∂t
=inflow+precipitation-outflow-evaporation
Dr.IgorTrajkovski21
Compactconditionaldistributionscontd.
Noisy-OR
distributionsmodelmultiplenoninteractingcauses
1)Parents
U
1
...U
k
includeallcauses(canadd
leaknode
)
2)Independentfailureprobability
q
i
foreachcausealone
⇒P(X|U
1
...U
j
,¬U
j+1
...¬U
k
)=1−Π
j
i=1
q
i
ColdFluMalaria
P(Fever)
P(¬Fever)
FFF
0.0
1.0
FFT
0.9
0.1
FTF
0.8
0.2
FTT
0.98
0.02=0.2×0.1
TFF
0.4
0.6
TFT
0.94
0.06=0.6×0.1
TTF
0.88
0.12=0.6×0.2
TTT
0.988
0.012=0.6×0.2×0.1
Numberofparameters
linear
innumberofparents
Dr.IgorTrajkovski22
Hybrid(discrete+continuous)networks
Discrete(
Subsidy?
and
Buys?
);continuous(
Harvest
and
Cost
)
Buys?
Harvest
Subsidy?
Cost
Option1:discretization—possiblylargeerrors,largeCPTs
Option2:finitelyparameterizedcanonicalfamilies
1)Continuousvariable,discrete+continuousparents(e.g.,
Cost
)
2)Discretevariable,continuousparents(e.g.,
Buys?
)
Dr.IgorTrajkovski23
Continuouschildvariables
Needone
conditionaldensity
functionforchildvariablegivencontinuous
parents,foreachpossibleassignmenttodiscreteparents
Mostcommonisthe
linearGaussian
model,e.g.,:
P(Cost=c|Harvest=h,Subsidy?=true)
=N(a
t
h+b
t

t
)(c)
=
1
σ
t


exp






1
2




c−(a
t
h+b
t
)
σ
t




2





Mean
Cost
varieslinearlywith
Harvest
,varianceisfixed
Linearvariationisunreasonableoverthefullrange
butworksOKifthe
likely
rangeof
Harvest
isnarrow
Dr.IgorTrajkovski24
Continuouschildvariables
0
5
10
0
5
10
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Cost
Harvest
P(Cost|Harvest,Subsidy?=true)
All-continuousnetworkwithLGdistributions

fulljointdistributionisamultivariateGaussian
Discrete+continuousLGnetworkisa
conditionalGaussian
networki.e.,a
multivariateGaussianoverallcontinuousvariablesforeachcombinationof
discretevariablevalues
Dr.IgorTrajkovski25
Discretevariablew/continuousparents
Probabilityof
Buys?
given
Cost
shouldbea“soft”threshold:
0
0.2
0.4
0.6
0.8
1
0
2
4
6
8
10
12
P(Buys?=false|Cost=c)
Cost c
Probit
distributionusesintegralofGaussian:
Φ(x)=
R
x
−∞
N(0,1)(x)dx
P(Buys?=true|Cost=c)=Φ((−c+µ)/σ)
Dr.IgorTrajkovski26
Whytheprobit?
1.It’ssortoftherightshape
2.Canviewashardthresholdwhoselocationissubjecttonoise
Buys?
Cost
Cost
Noise
Dr.IgorTrajkovski27
Discretevariablecontd.
Sigmoid
(or
logit
)distributionalsousedinneuralnetworks:
P(Buys?=true|Cost=c)=
1
1+exp(−2
−c+µ
σ
)
Sigmoidhassimilarshapetoprobitbutmuchlongertails:
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
2
4
6
8
10
12
P(Buys?=false|Cost=c)
Cost c
Dr.IgorTrajkovski28
Summary
Bayesnetsprovideanaturalrepresentationfor(causallyinduced)
conditionalindependence
Topology+CPTs=compactrepresentationofjointdistribution
Generallyeasyfor(non)expertstoconstruct
Canonicaldistributions(e.g.,noisy-OR)=compactrepresentationofCPTs
Continuousvariables⇒parameterizeddistributions(e.g.,linearGaussian)
Dr.IgorTrajkovski29