# Artificial Intelligence Methods

Τεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 4 χρόνια και 8 μήνες)

61 εμφανίσεις

ArtificialIntelligenceMethods
Bayesiannetworks
Inwhichweexplainhowtobuildnetworkmodelstoreasonunder
uncertaintyaccordingtothelawsofprobabilitytheory.
Dr.IgorTrajkovski
Dr.IgorTrajkovski1
Outline
♦Syntax
♦Semantics
♦Parameterizeddistributions
Dr.IgorTrajkovski2
Bayesiannetworks
Asimple,graphicalnotationforconditionalindependenceassertions
andhenceforcompactspeciﬁcationoffulljointdistributions
Syntax:
asetofnodes,onepervariable

“directlyinﬂuences”)
aconditionaldistributionforeachnodegivenitsparents:
P(X
i
|Parents(X
i
))
Inthesimplestcase,conditionaldistributionrepresentedas
a
conditionalprobabilitytable
(CPT)givingthe
distributionover
X
i
foreachcombinationofparentvalues
Dr.IgorTrajkovski3
Example
Topologyofnetworkencodesconditionalindependenceassertions:
Weather
Cavity
Toothache
Catch
Weather
isindependentoftheothervariables
Toothache
and
Catch
areconditionallyindependentgiven
Cavity
Dr.IgorTrajkovski4
Example
I’matwork,neighborJohncallstosaymyalarmisringing,butneighbor
Marydoesn’tcall.Sometimesit’ssetoﬀbyminorearthquakes.Istherea
burglar?
Variables:
Burglar
,
Earthquake
,
Alarm
,
JohnCalls
,
MaryCalls
Networktopologyreﬂects“causal”knowledge:
–Aburglarcansetthealarmoﬀ
–Anearthquakecansetthealarmoﬀ
–ThealarmcancauseMarytocall
–ThealarmcancauseJohntocall
Dr.IgorTrajkovski5
Examplecontd.
.001
P(B)
.002
P(E)
Alarm
Earthquake
MaryCalls
JohnCalls
Burglary
B
T
T
F
F
E
T
F
T
F
.95
.29
.001
.94
P(A|B,E)
A
T
F
.90
.05
P(J|A)
A
T
F
.70
.01
P(M|A)
Dr.IgorTrajkovski6
Compactness
ACPTforBoolean
X
i
with
k
Booleanparentshas
B
E
J
A
M
2
k
rowsforthecombinationsofparentvalues
Eachrowrequiresonenumber
p
for
X
i
=true
(thenumberfor
X
i
=false
isjust
1−p
)
Ifeachvariablehasnomorethan
k
parents,
thecompletenetworkrequires
O(n∙2
k
)
numbers
I.e.,growslinearlywith
n
,vs.
O(2
n
)
forthefulljointdistribution
Forburglarynet,
1+1+4+2+2=10
numbers(vs.
2
5
−1=31
)
Dr.IgorTrajkovski7
Globalsemantics
Global
semanticsdeﬁnesthefulljointdistribution
B
E
J
A
M
astheproductofthelocalconditionaldistributions:
P(x
1
,...,x
n
)=Π
n
i=1
P(x
i
|parents(X
i
))
e.g.,
P(j∧m∧a∧¬b∧¬e)
=
Dr.IgorTrajkovski8
Globalsemantics
“Global”semanticsdeﬁnesthefulljointdistribution
B
E
J
A
M
astheproductofthelocalconditionaldistributions:
P(x
1
,...,x
n
)=Π
n
i=1
P(x
i
|parents(X
i
))
e.g.,
P(j∧m∧a∧¬b∧¬e)
=P(j|a)P(m|a)P(a|¬b,¬e)P(¬b)P(¬e)
=0.9×0.7×0.001×0.999×0.998
≈0.00063
Dr.IgorTrajkovski9
Localsemantics
Local
semantics:eachnodeisconditionallyindependent
ofitsnondescendantsgivenitsparents
. . .
. . .
U
1
X
Um
Yn
Z
nj
Y
1
Z
1j
Theorem:
Localsemantics⇔globalsemantics
Dr.IgorTrajkovski10
Markovblanket
Eachnodeisconditionallyindependentofallothersgivenits
Markovblanket
:parents+children+children’sparents
. . .
. . .
U
1
X
U
m
Yn
Z
nj
Y
1
Z
1j
Dr.IgorTrajkovski11
ConstructingBayesiannetworks
Needamethodsuchthataseriesoflocallytestableassertionsof
conditionalindependenceguaranteestherequiredglobalsemantics
1.Chooseanorderingofvariables
X
1
,...,X
n
2.For
i
=1to
n
X
i
tothenetwork
selectparentsfrom
X
1
,...,X
i−1
suchthat
P(X
i
|Parents(X
i
))=P(X
i
|X
1
,...,X
i−1
)
Thischoiceofparentsguaranteestheglobalsemantics:
P(X
1
,...,X
n
)=Π
n
i=1
P(X
i
|X
1
,...,X
i−1
)
(chainrule)

n
i=1
P(X
i
|Parents(X
i
))
(byconstruction)
Dr.IgorTrajkovski12
Example
Supposewechoosetheordering
M
,
J
,
A
,
B
,
E
MaryCalls
JohnCalls
P(J|M)=P(J)
?
Dr.IgorTrajkovski13
Example
Supposewechoosetheordering
M
,
J
,
A
,
B
,
E
MaryCalls
Alarm
JohnCalls
P(J|M)=P(J)
?No
P(A|J,M)=P(A|J)
?
P(A|J,M)=P(A)
?
Dr.IgorTrajkovski14
Example
Supposewechoosetheordering
M
,
J
,
A
,
B
,
E
MaryCalls
Alarm
Burglary
JohnCalls
P(J|M)=P(J)
?No
P(A|J,M)=P(A|J)
?
P(A|J,M)=P(A)
?No
P(B|A,J,M)=P(B|A)
?
P(B|A,J,M)=P(B)
?
Dr.IgorTrajkovski15
Example
Supposewechoosetheordering
M
,
J
,
A
,
B
,
E
MaryCalls
Alarm
Burglary
Earthquake
JohnCalls
P(J|M)=P(J)
?No
P(A|J,M)=P(A|J)
?
P(A|J,M)=P(A)
?No
P(B|A,J,M)=P(B|A)
?Yes
P(B|A,J,M)=P(B)
?No
P(E|B,A,J,M)=P(E|A)
?
P(E|B,A,J,M)=P(E|A,B)
?
Dr.IgorTrajkovski16
Example
Supposewechoosetheordering
M
,
J
,
A
,
B
,
E
MaryCalls
Alarm
Burglary
Earthquake
JohnCalls
P(J|M)=P(J)
?No
P(A|J,M)=P(A|J)
?
P(A|J,M)=P(A)
?No
P(B|A,J,M)=P(B|A)
?Yes
P(B|A,J,M)=P(B)
?No
P(E|B,A,J,M)=P(E|A)
?No
P(E|B,A,J,M)=P(E|A,B)
?Yes
Dr.IgorTrajkovski17
Examplecontd.
MaryCalls
Alarm
Burglary
Earthquake
JohnCalls
Decidingconditionalindependenceishardinnoncausaldirections
(Causalmodelsandconditionalindependenceseemhardwiredforhumans!)
Assessingconditionalprobabilitiesishardinnoncausaldirections
Networkislesscompact:
1+2+4+2+4=13
numbersneeded
Dr.IgorTrajkovski18
Example:Cardiagnosis
Initialevidence:carwon’tstart
Testablevariables(green),“broken,soﬁxit”variables(orange)
Hiddenvariables(gray)ensuresparsestructure,reduceparameters
lights
no oil
no gas
starter
broken
battery age
alternator
broken
fanbelt
broken
battery
no charging
battery
flat
gas gauge
fuel line
blocked
oil light
battery
meter
car won't
start
dipstick
Dr.IgorTrajkovski19
Example:Carinsurance
SocioEcon
Age
GoodStudent
ExtraCar
Mileage
VehicleYear
RiskAversion
SeniorTrain
DrivingSkill
MakeModel
DrivingHist
DrivQuality
Antilock
Airbag
CarValue
HomeBase
AntiTheft
Theft
OwnDamage
PropertyCost
LiabilityCost
MedicalCost
Cushioning
Ruggedness
Accident
OtherCost
OwnCost
Dr.IgorTrajkovski20
Compactconditionaldistributions
CPTgrowsexponentiallywithnumberofparents
CPTbecomesinﬁnitewithcontinuous-valuedparentorchild
Solution:
canonical
distributionsthataredeﬁnedcompactly
Deterministic
nodesarethesimplestcase:
X=f(Parents(X))
forsomefunction
f
E.g.,Booleanfunctions
E.g.,numericalrelationshipsamongcontinuousvariables
∂Level
∂t
=inﬂow+precipitation-outﬂow-evaporation
Dr.IgorTrajkovski21
Compactconditionaldistributionscontd.
Noisy-OR
distributionsmodelmultiplenoninteractingcauses
1)Parents
U
1
...U
k
leaknode
)
2)Independentfailureprobability
q
i
foreachcausealone
⇒P(X|U
1
...U
j
,¬U
j+1
...¬U
k
)=1−Π
j
i=1
q
i
ColdFluMalaria
P(Fever)
P(¬Fever)
FFF
0.0
1.0
FFT
0.9
0.1
FTF
0.8
0.2
FTT
0.98
0.02=0.2×0.1
TFF
0.4
0.6
TFT
0.94
0.06=0.6×0.1
TTF
0.88
0.12=0.6×0.2
TTT
0.988
0.012=0.6×0.2×0.1
Numberofparameters
linear
innumberofparents
Dr.IgorTrajkovski22
Hybrid(discrete+continuous)networks
Discrete(
Subsidy?
and
);continuous(
Harvest
and
Cost
)
Harvest
Subsidy?
Cost
Option1:discretization—possiblylargeerrors,largeCPTs
Option2:ﬁnitelyparameterizedcanonicalfamilies
1)Continuousvariable,discrete+continuousparents(e.g.,
Cost
)
2)Discretevariable,continuousparents(e.g.,
)
Dr.IgorTrajkovski23
Continuouschildvariables
Needone
conditionaldensity
functionforchildvariablegivencontinuous
parents,foreachpossibleassignmenttodiscreteparents
Mostcommonisthe
linearGaussian
model,e.g.,:
P(Cost=c|Harvest=h,Subsidy?=true)
=N(a
t
h+b
t

t
)(c)
=
1
σ
t

exp

1
2

c−(a
t
h+b
t
)
σ
t

2

Mean
Cost
varieslinearlywith
Harvest
,varianceisﬁxed
Linearvariationisunreasonableoverthefullrange
butworksOKifthe
likely
rangeof
Harvest
isnarrow
Dr.IgorTrajkovski24
Continuouschildvariables
0
5
10
0
5
10
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Cost
Harvest
P(Cost|Harvest,Subsidy?=true)
All-continuousnetworkwithLGdistributions

fulljointdistributionisamultivariateGaussian
Discrete+continuousLGnetworkisa
conditionalGaussian
networki.e.,a
multivariateGaussianoverallcontinuousvariablesforeachcombinationof
discretevariablevalues
Dr.IgorTrajkovski25
Discretevariablew/continuousparents
Probabilityof
given
Cost
shouldbea“soft”threshold:
0
0.2
0.4
0.6
0.8
1
0
2
4
6
8
10
12
Cost c
Probit
distributionusesintegralofGaussian:
Φ(x)=
R
x
−∞
N(0,1)(x)dx
Dr.IgorTrajkovski26
Whytheprobit?
1.It’ssortoftherightshape
2.Canviewashardthresholdwhoselocationissubjecttonoise
Cost
Cost
Noise
Dr.IgorTrajkovski27
Discretevariablecontd.
Sigmoid
(or
logit
)distributionalsousedinneuralnetworks:
1
1+exp(−2
−c+µ
σ
)
Sigmoidhassimilarshapetoprobitbutmuchlongertails:
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
2
4
6
8
10
12