Bayesian Belief Network

cabbageswerveAI and Robotics

Nov 7, 2013 (3 years and 9 months ago)

97 views

Bayesian Belief Network






The decomposition of large probabilistic domains into
weakly connected subsets via conditional
independence is one of the most important
developments in the recent history of AI



This can work well, even the assumption is not true!

)
,
,
(
)
(
)
,
,
,
(
)
(
)
(
)
(
cavity
catch
toothache
P
cloudy
Weather
P
cloudy
Weather
cavity
catch
toothache
P
b
P
a
P
b
a
P






v
NB



Naive Bayes assumption:




which gives


Bayesian networks


Conditional Independence


Inference in Bayesian Networks


Irrelevant variables


Constructing Bayesian Networks


Aprendizagem Redes Bayesianas



Examples
-

Exercisos



Naive Bayes assumption of conditional
independence too restrictive


But it's intractable without some such
assumptions...



Bayesian Belief networks describe conditional
independence among
subsets

of variables


allows combining prior knowledge about
(in)dependencies among

variables with observed training data

Bayesian networks


A simple, graphical notation for conditional independence
assertions and hence for compact specification of full joint
distributions



Syntax:


a set of nodes, one per variable




a directed, acyclic graph (link

"directly influences")


a conditional distribution for each node given its parents:

P
(X
i
| Parents (X
i
))



In the simplest case, conditional distribution represented as a
conditional probability table

(CPT) giving the distribution over
X
i

for each combination of parent values

Bayesian Networks


Bayesian belief network allows a
subset

of the
variables conditionally independent


A graphical model of causal relationships


Represents
dependency

among the variables


Gives a specification of joint probability distribution

X

Y

Z

P


Nodes: random variables


Links: dependency


X,Y are the parents of Z, and Y is the
parent of P


No dependency between Z and P


Has no loops or cycles

Conditional Independence


Once we know that the patient has cavity we do
not expect the probability of the probe catching to
depend on the presence of toothache





Independence between a and b


)
|
(
)
|
(
)
|
(
)
|
(
cavity
toothache
P
catch
cavity
toothache
P
cavity
catch
P
toothache
cavity
catch
P




)
(
)
|
(
)
(
)
|
(
b
P
a
b
P
a
P
b
a
P


Example


Topology of network encodes conditional independence assertions:











Weather

is independent of the other variables


Toothache

and
Catch

are conditionally independent given
Cavity

Bayesian Belief Network: An
Example

Family

History

LungCancer

PositiveXRay

Smoker

Emphysema

Dyspnea

LC

~LC

(FH, S)

(FH, ~S)

(~FH, S)

(~FH, ~S)

0.8

0.2

0.5

0.5

0.7

0.3

0.1

0.9

Bayesian Belief Networks

The
conditional probability table

for the variable LungCancer:

Shows the conditional probability
for each possible combination of its
parents






n
i
Z
Parents
i
z
i
P
zn
z
P
1
))
(
|
(
)
,...,
1
(
Example


I'm at work, neighbor John calls to say my alarm is ringing, but neighbor
Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a
burglar?



Variables:
Burglary
,
Earthquake
,
Alarm
,
JohnCalls
,
MaryCalls



Network topology reflects "causal" knowledge:



A burglar can set the alarm off


An earthquake can set the alarm off


The alarm can cause Mary to call


The alarm can cause John to call

Belief Networks

Burglary

P(B)

0.001

Earthquake

P(E)

0.002

Alarm

Burg.


Earth.

P(A)

t

t

.95

t

f

.94

f

t

.29


f


f .001


JohnCalls

MaryCalls

A P(J)

t .90

f .05

A P(M)

t .7

f .01

Full Joint Distribution


))
(
|
(
)
,...,
(
1
1
i
n
i
i
n
X
parents
x
P
x
x
P



00062
.
0
998
.
0
999
.
0
001
.
0
7
.
0
9
.
0
)
(
)
(
)
|
(
)
|
(
)
|
(
)
(


















e
P
b
P
e
b
a
P
a
m
P
a
j
P
e
b
a
m
j
P
Compactness


A CPT for Boolean
X
i

with
k

Boolean parents has
2
k

rows for the
combinations of parent values



Each row requires one number
p

for
X
i

= true

(the number for
X
i

=
false

is just
1
-
p
)



If each variable has no more than
k

parents, the complete network requires
O(n


2
k
) numbers



I.e., grows linearly with
n
, vs.
O(2
n
)
for the full joint distribution



For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 2
5
-
1 = 31)

Inference in Bayesian Networks


How can one infer the (probabilities of)
values of one or more network variables,
given observed values of others?


Bayes net contains all information needed
for this inference


If only one variable with unknown value,
easy to infer it


In general case, problem is NP hard

Example


In the burglary network, we migth observe
the event in which
JohnCalls=true

and
MarryCalls=true


We could ask for the probability that the
burglary has occured



P(Burglary|JohnCalls=ture,MarryCalls=true)

Remember
-

Joint distribution





Normalization


4
.
0
,
6
.
0
08
.
0
,
12
.
0
)
|
(
),
|
(
)
(
)
|
(
)
|
(
)
|
(
)
|
(
1










x
y
P
x
y
P
Y
P
Y
X
P
X
Y
P
x
y
P
x
y
P
Normalization





X is the query variable


E evidence variable


Y remaining unobservable variable







Summation over all possible y (all possible values of the
unobservable varables Y)





P(Burglary|JohnCalls=ture,MarryCalls=true)


The hidden variables of the query are

Earthquake
and

Alarm





For
Burglary=true
in the Bayesain network




To compute we had to add four terms,
each computed by multipling five numbers


In the worst case, where we have to sum
out almost all variables, the complexity of
the network with
n

Boolean variables is
O(n2
n
)



P(b)

is constant and can be moved out,
P(e)

term can be moved outside summation
a





JohnCalls=true

and
MarryCalls=true,
the probability
that the burglary has occured is aboud 28%



Computation for
Burglary=true


Variable elimination algorithm


Eliminate repeated calculation


Dynamic programming


Irrelevant variables


(X query variable, E evidence variables)

Complexity of


exact inference


The burglary network belongs to a family of
networks in which there is
at most one
undiracted path

between tow nodes in the
network


These are called singly connected networks or
polytrees


The time and space complexity of exact
inference in polytrees is linear in the size of
network


Size is defined by the number of CPT entries


If the number of parents of each node is bounded by
a constant, then the complexity will be also linear in
the number of nodes


For multiply connected networks variable
elimination can have exponentional time
and space complexity

Constructing Bayesian Networks


A Bayesian network is a correct
representation of the domain only if each node
is conditionally independent of its
predecessors in the ordering, given its parents



P(MarryCalls|JohnCalls,Alarm,Eathquake,Bulgary)=P(MaryCalls|Alarm)



Conditional Independence
relations in Bayesian networks



The toopological semantics is given either
of the spqcifications of DESCENDANTS
or MARKOV BLANKET

Local semantics


Example








JohnCalls

is indipendent of
Burglary

and
Earthquake

given the value of
Alarm


Example








Burglary

is indipendent of
JohnCalls

and
MaryCalls

given
Alarm
and

Earthquake

Constructing Bayesian
networks


1. Choose an ordering of variables
X
1
, … ,
X
n



2. For
i

= 1 to
n


add
X
i

to the network




select parents from
X
1
, … ,X
i
-
1

such that


P

(X
i

| Parents(X
i
)) =
P

(X
i

| X
1
, ... X
i
-
1
)


This choice of parents guarantees:



P

(X
1
, … ,X
n
)

=
π
n
i =1

P

(X
i

| X
1
, … , X
i
-
1
)



(chain rule)




=
π
n
i =1
P

(X
i
| Parents(X
i
))


(by construction)


The compactness of Bayesian networks is an
example of locally structured systems


Each subcomponent interacts directly with only
bounded number of other components


Constructing Bayesian networks is difficult


Each variable should be directly influenced by only a
few others


The network topology reflects thes direct influences


Suppose we choose the ordering
M, J, A, B, E







P
(J | M) =
P
(J)?



Example


Suppose we choose the ordering
M, J, A, B, E







P
(J | M) =
P
(J)?

No

P
(A | J, M) =
P
(A | J)
?

P
(A | J, M) =
P
(A)
?
No

P
(B | A, J, M) =
P
(B | A)
?

P
(B | A, J, M) =
P
(B)
?

Example


Suppose we choose the ordering M, J, A, B, E









P
(J | M) =
P
(J)?

No

P
(A | J, M) =
P
(A | J)
?

P
(A | J, M) =
P
(A)
?
No

P
(B | A, J, M) =
P
(B | A)
?
Yes

P
(B | A, J, M) =
P
(B)
?
No

P
(E | B, A ,J, M) =
P
(E | A)
?

P
(E | B, A, J, M) =
P
(E | A, B)
?

Example


Suppose we choose the ordering M, J, A, B, E








P
(J | M) =
P
(J)?

No


P
(A | J, M) =
P
(A | J)
?

P
(A | J, M) =
P
(A)
?
No

P
(B | A, J, M) =
P
(B | A)
?
Yes

P
(B | A, J, M) =
P
(B)
?
No

P
(E | B, A ,J, M) =
P
(E | A)
?
No

P
(E | B, A, J, M) =
P
(E | A, B)
?
Yes

Example

Example contd.










Deciding conditional independence is hard in noncausal directions




(Causal models and conditional independence seem hardwired for humans!)




Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed


Some links represent tenous relationship that require difficult and unnatural
probability judgment, such the probability of
Earthquake

given
Burglary

and
Alarm





Aprendizagem Redes Bayesianas


Como preencher as entradas numa Tabela de Probabilidade
Condicional

1º Caso:

Se a estrutura da rede bayesiana fôr conhecida, e todas as
variavéis podem ser observadas do conjunto de treino.


Então:


Entrada (i,j) =




utilizando os valores
observados no conjunto de treino


2º Caso
: Se a estrutura da rede bayesiana fôr conhecida, e algumas
das variavéis não podem ser observadas no conjunto de treino.


Então utiliza
-
se método do
algoritmo do gradiente ascendente







))
(
Pr
/
(
i
i
Y
s
edecessore
y
P

Exemplo 1º caso


Person FH S E LC PXRay D


P1 Sim Sim Não Sim +


Sim


P2 Sim Não Não Sim
-


Sim



P3 Sim Não Sim Não + Não


P4 Não Sim Sim Sim
-

Sim


P5 Não Sim Não Não +


Não




P6

Sim Sim ? ? ? ?

LC

~LC

(FH, S)

(FH, ~S)

(~FH, S)

(~FH, ~S)

0.5















P(LC = Sim
\

FH=Sim, S=Sim) =0.5


))
(
Pr
/
(
i
i
Y
s
edecessore
y
P
Family

History

LungCancer

Smoker

Emphysema

Exemplo 2º caso








Suppose structure known, variables partially
observable


Similar to training neural network with hidden units


In fact, can learn network conditional probability
tables using gradient ascent



Person FH S E LC PXRay D


P1
---

Sim
---

Sim +


Sim


P2
---

Não
---

Sim
-


Sim



P3
---

Não
---

Não + Não


P4
---

Sim
---

Sim
-

Sim


P5
---

Sim
---

Não +


Não




P6

Sim Sim ? ? ? ?

Summary


Bayesian networks provide a natural
representation for (causally induced)
conditional independence


Topology + CPTs = compact
representation of joint distribution


Generally easy for domain experts to
construct






-
> P(d|a,b,c)=P(d|a,c)=0.66







-
>


Bayesian networks


Conditional Independence


Inference in Bayesian Networks


Irrelevant variables


Constructing Bayesian Networks


Aprendizagem Redes Bayesianas



Examples
-

Exercisos



árv dec ID3