1

The University of Texas at Arlington

Lecture-5

Bayesian Networks

CSE 5301 – Data Modeling and Analysis

Techniques

Dr. Gergely Záruba

Modeling Complexity and Dependence

• Modeling distributions and inferring probability

values in multi-variable systems is

computationally complex

• Model size and probabilistic inference are exponential

in the number of random variables

• Independence relations can reduce this

complexity

• Model size is exponential only in the number of

mutually dependent variables

• Conditional independence in probability limits the

number of random variables that have to be

considered to make inference

2

2

Graphical Models

• Graphical models provide an efficient structure to represent dependencies

in probabilistic systems.

• There are two main types of graphical models for probabilistic systems:

• Bayesian Networks are directed graphical models

• Markov Networks (Markov Random Fields) are undirected graphical models

(they can model dependencies Bayes nets cannot. – Will not be discussed.

• Both types of models can represent different types of dependencies

• Graphical models in probabilistic systems allow the representation

of the interdependencies of random variables

• Structure shows dependency relations

• Inference can use the structure to control the computations

• Graphical models provide a basis for a number of efficient problem

solutions

• Inference of prior and conditional probabilities

• Learning of network structure

3

BAYESIAN NETWORKS

4

3

Bayesian Networks

• Bayesian networks are graphical representation for

conditional independence providing a compact

specification of joint probability distributions

• Bayesian networks are directed, acyclic graphs G(V,E)

• Vertices represent random variables

ܸ ൌ ሼܺ

|1 ݅ ݊ሽ

• Edges represent “direct influences”

ܧ ൌ ሼሺܺ

,ܺ

ሻ|ܺ

directly influences ܺ

ሽ

• Nodes are annotated with the conditional probability distribution of

the node given its parents

ܲ

ܺ

|ܲܽݎ݁݊ݐݏ

ܺ

• Probabilities in the network represent the joint distribution

5

A Simple BayesNet Example

6

4

Joint Distribution

• Remember a Bayesian Network is should be a simple

representation of a system with a large number of probabilistic

variables with some independecies.

• Calculating the joint distribution can be done using:

ܲ

ݔ

ଵ

,…,ݔ

ൌ ෑܲሺݔ

|ܲܽݎ݁݊ݐݏ

ܺ

ሻ

ୀଵ

• E.g., the probability of:

P(sp,!hp,bd,!rs,!fl)=

P(!fl|bd)*P(!rs|bd)*P(bd|sp,!hp)*P(!hp)*P(sp)=0.7*0.8*0.5*0.99*0.1=0.02772

• If all variables are Boolean, then keeping the joint probability table

would result in maintaining 2

n

values. If we can limit the parents of

each node to be no more than k, then with a Bayesain network we

can reduce that value to O(n*2

k

)

7

Conditional Independence

• Bayesian networks are powerful as they

capture conditional independence

between random variables.

• Bothe the following expressions are true::

• A node is conditionally independent of all of

its non-descendants given its parents

• A node is conditionally independent of all

other nodes in the network given its parents,

its children and its children’s parents. This is

what we call the Markov blanket.

8

5

Node Ordering

• Note, that the node ordering matters. The best

node ordering is usually “causal”.

• Add the root causes first, then add variables

they influence from top to bottom until you reach

the leaves.

• Fortunately in most situation this causal

relationship is what the researcher requires.

• Note, that any ordering is possible but the

number of links may grow significantly if the

relationship is not casual.

9

Discrete and Continuous Variables

• Obviously any discrete distribution is

representable in a node in a Bayesian network.

• An arbitrary continuous distribution cannot be

easily represented.

• One trick could be to discretize the distribution, where

precision of the size of the containers could be

balanced against the size of the network.

• Distributions that can be given by a formula and

parameters (e.g., exponential, Gaussian) can be used

if attention is paid at their meanings.

10

6

Child continuous, Parent Discrete

• It is common to change the parameter of

the distribution in a continuous child node

based on the discrete parent node’s

probabilities.

• E.g., it is common to use a Gaussian

distribution with a fixed variance but a mean

that is influenced by the parent node.

11

Hybrid Example

12

Example taken from RN2003

ܲ

ܥ ܪ,!ݏ ൌ ܰሺܽ

௧

݄ ܾ

௧

,ߪ

௧

ଶ

ሻ

ܲ

ܥ ܪ,ݏ ൌ ܰሺܽ

݄ ܾ

,ߪ

ଶ

ሻ

7

Linear Gaussian Distribution

13

Discrete variable with

Continuous Parent

14

• Common to use soft threshold functions.

• probit, if Φ

ݔ ൌ

ܰሺ0,1ሻሺݔሻ

௫

ିஶ

݀ݔ then

P(Buy=true|Cost=c)=Φ

ሺെܿ ߤሻ/ߪ

• sigmoid (logit) P(Buy=true|Cost=c)=

ଵ

ଵା

ሺషమ

ഋష

ሻ

8

INFERENCE IN BAYESIAN NETS

So, why did we do all of these?

15

Inference

• With a nicely constructed Bayisian Network we can

make diagnosis and thus can make good informed

decisions.

• We can fix the value of any or more of the nodes in the

network (to a precise value) and see how that changes

the probabilities of the distributions.

• Thus we can observe evidence variables and see what

their impact is on some other variables, the query

variables.Variables in the network that are not used for

evidence nor for query are called hidden variables.

• So, what is the posterior distribution of P(x

q

|x

e

1

,..,x

e

n

)

16

9

Inference by Enumeration

• Conditional probabilities can be computed

from the joint probabilities.

• A query can be answered as the

normalized sum of joint probabilities, thus

as the normalized sum of products of

conditional probabilities found in the

network.

• Recall: ܲ

ܺ ൌ ߙܲ

ܺ, ൌ ߙ

∑

ܲ

ܺ,,

௬

17

Inference by Enumeration –

Example

• How much is P(SP|rs,fl)

• ܲ

ܵܲ ݎݏ,݂݈ ൌ ߙܲ

ܵܲ,ݎݏ,݂݈ ൌ ߙ

∑

ܲ

ܵܲ,ܪܲ,ܤܦ,ݎݏ,݂݈

ு,

• This requires 4 additions over n multiplications

• The worst case complexity is ܱሺ݊2

ሻ

• Example shows “variable elimination” with which real complexity can be

reduced. Complexity also depends on sparsity of the network and which

variables are used for evidence, query and which are hidden)

18

10

Approximate Inference

• So, large multi-connected Bayesian Networks

can pose a problem for exact inference.

• We can use Mote Carlo methods to determine

interesting conditional probabilities (i.e., to do

inference).

• The four basic methods are:

• Direct sampling

• Sampling from an empty network (for joint probabilities)

• Rejection sampling in Bayesian networks (for inference)

• Likelihood weighting

• Markov chain Monte Carlo

19

Sampling from an Empty Network

• Simplest method.

• Forget any evidence you may have for nodes.

• Sample each variable in topological order based on the

outcomes of the previous samples. Do this many times

(let’s say M times).

• This is going to result in M number of N-tuples:

e.g., {൏ ܨܮ

,ܴܵ

,ܤܦ

,ܪܲ

,ܵܲ

|1 ൏ ݅ ൏ ܯሽ

• Individual and joint probabilities now can be estimated by

how many times out of M samples something has

happened.

20

11

Rejection Sampling in

Bayesian Networks

• Recall: rejection sampling was used to sample from a

hard to sample distribution given an easy one.

• Used in this context to add evidence and thus to

determine conditional probabilities.

• Having the M N-tuples, we can count how many times

the evidence happened and out of those times, how

many times the query happened (for Boolean). The

conditional probability will be the ratio of these two.

• The problem is that some probabilities may become very

low and thus using them as evidence variables will

require huge sets of samples.

21

Likelihood Weighting

• Avoids the previous inefficiency by only generating

samples that conform to the evidence.

• We sample all variables in order as before. However if

we are about to sample an evidence variable we set the

variable and (we do not sample) modify a weight value

for this n-tuple. For each tuple, the weight starts from 1

and gets multiplied by P(e|Parentoutcomes(e)) for each

evidence.

• Thus each n-tuple has the correct evidence and an

additional weight capturing the likelihood of such n-tuple.

• Conditional probabilities are then sums of the weight for

the various outcomes of the query variable normalized

over the total of the weights.

22

12

Markov Chain Monte Carlo

• Think of an n-tuple as the state of a process.

• Evidence variables reduce the size n (as they

are fixed and will never change).

• Initialize non-evidence variables randomly.

• Next state is determined by changing exactly

one non-evidence variable by its distribution on

the current state and its Markov blanket.

• The conditional probability is the normalized

number of states over the query variable.

23

References

• [RN2003] S. Russel, P Norvig, “Artificial

Intelligence, A Modern Approach,” Second

edition, Prentice Hall, 2003 (Chapter 14)

• Eugene Charniak, “Bayesian Networks

Without Tears,” AI Magazine, 12(4), 1991,

http://www.aaai.org/ojs/index.php/aimagaz

ine/article/view/918

24

## Comments 0

Log in to post a comment