A Tutorial On Learning With

cabbageswerveΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

74 εμφανίσεις

Haimonti Dutta , Department Of
Computer And Information
Science

1




David HeckerMann

A Tutorial On Learning With


Bayesian Networks

Haimonti Dutta , Department Of
Computer And Information
Science

2

Outline


Introduction


Bayesian Interpretation of probability and review methods


Bayesian Networks and Construction from prior knowledge


Algorithms for probabilistic inference


Learning probabilities and structure in a bayesian network


Relationships between Bayesian Network techniques and
methods for supervised and unsupervised learning


Conclusion


Haimonti Dutta , Department Of
Computer And Information
Science

3




Introduction






A bayesian network is a graphical model for
probabilistic relationships among a set of
variables

Haimonti Dutta , Department Of
Computer And Information
Science

4

What do Bayesian Networks and Bayesian
Methods have to offer ?



Handling of Incomplete Data Sets


Learning about Causal Networks


Facilitating the combination of domain knowledge
and data


Efficient and principled approach for avoiding the
over fitting of data

Haimonti Dutta , Department Of
Computer And Information
Science

5



The Bayesian Approach to



Probability and Statistics



Bayesian Probability : the degree of belief in that event


Classical Probability : true or physical probability of an event

Haimonti Dutta , Department Of
Computer And Information
Science

6


Some Criticisms of Bayesian Probability



Why degrees of belief satisfy the rules of
probability


On what scale should probabilities be measured?


What probabilites are to be assigned to beliefs
that are not in extremes?


Haimonti Dutta , Department Of
Computer And Information
Science

7


Some Answers ……



Researchers have suggested different sets of
properties that are satisfied by the degrees of
belief


Haimonti Dutta , Department Of
Computer And Information
Science

8



Scaling Problem



The probability wheel : a tool for assessing




probabilities



What is the probability that the fortune wheel stops
in the shaded region?

Haimonti Dutta , Department Of
Computer And Information
Science

9


Probability assessment

An evident problem : SENSITIVITY


How can we say that the probability of an event is 0.601
and not .599 ?



Another problem : ACCURACY



Methods for improving accuracy are available in
decision analysis techniques


Haimonti Dutta , Department Of
Computer And Information
Science

10

Learning with Data



Thumbtack problem



When tossed it can rest on either heads or tails




Heads



Tails

Haimonti Dutta , Department Of
Computer And Information
Science

11




Problem ………



From N observations we want to determine the
probability of heads on the N+1 th toss.

Haimonti Dutta , Department Of
Computer And Information
Science

12



Two Approaches


Classical Approach :



assert some physical probability of heads

(unknown)


Estimate this physical probability from N
observations


Use this estimate as probability for the heads
on the N+1 th toss.


Haimonti Dutta , Department Of
Computer And Information
Science

13



The other approach

Bayesian Approach


Assert some physical probability


Encode the uncertainty about this physical
probability using the Bayesian probailities


Use the rules of probability to compute the
required probability

Haimonti Dutta , Department Of
Computer And Information
Science

14


Some basic probability






formulas


Bayes theorem : the posterior probability for


given D and a background knowledge


:





p(

⽄Ⱐ

⤠㴠






⤠瀠⡄⼠





)





倨䐠⼠

)


Where p(D/

⤽)⁰ 䐯D



⤠瀨




⤠搠




Note :


楳i慮⁵湣敲瑡楮av慲楡扬攠w桯獥hv慬略⁣潲牥獰潮摳o瑯t
the possible true values of the physical probability






Haimonti Dutta , Department Of
Computer And Information
Science

15


Likelihood function


How good is a particular value of


?




It depends on how likely it is capable of generating the observed
data



L (


:D ) = P( D/


)

Hence the likelihood of the sequence H, T,H,T ,T may be L (


:D )
=


. (1
-


).

. (1
-


). (1
-


).

Haimonti Dutta , Department Of
Computer And Information
Science

16



Sufficient statistics

To compute the likelihood in the thumb tack
problem we only require h and t

(the number of
heads and the number of tails)



h and t are called sufficient statistics for the
binomial distribution


A sufficient statistic is a function that summarizes
from the data , the relevant information for the
likelihood

Haimonti Dutta , Department Of
Computer And Information
Science

17




Finally

……….

We average over the possible values of


瑯t
determine the probability that the N+1 th toss of
the thumb tack will come up heads

P(X


=heads / D,

⤠㴠=



⽄Ⱐ

⤠d



n+1

The above value is also referred to as the
Expectation of


w楴栠牥獰散琠瑯t瑨攠摩獴物扵瑩潮


⽄/


)


Haimonti Dutta , Department Of
Computer And Information
Science

18




To remember


We need a method to assess the prior distribution
for


.


䄠捯浭潮⁡灰牯A捨⁵獵慬ay 慤a灴敤p楳⁡獳s浥m
that the distribution is a beta distribution.

Haimonti Dutta , Department Of
Computer And Information
Science

19


Maximum Likelihood Estimation





MLE principle

:


We try to learn the parameters that maximize the
likelihood function


It is one of the most commonly used estimators in
statistics and is intuitively appealing

Haimonti Dutta , Department Of
Computer And Information
Science

20





A graphical model that efficiently encodes the
joint probability distribution for a large set of
variables


What is a Bayesian Network ?

Haimonti Dutta , Department Of
Computer And Information
Science

21




Definition



A Bayesian Network for a set of variables

X = { X1,…….Xn} contains


network structure S encoding conditional
independence assertions about X


a set P of local probability distributions

The network structure S is a directed acyclic graph

And the nodes are in one to one correspondence
with the variables X.Lack of an arc denotes a
conditional independence.

Haimonti Dutta , Department Of
Computer And Information
Science

22


Some conventions……….


Variables depicted as nodes



Arcs represent probabilistic dependence between
variables



Conditional probabilities encode the strength of
dependencies

Haimonti Dutta , Department Of
Computer And Information
Science

23




An Example




Detecting Credit
-

Card Fraud


Fraud

Age

Sex

Gas

Jewelry

Haimonti Dutta , Department Of
Computer And Information
Science

24




Tasks



Correctly identify the goals of modeling


Identify many possible observations that may be relevant to
a problem


Determine what subset of those observations is worthwhile
to model


Organize the observations into variables having mutually
exclusive and collectively exhaustive states.


Finally we are to build a Directed A cyclic Graph that encodes
the assertions of conditional independence





Haimonti Dutta , Department Of
Computer And Information
Science

25


A technique of constructing a



Bayesian Network


The approach is based on the following
observations :


People can often readily assert causal
relationships among the variables


Casual relations typically correspond to
assertions of conditional dependence

To construct a Bayesian Network we simply draw
arcs for a given set of variables from the cause
variables to their immediate effects.In the final
step we determine the local probability
distributions.


Haimonti Dutta , Department Of
Computer And Information
Science

26



Problems


Steps are often intermingled in practice


Judgments of conditional independence and /or
cause and effect can influence problem
formulation


Assessments in probability may lead to changes
in the network structure


Haimonti Dutta , Department Of
Computer And Information
Science

27



Bayesian inference


On construction of a Bayesian network we need to determine
the various probabilities of interest from the model


































Observed data


Query

Computation of a probability of interest given a model is probabilistic
inference



x1

x2

x[m]

x[m+1]

Haimonti Dutta , Department Of
Computer And Information
Science

28

Learning Probabilities in a Bayesian




Network

Problem

: Using data to update the probabilities of a
given network structure

Thumbtack problem

: We do not learn the
probability of the heads , we update the posterior
distribution for the variable that represents the
physical probability of the heads

The problem restated

:Given a random sample D
compute the posterior probability .


Haimonti Dutta , Department Of
Computer And Information
Science

29

Assumptions to compute the posterior



probability





There is no missing data in the random sample D.



Parameters are independent .


Haimonti Dutta , Department Of
Computer And Information
Science

30




But……





Data may be missing and then how do



we proceed ?????????

Haimonti Dutta , Department Of
Computer And Information
Science

31



Obvious concerns….

Why was the data missing?


Missing values


Hidden variables



Is the absence of an observation
dependent on the actual states of the
variables?


We deal with the missing data that are
independent of the state

Haimonti Dutta , Department Of
Computer And Information
Science

32




Incomplete data (contd)

Observations reveal that for any interesting set of
local likelihoods and priors the exact
computation of the posterior distribution will be
intractable.


We require approximation for incomplete data

Haimonti Dutta , Department Of
Computer And Information
Science

33


The various methods of approximations for




Incomplete Data


Monte Carlo Sampling methods



Gaussian Approximation



MAP and Ml Approximations and EM algorithm

Haimonti Dutta , Department Of
Computer And Information
Science

34



Gibb’s Sampling

The steps involved :


Start :


Choose an initial state for each of the variables in X at
random


Iterate :


Unassign the current state of X1.


Compute the probability of this state given that of n
-
1
variables.


Repeat this procedure for all X creating a new sample of X



After “ burn in “ phase the possible configuration of X will
be sampled with probability p(x).

Haimonti Dutta , Department Of
Computer And Information
Science

35

Problem in Monte Carlo method

Intractable when the sample size is large


Gaussian Approximation



Idea : Large amounts of data can be approximated
to a multivariate Gaussian Distribution.

Haimonti Dutta , Department Of
Computer And Information
Science

36


Criteria for Model Selection


Some criterion must be used to determine the
degree to which a network structure fits the prior
knowledge and data


Some such criteria include


Relative posterior probability


Local criteria

Haimonti Dutta , Department Of
Computer And Information
Science

37


Relative posterior probability

A criteria for model selection is the logarithm of the
relative posterior probability given as follows :


Log p(D /Sh) = log p(Sh) + log p(D /Sh)







log prior log marginal






likelihood

Haimonti Dutta , Department Of
Computer And Information
Science

38



Local Criteria


An Example :







A Bayesian network structure for medical diagnosis


Ailment

Finding 1

Finding 2

Finding n

Haimonti Dutta , Department Of
Computer And Information
Science

39




Priors



To compute the relative posterior probability


We assess the



Structure priors p(Sh)



Parameter priors p(

猠⽓栩

Haimonti Dutta , Department Of
Computer And Information
Science

40


Priors on network parameters

Key concepts :



Independence Equivalence



Distribution Equivalence

Haimonti Dutta , Department Of
Computer And Information
Science

41

Illustration of independent equivalence










Independence assertion : X and Z are conditionally
independent given Y

X

Y

Z

X

Y

Z

X

Y

Z

Haimonti Dutta , Department Of
Computer And Information
Science

42



Priors on structures

Various methods….



Assumption that every hypothesis is equally
likely ( usually for convenience)


Variables can be ordered and presence or
absence of arcs are mutually independent


Use of prior networks


Imaginary data from domain experts



Haimonti Dutta , Department Of
Computer And Information
Science

43


Benefits of learning structures


Efficient learning
---

more accurate models with
less data


Compare P(A) and P(B) versus P(A,B) former
requires less data


Discover structural properties of the domain


Helps to order events that occur sequentially and
in sensitivity analysis and inference


Predict effect of the actions

Haimonti Dutta , Department Of
Computer And Information
Science

44



Search Methods

Problem : We are to find the best network from the
set of all networks in which each node has no
more than k parents


Search techniques :


Greedy Search


Greedy Search with restarts


Best first Search


Monte Carlo Methods



Haimonti Dutta , Department Of
Computer And Information
Science

45

Bayesian Networks for Supervised and


Unsupervised learning

Supervised learning

: A natural representation in
which to encode prior knowledge


Unsupervised learning

:


Apply the learning technique to select a model with no hidden
variables


Look for sets of mutually dependent variables in the model


Create a new model with a hidden variable


Score new models possibly finding one better than the original.


Haimonti Dutta , Department Of
Computer And Information
Science

46

What is all this good for anyway????????


Implementations in real life :


It is used in the Microsoft products(Microsoft
Office)


Medical applications and Biostatistics (BUGS)


In NASA Autoclass projectfor data analysis


Collaborative filtering (Microsoft


MSBN)


Fraud Detection (ATT)


Speech recognition (UC , Berkeley )

Haimonti Dutta , Department Of
Computer And Information
Science

47

Limitations Of Bayesian Networks




Typically require initial knowledge of many
probabilities…quality and extent of prior
knowledge play an important role


Significant computational cost(NP hard task)


Unanticipated probability of an event is not taken
care of.



Haimonti Dutta , Department Of
Computer And Information
Science

48




Conclusion

Inducer

Bayesian Network

Data +prior

knowledge

Haimonti Dutta , Department Of
Computer And Information
Science

49

Some Comments


Cross fertilization with other techniques?


For e.g with decision trees, R trees and neural networks



Improvements in search techniques using the classical
search methods ?



Application in some other areas as estimation of population
death rate and birth rate, financial applications ?