S3 SEMINAR ON DATA MINING

benhurspicyΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 3 χρόνια και 10 μήνες)

80 εμφανίσεις

S3 SEMINAR ON DATA MINING

-
BAYESIAN NETWORKS
-

A. BASICS

Master

Universitario en Inteligencia Artificial

Concha
Bielza
,
Pedro Larrañaga


Computational

Intelligence

Group

Departamento de Inteligencia Artificial

Universidad
Politécnica de Madrid

C.Bielza, P.Larrañaga
-
UPM
-

2

Reasoning under uncertainty

Conditional independence

Building BNs

D
-
separation

Bayesian networks: formal definition


Conceptos básicos


Basics of Bayesian networks

C.Bielza, P.Larrañaga
-
UPM
-

3

Uncertainty…


…because the knowledge about the world is
incomplete / incorrect (noise, imprecision,
unreliable…) or due to limitations in the knowledge
representation way. For instance:


Medical diagnosis


Robotics


Financial forecasting


Voice/image recognition


Monitoring/control of industrial processes


Almost all human being knowledge presents any kind
of uncertainty



Reasoning under uncertainty

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

C.Bielza, P.Larrañaga
-
UPM
-

4

Advantages of BNs


Explicit

representation of the uncertain knowledge


Graphical, intuitive, closer to a world repres.


Deal with uncertainty for
reasoning

and

decision
-
making


Founded

on
probability theory
, provide a clear semantics and

a sound theoretical foundation


Manage
many

variables


Both
data and experts

can be used to construct the model


Current and huge
development


Support

the expert; do not try to replace him


Reasoning under uncertainty

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

C.Bielza, P.Larrañaga
-
UPM
-

5

Modularity


The
joint

probability distribution (global model) is
specified via
marginal

and
conditional

distributions
(local models) taking into account
conditional

independence

relationships among variables


This modularity:



Provides an easy
maintenance


Reduces the number of parameters

needed for the global
model

Estimation/elicitation

is easier

Reduction of the storing

needs

Efficient reasoning

(inference)


Reasoning under uncertainty

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

C.Bielza, P.Larrañaga
-
UPM
-

6

The joint probability distribution


Dealing with a
joint

probability distribution

n diseases D
1
,…,D
n

m symptoms S
1
,…,S
m

Represent
P(D
1
,…,D
n
,S
1
,…,S
m
)
, with 2
n+m
-
1 parameters

E.g.: m=30, n=10, need of 2
40
-
1

10
12


That’s
complete

dependence:

intractable

in practice


Conditional independence

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

C.Bielza, P.Larrañaga
-
UPM
-

7

Independence


With mutual
independence
, only specify
P(X
1
),…,P(X
n
)


n

parameters (lineal) instead of
2
n
-
1
(exponential)


Unfortunately, it
rarely

holds in most domains


Fortunately, there are some
conditional
independences
. Exploit them (representation and
inference)


Conditional independence

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

C.Bielza, P.Larrañaga
-
UPM
-

8

Chain rule


“if

y

is true and everything else known is irrelevant
for
x
, then the probability of
x

is p”


Chain R. 2 variables:


Chain R. 3 variables:


(with conditioning)


-
Conditional

(on a context) probability
-


Conditional independence

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

C.Bielza, P.Larrañaga
-
UPM
-

9

Chain rule


Conditional independence

Reasoning

Cond.Indep.

D
-
separ

Definition

Building


Chain Rule n variables:

C.Bielza, P.Larrañaga
-
UPM
-

10

Bayes’ rule


Conditional independence

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

P

Rev. Thomas Bayes

1701
-
1761

C.Bielza, P.Larrañaga
-
UPM
-

11

Conditional independence


Independence


Conditional

independence of
X and Y given Z


3 disjoint sets of variables


Notation:


Intuitively, whenever Z=z, the information Y=y does not
influence on the probability of x


for all possible values x,y,z


Conditional independence

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

sets of vars

(marginal)

C.Bielza, P.Larrañaga
-
UPM
-

12

Conditional independence: numerical example


Conditional independence

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

YES

NO

YES

NO

C.Bielza, P.Larrañaga
-
UPM
-

13

Chain rule and factorization


Overcome the problem of exponential size of the
joint prob. distr. (JPD) by exploiting
conditional
independence


Chain rule. No gains yet. The number of parameters
required by the factors is still exponential



Further factorizing the JPD

Reasoning

Cond.Indep
.

D
-
separ

Definition

Building

C.Bielza, P.Larrañaga
-
UPM
-

14

Chain rule and factorization via c.i.


Further factorizing the JPD

Reasoning

Cond.Indep
.

D
-
separ

Definition

Building

Joint distribution factorized

C.Bielza, P.Larrañaga
-
UPM
-

15

Informal definition: 2 components in a BN


Qualitative part: a
directed acyclic graph (DAG)

Nodes

= variables

Arcs

=


direct dependence relations
(otherwise it indicates absence of
direct dependence; there may be
indirect dependences and
independences)



Quantitative part: a set of
conditional probabilities

that determine a unique JPD

Not necessarily
causality


BNs

Reasoning

Cond.Indep
.

D
-
separ

Definition

Building

YES

C.Bielza, P.Larrañaga
-
UPM
-

16


BNs: nodes

Reasoning

Cond.Indep
.

D
-
separ

Definition

Building

Target node

Parents

Ancestors

Children

Descendants

Rest

Family

C.Bielza, P.Larrañaga
-
UPM
-

17

Independences in a BN


A BN represents a set of independences


Distinguish:

Basic

independences: we should take care of
verifying

them when constructing the net

Derived

independences: from the previous
independences, by using the properties of the
independence relations


Check them by means of the
d
-
separation criterion


BNs: arcs (types of independence)

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

C.Bielza, P.Larrañaga
-
UPM
-

18

X
i

c.i. of its

non
-
descendants
,

given its parents

Pa(X
i
)

Basic independence:

Markov condition


Basic independences

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

C.Bielza, P.Larrañaga
-
UPM
-

19

Example


Basic independences

Reasoning

Cond.Indep.

D
-
separ

Definition

Building


Fever

is
conditionally
independent of
Jaundice

given
Malaria

and
Flu

C.Bielza, P.Larrañaga
-
UPM
-

20

Example

M1

M2

M3


Basic independences

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

Non
-
descend
.


Send a message
M1

through a transmitter. It is
received as
M2

and it is then sent through other
transmitter. It is received finally as
M3
.

Transmitters have noise that modifies messages


M1


M2


M3

M1

&
M3
are dependent without any knowledge

M1

y
M3

are independent given
M2

C.Bielza, P.Larrañaga
-
UPM
-

21

Factorizing the JPD


…Now with the
quantitative

part of the net, the JPD:


Specify it intelligently. Use the
chain rule

and the
Markov

condition


Quantitative part

Reasoning

Cond.Indep.

D
-
separ

Definition

Building


Let X
1
,…,X
n

be an
ancestral ordering

(parents appear before
their children in the sequence). It always exists
(DAG)


Using that ordering in the chain rule, in {X
i
-
1
,…,X
1
} there
are non
-
descendants of X
i
, and we have

C.Bielza, P.Larrañaga
-
UPM
-

22

Factorizing the JPD


Therefore, we can recover the JPD by using the
following factorization
:


Quantitative part

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

Only store
local

distributions at each node


Fewer

parameters to assign and more
naturally

Inference

easier


MODEL CONSTRUCTION EASIER:

C.Bielza, P.Larrañaga
-
UPM
-

23

With

all

binary

variables:

A

E

B

W

N


32=2
5
-
1 probabilities for the JPD

10
with

the

factorization

in
the

BN:

1

1

4

2

2


Quantitative part

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

C.Bielza, P.Larrañaga
-
UPM
-

24


2
54

probabilities for the JPD vs. 509 in BN

BN
Alarm

for monitoring ICU patients


Quantitative part

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

C.Bielza, P.Larrañaga
-
UPM
-

25

u
-
separation


Equivalent

criterion

to

d
-
separation

and
sometimes

with

fewer

checks


Obtain the minimum graph containing
X
,
Y
,
Z

and their
ancestors
(ancestral graph)

The subgraph obtained is
moralized

(add a
link between parents with children in
common) and remove direction of arcs


Z

u
-
separates

X

and
Y

whenever
Z

is in all
paths between
X

and
Y



Derived

independences

derived

from

d
-
separation

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

C.Bielza, P.Larrañaga
-
UPM
-

26

d
-
separation Theorem
[Verma and Pearl’90, Neapolitan’90]

Let
P

be a prob. distribution of the variables in V
and
G
=(V,E) a DAG.

(G,P) holds the Markov condition
iff



disjoint


d
-
separation defined by G


c.i. defined by P


Graph

G
represents

all

dependences

of P


Some independences

of P may be not identified by

d
-
separation in G

Joining the two parts

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

C.Bielza, P.Larrañaga
-
UPM
-

27

Correspondence graph
-
model


A DAG may be viewed as maps of
in
dependences of
3 types:

D
-
Map of P (dependences): independent
variables are d
-
separated in the graph

I
-
Map

of P (independences): d
-
separated
variables in the graph are independent

P
-
Map of P (perfect): I
-
map and D
-
map


P
-
Map is not always possible and we look for I
-
maps
(your statements of c.i. don’t manage to be d
-
sep)


Joining the two parts

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

C.Bielza, P.Larrañaga
-
UPM
-

28

Formal definition

Let

P
be

a JPD
over

V={X
1
,…,
X
n
}.

A
BN

is

a
tuple

(G,P),
where

G=(V,E)
is

a DAG
such

that
:

Each

node

of G
represents

a variable of V

The

Markov

condition

is

held

Each

node

has
associated

a
local

prob
.

distrib
.
such

that



G
is

a
minimal

I
-
map

of P
(
if

some

arc

is

removed,
it

is

no
longer

an

I
-
map
)


quantitative
part


(taking an ancestral
ordering)


Definition of BN

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

C.Bielza, P.Larrañaga
-
UPM
-

29

A property


A node is c.i. of
all other
nodes in the BN
,
given
its
parents, children

and
children’s parents


-
its

Markov blanket
-

Set of nodes that makes X c.i. of
the rest of the network:


Definition of BN

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

C.Bielza, P.Larrañaga
-
UPM
-

30


Definition of BN

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

Malaria is
conditionally
independent of
Aches given
ExoticTrip,
Jaundice, Fever and
Flu

C.Bielza, P.Larrañaga
-
UPM
-

31

Expert /from data /both


Manual with the aid of an
expert

in the domain


A
combination

(experts

structure; database


probabilities)

Causal

mechanisms

Causal graph

Bayesian net

modelisation

probabilities


Learning from a
database


Database

Bayesian net


algorithm


Build it in the causal direction: BNs simpler and efficient


Building a BN

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

C.Bielza, P.Larrañaga
-
UPM
-

32

Summary


Building a BN

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

C.Bielza, P.Larrañaga
-
UPM
-

33

Summary


Building a BN

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

C.Bielza, P.Larrañaga
-
UPM
-

34

Summary


Building a BN

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

C.Bielza, P.Larrañaga
-
UPM
-

35

Example:

Asia

BN
[Lauritzen & Spiegelhalter’88]


Building a BN

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

C.Bielza, P.Larrañaga
-
UPM
-

36


Building a BN

Reasoning

Cond.Indep.

D
-
separ

Definition

Building

C.Bielza, P.Larrañaga
-
UPM
-

37


Texts and readings

Master

Universitario en Inteligencia Artificial

Concha
Bielza
,
Pedro Larrañaga


Computational

Intelligence

Group

Departamento de Inteligencia Artificial

Universidad
Politécnica de Madrid

S3 SEMINAR ON DATA MINING

-
BAYESIAN NETWORKS
-

A. BASICS