App ears in Pro ceedings of

Unc ertainty in A rtiial Intel ligenc e

P ages

Using Causal Information and Lo al c Measures to

Learn Ba y esian Net w orks

y

W ai Lam F ahiem B acc h us

Departmen t f o omputer C Science Departmen t of Computer cience S

Univ ersit yof W aterlo o Univ ersit yof W aterlo o

W aterlo o On tario W aterlo o On tario

Canada N Canada N G

Abstract neering b ottlenec k Clearly t w ould b e extremely use

ful i f the construction pro cess could b e fully or partly

automated A u seful a pproac h that has recen tly b e

In previous w ork w e dev elop ed a metho d f o

ing p ursued b ya n um b er f o authors is to attempt to

learning Ba y esian Net w ork mo dels from a r w

build or learn a et n w ork mo del f rom ra w data In

data This metho d relies on the w ell k no wn

practice ra w ata d s i often a v ailable from databases of

minimal description length DL principle

records

The MDL principle is particularly w ell suited

to this task as it allo ws us to tradeo n i a

W eha v edev elop ed a n ew approac h to l earning

principled w a y t he accuracy o f t he learned

y esian net w ork mo els d LBb Our approac h

w ork against its practical usefulness In

is based on Rissanen inimal M Description Length

this pap er w e presen t ome s n ew results that

DL is principle The M DL principle ors a

ha v e risen a from our w ork In particular w e

means for trading o mo del omplexit c y and accuracy

presen t a new lo c al w a y f o computing the

and our exp erimen v e demonstrated its suitabil

description length This allo ws us to mak e

y or f this task In this ap p er w e p resen t s s ig

signian t mpro i v emen ts in our searc haglo

nian t impro v emen ts to our riginal o system Bb

rithm In a ddition w e m o dify our algorithm

whic h mak e it ore m ien e t allo wittotak e

so that it can tak ein to accoun t artial p do

to consideration domain information ab out causa

main information that migh tbepro vided b y

tion and ordering and allo w lo cal reemen tof an

a domain exp rt e The lo cal computation of

existing net w ork

description length a lso op ns e the do or for lo

These impro v emen ts are mainly based on a n ew anal

cal reemen t of an existen t net w ork The

ysis of the description length p arameter that sho ws

feasibilit y o f our approac h s i demonstrated

ww ecna ev aluate the d escription length of a pro

b y exp erimen ts in v olving net w orks of a rac p

p sed o net w ork via lo c al computations in v

tical size

a n o e d nd a its aren p ts This lo calized ev of

description length allo ws us to dev elop an impro v

searc hing mec hanism that p e rforms w ell ev en in fairly

large d omains In addition it llo a ws us to mo dify our

searc h pro cedure so t hat it can tak ein

In tro duction

domain kno wledge of dir e ct c auses as w ell as p artial or

derings among t he v ariables Suc h p artial information

Ba y esian net w orks adv anced b yP earl P ea ha v e

ab out the structure of the d omain is q uite common

b ecome an imp ortan t paradigm for represen ting and

and in man y ases c it can reduce the complexit yof het

reasoning under uncertain t y Systems based n o

searc hing pro ess c d uring l earning

Ba y esian net w orks ha v e b een constructed in a n um

The lo calized ev aluation of description ength l also al

b er of diren t pplication a areas ranging from medi

ws us to mo dify an existing Ba y esian net w ork b y

cal diagnosis BS to oil price reasoning br

reing a lo cal part of it y B eing r the n et w ork w e

Despite these successes a ma jor o bstacle to u sing

obtain a more accurate mo del o r adapt an existing

Ba y esian net w orks lies in the diult y of constructing

mo del to n a en vironmen t t hat has c hanged o v er

them in complex domains there i s a kno wledge engi

In the sequel w e w ill st describ e b rie he t k ey fea

W ai Lam w ork w as supp orted b y an OGS sc holarship

tures of our previous w ork concen trating in particular

His eail address is wlammath w erl oo ca

y

a an tages f o the MDL approac h Then w e de

F ahiem Bacc h us w ork w as supp orted b y NSER Cadn

e a new l o calized v ersion of the description length

b y IRIS His eail address is

computation Using his t w edev elop an algorithm that

fbacchus og os uw ate rlo o ca

riv

dv the on

at

time

lo

consideration to

ed

aluation

only olving

ho

in

ome it

ha ts

net

Ba

searc hes for a go o d net w ork mo del taking n i to consid abilit y p arameters Hence ev en if a ighly h connected

eration causal and ordering information ab out the d o net w ork is more accurate in practice it migh t not b e

main Finally e discuss the esults r of v arious exp e r as useful a mo el d as a impler s alb eit sligh tly l ess c a

imen ts w eha v e un r that demonstrate the ectiv enes s curate mo el d n I ddition a o t t he computational dis

of our approac h The exp erimen tal results of our w ork adv an tages t he causal relationships b et w een the v ari

on lo cal reemen t f o an existing net w ork are not y et ables a re conceptually more diult to understand in

complete but w e will close ith w a rief b discussion of a complex et n w ork

the metho d The exp rimen e t results will b e rep orted

Hence w e re a faced with a tradeo More complex

in our full rep o rt Ba

net w orks allo w for more accurate mo dels but a t he t

same time suc hmosmdel a y b e o f less practical use

than simpler o m dels The MDL principle allo ws us

Learning Ba y esian Net w orks

to balance this tradeo our metho d will learn a l ess

complex et n w ork f i hat t net w ork is suien tly accu

Muc h early w ork on learning a B y esian net w orks shares

rate and a t the same time i t s i still capable of learning

the common isadv d tage of relying on ssumptions a

a complex et n w ork if o n s impler one is suien tly c a

ab out the underlying distribution b eing learned F or

curate This seems to b e a particularly appropriate

example Cho w and Liu L ve elop ed metho ds

approac hto tka einlgih t o f he t fact that w eonlyha v e

that construct ree t structured net w orks hence their

a ample s f o ata d p oin ts from the u nderlying distribu

metho d pro vides no uaran g tees ab ut o the a ccuracy

tion hat T is it seems inappropriate to try to learn

of the learned structure if he t underlying distribution

the most accurate mo del f o t he underlying distribu

cannot b e expressed b y a tree structure a p

tion giv en that the ra w ata d only pro s u w an

proac h of Rebane and P earl P as w ell as t hat

appro ximate picture of t i

of Geiger et al PP surs rom f the same criti

cism except that they are able to construct singly con

Among other w orks on learning Ba y esian et n w orks he t

nected net w orks Sprites et alSS sw ell as V erma

most closely related is that of Co op er and Hersk o

and P earl VP PV dev elop approac hes that a re

H They use a B a y esian approac h t hat lik e ours

able to construct m ultiply onnected c net w orks but

is capable o f learning m ultiply connected net w orks

they require he t underlying distribution to b e dag

Ho w er as with all Ba y esian pproac a hes they m ust

isomorphic

c ho ose some p rior distribution o v er the space of p ossi

ble et n w orks One w a y o f viewing the DL M principle is

The problem with making an assumption ab out the

a m hanism for c ho osing a r easonable prior hat t s i

underlying distribution is that generally w e o d not

biased to w ards simpler mo dels C o op er nd a Hersk o

ha v e suien t information to test our assumption

H n v estigate a n um b er f o iren d t riors p but t i

The underlying distribution is nkno u wn all w eha v eis

is unclear o h wan y p articular c hoice will inence the

a collection of records in the form of v ariable nstan i tia

end result The DL M principle on the other h and

tions Hence in practice these metho d s or n o guar

allo ws the system designer who can c ho ose diren t

an tees ab out the accuracy of the earned l mo del xcept e

w a ys of enco ding the n et w ork to c ho ose a prior based

in the rare circumstances where w e kno w something

on principles of computational eiency or exam

ab out the underlying distribution

ple if w e prefer to learn net w orks in whic hnonoed

Our approac h c an construct an a ccurate mo del from

has more than paren ts w ecna c ho ose an enco ding

an unrestricte d ange r of underlying distributions and

sc heme that imp oses a high p e nalt y on net w orks that

it is capable of constructing net w orks o f rbitrary a

violate this constrain t

top ology i it can construct m ultiply connected net

w orks The abilit y to construct a m ultiply connected

Applying the MDL Principle

net w orks is sometimes essen tial if the net w ork i s to b e

a suien tly accurate mo del of the underlying distri

The MDL p rinciple is based on the idea that the b est

bution

mo del represen ting a c ollection of data items is the

Although m ultiply connected net w orks a llo wusto

mo del that inimizes m the sum f o

more accurately mo del the nderlying u distribution

they ha v e omputational c as w ell as conceptual dis

the length f o the enco ding f o he t mo del and

adv an tages Exact b elief up dating pro cedures are in

the w orst case computationally in tractable o v er m ul

the l ength of the nco e ding of the d ata

tiply connected et n w orks o o Moreo v er ev en if

mo del

an appro ximation algorithm is used e the sto c has

tic sim ulation metho ds of C ea SP highly

b th o of whic h an c b e measured in bits A detailed

connected net w orks still require the storage and esti

description of he t MDL principle with n umerous ex

mation of an exp onen tial n b r e of conditional rob p

amples of its application can b e found in is

A distribution is dagsomorphic i f there s i some ag d

that displa ys all of its dep endencies and indep endencies The n b r e of parameters required is exp onen tial in

ea the maxim n um broe fpnare ts of no de

um

um

um

the given

vits

ec as

ev

vits

ith vides

The

an

T o apply the M DL principle to the task of learning more eien tnet w orks That is since the enco ding

Ba y esian net w orks w e need t o sp cify e ho ww ecan per length of the mo del is included in our ev aluation f o

form the t w o enco dings the net w ork i tself tem and description length w e are enforcing a preference for

the ra wdaagt iv en a net w ork tem net w orks that require the storage of few er probabilit y

parameters and o n hic w h e xact algorithms are more

eien t

Enco ding the N et w ork Our enco ding sc heme for

the net w orks has the prop ert y that the higher the

top ological complexit y o f the net w ork the longer will

Enco ding the Data Using the Mo del is

b e its enco ding T o represen t the structure of a

to learn the join t distribution of a collection of random

Ba y esian net w ork w enedfe ro eca h o n de a list of its

v ariables X f X g hv ariable X has

n i

s

paren ts and a list of its conditional probabilit y param i

an asso ciated collection o f v alues f x g that it

i i

eters

t e n o where the n um ber of v alues s e on

i

v i ery d istinct c hoice o f v alues for ll a the v ariables

Supp ose there are n no des in the problem d omain F or

X dees an atomic ev en t i n t he underlying join t

a no de with k paren ts w e need k log n b its t o list

distribution and i s assigned a p articular probabilit yb y

its paren ts T o represen t t he conditional p robabilities

that distribution

the enco ding length will b e the pro uct d of t he n ber

of bits required to s tore the n umerical v alue of eac h

W e ssume a that the d ata p oin ts in the ra wdaaat er

conditional probabilit y and the total n berofcon

all atomic ev en ts T hat is eac hdtapa noi t s p ecis a

ditional probabilities that are required In a Ba y esian

v alue for ev ery andom r v ariable in X urthermore

net w ork a conditional probabilit y is needed for ev

w e ssume a that the data p oin ts are he t result of in

distinct instan tiation of the paren t no des and no e d i t

dep nden e t random trials Hence w ew ould exp ct e

self xcept t hat one of these conditional robabilities p

t tral limit theorem t hat eac h articular p in

can b e computed from the others due to the fact that

stan tiation of the v ariables w ould v e tually app ear in

they all sum to F or example i f a no de that can

the atabase d ith w a relativ e frequency a ppro ximately

tak e on distinct v alues has paren ts eac hof wihc h

equal to its robabilit p y T hese are s tandard a ssump

can tak e on distinct v alues w e iwllndee

tions

conditional probabilities

Giv en a ollection c of N p o w ew t o t enco de

Hence the total description length for a articular p net

or store the data as a b inary tring s There are v arious

w ork will b e

w a ys in whic h his t enco ing d can b e done but here w e

n

X Y

are only in terested in using he t length of t he enco d

k log n d s s

i i j

ing s a a metric via item in the MDL principle for

i j F

i

comparing the merit o f c andidate Ba y esian Net w orks

where there are n no des for no de i k is the n ber Hence w e can limit our atten tion to acter c o des

i

of its paren tnodse s is the n um ber of v alues it can LR p p W ith c haracter co des eac h atomic

i

tak e on and F is the set of i ts paren ts and d repre ev en t i s a ssigned a unique binary string Eac hofthe

i

sen ts the n um b er f o its b required to store a n umerical data p o in ts whic h are all a tomic ev en scsont v erted

v alue F or a p articular problem domain n and d will to its c haracter co de and t he N p oin ts are r epresen ted

b e constan ts This is not the only enco ding sc heme b y the string formed b y c oncatenating these c haracter

p ossible but it is simple nd a it p erforms w ell i n uro co des together T o minimize the total ength l of the

exp erimen enco ding w e assign shorter co des t o ev en ts that o c

cur more frequen tly T his is the basis for Huan

By lo oking at this equation w esee ththa ilghycon

enco ding sc heme I t s i w ell k no wn that Huan al

nected net w orks require longer enco dings First for

gorithm ields y the shortest enco ding of the N data

man y o n des the list of aren p ts will b e come l arger and

poin ts H

second t he list of conditional p robabilities w eneedto

store for that no de will also increase In addition net y that i n t he underlying distribution eac h atomic

w orks in whic h no des that ha v e a l arger n ber of ev en t e has probabilit y p and w e construct via some

i i

v alues ha v e paren ts with a arge l n berofv alues will learning sc heme a particular Ba y esian et n w ork from

require longer enco dings Hence the MDL rinciple p ra w d ata This Ba y esian net w a s a f o

whic h is trying to minimize the sum of the enco ding the underlying d istribution a nd it also assigns a prob

lengths will tend to fa v or net w orks in whic h t abilit a y y q o ev ery atomic ev t e Of course

i i

ha v e a smaller n um b e r of paren ts net w orks that in general q will not b e equal to p as t he learning

i i

are l ess connected and a lso et n w orks in whic hnodse sc heme cannot guaran tee that it will construct a p er

taking on a large n um ber of v alues are not aren p ts of fectly a ccurate net w ork Nev ertheless the aim is for q

i

no des that also tak e on a large n um ber of v alues to b e close to p and the c loser t i is t he more accurate

i

is our o m d el

In Ba y esian net w orks t he degree of connectivit yis

closely related to the computational complexit yof The constructed B a y esian net w ork is in tended as our

using the net w ork b oth space and time complexit y b st e uess represen tation of the underlying distribu

Hence our enco ding sc heme generates a preference for tion Hence giv en that the probabilities q determined

i

en des no he

del mo acts ork the

um

um

Sa

ts

char um

an ts in data

en

cen he via

ery

um

um

in

nds dep ak can

Eac

task The

b y the net w ork are our b est uess g of the true v alues p where F is the p o ssibly empt y set of paren ts of X

i X i

i

it mak es sense to design our Huan c o e d sing u these in the n et w ork Note that P app ears on the r igh thand

probabilities Using the q probabilities t he Huan side instead of Q e o btain t he conditional proba

i

algorithm will assign ev en t e a co dew ord of length ap bilit y p arameters o n t he righ t from frequency coun

i

pro ximately log q w e had the true probabilities tak en o v er the d ata p oin By t w of argel n um

i

p the algorithm w ould ha v e a ssigned e and optimal b rs e w ew ould exp ect that t hese frequency coun ts will

i i

co dew ord o f length log p i nstead Despite our use b e close to the true robabilities p o v er P

i

of the v alues q in assigning co dew ords the ra wdata

i

W e an c no wpro v e the follo wing new result that is he t

will con tin ue to b e determined b y the true p robabil

basis for our new lo calized description length compu

ities p That is w e still exp ect that for arge l N w e

i

tations

will ha v e Np o ccurrences of ev en t e s p is the true

i i i

probabilit yof e o c curring Therefore w hen w euse

i

Theorem The nc e o ding length o f t he data qua

the learned Ba y esian net w ork to enco de the data the

tion c b e xpre esse das

length of the tring s enco ding the database will b e a p

n n

pro ximately

X X X

N W X N P X log P X

i X i i i

X

i i X

i

N p log q

i i

i

where w e a re summing o v er all tomic a ev en ts Ho w

where the second sum i s ak t en o v er all p ossible instan

do es this enco ding length compare to the nco e ding

tiations of X eh rtem W X giv en b y

i i X

i

length if w e had access to the true probabilities p

i

X

P X

i X

i

An old theorem due originally to Gibbs giv es us the

W X P X og

i X i X

i i

P X P F

answ er

i X

i

X

i X

i

Theorem Gibbs L et p and q i e

i i

where the sum is tak en o v er all p ossible nstan i tiations

none gative r e al numb ers that sum to hen T

of X and ts i paren ts F dn w etak e W X

i X i X

i i

t t

F The pro o f f o t his and a ll other theorems

X X

X

i

p log p p q is presen ted n i o ur full rep ort Ba

i i i i

i i

Giv en some collection of ra w data the last term in

with e quality holding if and only if i qn the Equation is indep e nden t of the structure of the et n

i i

summation we take og b e w ork F urthermore the w eigh t measure the st term

in Equation can b e calculated lo cally

In other w ords this theorem s ho ws that the enco ding

using the estimated probabilities q will b e longer than

i

Lo calization of the Description

the enco ding using the true probabilities p It also

i

sa ys that the true probabilities a c hiev e he t minimal Length

enco ding length p o ssible

T omka e use of the MDL principle w e n aluate

The MDL principle sa ys that w em ust c ho ose a et n

the total description length tem item giv en a

w ork that minimizes the sum of its o wn enco ding

y esian net w ork Adding Equation nd a the total

length whic h dep ends on the complexit y f o he t net

description length i s

w ork and the enco ding length of the ata d giv en the

n n

X Y X

mo del whic h ep d ends on the c loseness of the roba p

k log n s s d N W X

i i j i X

i

bilities q determined b y the net w ork to the true rob p

i

i j F i

X

abilities p i on the accuracy of the mo del i

i

n

X X

W e could use Equation irectly d to ev aluate the the

N P X log P X

i i

enco ding length f o the data giv en the mo del Ho w

i X

i

ev er the equation in v olv es a summation o v er all the

atomic ev en ts and the n um b e r o f tomic a ev en ts is ex

n

X Y

p onen tial in the n um ber fo v ariables Instead o f trying

k log n s s d NW X

i i j i X

i

to use Equation directly w ein v estigate the relation

i j F

X

i

ship b et w een e nco d ing l ength and net w ork top ology

n

X X

Let the underlying oin j t distribution o v er the v ari

N P X i log P X i

ables X f X g be P n yBa y esian net w ork

n

i X

i

mo del will also dee a j oin t istribution d Q o v er these

v ariables W e an c express Q ea

It migh t ot n b e the ase c that P is equal to t his decom

Q X P X j F P X j F P X j F p osition The appro ximation in tro duced b your etn w

X X n X

n

mo del is precisely the assumption of suc h a decomp o sition

ork

as

Ba

ev to eed

to

log

if

an

la he ts If

ts

The last term in Equation remains c onstan t for a By direct causation information w e ean m information

ed collection of a r w ata d Therefore the st term of the form X is a d irect cause o f X That is w e

i j

is suien t to ompare c the total description lengths of migh t kno w o f a direct causal link b et w een t w ov ari

alternativ e candidate Ba y esian net w orks ables ev en if w e do not kno w he t causal relationships

bet w een the other v ariables This kind of information

Deitio n The no de description length DL for

migh tbepro vided b y e domain exp erts nd a w e can

i

the no de X with resp ect to its p aren ts F is deed

use i t hen w generating the et n w ork mo del In particu

i X

i

as

lar w e c an require that in t he learned o m d el X be oen

i

Y

of X paren ts th us ensuring hat t he t mo del v alidates

j

DL k log n s s d NW X

i i i j i X

i

the direct causation More generally the domain ex

j F

X

i

p rtse migh t b e able t o construct a k s eleton of the et n

w ork in v olving some but not all of the v ariables The

arcs in the sk eleton can b e sp ecid as direct causation

Deitio n The r elative total description length

sp eciations to our system whic h will t hen ro p ceed

foraBa y esian net w ork deed as the summation of

to l n i he t sk eleton placing he t remaining v ariables

the no de description length of ev ery no de in the et n

in appropriate p ositions

w ork is giv en b y

P artial ordering information on the ther o hand p s e c

n

X

is ordering relationships b et w een t w o no des Suc h

DL

i

information migh t for example come rom f k no wledge

i

ab out the temp oral ev olution of ev en ts in our domain

As a r esult the relativ e otal t description ength l is x e

F or instance if w ekno w t hat X o c b e X he

i j

actly equiv alen t to the st term in Equation and

net w ork mo del should not con tain a ath p from X to

j

th us is suien t for comparing candidate net w orks

X as no causal inence should exist in that direc

i

Moreo v er it can b e calculated lo cally since eac h DL

i

tion ote N that a t otal ordering among the v ariables

dep ends only on the set of paren t o n des for a giv en

as required b y Co o p er nd a Hersk o vits CH is just

no de X

i a p s ecial case of our p artial ordering sp eciations

Sub ject to the condition hat t the d irect causation a nd

Deitio n Giv en a collection o f a r w d

partial o rdering sp eciations ot n en tail an y t ransitiv

optimal Bayesian network a Ba y esian net w ork for

y iolations v e w e cannot ha v e a circular sequence

whic h the total description length is minim

of direct causations as input to the system our sys

tem can nsure e that t he constructed net w ork v alidates

Clearly one or more optimal Ba y esian net w orks m ust

these sp eciations F urthermore information of this

exist for an y collection of ra wdataF urthermore w e

sort can i n fact lead to increased iency e it will con

ha v e t he follo wing result

strain our searc h for an appropriate n et w ork o m del

Theorem Given a c ol le ction o f r aw data t he r

T o i ncorp orate this information w e dee a c on

ative total description length f o an optimal Bayesian

str aine d ayesian B network as follo ws

network is minimum A lso for a given no X in an

i

optimal Bayesian network DL is minimum among

i

Deitio n A c onstr aine d Bayesian n etwork is an

those p ar ent sets cr e ating no cycle and not making the

ordinary Ba y esian et n w ork whose t op ology includes all

network disc onne cte d That is we c annot r e duc e DL

i

the arcs sp ecid b y the direct causation sp eciations

by mo difying the network to change X p ar ents

i

and do es not v iolate n a y partial ordering sp ecia

tions

This theorem a s ys that in an optimal et n w ork no sin

gle no de can b e lo cally impro v ed I t s i p ossible ho w

It can b e sho wn that Theorem till s holds with

ev er that a nonptimal net w ork could a lso p o ssess

b o vious mo diations if w e consider constrained

this prop ert y In suc h a case the paren tsset fo a n um

y esian net w orks instead o f ordinary n et w orks

ber of nodse w ould ha v e to b e altered sim ultaneously

in order to reduce its description length

Searc hing f or the Best Constrained

Incorp orating P artial Domain

w

Kno wledge

Although our expression for the relativ e otalt descrip

Although w emhig t not kno w the underlying join tdis tion length allo ws us to ev aluate the elativ r e merit f o

tribution go v erning the b eha vior of the domain v ari candidate net w ork mo dels w e cannot onsider c all p os

ables w e could p o ssibly ha v e other partial informa n w orks there are imply s o t o man yof thme

tion ab out the domain In particular our ew n system n exp onen tial n um b e r in fact Hence t o a pply the

can consider t w ot yp es of domain kno wledge dir e ct MDL rinciple p w em ust e ngage in a heuristic searc h

c ausation sp e ci ations and p artial or dering s p e ci a that tries to d a go o d lo w escription d length

tions but n ot necessarily optimal net w ork o m el d

et sible

ork Net

Ba

the

de

el

um

it

is

an ata

fore curs

In this section w e d escrib e our searc h algorithm w hic h No ww e escrib d e the Ar cbsorption pro cedure

attempts to d a go o d net w b y building one p u whic h nds a lo cally optimal w a y o t nsert i a new arc

arc b y arc The st step is t o rank the p o ssible arcs to an existing net w ork T o inimize m the description

so that etter arcs can b e added in to the andidate c length of the resulting net w ork the pro cedure migh t

net w orks b efore others T he arcs are rank b ycla also decide to rev erse the direction of some of the o ld

culating the no de description length for X giv en the arcs

j

arc X X i j sing u Equation and treating X

i j i

as the single aren p t This no de description length is

Input net w ork G

old

assigned as the description length of arc X X

i j

n arc X X t o b e a dded

i j

A l ist of arcs P airs is created orted s so that the rst

Output A n ew net w ork G with the arc added

arc on P airs lo w est description length P airs will

and some other arcs p ssibly o rev ersed

con tain all arcs e xcept or f those violating the direct

causation or partial ordering sp eciations Lo oking

at Equation w e can see hat t if X and X are highly Create a n ew net w ork b y dding a the arc X

i j i

correlated s measured b y W X Equation X o G I n the new n et w ork w e hen t searc h

j i j old

the description length will b e lo w er and an arc b e lo cally o t d etermine if w e an c decrease the elativ r e

t w een them will b e one of the st that w e will try to total description l ength b yrev ersing the direction

add to the candidate net w orks of some of the rcs a This is accomplished via he t

follo wing steps

Searc h is p erformed b y a b estrst algorithm t hat

main tains Open and Closed lists eac hcon taining

Determine t he optimal directionalit y o f the arcs

searc h elemen ts The individual searc h elemen ts ha v e

attac hed directly to X b y examining whic h di

j

t w o omp c nen o ts h G L i a candidate net w ork G dn

rections minimize the relativ e t otal description

an arc L whic h could b e added o t the candidate net

length Some of these rcs a a m y b e rev ersed b y

w ork without causing a c ycle or violating the partial

this pro ess c F urthermore w e o d n ot consider

ordering and direct causation sp eciations Open

the rev ersal f o n a y rcs a t hat w

ordered b y heuristic v alue whic h is calculated s a the

violation f o the direct causation or partial order

relativ e total description length quation of the

ing s p eciations

elemen t net w ork plus the description length of the

elemen t arc alculated during the construction of

If the direction of an existing arc s i rev ersed t hen

P airs Therefore the lo w er the h euristic v alue the

p e t o v e d irectionalit y determination

shorter the enco ding length Initially e c onstruct a

step on the other no de acted

net w ork G con taining only those arcs included in

init

the direct causation sp eciations Then the nitial i

Open list is generated b y generating all f o the searc h The searc h p ro cedure is mainly omp c osed of t he Ar c

elemen ts h G i for all arcs L P aris Bestrst Absorption pro c edure a cycle c king routine and

init

searc h is then executed with the searc h lemen e tat the a p artial order c hec king routine The complexit yof

fron tof Open expanded as follo ws cycle c king and partial order c hec king are O n dn

O n r esp e ctiv ely where n is the n um b er f o n o d es

W eha v e f ound that he t searc h c eat a v ery

Remo v e the st elemen tform Open and cop yit

on to Closed Let the elemen t n et w ork b e G reasonable net w ork mo del if pro vided with a resource

and the elemen t rc a b e L bound of O n s h elemen ts expansions Under

this resource b ound w eha v e found t hat in practice the

In v ok eteh Ar cbsorption pro cedure on G

o v erall complexit y o f the searc h mec hanism remains

and L to obtain a ew n net w ork G with the

new

p lynomial o in the n ber fo nodes n

arc L added The Ar cbsorption pro edure c

describ ed b elo w migh t lso a rev erse the direction

W e c an further bserv o ethtwa henteh unamo tof od

of some other arcs in G

old main nformation i increases the searc htiem dnot a

go o d net w ork o m del decreases This arises from the

Next w e mak e a n ew searc h lemen e t onsisting c o f

fact that suc h information places constrain ts on the

G and the st arc from P airs that app ears

space f o allo w able mo dels making searc h asier e F or

after the old arc L whic h ould c b e added to G

new

example i f a total o rdering among t he no des in the

without generating a cycle or violating a p artial

domain is giv en the s earc h time will b e reduced b ya

ordering s p eciation This new elemen tis ledpac

factor of O n there is no need to p erform the cycle

on Open in the correct order according to the

or partial order c hec king nd a the arc rev ersal step in

heuristic function

Ar cbsorption is no longer needed

Finally emak e another new searc h elemen tcon

sisting of G and the rst rc a from P airs that

old

app ears after L whic h c ould b e added to G

Note that it is suien t t o compute the n o de d escrip

without generating a cycle r o violating a par

tion length quation of those n o es d whose paren ts ha v e

tial o rdering sp ciation e Again this elemen t been c hanged The r elativ e otal t description length qua

is placed on Open in the correct order tion of the hole w n et w ork n eed not b e computed

old

new

um

old

earc

old

arriv an

hec

hec

ab he rform

the in result ould

is

has

new

ed

in

orkoriginal net w ork

B

A C D

structure

Q

B

Q

Q B

Pb Pb

Q Qs BN G

Pab Pab c

E

Pab Pab d

conditiona l as t

Pab Pab c

same as the

first except

Pab Pab d

first except first e xcept

probabilit y for

Pab Pab c

for

Pab

Pab Pab c

Pab Pb

parameters

Pab Pab c

Pab Pab c

B B

B B

Z

D C

Z

C A

learned A

A D

C A

A B

D

structures S

D

S

A B

S

S

A B

S

S

Sw A B BN

AU Sw

C

E E

E E

Figure The Qualit y of Learned Net w orks

Exp erimen ts ables no des and arcs T his net w ork w as de

ed from a real orld application in medical diagno

sis SCC and is kno wn as the LARM A net w

F ollo wing H e test our approac hb y constructing

ee Bb for a iagram d of this net w ork After ap

an original net w ork nd a using Henrion logical sam

plying our h euristic searc h algorithm w e found that

pling tec hnique en to generate a collection of ra w

the earned l net w ork is almost iden o

data W e t hen apply our learning ec m hanism to the

structure with the exception of one diren t arc and

ra w data to obtain a learned net w ork By comparing

one missing arc One c haracteristic of our heuristic

this net w ork with the original w e can determine the

searc h lgorithm a s i t hat w e did not require a u ser sup

p erformance of ur o system

plied o rdering of v ariables f Co op er and Hersk o

H This exp erimen t emonstrates d the feasibilit y

In the st set of exp erimen ts the original Ba y esian

of our pproac a h for reco v ering net w orks of practical

net w ork G consisted of n o des and arcs W ev aried

size

the conditional probabilit y arameters p during he t pro

cess of generating the ra w ata d obtaining four iren d t

Besides b eing able t o u se extra domain information

sets of data Exhaustiv eserca hing instead f o h euris

our new searc h mec hanism s i faster nd a more accurate

tic searc hing w as then c arried out to nd the et n

than the m ec hanism st rep orted in Bb hic h

w ork with minim um total escription d length for eac h

w as dev elop ed without the o l al c measure of descrip

of these sets of ra w data r esulting in diren t l earned

tion length T oin v estigate ho w our searc hmce hanism

structures in eac h ase c The e xp erimen t emonstrates d

beha v es when domain information is supplied w ecno

that our algorithm do es in fact yield a tradeo b et w

ducted some further exp erimen ts Using the same set

accuracy and c omplexit y o f the learned tructures s n i

of ra w atad erivd ed from the ALARM mo del in c on

all cases where the original et n w ork w as not r eco v ered

junction with v arying amoun ts of domain information

a s impler n et w ork w as learned T he t yp e of structure

w e a pplied our earning l lgorithm a and recorded he t

learned dep ends on the arameters p as eac h s et of pa

searc h time r equired to obtain a n a ccurate n et w

rameters in conjunction with the structure dees a

mo del The follo t w o tables epict d the relativ e

diren t probabilit y distribution Some of these distri

time required b y the searc h a lgorithm when v arying

butions can b e accurately mo deled with simpler struc

amoun ts of direct causation a nd partial orderings sp ec

tures In the st case the istribution d eed d b ythe

iations are made a v ailable In general the searc h

parameters did not a h v e a simpler o m del of suien t

time decreases as the amoun t of causal information

accuracy ut b in the other cases i t did W eha v ealso

increases

dev elop ed measures of the absolute accuracy o f the

learned structures ee Bb for a full escription d

that indicate in all cases that the learned structure w as

v ery accurate ev en though it m igh t p ossess a iren d t

top ology

no partial partial partial total

The second exp e rimen t consisted of learning a

ordering orderings orderings ordering

Ba y esian net w ork ith w a fairly large n um ber of v ari time

wing

ork

een

vits

riginal the to tical

ork

riv

for

the as same

he same

no direct direct direct

trees IEEE T r ansactions on Information The

causal causal causal

sp eciation sp eciations sp eciations

LR T H Cormen C E L eiserson and R L

time

est Intr o duction to A lgorithms M ITress

Cam bridge Massac h usetts

Reemen t o f Existen tNet w orks

o o G F Co op er The computational complexit yof

probabilistic inference using Ba y esian b elief net

w orks A rtiial Intel ligenc e

Besides the adv tages outlined ab o v e o ur new lo cal om c

putation of description length also allo ws for the p ossibil

PP D Geiger A P az and J P earl Learning causal

it y of reing n a existing net w ork b y mo d ifying some lo cal

trees from dep endence i nformation In Pr o c e e d

part of it Reemen t is based on the follo wing theorem

ings of the A AAI N ational onfer C enc e p ages

Theorem L et X f X g b eteh no i n

n

en M Henrion Propagating uncertain t y in

an existent Bayesian network X b e a subset of X and

Ba y esian net w orks b y p robabilistic logic sam

DL b e the total no de description engths l of al l the no des

X

P

pling n I L N anal K and J F Lemmer e d

in X DL DL Supp ose we d a new

X i

X X

i itors Unc ertainty in A rtiial Intel ligen c eV ol

set of p ar ents or f every no de in X es not cr e ate a ny

II p ages Northolland Amsterdam

cycles or make the network disc onne cte d L et the n ew total

no de description lengths of al l the no des in X b e DL

new X

Ba W Lam and F Bacc h us L earning and reing

Then we c an c onstruct a new network in which the p ar ents

ba y esian net w orks using partial domain infor

of the no des in X ar er eplac e d y b their new p ar ent sets

mation In preparation

such that the new network wil l have lower total escription d

length if DL L Bb W Lam and F Bacc h Learning Ba y

new X X

b lief e net w orks an approac h based on he t MDL

This theorem pro vides a m eans to impro v eaBa y esian net principle Computational Intel ligenc e T o

w ork without ev aluating the total description length of the app ear

whole Ba y esian n et w ork a p oten tially exp ensiv e ask t if the

H D A Lelew er and D S Hirsc h b rg e Data com

net w ork is large W e can isolate a subset f o no es d and try to

pression A CM Computing Surveys

impro v e that collection lo cally ignoring the rest of the et n

w ork Algorithms for p erforming suc h a reemen t based

ea J P earl F usion propagation and structur

on this theorem ha v ebnee vde elop ed and exp erimen ts are

ing i n b elief n et w orks A rtiial Intel igenc l e

b eing p erformed W e h e to r on t w ork in the

near future Ba

ea J P earl Eviden tial reasoning u sing sto c hastic

ulation of ausal c m o d els A rtiial Intel i l

References

genc e

br B Abramson AR CO An application of b e

ea J P earl Pr ob abilistic R e asoning in Intel igent l

lief net w orks to the oil mark et In Pr o c e e dings

Systems Networks of P lausible Infer enc e or

of the Confer enc eon Unc ertainty in A rtiial

gan Kaufmann an S Mateo California

Intel l igenc e pages

V J P earl and T S V erma A theory o f nferred i

BS C Berzuini R ellazzi B and D piegelhalter S

causation I n o c e e dings f o he t d Interna

y esian net w orks applied to therap y m onitor

tional C onfer e o n Principles of Know le dge

ing In Pr o c e e dings of the C onfer enc eon Un

R epr esentation and R e asoning pages

c ertainty in A rtiial Intel ligen c e ages p

is J Rissanen Mo deling b y s hortest data descrip

SCC I A Beinlic h H J Suermondt R M Cha v ez

tion A utomatic a

nda G F Cooper The LARM A monitoring

is J Rissanen chastic Complexity in Statistic

system A case study ith w t w o p robabilistic i n

Inquiry orld Scien ti

ference tec hniques for b lief e net w orks I n Pr o

c e e dings of the d Eur op e an Confer enc eon A r

P G Rebane and J P earl T he reco v ery of ausal c

tiial Intel ligen c ein eM dicine pages

p olyrees f rom statistical ata d In Pr o c e e dings

of the C onfer enc eon Unc ertainty in A rtiial

Intel l igenc e p ages

C R Cha v ez and G F Co op er A randomized

appro ximation algorithm for probabilistic infer

P R D Shac h ter nd a M A P eot Sim ap

ence on Ba y esian Belief net w orks Networks

proac hes to general probabilistic inference n o

b lief e net w orks In M Henrion R Shacter

L N Kanal and J F Lemmer editors Un

H G F Co op er and E Hersk o vits A Ba y esian

metho d for constructing Ba y esian b e lief net c ertainty in A rtiial Intel ligenc e pages

Northolland Amsterdam

w orks from databases In Pr o c e e dings o f the

Confer enc eon Unc ertainty in A rtiial Intel

S C Spirtes P lymour G and R Sc heines Causal

ligenc e pages

y from p robabilit y In Evolving now K le dge

L C K w a nd C N Liu Appro ximating dis in Natur al Scienc eand A rtiial Intel igenc l e

crete probabilit y istributions d with dep e ndence pages

Cho

it

ulation

al Sto

enc

Ba

Pr

sim

his ort ep op

esian us

do that

des

an

Riv

oryP T V erma and J P earl Causal net w orks Se

tics and expressiv eness In R Shacter T S

Levitt L N Kanal and J F Lemmer ditors e

Unc ertainty in A rtiial ntel I ligenc e gesa

Northolland A msterdam

man

## Comments 0

Log in to post a comment