U s i n g C a u s a l I n f o r m a t i o n a n d L o c a l M e a s u r e s t o

reverandrunAI and Robotics

Nov 7, 2013 (4 years and 6 months ago)


App ears in Pro ceedings of
Unc ertainty in A rtiial Intel ligenc e
P ages
Using Causal Information and Lo al c Measures to
Learn Ba y esian Net w orks
W ai Lam F ahiem B acc h us
Departmen t f o omputer C Science Departmen t of Computer cience S
Univ ersit yof W aterlo o Univ ersit yof W aterlo o
W aterlo o On tario W aterlo o On tario
Canada N Canada N G
Abstract neering b ottlenec k Clearly t w ould b e extremely use
ful i f the construction pro cess could b e fully or partly
automated A u seful a pproac h that has recen tly b e
In previous w ork w e dev elop ed a metho d f o
ing p ursued b ya n um b er f o authors is to attempt to
learning Ba y esian Net w ork mo dels from a r w
build or learn a et n w ork mo del f rom ra w data In
data This metho d relies on the w ell k no wn
practice ra w ata d s i often a v ailable from databases of
minimal description length DL principle
The MDL principle is particularly w ell suited
to this task as it allo ws us to tradeo n i a
W eha v edev elop ed a n ew approac h to l earning
principled w a y t he accuracy o f t he learned
y esian net w ork mo els d LBb Our approac h
w ork against its practical usefulness In
is based on Rissanen inimal M Description Length
this pap er w e presen t ome s n ew results that
DL is principle The M DL principle ors a
ha v e risen a from our w ork In particular w e
means for trading o mo del omplexit c y and accuracy
presen t a new lo c al w a y f o computing the
and our exp erimen v e demonstrated its suitabil
description length This allo ws us to mak e
y or f this task In this ap p er w e p resen t s s ig
signian t mpro i v emen ts in our searc haglo
nian t impro v emen ts to our riginal o system Bb
rithm In a ddition w e m o dify our algorithm
whic h mak e it ore m ien e t allo wittotak e
so that it can tak ein to accoun t artial p do
to consideration domain information ab out causa
main information that migh tbepro vided b y
tion and ordering and allo w lo cal reemen tof an
a domain exp rt e The lo cal computation of
existing net w ork
description length a lso op ns e the do or for lo
These impro v emen ts are mainly based on a n ew anal
cal reemen t of an existen t net w ork The
ysis of the description length p arameter that sho ws
feasibilit y o f our approac h s i demonstrated
ww ecna ev aluate the d escription length of a pro
b y exp erimen ts in v olving net w orks of a rac p
p sed o net w ork via lo c al computations in v
tical size
a n o e d nd a its aren p ts This lo calized ev of
description length allo ws us to dev elop an impro v
searc hing mec hanism that p e rforms w ell ev en in fairly
large d omains In addition it llo a ws us to mo dify our
searc h pro cedure so t hat it can tak ein
In tro duction
domain kno wledge of dir e ct c auses as w ell as p artial or
derings among t he v ariables Suc h p artial information
Ba y esian net w orks adv anced b yP earl P ea ha v e
ab out the structure of the d omain is q uite common
b ecome an imp ortan t paradigm for represen ting and
and in man y ases c it can reduce the complexit yof het
reasoning under uncertain t y Systems based n o
searc hing pro ess c d uring l earning
Ba y esian net w orks ha v e b een constructed in a n um
The lo calized ev aluation of description ength l also al
b er of diren t pplication a areas ranging from medi
ws us to mo dify an existing Ba y esian net w ork b y
cal diagnosis BS to oil price reasoning br
reing a lo cal part of it y B eing r the n et w ork w e
Despite these successes a ma jor o bstacle to u sing
obtain a more accurate mo del o r adapt an existing
Ba y esian net w orks lies in the diult y of constructing
mo del to n a en vironmen t t hat has c hanged o v er
them in complex domains there i s a kno wledge engi
In the sequel w e w ill st describ e b rie he t k ey fea
W ai Lam w ork w as supp orted b y an OGS sc holarship
tures of our previous w ork concen trating in particular
His eail address is wlammath w erl oo ca
a an tages f o the MDL approac h Then w e de
F ahiem Bacc h us w ork w as supp orted b y NSER Cadn
e a new l o calized v ersion of the description length
b y IRIS His eail address is
computation Using his t w edev elop an algorithm that
fbacchus og os uw ate rlo o ca
dv the on
consideration to
only olving

ome it
ha ts

searc hes for a go o d net w ork mo del taking n i to consid abilit y p arameters Hence ev en if a ighly h connected
eration causal and ordering information ab out the d o net w ork is more accurate in practice it migh t not b e
main Finally e discuss the esults r of v arious exp e r as useful a mo el d as a impler s alb eit sligh tly l ess c a
imen ts w eha v e un r that demonstrate the ectiv enes s curate mo el d n I ddition a o t t he computational dis
of our approac h The exp erimen tal results of our w ork adv an tages t he causal relationships b et w een the v ari
on lo cal reemen t f o an existing net w ork are not y et ables a re conceptually more diult to understand in
complete but w e will close ith w a rief b discussion of a complex et n w ork
the metho d The exp rimen e t results will b e rep orted
Hence w e re a faced with a tradeo More complex
in our full rep o rt Ba
net w orks allo w for more accurate mo dels but a t he t
same time suc hmosmdel a y b e o f less practical use
than simpler o m dels The MDL principle allo ws us
Learning Ba y esian Net w orks
to balance this tradeo our metho d will learn a l ess
complex et n w ork f i hat t net w ork is suien tly accu
Muc h early w ork on learning a B y esian net w orks shares
rate and a t the same time i t s i still capable of learning
the common isadv d tage of relying on ssumptions a
a complex et n w ork if o n s impler one is suien tly c a
ab out the underlying distribution b eing learned F or
curate This seems to b e a particularly appropriate
example Cho w and Liu L ve elop ed metho ds
approac hto tka einlgih t o f he t fact that w eonlyha v e
that construct ree t structured net w orks hence their
a ample s f o ata d p oin ts from the u nderlying distribu
metho d pro vides no uaran g tees ab ut o the a ccuracy
tion hat T is it seems inappropriate to try to learn
of the learned structure if he t underlying distribution
the most accurate mo del f o t he underlying distribu
cannot b e expressed b y a tree structure a p
tion giv en that the ra w ata d only pro s u w an
proac h of Rebane and P earl P as w ell as t hat
appro ximate picture of t i
of Geiger et al PP surs rom f the same criti
cism except that they are able to construct singly con
Among other w orks on learning Ba y esian et n w orks he t
nected net w orks Sprites et alSS sw ell as V erma
most closely related is that of Co op er and Hersk o
and P earl VP PV dev elop approac hes that a re
H They use a B a y esian approac h t hat lik e ours
able to construct m ultiply onnected c net w orks but
is capable o f learning m ultiply connected net w orks
they require he t underlying distribution to b e dag
Ho w er as with all Ba y esian pproac a hes they m ust

c ho ose some p rior distribution o v er the space of p ossi
ble et n w orks One w a y o f viewing the DL M principle is
The problem with making an assumption ab out the
a m hanism for c ho osing a r easonable prior hat t s i
underlying distribution is that generally w e o d not
biased to w ards simpler mo dels C o op er nd a Hersk o
ha v e suien t information to test our assumption
H n v estigate a n um b er f o iren d t riors p but t i
The underlying distribution is nkno u wn all w eha v eis
is unclear o h wan y p articular c hoice will inence the
a collection of records in the form of v ariable nstan i tia
end result The DL M principle on the other h and
tions Hence in practice these metho d s or n o guar
allo ws the system designer who can c ho ose diren t
an tees ab out the accuracy of the earned l mo del xcept e
w a ys of enco ding the n et w ork to c ho ose a prior based
in the rare circumstances where w e kno w something
on principles of computational eiency or exam
ab out the underlying distribution
ple if w e prefer to learn net w orks in whic hnonoed
Our approac h c an construct an a ccurate mo del from
has more than paren ts w ecna c ho ose an enco ding
an unrestricte d ange r of underlying distributions and
sc heme that imp oses a high p e nalt y on net w orks that
it is capable of constructing net w orks o f rbitrary a
violate this constrain t
top ology i it can construct m ultiply connected net
w orks The abilit y to construct a m ultiply connected
Applying the MDL Principle
net w orks is sometimes essen tial if the net w ork i s to b e
a suien tly accurate mo del of the underlying distri
The MDL p rinciple is based on the idea that the b est
mo del represen ting a c ollection of data items is the
Although m ultiply connected net w orks a llo wusto
mo del that inimizes m the sum f o
more accurately mo del the nderlying u distribution
they ha v e omputational c as w ell as conceptual dis
the length f o the enco ding f o he t mo del and
adv an tages Exact b elief up dating pro cedures are in
the w orst case computationally in tractable o v er m ul
the l ength of the nco e ding of the d ata
tiply connected et n w orks o o Moreo v er ev en if
mo del
an appro ximation algorithm is used e the sto c has
tic sim ulation metho ds of C ea SP highly
b th o of whic h an c b e measured in bits A detailed
connected net w orks still require the storage and esti
description of he t MDL principle with n umerous ex
mation of an exp onen tial n b r e of conditional rob p
amples of its application can b e found in is

A distribution is dagsomorphic i f there s i some ag d

that displa ys all of its dep endencies and indep endencies The n b r e of parameters required is exp onen tial in
ea the maxim n um broe fpnare ts of no de

the given

ec as

ith vides

T o apply the M DL principle to the task of learning more eien tnet w orks That is since the enco ding
Ba y esian net w orks w e need t o sp cify e ho ww ecan per length of the mo del is included in our ev aluation f o
form the t w o enco dings the net w ork i tself tem and description length w e are enforcing a preference for
the ra wdaagt iv en a net w ork tem net w orks that require the storage of few er probabilit y
parameters and o n hic w h e xact algorithms are more
eien t
Enco ding the N et w ork Our enco ding sc heme for
the net w orks has the prop ert y that the higher the
top ological complexit y o f the net w ork the longer will
Enco ding the Data Using the Mo del is
b e its enco ding T o represen t the structure of a
to learn the join t distribution of a collection of random
Ba y esian net w ork w enedfe ro eca h o n de a list of its

v ariables X f X g hv ariable X has
n i
paren ts and a list of its conditional probabilit y param i
an asso ciated collection o f v alues f x g that it
i i
t e n o where the n um ber of v alues s e on
v i ery d istinct c hoice o f v alues for ll a the v ariables
Supp ose there are n no des in the problem d omain F or

X dees an atomic ev en t i n t he underlying join t
a no de with k paren ts w e need k log n b its t o list

distribution and i s assigned a p articular probabilit yb y
its paren ts T o represen t t he conditional p robabilities
that distribution
the enco ding length will b e the pro uct d of t he n ber
of bits required to s tore the n umerical v alue of eac h
W e ssume a that the d ata p oin ts in the ra wdaaat er
conditional probabilit y and the total n berofcon
all atomic ev en ts T hat is eac hdtapa noi t s p ecis a
ditional probabilities that are required In a Ba y esian

v alue for ev ery andom r v ariable in X urthermore
net w ork a conditional probabilit y is needed for ev
w e ssume a that the data p oin ts are he t result of in
distinct instan tiation of the paren t no des and no e d i t
dep nden e t random trials Hence w ew ould exp ct e
self xcept t hat one of these conditional robabilities p
t tral limit theorem t hat eac h articular p in
can b e computed from the others due to the fact that
stan tiation of the v ariables w ould v e tually app ear in
they all sum to F or example i f a no de that can
the atabase d ith w a relativ e frequency a ppro ximately
tak e on distinct v alues has paren ts eac hof wihc h
equal to its robabilit p y T hese are s tandard a ssump

can tak e on distinct v alues w e iwllndee
conditional probabilities
Giv en a ollection c of N p o w ew t o t enco de
Hence the total description length for a articular p net
or store the data as a b inary tring s There are v arious
w ork will b e
w a ys in whic h his t enco ing d can b e done but here w e
are only in terested in using he t length of t he enco d
k log n d s s
i i j

ing s a a metric via item in the MDL principle for
i j F
comparing the merit o f c andidate Ba y esian Net w orks
where there are n no des for no de i k is the n ber Hence w e can limit our atten tion to acter c o des
of its paren tnodse s is the n um ber of v alues it can LR p p W ith c haracter co des eac h atomic
tak e on and F is the set of i ts paren ts and d repre ev en t i s a ssigned a unique binary string Eac hofthe
sen ts the n um b er f o its b required to store a n umerical data p o in ts whic h are all a tomic ev en scsont v erted
v alue F or a p articular problem domain n and d will to its c haracter co de and t he N p oin ts are r epresen ted
b e constan ts This is not the only enco ding sc heme b y the string formed b y c oncatenating these c haracter
p ossible but it is simple nd a it p erforms w ell i n uro co des together T o minimize the total ength l of the
exp erimen enco ding w e assign shorter co des t o ev en ts that o c
cur more frequen tly T his is the basis for Huan
By lo oking at this equation w esee ththa ilghycon
enco ding sc heme I t s i w ell k no wn that Huan al
nected net w orks require longer enco dings First for
gorithm ields y the shortest enco ding of the N data
man y o n des the list of aren p ts will b e come l arger and
poin ts H
second t he list of conditional p robabilities w eneedto
store for that no de will also increase In addition net y that i n t he underlying distribution eac h atomic
w orks in whic h no des that ha v e a l arger n ber of ev en t e has probabilit y p and w e construct via some
i i
v alues ha v e paren ts with a arge l n berofv alues will learning sc heme a particular Ba y esian et n w ork from
require longer enco dings Hence the MDL rinciple p ra w d ata This Ba y esian net w a s a f o
whic h is trying to minimize the sum of the enco ding the underlying d istribution a nd it also assigns a prob
lengths will tend to fa v or net w orks in whic h t abilit a y y q o ev ery atomic ev t e Of course
i i
ha v e a smaller n um b e r of paren ts net w orks that in general q will not b e equal to p as t he learning
i i
are l ess connected and a lso et n w orks in whic hnodse sc heme cannot guaran tee that it will construct a p er
taking on a large n um ber of v alues are not aren p ts of fectly a ccurate net w ork Nev ertheless the aim is for q
no des that also tak e on a large n um ber of v alues to b e close to p and the c loser t i is t he more accurate
is our o m d el
In Ba y esian net w orks t he degree of connectivit yis
closely related to the computational complexit yof The constructed B a y esian net w ork is in tended as our
using the net w ork b oth space and time complexit y b st e uess represen tation of the underlying distribu
Hence our enco ding sc heme generates a preference for tion Hence giv en that the probabilities q determined

en des no he
del mo acts ork the


char um

an ts in data
cen he via


nds dep ak can

task The
b y the net w ork are our b est uess g of the true v alues p where F is the p o ssibly empt y set of paren ts of X
i X i
it mak es sense to design our Huan c o e d sing u these in the n et w ork Note that P app ears on the r igh thand
probabilities Using the q probabilities t he Huan side instead of Q e o btain t he conditional proba
algorithm will assign ev en t e a co dew ord of length ap bilit y p arameters o n t he righ t from frequency coun
pro ximately log q w e had the true probabilities tak en o v er the d ata p oin By t w of argel n um
p the algorithm w ould ha v e a ssigned e and optimal b rs e w ew ould exp ect that t hese frequency coun ts will
i i

co dew ord o f length log p i nstead Despite our use b e close to the true robabilities p o v er P
of the v alues q in assigning co dew ords the ra wdata
W e an c no wpro v e the follo wing new result that is he t
will con tin ue to b e determined b y the true p robabil
basis for our new lo calized description length compu
ities p That is w e still exp ect that for arge l N w e
will ha v e Np o ccurrences of ev en t e s p is the true
i i i
probabilit yof e o c curring Therefore w hen w euse
Theorem The nc e o ding length o f t he data qua
the learned Ba y esian net w ork to enco de the data the
tion c b e xpre esse das
length of the tring s enco ding the database will b e a p
n n
pro ximately
N W X N P X log P X
i X i i i
i i X
N p log q
i i

where w e a re summing o v er all tomic a ev en ts Ho w
where the second sum i s ak t en o v er all p ossible instan
do es this enco ding length compare to the nco e ding
tiations of X eh rtem W X giv en b y
i i X
length if w e had access to the true probabilities p
i X
An old theorem due originally to Gibbs giv es us the
W X P X og
i X i X
i i
answ er
i X
i X

Theorem Gibbs L et p and q i e
i i
where the sum is tak en o v er all p ossible nstan i tiations
none gative r e al numb ers that sum to hen T
of X and ts i paren ts F dn w etak e W X
i X i X
i i
t t
F The pro o f f o t his and a ll other theorems
p log p p q is presen ted n i o ur full rep ort Ba
i i i i

i i
Giv en some collection of ra w data the last term in
with e quality holding if and only if i qn the Equation is indep e nden t of the structure of the et n
i i
summation we take og b e w ork F urthermore the w eigh t measure the st term

in Equation can b e calculated lo cally
In other w ords this theorem s ho ws that the enco ding
using the estimated probabilities q will b e longer than
Lo calization of the Description
the enco ding using the true probabilities p It also
sa ys that the true probabilities a c hiev e he t minimal Length
enco ding length p o ssible
T omka e use of the MDL principle w e n aluate
The MDL principle sa ys that w em ust c ho ose a et n
the total description length tem item giv en a
w ork that minimizes the sum of its o wn enco ding
y esian net w ork Adding Equation nd a the total
length whic h dep ends on the complexit y f o he t net
description length i s
w ork and the enco ding length of the ata d giv en the
n n
mo del whic h ep d ends on the c loseness of the roba p
k log n s s d N W X
i i j i X
bilities q determined b y the net w ork to the true rob p
i j F i
abilities p i on the accuracy of the mo del i
W e could use Equation irectly d to ev aluate the the
N P X log P X
i i
enco ding length f o the data giv en the mo del Ho w
i X
ev er the equation in v olv es a summation o v er all the
atomic ev en ts and the n um b e r o f tomic a ev en ts is ex
p onen tial in the n um ber fo v ariables Instead o f trying
k log n s s d NW X
i i j i X
to use Equation directly w ein v estigate the relation
i j F
ship b et w een e nco d ing l ength and net w ork top ology
Let the underlying oin j t distribution o v er the v ari
N P X i log P X i

ables X f X g be P n yBa y esian net w ork
i X
mo del will also dee a j oin t istribution d Q o v er these
v ariables W e an c express Q ea

It migh t ot n b e the ase c that P is equal to t his decom

Q X P X j F P X j F P X j F p osition The appro ximation in tro duced b your etn w
X X n X
mo del is precisely the assumption of suc h a decomp o sition

ev to eed



la he ts If
The last term in Equation remains c onstan t for a By direct causation information w e ean m information
ed collection of a r w ata d Therefore the st term of the form X is a d irect cause o f X That is w e
i j
is suien t to ompare c the total description lengths of migh t kno w o f a direct causal link b et w een t w ov ari
alternativ e candidate Ba y esian net w orks ables ev en if w e do not kno w he t causal relationships
bet w een the other v ariables This kind of information
Deitio n The no de description length DL for
migh tbepro vided b y e domain exp erts nd a w e can
the no de X with resp ect to its p aren ts F is deed
use i t hen w generating the et n w ork mo del In particu
i X
lar w e c an require that in t he learned o m d el X be oen
of X paren ts th us ensuring hat t he t mo del v alidates
DL k log n s s d NW X
i i i j i X
the direct causation More generally the domain ex
j F
p rtse migh t b e able t o construct a k s eleton of the et n

w ork in v olving some but not all of the v ariables The
arcs in the sk eleton can b e sp ecid as direct causation
Deitio n The r elative total description length
sp eciations to our system whic h will t hen ro p ceed
foraBa y esian net w ork deed as the summation of
to l n i he t sk eleton placing he t remaining v ariables
the no de description length of ev ery no de in the et n
in appropriate p ositions
w ork is giv en b y
P artial ordering information on the ther o hand p s e c
is ordering relationships b et w een t w o no des Suc h
information migh t for example come rom f k no wledge
ab out the temp oral ev olution of ev en ts in our domain
As a r esult the relativ e otal t description ength l is x e
F or instance if w ekno w t hat X o c b e X he
i j
actly equiv alen t to the st term in Equation and
net w ork mo del should not con tain a ath p from X to
th us is suien t for comparing candidate net w orks
X as no causal inence should exist in that direc
Moreo v er it can b e calculated lo cally since eac h DL
tion ote N that a t otal ordering among the v ariables
dep ends only on the set of paren t o n des for a giv en
as required b y Co o p er nd a Hersk o vits CH is just
no de X
i a p s ecial case of our p artial ordering sp eciations
Sub ject to the condition hat t the d irect causation a nd
Deitio n Giv en a collection o f a r w d
partial o rdering sp eciations ot n en tail an y t ransitiv
optimal Bayesian network a Ba y esian net w ork for
y iolations v e w e cannot ha v e a circular sequence
whic h the total description length is minim
of direct causations as input to the system our sys
tem can nsure e that t he constructed net w ork v alidates
Clearly one or more optimal Ba y esian net w orks m ust
these sp eciations F urthermore information of this
exist for an y collection of ra wdataF urthermore w e
sort can i n fact lead to increased iency e it will con
ha v e t he follo wing result
strain our searc h for an appropriate n et w ork o m del
Theorem Given a c ol le ction o f r aw data t he r
T o i ncorp orate this information w e dee a c on
ative total description length f o an optimal Bayesian
str aine d ayesian B network as follo ws
network is minimum A lso for a given no X in an
optimal Bayesian network DL is minimum among
Deitio n A c onstr aine d Bayesian n etwork is an
those p ar ent sets cr e ating no cycle and not making the
ordinary Ba y esian et n w ork whose t op ology includes all
network disc onne cte d That is we c annot r e duc e DL
the arcs sp ecid b y the direct causation sp eciations
by mo difying the network to change X p ar ents
and do es not v iolate n a y partial ordering sp ecia
This theorem a s ys that in an optimal et n w ork no sin
gle no de can b e lo cally impro v ed I t s i p ossible ho w
It can b e sho wn that Theorem till s holds with
ev er that a nonptimal net w ork could a lso p o ssess
b o vious mo diations if w e consider constrained
this prop ert y In suc h a case the paren tsset fo a n um
y esian net w orks instead o f ordinary n et w orks
ber of nodse w ould ha v e to b e altered sim ultaneously
in order to reduce its description length
Searc hing f or the Best Constrained
Incorp orating P artial Domain
Kno wledge
Although our expression for the relativ e otalt descrip
Although w emhig t not kno w the underlying join tdis tion length allo ws us to ev aluate the elativ r e merit f o
tribution go v erning the b eha vior of the domain v ari candidate net w ork mo dels w e cannot onsider c all p os
ables w e could p o ssibly ha v e other partial informa n w orks there are imply s o t o man yof thme
tion ab out the domain In particular our ew n system n exp onen tial n um b e r in fact Hence t o a pply the
can consider t w ot yp es of domain kno wledge dir e ct MDL rinciple p w em ust e ngage in a heuristic searc h
c ausation sp e ci ations and p artial or dering s p e ci a that tries to d a go o d lo w escription d length
tions but n ot necessarily optimal net w ork o m el d
et sible

ork Net


an ata
fore curs

In this section w e d escrib e our searc h algorithm w hic h No ww e escrib d e the Ar cbsorption pro cedure
attempts to d a go o d net w b y building one p u whic h nds a lo cally optimal w a y o t nsert i a new arc
arc b y arc The st step is t o rank the p o ssible arcs to an existing net w ork T o inimize m the description
so that etter arcs can b e added in to the andidate c length of the resulting net w ork the pro cedure migh t
net w orks b efore others T he arcs are rank b ycla also decide to rev erse the direction of some of the o ld
culating the no de description length for X giv en the arcs
arc X X i j sing u Equation and treating X
i j i
as the single aren p t This no de description length is
Input net w ork G
assigned as the description length of arc X X
i j
n arc X X t o b e a dded
i j
A l ist of arcs P airs is created orted s so that the rst
Output A n ew net w ork G with the arc added
arc on P airs lo w est description length P airs will
and some other arcs p ssibly o rev ersed
con tain all arcs e xcept or f those violating the direct
causation or partial ordering sp eciations Lo oking
at Equation w e can see hat t if X and X are highly Create a n ew net w ork b y dding a the arc X
i j i
correlated s measured b y W X Equation X o G I n the new n et w ork w e hen t searc h
j i j old
the description length will b e lo w er and an arc b e lo cally o t d etermine if w e an c decrease the elativ r e
t w een them will b e one of the st that w e will try to total description l ength b yrev ersing the direction
add to the candidate net w orks of some of the rcs a This is accomplished via he t
follo wing steps
Searc h is p erformed b y a b estrst algorithm t hat
main tains Open and Closed lists eac hcon taining
Determine t he optimal directionalit y o f the arcs
searc h elemen ts The individual searc h elemen ts ha v e
attac hed directly to X b y examining whic h di
t w o omp c nen o ts h G L i a candidate net w ork G dn
rections minimize the relativ e t otal description
an arc L whic h could b e added o t the candidate net
length Some of these rcs a a m y b e rev ersed b y
w ork without causing a c ycle or violating the partial

this pro ess c F urthermore w e o d n ot consider
ordering and direct causation sp eciations Open
the rev ersal f o n a y rcs a t hat w
ordered b y heuristic v alue whic h is calculated s a the
violation f o the direct causation or partial order
relativ e total description length quation of the
ing s p eciations
elemen t net w ork plus the description length of the
elemen t arc alculated during the construction of
If the direction of an existing arc s i rev ersed t hen
P airs Therefore the lo w er the h euristic v alue the
p e t o v e d irectionalit y determination
shorter the enco ding length Initially e c onstruct a
step on the other no de acted
net w ork G con taining only those arcs included in
the direct causation sp eciations Then the nitial i
Open list is generated b y generating all f o the searc h The searc h p ro cedure is mainly omp c osed of t he Ar c
elemen ts h G i for all arcs L P aris Bestrst Absorption pro c edure a cycle c king routine and
searc h is then executed with the searc h lemen e tat the a p artial order c hec king routine The complexit yof
fron tof Open expanded as follo ws cycle c king and partial order c hec king are O n dn

O n r esp e ctiv ely where n is the n um b er f o n o d es
W eha v e f ound that he t searc h c eat a v ery
Remo v e the st elemen tform Open and cop yit
on to Closed Let the elemen t n et w ork b e G reasonable net w ork mo del if pro vided with a resource

and the elemen t rc a b e L bound of O n s h elemen ts expansions Under
this resource b ound w eha v e found t hat in practice the
In v ok eteh Ar cbsorption pro cedure on G
o v erall complexit y o f the searc h mec hanism remains
and L to obtain a ew n net w ork G with the
p lynomial o in the n ber fo nodes n
arc L added The Ar cbsorption pro edure c
describ ed b elo w migh t lso a rev erse the direction
W e c an further bserv o ethtwa henteh unamo tof od
of some other arcs in G
old main nformation i increases the searc htiem dnot a
go o d net w ork o m del decreases This arises from the
Next w e mak e a n ew searc h lemen e t onsisting c o f
fact that suc h information places constrain ts on the
G and the st arc from P airs that app ears
space f o allo w able mo dels making searc h asier e F or
after the old arc L whic h ould c b e added to G
example i f a total o rdering among t he no des in the
without generating a cycle or violating a p artial
domain is giv en the s earc h time will b e reduced b ya
ordering s p eciation This new elemen tis ledpac

factor of O n there is no need to p erform the cycle
on Open in the correct order according to the
or partial order c hec king nd a the arc rev ersal step in
heuristic function
Ar cbsorption is no longer needed
Finally emak e another new searc h elemen tcon
sisting of G and the rst rc a from P airs that

app ears after L whic h c ould b e added to G
Note that it is suien t t o compute the n o de d escrip
without generating a cycle r o violating a par
tion length quation of those n o es d whose paren ts ha v e
tial o rdering sp ciation e Again this elemen t been c hanged The r elativ e otal t description length qua
is placed on Open in the correct order tion of the hole w n et w ork n eed not b e computed


arriv an

ab he rform
the in result ould


orkoriginal net w ork


Pb Pb


Pab Pab c
Pab Pab d
conditiona l as t
Pab Pab c
same as the
first except
Pab Pab d
first except first e xcept
probabilit y for
Pab Pab c
Pab Pab c
Pab Pb
Pab Pab c

Pab Pab c



learned A



structures S






Figure The Qualit y of Learned Net w orks
Exp erimen ts ables no des and arcs T his net w ork w as de
ed from a real orld application in medical diagno
sis SCC and is kno wn as the LARM A net w
F ollo wing H e test our approac hb y constructing
ee Bb for a iagram d of this net w ork After ap
an original net w ork nd a using Henrion logical sam
plying our h euristic searc h algorithm w e found that
pling tec hnique en to generate a collection of ra w
the earned l net w ork is almost iden o
data W e t hen apply our learning ec m hanism to the
structure with the exception of one diren t arc and
ra w data to obtain a learned net w ork By comparing
one missing arc One c haracteristic of our heuristic
this net w ork with the original w e can determine the
searc h lgorithm a s i t hat w e did not require a u ser sup
p erformance of ur o system
plied o rdering of v ariables f Co op er and Hersk o
H This exp erimen t emonstrates d the feasibilit y
In the st set of exp erimen ts the original Ba y esian
of our pproac a h for reco v ering net w orks of practical
net w ork G consisted of n o des and arcs W ev aried
the conditional probabilit y arameters p during he t pro
cess of generating the ra w ata d obtaining four iren d t
Besides b eing able t o u se extra domain information
sets of data Exhaustiv eserca hing instead f o h euris
our new searc h mec hanism s i faster nd a more accurate
tic searc hing w as then c arried out to nd the et n
than the m ec hanism st rep orted in Bb hic h
w ork with minim um total escription d length for eac h
w as dev elop ed without the o l al c measure of descrip
of these sets of ra w data r esulting in diren t l earned
tion length T oin v estigate ho w our searc hmce hanism
structures in eac h ase c The e xp erimen t emonstrates d
beha v es when domain information is supplied w ecno
that our algorithm do es in fact yield a tradeo b et w
ducted some further exp erimen ts Using the same set
accuracy and c omplexit y o f the learned tructures s n i
of ra w atad erivd ed from the ALARM mo del in c on
all cases where the original et n w ork w as not r eco v ered
junction with v arying amoun ts of domain information
a s impler n et w ork w as learned T he t yp e of structure
w e a pplied our earning l lgorithm a and recorded he t
learned dep ends on the arameters p as eac h s et of pa
searc h time r equired to obtain a n a ccurate n et w
rameters in conjunction with the structure dees a
mo del The follo t w o tables epict d the relativ e
diren t probabilit y distribution Some of these distri
time required b y the searc h a lgorithm when v arying
butions can b e accurately mo deled with simpler struc
amoun ts of direct causation a nd partial orderings sp ec
tures In the st case the istribution d eed d b ythe
iations are made a v ailable In general the searc h
parameters did not a h v e a simpler o m del of suien t
time decreases as the amoun t of causal information
accuracy ut b in the other cases i t did W eha v ealso
dev elop ed measures of the absolute accuracy o f the
learned structures ee Bb for a full escription d
that indicate in all cases that the learned structure w as
v ery accurate ev en though it m igh t p ossess a iren d t
top ology
no partial partial partial total
The second exp e rimen t consisted of learning a
ordering orderings orderings ordering
Ba y esian net w ork ith w a fairly large n um ber of v ari time


riginal the to tical



the as same

he same

no direct direct direct
trees IEEE T r ansactions on Information The
causal causal causal

sp eciation sp eciations sp eciations
LR T H Cormen C E L eiserson and R L
est Intr o duction to A lgorithms M ITress
Cam bridge Massac h usetts
Reemen t o f Existen tNet w orks
o o G F Co op er The computational complexit yof
probabilistic inference using Ba y esian b elief net
w orks A rtiial Intel ligenc e
Besides the adv tages outlined ab o v e o ur new lo cal om c
putation of description length also allo ws for the p ossibil
PP D Geiger A P az and J P earl Learning causal
it y of reing n a existing net w ork b y mo d ifying some lo cal
trees from dep endence i nformation In Pr o c e e d
part of it Reemen t is based on the follo wing theorem
ings of the A AAI N ational onfer C enc e p ages

Theorem L et X f X g b eteh no i n
en M Henrion Propagating uncertain t y in

an existent Bayesian network X b e a subset of X and
Ba y esian net w orks b y p robabilistic logic sam
DL b e the total no de description engths l of al l the no des
pling n I L N anal K and J F Lemmer e d

in X DL DL Supp ose we d a new
X i

i itors Unc ertainty in A rtiial Intel ligen c eV ol

set of p ar ents or f every no de in X es not cr e ate a ny
II p ages Northolland Amsterdam
cycles or make the network disc onne cte d L et the n ew total

no de description lengths of al l the no des in X b e DL
new X
Ba W Lam and F Bacc h us L earning and reing
Then we c an c onstruct a new network in which the p ar ents
ba y esian net w orks using partial domain infor

of the no des in X ar er eplac e d y b their new p ar ent sets
mation In preparation
such that the new network wil l have lower total escription d
length if DL L Bb W Lam and F Bacc h Learning Ba y
new X X
b lief e net w orks an approac h based on he t MDL
This theorem pro vides a m eans to impro v eaBa y esian net principle Computational Intel ligenc e T o
w ork without ev aluating the total description length of the app ear
whole Ba y esian n et w ork a p oten tially exp ensiv e ask t if the
H D A Lelew er and D S Hirsc h b rg e Data com
net w ork is large W e can isolate a subset f o no es d and try to
pression A CM Computing Surveys
impro v e that collection lo cally ignoring the rest of the et n

w ork Algorithms for p erforming suc h a reemen t based
ea J P earl F usion propagation and structur
on this theorem ha v ebnee vde elop ed and exp erimen ts are
ing i n b elief n et w orks A rtiial Intel igenc l e
b eing p erformed W e h e to r on t w ork in the

near future Ba
ea J P earl Eviden tial reasoning u sing sto c hastic
ulation of ausal c m o d els A rtiial Intel i l
genc e
br B Abramson AR CO An application of b e
ea J P earl Pr ob abilistic R e asoning in Intel igent l
lief net w orks to the oil mark et In Pr o c e e dings
Systems Networks of P lausible Infer enc e or
of the Confer enc eon Unc ertainty in A rtiial
gan Kaufmann an S Mateo California
Intel l igenc e pages
V J P earl and T S V erma A theory o f nferred i
BS C Berzuini R ellazzi B and D piegelhalter S
causation I n o c e e dings f o he t d Interna
y esian net w orks applied to therap y m onitor
tional C onfer e o n Principles of Know le dge
ing In Pr o c e e dings of the C onfer enc eon Un
R epr esentation and R e asoning pages
c ertainty in A rtiial Intel ligen c e ages p

is J Rissanen Mo deling b y s hortest data descrip
SCC I A Beinlic h H J Suermondt R M Cha v ez
tion A utomatic a
nda G F Cooper The LARM A monitoring
is J Rissanen chastic Complexity in Statistic
system A case study ith w t w o p robabilistic i n
Inquiry orld Scien ti
ference tec hniques for b lief e net w orks I n Pr o
c e e dings of the d Eur op e an Confer enc eon A r
P G Rebane and J P earl T he reco v ery of ausal c
tiial Intel ligen c ein eM dicine pages
p olyrees f rom statistical ata d In Pr o c e e dings

of the C onfer enc eon Unc ertainty in A rtiial
Intel l igenc e p ages
C R Cha v ez and G F Co op er A randomized
appro ximation algorithm for probabilistic infer
P R D Shac h ter nd a M A P eot Sim ap
ence on Ba y esian Belief net w orks Networks
proac hes to general probabilistic inference n o

b lief e net w orks In M Henrion R Shacter
L N Kanal and J F Lemmer editors Un
H G F Co op er and E Hersk o vits A Ba y esian
metho d for constructing Ba y esian b e lief net c ertainty in A rtiial Intel ligenc e pages
Northolland Amsterdam
w orks from databases In Pr o c e e dings o f the
Confer enc eon Unc ertainty in A rtiial Intel
S C Spirtes P lymour G and R Sc heines Causal
ligenc e pages
y from p robabilit y In Evolving now K le dge
L C K w a nd C N Liu Appro ximating dis in Natur al Scienc eand A rtiial Intel igenc l e
crete probabilit y istributions d with dep e ndence pages

al Sto


his ort ep op
esian us
do that
oryP T V erma and J P earl Causal net w orks Se
tics and expressiv eness In R Shacter T S
Levitt L N Kanal and J F Lemmer ditors e
Unc ertainty in A rtiial ntel I ligenc e gesa
Northolland A msterdam