U s i n g C a u s a l I n f o r m a t i o n a n d L o c a l M e a s u r e s t o

reverandrunAI and Robotics

Nov 7, 2013 (4 years and 6 days ago)

94 views

App ears in Pro ceedings of
Unc ertainty in A rtiial Intel ligenc e
P ages
Using Causal Information and Lo al c Measures to
Learn Ba y esian Net w orks
y
W ai Lam F ahiem B acc h us
Departmen t f o omputer C Science Departmen t of Computer cience S
Univ ersit yof W aterlo o Univ ersit yof W aterlo o
W aterlo o On tario W aterlo o On tario
Canada N Canada N G
Abstract neering b ottlenec k Clearly t w ould b e extremely use
ful i f the construction pro cess could b e fully or partly
automated A u seful a pproac h that has recen tly b e
In previous w ork w e dev elop ed a metho d f o
ing p ursued b ya n um b er f o authors is to attempt to
learning Ba y esian Net w ork mo dels from a r w
build or learn a et n w ork mo del f rom ra w data In
data This metho d relies on the w ell k no wn
practice ra w ata d s i often a v ailable from databases of
minimal description length DL principle
records
The MDL principle is particularly w ell suited
to this task as it allo ws us to tradeo n i a
W eha v edev elop ed a n ew approac h to l earning
principled w a y t he accuracy o f t he learned
y esian net w ork mo els d LBb Our approac h
w ork against its practical usefulness In
is based on Rissanen inimal M Description Length
this pap er w e presen t ome s n ew results that
DL is principle The M DL principle ors a
ha v e risen a from our w ork In particular w e
means for trading o mo del omplexit c y and accuracy
presen t a new lo c al w a y f o computing the
and our exp erimen v e demonstrated its suitabil
description length This allo ws us to mak e
y or f this task In this ap p er w e p resen t s s ig
signian t mpro i v emen ts in our searc haglo
nian t impro v emen ts to our riginal o system Bb
rithm In a ddition w e m o dify our algorithm
whic h mak e it ore m ien e t allo wittotak e
so that it can tak ein to accoun t artial p do
to consideration domain information ab out causa
main information that migh tbepro vided b y
tion and ordering and allo w lo cal reemen tof an
a domain exp rt e The lo cal computation of
existing net w ork
description length a lso op ns e the do or for lo
These impro v emen ts are mainly based on a n ew anal
cal reemen t of an existen t net w ork The
ysis of the description length p arameter that sho ws
feasibilit y o f our approac h s i demonstrated
ww ecna ev aluate the d escription length of a pro
b y exp erimen ts in v olving net w orks of a rac p
p sed o net w ork via lo c al computations in v
tical size
a n o e d nd a its aren p ts This lo calized ev of
description length allo ws us to dev elop an impro v
searc hing mec hanism that p e rforms w ell ev en in fairly
large d omains In addition it llo a ws us to mo dify our
searc h pro cedure so t hat it can tak ein
In tro duction
domain kno wledge of dir e ct c auses as w ell as p artial or
derings among t he v ariables Suc h p artial information
Ba y esian net w orks adv anced b yP earl P ea ha v e
ab out the structure of the d omain is q uite common
b ecome an imp ortan t paradigm for represen ting and
and in man y ases c it can reduce the complexit yof het
reasoning under uncertain t y Systems based n o
searc hing pro ess c d uring l earning
Ba y esian net w orks ha v e b een constructed in a n um
The lo calized ev aluation of description ength l also al
b er of diren t pplication a areas ranging from medi
ws us to mo dify an existing Ba y esian net w ork b y
cal diagnosis BS to oil price reasoning br
reing a lo cal part of it y B eing r the n et w ork w e
Despite these successes a ma jor o bstacle to u sing
obtain a more accurate mo del o r adapt an existing
Ba y esian net w orks lies in the diult y of constructing
mo del to n a en vironmen t t hat has c hanged o v er
them in complex domains there i s a kno wledge engi
In the sequel w e w ill st describ e b rie he t k ey fea
W ai Lam w ork w as supp orted b y an OGS sc holarship
tures of our previous w ork concen trating in particular
His eail address is wlammath w erl oo ca
y
a an tages f o the MDL approac h Then w e de
F ahiem Bacc h us w ork w as supp orted b y NSER Cadn
e a new l o calized v ersion of the description length
b y IRIS His eail address is
computation Using his t w edev elop an algorithm that
fbacchus og os uw ate rlo o ca
riv
dv the on
at
time
lo
consideration to
ed
aluation
only olving
ho
in

ome it
ha ts
net
Ba

searc hes for a go o d net w ork mo del taking n i to consid abilit y p arameters Hence ev en if a ighly h connected
eration causal and ordering information ab out the d o net w ork is more accurate in practice it migh t not b e
main Finally e discuss the esults r of v arious exp e r as useful a mo el d as a impler s alb eit sligh tly l ess c a
imen ts w eha v e un r that demonstrate the ectiv enes s curate mo el d n I ddition a o t t he computational dis
of our approac h The exp erimen tal results of our w ork adv an tages t he causal relationships b et w een the v ari
on lo cal reemen t f o an existing net w ork are not y et ables a re conceptually more diult to understand in
complete but w e will close ith w a rief b discussion of a complex et n w ork
the metho d The exp rimen e t results will b e rep orted
Hence w e re a faced with a tradeo More complex
in our full rep o rt Ba
net w orks allo w for more accurate mo dels but a t he t
same time suc hmosmdel a y b e o f less practical use
than simpler o m dels The MDL principle allo ws us
Learning Ba y esian Net w orks
to balance this tradeo our metho d will learn a l ess
complex et n w ork f i hat t net w ork is suien tly accu
Muc h early w ork on learning a B y esian net w orks shares
rate and a t the same time i t s i still capable of learning
the common isadv d tage of relying on ssumptions a
a complex et n w ork if o n s impler one is suien tly c a
ab out the underlying distribution b eing learned F or
curate This seems to b e a particularly appropriate
example Cho w and Liu L ve elop ed metho ds
approac hto tka einlgih t o f he t fact that w eonlyha v e
that construct ree t structured net w orks hence their
a ample s f o ata d p oin ts from the u nderlying distribu
metho d pro vides no uaran g tees ab ut o the a ccuracy
tion hat T is it seems inappropriate to try to learn
of the learned structure if he t underlying distribution
the most accurate mo del f o t he underlying distribu
cannot b e expressed b y a tree structure a p
tion giv en that the ra w ata d only pro s u w an
proac h of Rebane and P earl P as w ell as t hat
appro ximate picture of t i
of Geiger et al PP surs rom f the same criti
cism except that they are able to construct singly con
Among other w orks on learning Ba y esian et n w orks he t
nected net w orks Sprites et alSS sw ell as V erma
most closely related is that of Co op er and Hersk o
and P earl VP PV dev elop approac hes that a re
H They use a B a y esian approac h t hat lik e ours
able to construct m ultiply onnected c net w orks but
is capable o f learning m ultiply connected net w orks
they require he t underlying distribution to b e dag
Ho w er as with all Ba y esian pproac a hes they m ust

isomorphic
c ho ose some p rior distribution o v er the space of p ossi
ble et n w orks One w a y o f viewing the DL M principle is
The problem with making an assumption ab out the
a m hanism for c ho osing a r easonable prior hat t s i
underlying distribution is that generally w e o d not
biased to w ards simpler mo dels C o op er nd a Hersk o
ha v e suien t information to test our assumption
H n v estigate a n um b er f o iren d t riors p but t i
The underlying distribution is nkno u wn all w eha v eis
is unclear o h wan y p articular c hoice will inence the
a collection of records in the form of v ariable nstan i tia
end result The DL M principle on the other h and
tions Hence in practice these metho d s or n o guar
allo ws the system designer who can c ho ose diren t
an tees ab out the accuracy of the earned l mo del xcept e
w a ys of enco ding the n et w ork to c ho ose a prior based
in the rare circumstances where w e kno w something
on principles of computational eiency or exam
ab out the underlying distribution
ple if w e prefer to learn net w orks in whic hnonoed
Our approac h c an construct an a ccurate mo del from
has more than paren ts w ecna c ho ose an enco ding
an unrestricte d ange r of underlying distributions and
sc heme that imp oses a high p e nalt y on net w orks that
it is capable of constructing net w orks o f rbitrary a
violate this constrain t
top ology i it can construct m ultiply connected net
w orks The abilit y to construct a m ultiply connected
Applying the MDL Principle
net w orks is sometimes essen tial if the net w ork i s to b e
a suien tly accurate mo del of the underlying distri
The MDL p rinciple is based on the idea that the b est
bution
mo del represen ting a c ollection of data items is the
Although m ultiply connected net w orks a llo wusto
mo del that inimizes m the sum f o
more accurately mo del the nderlying u distribution
they ha v e omputational c as w ell as conceptual dis
the length f o the enco ding f o he t mo del and
adv an tages Exact b elief up dating pro cedures are in
the w orst case computationally in tractable o v er m ul
the l ength of the nco e ding of the d ata
tiply connected et n w orks o o Moreo v er ev en if
mo del
an appro ximation algorithm is used e the sto c has
tic sim ulation metho ds of C ea SP highly
b th o of whic h an c b e measured in bits A detailed
connected net w orks still require the storage and esti
description of he t MDL principle with n umerous ex
mation of an exp onen tial n b r e of conditional rob p
amples of its application can b e found in is

A distribution is dagsomorphic i f there s i some ag d

that displa ys all of its dep endencies and indep endencies The n b r e of parameters required is exp onen tial in
ea the maxim n um broe fpnare ts of no de
um
um
um

the given


vits
ec as
ev
vits

ith vides
The

an
T o apply the M DL principle to the task of learning more eien tnet w orks That is since the enco ding
Ba y esian net w orks w e need t o sp cify e ho ww ecan per length of the mo del is included in our ev aluation f o
form the t w o enco dings the net w ork i tself tem and description length w e are enforcing a preference for
the ra wdaagt iv en a net w ork tem net w orks that require the storage of few er probabilit y
parameters and o n hic w h e xact algorithms are more
eien t
Enco ding the N et w ork Our enco ding sc heme for
the net w orks has the prop ert y that the higher the
top ological complexit y o f the net w ork the longer will
Enco ding the Data Using the Mo del is
b e its enco ding T o represen t the structure of a
to learn the join t distribution of a collection of random
Ba y esian net w ork w enedfe ro eca h o n de a list of its

v ariables X f X g hv ariable X has
n i
s
paren ts and a list of its conditional probabilit y param i
an asso ciated collection o f v alues f x g that it
i i
eters
t e n o where the n um ber of v alues s e on
i
v i ery d istinct c hoice o f v alues for ll a the v ariables
Supp ose there are n no des in the problem d omain F or

X dees an atomic ev en t i n t he underlying join t
a no de with k paren ts w e need k log n b its t o list

distribution and i s assigned a p articular probabilit yb y
its paren ts T o represen t t he conditional p robabilities
that distribution
the enco ding length will b e the pro uct d of t he n ber
of bits required to s tore the n umerical v alue of eac h
W e ssume a that the d ata p oin ts in the ra wdaaat er
conditional probabilit y and the total n berofcon
all atomic ev en ts T hat is eac hdtapa noi t s p ecis a
ditional probabilities that are required In a Ba y esian

v alue for ev ery andom r v ariable in X urthermore
net w ork a conditional probabilit y is needed for ev
w e ssume a that the data p oin ts are he t result of in
distinct instan tiation of the paren t no des and no e d i t
dep nden e t random trials Hence w ew ould exp ct e
self xcept t hat one of these conditional robabilities p
t tral limit theorem t hat eac h articular p in
can b e computed from the others due to the fact that
stan tiation of the v ariables w ould v e tually app ear in
they all sum to F or example i f a no de that can
the atabase d ith w a relativ e frequency a ppro ximately
tak e on distinct v alues has paren ts eac hof wihc h
equal to its robabilit p y T hese are s tandard a ssump

can tak e on distinct v alues w e iwllndee
tions
conditional probabilities
Giv en a ollection c of N p o w ew t o t enco de
Hence the total description length for a articular p net
or store the data as a b inary tring s There are v arious
w ork will b e
w a ys in whic h his t enco ing d can b e done but here w e
n
X Y
are only in terested in using he t length of t he enco d
k log n d s s
i i j

ing s a a metric via item in the MDL principle for
i j F
i
comparing the merit o f c andidate Ba y esian Net w orks
where there are n no des for no de i k is the n ber Hence w e can limit our atten tion to acter c o des
i
of its paren tnodse s is the n um ber of v alues it can LR p p W ith c haracter co des eac h atomic
i
tak e on and F is the set of i ts paren ts and d repre ev en t i s a ssigned a unique binary string Eac hofthe
i
sen ts the n um b er f o its b required to store a n umerical data p o in ts whic h are all a tomic ev en scsont v erted
v alue F or a p articular problem domain n and d will to its c haracter co de and t he N p oin ts are r epresen ted
b e constan ts This is not the only enco ding sc heme b y the string formed b y c oncatenating these c haracter
p ossible but it is simple nd a it p erforms w ell i n uro co des together T o minimize the total ength l of the
exp erimen enco ding w e assign shorter co des t o ev en ts that o c
cur more frequen tly T his is the basis for Huan
By lo oking at this equation w esee ththa ilghycon
enco ding sc heme I t s i w ell k no wn that Huan al
nected net w orks require longer enco dings First for
gorithm ields y the shortest enco ding of the N data
man y o n des the list of aren p ts will b e come l arger and
poin ts H
second t he list of conditional p robabilities w eneedto
store for that no de will also increase In addition net y that i n t he underlying distribution eac h atomic
w orks in whic h no des that ha v e a l arger n ber of ev en t e has probabilit y p and w e construct via some
i i
v alues ha v e paren ts with a arge l n berofv alues will learning sc heme a particular Ba y esian et n w ork from
require longer enco dings Hence the MDL rinciple p ra w d ata This Ba y esian net w a s a f o
whic h is trying to minimize the sum of the enco ding the underlying d istribution a nd it also assigns a prob
lengths will tend to fa v or net w orks in whic h t abilit a y y q o ev ery atomic ev t e Of course
i i
ha v e a smaller n um b e r of paren ts net w orks that in general q will not b e equal to p as t he learning
i i
are l ess connected and a lso et n w orks in whic hnodse sc heme cannot guaran tee that it will construct a p er
taking on a large n um ber of v alues are not aren p ts of fectly a ccurate net w ork Nev ertheless the aim is for q
i
no des that also tak e on a large n um ber of v alues to b e close to p and the c loser t i is t he more accurate
i
is our o m d el
In Ba y esian net w orks t he degree of connectivit yis
closely related to the computational complexit yof The constructed B a y esian net w ork is in tended as our
using the net w ork b oth space and time complexit y b st e uess represen tation of the underlying distribu
Hence our enco ding sc heme generates a preference for tion Hence giv en that the probabilities q determined
i

en des no he
del mo acts ork the
um
um
Sa


ts


char um

an ts in data
en
cen he via
ery

um
um
in

nds dep ak can

Eac
task The
b y the net w ork are our b est uess g of the true v alues p where F is the p o ssibly empt y set of paren ts of X
i X i
i
it mak es sense to design our Huan c o e d sing u these in the n et w ork Note that P app ears on the r igh thand
probabilities Using the q probabilities t he Huan side instead of Q e o btain t he conditional proba
i
algorithm will assign ev en t e a co dew ord of length ap bilit y p arameters o n t he righ t from frequency coun
i
pro ximately log q w e had the true probabilities tak en o v er the d ata p oin By t w of argel n um
i
p the algorithm w ould ha v e a ssigned e and optimal b rs e w ew ould exp ect that t hese frequency coun ts will
i i

co dew ord o f length log p i nstead Despite our use b e close to the true robabilities p o v er P
i
of the v alues q in assigning co dew ords the ra wdata
i
W e an c no wpro v e the follo wing new result that is he t
will con tin ue to b e determined b y the true p robabil
basis for our new lo calized description length compu
ities p That is w e still exp ect that for arge l N w e
i
tations
will ha v e Np o ccurrences of ev en t e s p is the true
i i i
probabilit yof e o c curring Therefore w hen w euse
i
Theorem The nc e o ding length o f t he data qua
the learned Ba y esian net w ork to enco de the data the
tion c b e xpre esse das
length of the tring s enco ding the database will b e a p
n n
pro ximately
X X X
N W X N P X log P X
i X i i i
X
i i X
i
N p log q
i i

i
where w e a re summing o v er all tomic a ev en ts Ho w
where the second sum i s ak t en o v er all p ossible instan
do es this enco ding length compare to the nco e ding
tiations of X eh rtem W X giv en b y
i i X
i
length if w e had access to the true probabilities p
i
X
P X
i X
i
An old theorem due originally to Gibbs giv es us the
W X P X og
i X i X
i i
P X P F
answ er
i X
i
X
i X
i

Theorem Gibbs L et p and q i e
i i
where the sum is tak en o v er all p ossible nstan i tiations
none gative r e al numb ers that sum to hen T
of X and ts i paren ts F dn w etak e W X
i X i X
i i
t t
F The pro o f f o t his and a ll other theorems
X X
X
i
p log p p q is presen ted n i o ur full rep ort Ba
i i i i

i i
Giv en some collection of ra w data the last term in
with e quality holding if and only if i qn the Equation is indep e nden t of the structure of the et n
i i
summation we take og b e w ork F urthermore the w eigh t measure the st term

in Equation can b e calculated lo cally
In other w ords this theorem s ho ws that the enco ding
using the estimated probabilities q will b e longer than
i
Lo calization of the Description
the enco ding using the true probabilities p It also
i
sa ys that the true probabilities a c hiev e he t minimal Length
enco ding length p o ssible
T omka e use of the MDL principle w e n aluate
The MDL principle sa ys that w em ust c ho ose a et n
the total description length tem item giv en a
w ork that minimizes the sum of its o wn enco ding
y esian net w ork Adding Equation nd a the total
length whic h dep ends on the complexit y f o he t net
description length i s
w ork and the enco ding length of the ata d giv en the
n n
X Y X
mo del whic h ep d ends on the c loseness of the roba p
k log n s s d N W X
i i j i X
i
bilities q determined b y the net w ork to the true rob p
i
i j F i
X
abilities p i on the accuracy of the mo del i
i
n
X X
W e could use Equation irectly d to ev aluate the the
N P X log P X
i i
enco ding length f o the data giv en the mo del Ho w
i X
i
ev er the equation in v olv es a summation o v er all the
atomic ev en ts and the n um b e r o f tomic a ev en ts is ex
n
X Y
p onen tial in the n um ber fo v ariables Instead o f trying
k log n s s d NW X
i i j i X
i
to use Equation directly w ein v estigate the relation
i j F
X
i
ship b et w een e nco d ing l ength and net w ork top ology
n
X X
Let the underlying oin j t distribution o v er the v ari
N P X i log P X i

ables X f X g be P n yBa y esian net w ork
n
i X
i
mo del will also dee a j oin t istribution d Q o v er these
v ariables W e an c express Q ea

It migh t ot n b e the ase c that P is equal to t his decom

Q X P X j F P X j F P X j F p osition The appro ximation in tro duced b your etn w
X X n X
n
mo del is precisely the assumption of suc h a decomp o sition
ork
as




Ba
ev to eed
to

log
if







an

la he ts If
ts
The last term in Equation remains c onstan t for a By direct causation information w e ean m information
ed collection of a r w ata d Therefore the st term of the form X is a d irect cause o f X That is w e
i j
is suien t to ompare c the total description lengths of migh t kno w o f a direct causal link b et w een t w ov ari
alternativ e candidate Ba y esian net w orks ables ev en if w e do not kno w he t causal relationships
bet w een the other v ariables This kind of information
Deitio n The no de description length DL for
migh tbepro vided b y e domain exp erts nd a w e can
i
the no de X with resp ect to its p aren ts F is deed
use i t hen w generating the et n w ork mo del In particu
i X
i
as
lar w e c an require that in t he learned o m d el X be oen
i
Y
of X paren ts th us ensuring hat t he t mo del v alidates
j
DL k log n s s d NW X
i i i j i X
i
the direct causation More generally the domain ex
j F
X
i
p rtse migh t b e able t o construct a k s eleton of the et n

w ork in v olving some but not all of the v ariables The
arcs in the sk eleton can b e sp ecid as direct causation
Deitio n The r elative total description length
sp eciations to our system whic h will t hen ro p ceed
foraBa y esian net w ork deed as the summation of
to l n i he t sk eleton placing he t remaining v ariables
the no de description length of ev ery no de in the et n
in appropriate p ositions
w ork is giv en b y
P artial ordering information on the ther o hand p s e c
n
X
is ordering relationships b et w een t w o no des Suc h
DL
i
information migh t for example come rom f k no wledge
i
ab out the temp oral ev olution of ev en ts in our domain
As a r esult the relativ e otal t description ength l is x e
F or instance if w ekno w t hat X o c b e X he
i j
actly equiv alen t to the st term in Equation and
net w ork mo del should not con tain a ath p from X to
j
th us is suien t for comparing candidate net w orks
X as no causal inence should exist in that direc
i
Moreo v er it can b e calculated lo cally since eac h DL
i
tion ote N that a t otal ordering among the v ariables
dep ends only on the set of paren t o n des for a giv en
as required b y Co o p er nd a Hersk o vits CH is just
no de X
i a p s ecial case of our p artial ordering sp eciations
Sub ject to the condition hat t the d irect causation a nd
Deitio n Giv en a collection o f a r w d
partial o rdering sp eciations ot n en tail an y t ransitiv
optimal Bayesian network a Ba y esian net w ork for
y iolations v e w e cannot ha v e a circular sequence
whic h the total description length is minim
of direct causations as input to the system our sys
tem can nsure e that t he constructed net w ork v alidates
Clearly one or more optimal Ba y esian net w orks m ust
these sp eciations F urthermore information of this
exist for an y collection of ra wdataF urthermore w e
sort can i n fact lead to increased iency e it will con
ha v e t he follo wing result
strain our searc h for an appropriate n et w ork o m del
Theorem Given a c ol le ction o f r aw data t he r
T o i ncorp orate this information w e dee a c on
ative total description length f o an optimal Bayesian
str aine d ayesian B network as follo ws
network is minimum A lso for a given no X in an
i
optimal Bayesian network DL is minimum among
i
Deitio n A c onstr aine d Bayesian n etwork is an
those p ar ent sets cr e ating no cycle and not making the
ordinary Ba y esian et n w ork whose t op ology includes all
network disc onne cte d That is we c annot r e duc e DL
i
the arcs sp ecid b y the direct causation sp eciations
by mo difying the network to change X p ar ents
i
and do es not v iolate n a y partial ordering sp ecia
tions
This theorem a s ys that in an optimal et n w ork no sin
gle no de can b e lo cally impro v ed I t s i p ossible ho w
It can b e sho wn that Theorem till s holds with
ev er that a nonptimal net w ork could a lso p o ssess
b o vious mo diations if w e consider constrained
this prop ert y In suc h a case the paren tsset fo a n um
y esian net w orks instead o f ordinary n et w orks
ber of nodse w ould ha v e to b e altered sim ultaneously
in order to reduce its description length
Searc hing f or the Best Constrained
Incorp orating P artial Domain
w
Kno wledge
Although our expression for the relativ e otalt descrip
Although w emhig t not kno w the underlying join tdis tion length allo ws us to ev aluate the elativ r e merit f o
tribution go v erning the b eha vior of the domain v ari candidate net w ork mo dels w e cannot onsider c all p os
ables w e could p o ssibly ha v e other partial informa n w orks there are imply s o t o man yof thme
tion ab out the domain In particular our ew n system n exp onen tial n um b e r in fact Hence t o a pply the
can consider t w ot yp es of domain kno wledge dir e ct MDL rinciple p w em ust e ngage in a heuristic searc h
c ausation sp e ci ations and p artial or dering s p e ci a that tries to d a go o d lo w escription d length
tions but n ot necessarily optimal net w ork o m el d
et sible

ork Net
Ba
the

de
el

um
it
is
an ata
fore curs



In this section w e d escrib e our searc h algorithm w hic h No ww e escrib d e the Ar cbsorption pro cedure
attempts to d a go o d net w b y building one p u whic h nds a lo cally optimal w a y o t nsert i a new arc
arc b y arc The st step is t o rank the p o ssible arcs to an existing net w ork T o inimize m the description
so that etter arcs can b e added in to the andidate c length of the resulting net w ork the pro cedure migh t
net w orks b efore others T he arcs are rank b ycla also decide to rev erse the direction of some of the o ld
culating the no de description length for X giv en the arcs
j
arc X X i j sing u Equation and treating X
i j i
as the single aren p t This no de description length is
Input net w ork G
old
assigned as the description length of arc X X
i j
n arc X X t o b e a dded
i j
A l ist of arcs P airs is created orted s so that the rst
Output A n ew net w ork G with the arc added
arc on P airs lo w est description length P airs will
and some other arcs p ssibly o rev ersed
con tain all arcs e xcept or f those violating the direct
causation or partial ordering sp eciations Lo oking
at Equation w e can see hat t if X and X are highly Create a n ew net w ork b y dding a the arc X
i j i
correlated s measured b y W X Equation X o G I n the new n et w ork w e hen t searc h
j i j old
the description length will b e lo w er and an arc b e lo cally o t d etermine if w e an c decrease the elativ r e
t w een them will b e one of the st that w e will try to total description l ength b yrev ersing the direction
add to the candidate net w orks of some of the rcs a This is accomplished via he t
follo wing steps
Searc h is p erformed b y a b estrst algorithm t hat
main tains Open and Closed lists eac hcon taining
Determine t he optimal directionalit y o f the arcs
searc h elemen ts The individual searc h elemen ts ha v e
attac hed directly to X b y examining whic h di
j
t w o omp c nen o ts h G L i a candidate net w ork G dn
rections minimize the relativ e t otal description
an arc L whic h could b e added o t the candidate net
length Some of these rcs a a m y b e rev ersed b y
w ork without causing a c ycle or violating the partial

this pro ess c F urthermore w e o d n ot consider
ordering and direct causation sp eciations Open
the rev ersal f o n a y rcs a t hat w
ordered b y heuristic v alue whic h is calculated s a the
violation f o the direct causation or partial order
relativ e total description length quation of the
ing s p eciations
elemen t net w ork plus the description length of the
elemen t arc alculated during the construction of
If the direction of an existing arc s i rev ersed t hen
P airs Therefore the lo w er the h euristic v alue the
p e t o v e d irectionalit y determination
shorter the enco ding length Initially e c onstruct a
step on the other no de acted
net w ork G con taining only those arcs included in
init
the direct causation sp eciations Then the nitial i
Open list is generated b y generating all f o the searc h The searc h p ro cedure is mainly omp c osed of t he Ar c
elemen ts h G i for all arcs L P aris Bestrst Absorption pro c edure a cycle c king routine and
init
searc h is then executed with the searc h lemen e tat the a p artial order c hec king routine The complexit yof
fron tof Open expanded as follo ws cycle c king and partial order c hec king are O n dn

O n r esp e ctiv ely where n is the n um b er f o n o d es
W eha v e f ound that he t searc h c eat a v ery
Remo v e the st elemen tform Open and cop yit
on to Closed Let the elemen t n et w ork b e G reasonable net w ork mo del if pro vided with a resource

and the elemen t rc a b e L bound of O n s h elemen ts expansions Under
this resource b ound w eha v e found t hat in practice the
In v ok eteh Ar cbsorption pro cedure on G
o v erall complexit y o f the searc h mec hanism remains
and L to obtain a ew n net w ork G with the
new
p lynomial o in the n ber fo nodes n
arc L added The Ar cbsorption pro edure c
describ ed b elo w migh t lso a rev erse the direction
W e c an further bserv o ethtwa henteh unamo tof od
of some other arcs in G
old main nformation i increases the searc htiem dnot a
go o d net w ork o m del decreases This arises from the
Next w e mak e a n ew searc h lemen e t onsisting c o f
fact that suc h information places constrain ts on the
G and the st arc from P airs that app ears
space f o allo w able mo dels making searc h asier e F or
after the old arc L whic h ould c b e added to G
new
example i f a total o rdering among t he no des in the
without generating a cycle or violating a p artial
domain is giv en the s earc h time will b e reduced b ya
ordering s p eciation This new elemen tis ledpac

factor of O n there is no need to p erform the cycle
on Open in the correct order according to the
or partial order c hec king nd a the arc rev ersal step in
heuristic function
Ar cbsorption is no longer needed
Finally emak e another new searc h elemen tcon
sisting of G and the rst rc a from P airs that
old

app ears after L whic h c ould b e added to G
Note that it is suien t t o compute the n o de d escrip
without generating a cycle r o violating a par
tion length quation of those n o es d whose paren ts ha v e
tial o rdering sp ciation e Again this elemen t been c hanged The r elativ e otal t description length qua
is placed on Open in the correct order tion of the hole w n et w ork n eed not b e computed
old

new


um
old
earc
old
arriv an
hec
hec

ab he rform
the in result ould
is


has
new


ed
in
orkoriginal net w ork
B
A C D
structure
Q
B

Q

Q B
Pb Pb

Q Qs BN G

Pab Pab c
E
Pab Pab d
conditiona l as t
Pab Pab c
same as the
first except
Pab Pab d
first except first e xcept
probabilit y for
Pab Pab c
for
Pab
Pab Pab c
Pab Pb
parameters
Pab Pab c

Pab Pab c
B B
B B

Z
D C
Z


C A

learned A

A D


C A
A B

D
structures S


D
S
A B

S




S
A B



S



S
Sw A B BN

AU Sw
C
E E
E E
Figure The Qualit y of Learned Net w orks
Exp erimen ts ables no des and arcs T his net w ork w as de
ed from a real orld application in medical diagno
sis SCC and is kno wn as the LARM A net w
F ollo wing H e test our approac hb y constructing
ee Bb for a iagram d of this net w ork After ap
an original net w ork nd a using Henrion logical sam
plying our h euristic searc h algorithm w e found that
pling tec hnique en to generate a collection of ra w
the earned l net w ork is almost iden o
data W e t hen apply our learning ec m hanism to the
structure with the exception of one diren t arc and
ra w data to obtain a learned net w ork By comparing
one missing arc One c haracteristic of our heuristic
this net w ork with the original w e can determine the
searc h lgorithm a s i t hat w e did not require a u ser sup
p erformance of ur o system
plied o rdering of v ariables f Co op er and Hersk o
H This exp erimen t emonstrates d the feasibilit y
In the st set of exp erimen ts the original Ba y esian
of our pproac a h for reco v ering net w orks of practical
net w ork G consisted of n o des and arcs W ev aried
size
the conditional probabilit y arameters p during he t pro
cess of generating the ra w ata d obtaining four iren d t
Besides b eing able t o u se extra domain information
sets of data Exhaustiv eserca hing instead f o h euris
our new searc h mec hanism s i faster nd a more accurate
tic searc hing w as then c arried out to nd the et n
than the m ec hanism st rep orted in Bb hic h
w ork with minim um total escription d length for eac h
w as dev elop ed without the o l al c measure of descrip
of these sets of ra w data r esulting in diren t l earned
tion length T oin v estigate ho w our searc hmce hanism
structures in eac h ase c The e xp erimen t emonstrates d
beha v es when domain information is supplied w ecno
that our algorithm do es in fact yield a tradeo b et w
ducted some further exp erimen ts Using the same set
accuracy and c omplexit y o f the learned tructures s n i
of ra w atad erivd ed from the ALARM mo del in c on
all cases where the original et n w ork w as not r eco v ered
junction with v arying amoun ts of domain information
a s impler n et w ork w as learned T he t yp e of structure
w e a pplied our earning l lgorithm a and recorded he t
learned dep ends on the arameters p as eac h s et of pa
searc h time r equired to obtain a n a ccurate n et w
rameters in conjunction with the structure dees a
mo del The follo t w o tables epict d the relativ e
diren t probabilit y distribution Some of these distri
time required b y the searc h a lgorithm when v arying
butions can b e accurately mo deled with simpler struc
amoun ts of direct causation a nd partial orderings sp ec
tures In the st case the istribution d eed d b ythe
iations are made a v ailable In general the searc h
parameters did not a h v e a simpler o m del of suien t
time decreases as the amoun t of causal information
accuracy ut b in the other cases i t did W eha v ealso
increases
dev elop ed measures of the absolute accuracy o f the
learned structures ee Bb for a full escription d
that indicate in all cases that the learned structure w as
v ery accurate ev en though it m igh t p ossess a iren d t
top ology
no partial partial partial total
The second exp e rimen t consisted of learning a
ordering orderings orderings ordering
Ba y esian net w ork ith w a fairly large n um ber of v ari time

wing
ork
een


vits
riginal the to tical

ork
riv





for


the as same

he same

no direct direct direct
trees IEEE T r ansactions on Information The
causal causal causal

sp eciation sp eciations sp eciations
LR T H Cormen C E L eiserson and R L
time
est Intr o duction to A lgorithms M ITress
Cam bridge Massac h usetts
Reemen t o f Existen tNet w orks
o o G F Co op er The computational complexit yof
probabilistic inference using Ba y esian b elief net
w orks A rtiial Intel ligenc e
Besides the adv tages outlined ab o v e o ur new lo cal om c
putation of description length also allo ws for the p ossibil
PP D Geiger A P az and J P earl Learning causal
it y of reing n a existing net w ork b y mo d ifying some lo cal
trees from dep endence i nformation In Pr o c e e d
part of it Reemen t is based on the follo wing theorem
ings of the A AAI N ational onfer C enc e p ages


Theorem L et X f X g b eteh no i n
n
en M Henrion Propagating uncertain t y in


an existent Bayesian network X b e a subset of X and
Ba y esian net w orks b y p robabilistic logic sam
DL b e the total no de description engths l of al l the no des
X
P
pling n I L N anal K and J F Lemmer e d

in X DL DL Supp ose we d a new
X i

X X
i itors Unc ertainty in A rtiial Intel ligen c eV ol

set of p ar ents or f every no de in X es not cr e ate a ny
II p ages Northolland Amsterdam
cycles or make the network disc onne cte d L et the n ew total



no de description lengths of al l the no des in X b e DL
new X
Ba W Lam and F Bacc h us L earning and reing
Then we c an c onstruct a new network in which the p ar ents
ba y esian net w orks using partial domain infor

of the no des in X ar er eplac e d y b their new p ar ent sets
mation In preparation
such that the new network wil l have lower total escription d
length if DL L Bb W Lam and F Bacc h Learning Ba y
new X X
b lief e net w orks an approac h based on he t MDL
This theorem pro vides a m eans to impro v eaBa y esian net principle Computational Intel ligenc e T o
w ork without ev aluating the total description length of the app ear
whole Ba y esian n et w ork a p oten tially exp ensiv e ask t if the
H D A Lelew er and D S Hirsc h b rg e Data com
net w ork is large W e can isolate a subset f o no es d and try to
pression A CM Computing Surveys
impro v e that collection lo cally ignoring the rest of the et n

w ork Algorithms for p erforming suc h a reemen t based
ea J P earl F usion propagation and structur
on this theorem ha v ebnee vde elop ed and exp erimen ts are
ing i n b elief n et w orks A rtiial Intel igenc l e
b eing p erformed W e h e to r on t w ork in the

near future Ba
ea J P earl Eviden tial reasoning u sing sto c hastic
ulation of ausal c m o d els A rtiial Intel i l
References
genc e
br B Abramson AR CO An application of b e
ea J P earl Pr ob abilistic R e asoning in Intel igent l
lief net w orks to the oil mark et In Pr o c e e dings
Systems Networks of P lausible Infer enc e or
of the Confer enc eon Unc ertainty in A rtiial
gan Kaufmann an S Mateo California
Intel l igenc e pages
V J P earl and T S V erma A theory o f nferred i
BS C Berzuini R ellazzi B and D piegelhalter S
causation I n o c e e dings f o he t d Interna
y esian net w orks applied to therap y m onitor
tional C onfer e o n Principles of Know le dge
ing In Pr o c e e dings of the C onfer enc eon Un
R epr esentation and R e asoning pages
c ertainty in A rtiial Intel ligen c e ages p


is J Rissanen Mo deling b y s hortest data descrip
SCC I A Beinlic h H J Suermondt R M Cha v ez
tion A utomatic a
nda G F Cooper The LARM A monitoring
is J Rissanen chastic Complexity in Statistic
system A case study ith w t w o p robabilistic i n
Inquiry orld Scien ti
ference tec hniques for b lief e net w orks I n Pr o
c e e dings of the d Eur op e an Confer enc eon A r
P G Rebane and J P earl T he reco v ery of ausal c
tiial Intel ligen c ein eM dicine pages
p olyrees f rom statistical ata d In Pr o c e e dings

of the C onfer enc eon Unc ertainty in A rtiial
Intel l igenc e p ages
C R Cha v ez and G F Co op er A randomized
appro ximation algorithm for probabilistic infer
P R D Shac h ter nd a M A P eot Sim ap
ence on Ba y esian Belief net w orks Networks
proac hes to general probabilistic inference n o

b lief e net w orks In M Henrion R Shacter
L N Kanal and J F Lemmer editors Un
H G F Co op er and E Hersk o vits A Ba y esian
metho d for constructing Ba y esian b e lief net c ertainty in A rtiial Intel ligenc e pages
Northolland Amsterdam
w orks from databases In Pr o c e e dings o f the
Confer enc eon Unc ertainty in A rtiial Intel
S C Spirtes P lymour G and R Sc heines Causal
ligenc e pages
y from p robabilit y In Evolving now K le dge
L C K w a nd C N Liu Appro ximating dis in Natur al Scienc eand A rtiial Intel igenc l e
crete probabilit y istributions d with dep e ndence pages
Cho
it
ulation

al Sto


enc
Ba
Pr

sim
his ort ep op
esian us
do that
des
an
Riv
oryP T V erma and J P earl Causal net w orks Se
tics and expressiv eness In R Shacter T S
Levitt L N Kanal and J F Lemmer ditors e
Unc ertainty in A rtiial ntel I ligenc e gesa
Northolland A msterdam

man