Relational Bayesian Networks

Manfred Jaeger

Computer Science Department,

Stanford University,Stanford CA94305

jaeger@robotics.stanford.edu

Abstract

A new method is developed to represent prob

abilistic relations on multiple random events.

Where previously knowledge bases containing

probabilistic rules were used for this purpose,

here a probabilitydistributionover the relations is

directly represented by a Bayesian network.By

using a powerful way of specifying conditional

probability distributions in these networks,the

resulting formalism is more expressive than the

previous ones.Particularly,it provides for con

straints on equalities of events,and it allows to

dene complex,nested combination functions.

1 INTRODUCTION

In a standard Bayesian network,nodes are labeled with ran

dom variables (r.v.s) X that take values in some nite set

f x

;:::;x

n

g.Anetwork with r.v.s (earth)quake,burglary,

and alarm,each with possible values f true,falseg,for in

stance,then denes a joint probabilitydistributionfor th ese

r.v.s.

Evidence,E,is a set of instantiationsof some of the r.v.s.A

query asks for the probability of a specic value x of some

r.v.X,given the instantiations in the evidence.The answer

to this query is the conditional probability P X x j E

in the distribution P dened by the network.

The implicit underlying assumption we here make is that

the value assignments in the evidence and the query in

stantiate the attributes of one single random event,or

object,that has been sampled (observed) according to

the distribution of the network.If,for instance,E

f quake = true,alarm= trueg,then both instantiations are

assumed to refer to one single observed state of the world

!,and not the facts that there was an earthquake in 1906,

and the alarmbell is ringing right now.

On leave from:MaxPlanckInstitut f¨ur Informatik,Im

Stadtwald,D66123 Saarbr¨ucken,Germany

In case we indeed have evidence about several ob

served events,e.g.quake(!

) = true,alarm(!

) = true,

burglary(!

) = false,then,for the purpose of answer

ing a query X ! x about one of these events,all

evidence about other events can be ignored,and only

P X ! x j E ! needs to be computed.For each

of these computations the same Bayesian network can be

used.

Things become much different when we also want to model

relations that may hold between two different random

events.Suppose,for instance,we also want to say some

thingabout the probabilitythat one earthquake was stronger

than another.For this we use the binary relation stronger,

and would like to relate the probability of stronger(!

;!

to,say,alarm(!

) and alarm(!

).Evidence may now

contain instantiations of stronger for many different pairs

of states:f stronger(!

;!

);:::;stronger(!

;!

n

)g,and a

query may be alarm(!

.In evaluating this query,we

no longer can ignore information about the other events

!

;:::;!

n

.This means,however,that if we do not want

to impose an a priori restriction on the number of events

we can have evidence for,no single xed Bayesian network

with niterange r.v.s will be sufcient to evaluate querie s

for arbitrary evidence sets.

Nevertheless,the probabilistic information that we would

like to encode about relations between an arbitrary number

of different events may very well be expressible by some

nite set of laws,applicable to an arbitrary number of events.

One way of expressing such laws,which has been explored

in the past ( (Breese 1992),(Poole 1993),(Haddawy 1994)),

is to use probabilistic rules such as

stronger u;v

:

quake u ^ quake v

^ alarm u ^:alarm v :(1)

The intended meaning here is:for all states of the world

!

and!

,given that quake!

^:::^:alarm!

is

true,the probability that!

is stronger than!

is 0.8.A

rulebase containing expressions of this form then can be

used to construct,for each specic evidence and query,

a Bayesian network over binary r.v.s stronger!

;!

,

stronger!

;!

,quake!

,...,in which the answer to the

query subsequently is computed using standard Bayesian

network inference.

In all the above mentioned approaches,quite strong syn

tactic and/or semantic restrictions are imposed in the for

malismthat severely limit its expressiveness.Poole (1993)

does not allowthe general expressiveness of rules like (1),

but only combines deterministic rules with the specicatio n

of certain unconditional probabilities.Haddawy(1994) al

lows only rules in which the antecedent does not contain

free variables that do not appear in the consequent.As

pointed out by Glesner and Koller (1995),this is a severe

limitation.For instance,we can then not express by a rule

like aids x

p

contact x;y that the probability of person

x having aids depends on any other person y,with whomx

had sexual contact.When we do permit an additional free

variable y in this manner,it also has to be dened howthe

probability of the consequent is affected when there exist

multiple instantiations of y that make the antecedent true

(this question also arises when several rules with the same

consequent are permitted in the rule base ).In (Glesner &

Koller 1995) and (Ngo,Haddawy &Helwig1995) therefore

a combination rule is added to the rulebase,which denes

how the conditional probabilities arising fromdifferent in

stantiations,or rules,are to be combined.If the different

causal relationships described by the rules are understood

to be independent,then the combination rule typically will

be noisyor.

The specication of a single combination rule applied to

all sets of instantiations of applicable rules,again,does

not permit us to describe certain important distinctions.If,

for instance,we have a rule that relates aids x to the re

lation contact x;y ,and another rule that relates aids x

to the relation donor x;y ,standing for the fact that x

has received a blood transfusion from donor y,then the

probability computed for aids a ,using a simple combina

tion rule,will depend only on the number of instantiations

for contact a;y and for donor a;y .Particularly,we are

not able to make special provisions for the two rules to

be instantiated by the same element b,even though the

case contact a;b ^ donor a;b clearly has to be distin

guished fromthe case contact a;b ^ donor a;c ,or even

contact a;b ^ donor a;a .

In this paper a representation formalism is developed that

incorporates constraints on the equality of instantiating el

ements,and thereby allows us to dene different probabil

ities in situations only distinguished by equalities between

instantiating elements.

Furthermore,our representation method will allow us to

specify hierarchical,or nested,combination rules.As

an illustrations of what this means,consider the unary

predicate cancer x ,representing that person x will de

velop cancer at some time,and the three placed rela

tion exposed x;y;z ,representing that organ y of per

son x was exposed to radiation at time z (by the taking

of an xray,intake of radioactively contaminated food,

etc).Suppose,now,that for person x we have evidence

E f exposed x;y

i

;z

j

j i ;:::;k j ;:::;l g,

where y

i

y

i

for some i;i

0

,and z

j

z

j

for some

j;j

0

.Assume that for any specic organ y,multiple ex

posures of y to radiation have a cumulative effect on the

risk of developing cancer of y,so that noisyor is not the

adequate rule to model the combined effect of instances

exposed x;y;z

j

on the probability of developing cancer

of y.On the other hand,developing cancer at any of the

various organs y can be viewed as independent causes for

developing cancer at all.Thus,a single rule of the form

cancer x

p

exposed x;y;z together with a at combi

nation rule is not sufcient to model the true probabilistic

relationships.Instead,we need to use one rule to rst com

bine for every xed y the instances given by different z,and

then use another rule (here noisyor) to combine the effect

of the different y's.

To permit constraints on the equality of instantiating ele

ments,and to allowfor hierarchical denitions of combina

tion functions,in this paper we depart from the method of

representing our information in a knowledge base contain

ing different types of rules.Instead,we here use Bayesian

networks with a node for every relation symbol r of some

vocabulary S,which is seen as a r.v.whose values are pos

sible interpretations of r in some specic domain D.The

state space of these relational Bayesian networks therefore

can be identied with the set of all S structures over D,and

its semantics is a probability distributionover S structures,

as wereusedbyHalpern(1990) tointerpret rstorder proba

bilistic logic.Halpern and Koller(1996) have used Markov

networks labeled with relation symbols for representing

conditional independencies in probabilitydistributionsover

S structures.This can be seen as a qualitative analog to the

quantitative relational Bayesian networks described here.

2 THE BASIC FRAMEWORK

In medical example domains it is often natural to make the

domain closure assumption,i.e.to assume that the domain

under considerationconsists just of those objects mentioned

in the knowledge base.The following example highlights

a different kind of situation,where a denite domain of

objects is given over which the free variables are to range,

yet there is no evidence about most of these objects.

Example 2.1 Robot TBayes0.1 moves in an environment

consisting of n distinct locations.TBayes0.1 can make di

rect moves fromany location x to any location y unless the

(directed) path x!y is blocked.This happens to be the

case with probability p

for all x 6 y.At each time step

TBayes0.1,as well as a certain number of other robots op

erating in this domain,make one move along an unblocked

path x!y;x 6 y.TBayes0.1 just has completed the task

it was assigned to do,and is nowin search of new instruc

tions.It can receive these instructions,either by reaching a

terminallocationfromwhere a central task assigning com

puter can be accessed,or by meeting another robot that will

assign TBayes0.1 a subtask of its own job.Unfortunately,

TBayes0.1 only has the vaguest idea of where the terminal

locations are,or where the other robots are headed.The

best model of its environment that it can come up with,

is that every location x is a terminal location with proba

bility p

,and that any unblocked path x!y is likely to

be taken by at least one robot at any given time step with

probability p

.In order to plan its next move,TBayes0.1

tries to evaluate for every location x the probability that

going to x leads to success,dened as either getting in

structions at x directly,or being able to access a terminal

location in one more move from x.Hence,the probability

of s(uccess)(x ) is 1 if t(erminal)(x ) is true,or if t z and

:b(locked)(x;z ) holds for some z.Otherwise,there still is

a chance of s x being true,determined by the number of

incoming paths z!x,each of which is likely to be taken

by another robot with probability p

.Assuming a fairly

large number of other robots,the event that z!x is taken

by some robot can be viewed as independent from z

0

!x

being taken by a robot,so that the overall probability that

another robot will reach location x is given by p

k

,

where k jf z j z 6 x;:b z;x gj,i.e.by combining the

individual probabilities via noisyor.

The foregoing example gives an informal descriptionof how

the probability of s x is evaluated,given the predicates b

and t.Also,the probabilities for b and t are given.Piecing

all this information together (and assuming independence

whenever no dependence has been mentioned explicitly),

we obtain for every nite domain D of locations a proba

bility distribution P for the f b;t;s g structures over D.

Our aim nowis to represent this class of probability distri

butions in compact formas a Bayesian network with nodes

b;t,and s.Given the description of the dependencies in

the example,it is clear that this network should have two

edges:one leading fromb to s,and one leading fromt to s.

The more interesting problemis how to specify the condi

tional probability of the possible values of each node (i.e.

the possible interpretations of the symbol at that node),

given the values of its parent nodes.For the two parentless

nodes in our example this is accomplished very easily:for

a given domain D,and for all locations x;y 2 D we have

P b x;y

p

if x 6 y

if x y

(2)

P t x p

:(3)

Here P b x;y stands for the probability that x;y be

longs to the interpretationof b.Similarly for P t x .Since

b x;y and b x

0

;y

0

for x;y 6 x

0

;y

0

,respectively t x

and t x

0

for x 6 x

0

,were assumed to be mutually indepen

dent,this denes a probabilitydistributionover the possi ble

interpretations in D of the two predicates.For example,the

probability that I D D is the interpretation of b is 0

if x;x 2 I for some x 2 D,and p

j I j

p

n n j I j

else.

Next,we have to dene the probability of interpretations

of s.Given interpretations of b and t,the events s x

and s x

0

are independent for x 6 x

0

.Also,example 2.1

contains a hight level description of howthe probability of

s x is to be computed.Our aim now is to formalize this

computation rule in such a manner,that P s x can be

computed by evaluating a single functional expression,in

the same manner as P b x;y and P t x are given by

(2) and (3).

Since P s x depends on the interpretations of b and t,

we begin with functional expressions that access these in

terpretations.This is done by using indicator functions

I b

x;y and

I t

x .

I b

x;y ,for example,evalu

ates to 1 if x;y is in the given interpretation I b of b,

and to 0 otherwise.Though the function

I b

x;y has

to be distinguished fromthe logical expression b x;y ,for

the benet of greater readability,in the sequel the simpler

notation will be used for both.Thus,b x;y stands for the

function

I b

x;y whenever it appears withina functional

expression.

In order to nd a suitable functional expression F

s

x for

P s x ,assume rst that t x is true.Since t x implies

s x ,in this case we need to obtain F

s

x .In the case

:t x ,the probability of s x is computed by considering

all locations z 6 x for which either:b x;z or:b z;x .

Any such z that satises:b x;z ^ t z again makes s x

true with probability 1.If only:b z;x holds,then the

locationz merely contributes a probability p

to P s x .

Thus,for any z,the contribution of z to P s x is given

by maxf t z b x;z ;p

b z;x g.Combining all

the relevant z via noisyor,we obtain the formula

F

s

x nof maxf t z b x;z ;p

b z;x g

j z z 6 x g (4)

for x with:t x .

Abbreviating the functional expression on the righthand

side of (4) by H x ,we can nally put the two cases t x

and:t x together,dening

F

s

x t x t x H x :(5)

We now give a general denition of a representation lan

guage for forming functional expressions in the style of (5).

We begin by describing the general class of combination

functions,instances of which are the functions no and max

used above.

Denition 2.2 Acombination function is any function that

maps every nite multiset (i.e.a set possibly containing

multiple copies of the same element) with elements from

[0,1] into [0,1].

Except no and max,examples of combination functions

are min,the arithmetic mean of the arguments,etc.Each

combination function must include a sensible denition for

its result on the empty set.For example,we here use the

conventions no; max;,min;.

In the following,we use bold type to denote tuples of vari

ables: x

;:::;x

n

for some n.The number of ele

ments in tuple is denoted by j j.An equality constraint

c for is a quantier free formula over the empty vocab

ulary,i.e.,a formula only containing atomic subformulas of

the formx

i

x

j

.

Denition 2.3 The class of probability formulas over the

relational vocabulary S is inductively dened as follows.

(i) (Constants) Each rational number q 2 ; is a proba

bility formula.

(ii) (Indicator functions) For every n ary symbol r 2 S,

and every n tuple of variables,r is a probability

formula.

(iii) (Convex combinations) When F

;F

;F

are probabil

ity formulas,then so is F

F

F

F

.

(iv) (Combination functions) When F

;:::;F

k

are prob

ability formulas,comb is any combination function,

, are tuples of variables,and c ; is an equal

ity constraint,then combf F

;:::;F

k

j c ; g is a

probability formula.

Note that special cases of (iii) are multiplication (F

)

and inversion ( F

;F

).The set of free variables

of a probability formula is dened in the canonical way.

The free variables of combf:::g are the union of the free

variables of the F

i

,minus the variables in .

A probability formula F over S in the free variables

x

;:::;x

n

denes for every S structure over a domain

D a mapping D

n

7!;.The value F for 2 D

n

is

dened inductively over the structure of F.We here give

the details only for case (iv).

Let F be of the form

combf F

; ;:::;F

k

; j c ; g (where not

necessarily all the variables in and actually appear

in all the F

i

and in c ).In order to dene F ,we must

specify the multiset represented by

f F

; ;:::;F

k

; j c ; g:(6)

Let E D

j j

be the set f

0

j c ;

0

g.For each

0

2 E and each i 2 f ;:::;k g,by induction hypothe

sis,F

i

;

0

2 ;.The multiset represented by (6) now

is dened as containing as many copies of p 2 ; as

there are representations p F

i

;

0

with different i or

0

.Note that F

i

;

0

and F

i

;

00

count as different

representations even in the case that the variables for which

0

and

00

substitute different elements do not actually ap

pear in F

i

.The multiset f r j z z z g,for instance,

contains as many copies of the indicator r ,as there are

elements in the domain over which it is evaluated.

For any tautological constraint like z z,in the sequel we

simply write .

Another borderline case that needs clarication is the case

where is empty.Here our denition degenerates to:if

c holds,then the multiset f F

;:::;F

k

j; c g

contains as many copies of p 2 ; as there are represen

tations p F

i

;it is empty if c does not hold.

By using indicator functions r ,the value of F is

being dened in terms of the validityin of atomic formu

las r

0

.A natural generalization of probability formulas

might therefore be considered,in which not only the truth

values of atomic formulas are used,but indicator functions

for arbitrary rstorder formulas are allowed.As the fol

lowing lemma shows,this provides no real generalization.

Lemma 2.4 Let be a rstorder formula over the rela

tional vocabulary S.Then there exists a probabilityformula

F

over S,using max as the only combination function,

s.t.for every nite S structure ,and every 2 D

j j

:

F

iff holds in ,and F

else.

Proof:By induction on the structure of .If r

for some r 2 S,then F

r .For x

x

,

let F

x

;x

maxf j; x

x

g.Conjunction

and negation are handled by multiplication and in

version,respectively,of probability formulas.For

9 y ;y the corresponding probability formula is

F

maxf F

;y j y g.

Denition 2.5 A relational Bayesian network for the (re

lational) vocabulary S is given by a directed acyclic graph

containingone node for every r 2 S.The node for an n ary

r 2 S is labeled with a probabilityformula F

r

x

;:::;x

n

over the symbols in the parent nodes of r,denoted by Pa(r ).

The denition for the probability of b x;y in (2) does

not seem to quite match denition 2.5,because it contains

a distinction by cases not accounted for in denition 2.5.

However,this distinction by cases can be incorporated into

a single probability formula.If,for instance,c

and

c

are two mutually exclusive and exhaustive equality

constraints,then

F maxf maxf F

j; c

g;

maxf F

j; c

gj; g (7)

evaluates to F

for with c

,and to F

for

with c

.

Let N nowbearelational Bayesiannetworkover S.Let r be

(the label of) a node in N with arity n,and let be a Pa(r )

structure over domain D.For every 2 D

n

,F

r

2 ;

then is dened.Thus,for every interpretation I r of r in

D

n

we can dene

P I r

Y

2 I r

F

r

Y

62 I r

F

r

;

which gives a probability distribution over interpretations

of r,given the interpretations of Pa(r ).Given a xed do

main D,a relational Bayesian network thus denes a joint

probability distribution P over the interpretations in D of

the symbols in S,or,equivalently,a probability measure

on S structures over D.Hence,semantically,relational

Bayesian networks are mappings of nite domains D into

probability measures on S structures over D.

Example 2.6 Reconsider the relations cancer and exposed

as described in the introduction.Assume that !

; is the probability distribution that for any xed organ

y gives the probability that y develops cancer after the n th

exposure to radiation.Let n

P

n

i

n be the cor

responding distributionfunction.Then can be used to de

ne a combination function comb

by letting for a multiset

A:comb

A n ,where n is the number of nonzero el

ements in A (counting multiplicities).Using comb

we ob

tainthe probabilityformula comb

f exposed x;y;z j z g

for the contributionof organ y to the cancer risk of x.Com

bining for all y,then

F

cancer

x nof comb

f exposed x;y;z j z gj y g

is a probability formula dening the risk of cancer for x,

given the relation exposed.

In the preceding example we have tacitly assumed a multi

sorted domain,so that the variables x;y;z range over dif

ferent sets people,organs,times,respectively.We

here do not introduce an extra formalization for dealing

with many sorted domains.It is clear that this can be done

easily,but would introduce an extra load of notation.

3 INFERENCE

The inference problem we would like to solve is:given

a relational Bayesian network N for S,a nite domain

D f d

;:::;d

n

g,an evidence set of ground literals

E f r

;:::;r

k

k

;:r

k

k

;:::;:r

m

m

g

with r

i

2 S (not necessarily distinct),

i

D (not neces

sarily distinct) for i ;:::;m,and a ground atomr

(r

2 S;

D ),what is the probability of r

given

r

;:::;:r

m

m

?More precisely:in the probabil

ity measure P dened by N on the S structures over

D,what is the conditional probability P r

j E

of a structure satisfying r

,given that it satises

r

;:::;:r

m

m

?

Since for any given nite domain a relational Bayesian

network can be seen as an ordinary Bayesian network for

variables with nitely many possible values,in principle,

any inference algorithmfor standardBayesian networkscan

be used.

Unfortunately,however,direct applicationof any such algo

rithmwill be inefcient,because they include a summation

over all possible values of a node,and the number of pos

sible values here is exponential in the size of the domain.

For this reason,it will often be more efcient to follow

the approach used in inference from rulebase encodings

of probabilistic knowledge,and to construct for every spe

cic inference task an auxiliary Bayesian network whose

nodes are ground atoms in the symbols from S,each of

whichwiththetwopossiblevalues trueandfalse(cf.(Breese

1992),(Ngo et al.1995)).

The reason why we here can do the same is that in the

query r

we do not ask for the probability of any spe

cic interpretation of r

,but only for the probability of all

interpretations containing

.For the computation of this

probability,in turn,it is irrelevant to know the exact inter

pretations of parent nodes r

0

of r.Instead,we only need to

knowwhich of those tuples

0

belong to r

0

,whose indicator

r

0

0

is needed in the computation of F

r

.

In order to construct such an auxiliary network,we have to

compute for some given atom r the list of atoms r

0

0

on whose truth value F

r

depends.One way of doing

this is to just go through a recursive evaluation of F

r

,

and list all the ground atoms encountered in this evaluation.

However,rather than doing this,it is useful to compute for

every relation symbol r 2 S,and each parent relation r

0

of

r,an explicit description of the tuples ,such that F

r

depends on r

0

.Such an explicit description can be given

in formof a rstorder formula pa

rr

; over the empty

vocabulary.

To demonstrate the general method for the computation of

these formulas,we show how to obtain pa

sb

x;y

;y

for

F

s

x as dened in (5).By induction on the structure of

F

s

,we compute formulas pa

Gb

;y

;y

that dene for

a subformula G of F

s

the set of y

;y

s.t.G

depends on b y

;y

.In the end,then,pa

sb

x;y

;y

pa

F

b

x;y

;y

.

The two subformulas t x and t x of F

s

do not

depend on b at all;therefore we can let pa

t x b

x;y

;y

pa

t x b

x;y

;y

,where is some unsatisable for

mula.

To obtain pa

H x b

x;y

;y

we begin with the atomic

subformulas b x;z and b z;x of H x ,which yield

pa

b x;z b

x;z;y

;y

y

x ^ y

z and

pa

b z;x b

x;z;y

;y

y

z ^ y

x respectively.

The remaining atomic subformulas t z ,1,and p

appear

ing within the max combination function again only yield

the unsatisable .Skipping one trivial step where the for

mulas for the two arguments of M x;z maxf:::g are

computed,we next obtain the formula

pa

M x;z b

x;z;y

;y

y

x ^ y

z _ y

z ^ y

x

(after deleting some meaningless disjuncts).H x

nof M x;z j z z 6 x g depends on all b y

;y

for which

there exist some z 6 x s.t.pa

M x;z b

x;z;y

;y

.Hence,

pa

H x b

x;y

;y

9 z y

x ^ y

z _ y

z ^ y

x ;(8)

which is already the same as pa

F

x b

x;y

;y

.Finally,

we can simplify (8),and obtain

pa

sb

x;y

;y

y

x ^ y

6 x _ y

6 x ^ y

x :(9)

In general,the formulas pa

rr

; are existential;

formulas.It is not always possible to completely elimi

nate the existential quantiers as in the preceding example.

However,it is always possible to transformpa

rr

; into

a formula so that quantiers only appear in subformulas of

the form 9

n

xx x,postulating the existence of at least

n elements.This means that for every formula pa

rr

; ,

and tuples ;

0

D,it can be checked in time independent

of the size of D whether pa

rr

;

0

holds.

The formula pa

rr

; enables us to nd for every tuple

the parents r

0

0

of r in the auxiliary network.More

over,we can take this one step further:suppose that in the

original network N there is a path of length two fromnode

r

00

via a node r

0

to r.Then,in the auxiliary network,there

is a path of length two froma node r

00

00

via a node r

0

0

to r iff the formula

pa

r

!r

!r

; 9 z pa

rr

; ^ pa

r

r

; :

(10)

is satised for ,and

00

.Taking the disjunction

of all formulas of the form (10) for all paths in N leading

from r

00

to r then yields a formula pa

rr

; dening

all predecessors r

00

00

of a node r in the auxiliary

network.

Using the pa

rr

and pa

rr

,we can for given evidence and

query construct the auxiliary network needed to answer the

query:we begin with a node r

for the query.For all

nodes r added to the network,we add all parents r

0

0

of r ,as dened by pa

rr

.If r is not instantiatedin E,

using the formulas pa

r

r

,we check whether the subgraph

rooted at r contains a node instantiated in E.If this is

the case,we add all successors of r that lie on a path

from r to an instantiated node (these are again given

by the formulas pa

r

r

).Thus,we can construct directly

the minimal network needed to answer the query,without

rst backward chaining fromevery atomin E,and pruning

afterwards.

Auxiliary networks as described here still encode ner dis

tinctions in the instantiations of the nodes of N than is

actually needed to solve our inference problem.Consider,

for example,the case where the domain in example 2.1 con

sists of ten locations f l

;:::;l

g,there is no evidence,and

the query is s l

.According to (9),the auxiliary network

will contain nodes b l

;l

i

;b l

i

;l

for all i ;:::;.

In applying standard inference techniques on this network,

we distinguish e.g.the case where b l

;l

;b l

;l

are

true and b l

;l

;b l

;l

are false from the case where

b l

;l

;b l

;l

are false and b l

;l

;b l

;l

are true,

and all other b l

;l

i

;b l

i

;l

have the same truth value.

However,for the given inference problem,this distinction

really is unnecessary,because the identityof locations men

tioned neither in evidence nor query is immaterial.Future

work will therefore be directed towards nding inference

techniques for relational Bayesian networks that distinguish

instantiations of the relations in the network at a higher

level of abstraction than the current auxiliary networks,and

thereby reduce the complexity of inference in terms of the

size of the underlying domain.

4 RECURSIVE NETWORKS

In the distributionsdened by relational Bayesian network s

of denition 2.5,the events r and r

0

with 6

0

are

conditionally independent,given the interpretation of the

parent relations of r.This is a rather strong limitation of

the expressiveness of these networks.For instance,using

these networks,we can not model a variationof example 2.1

in which the predicate blocked is symmetric:b x;y being

independent from b y;x ,b x;y ,b y;x can not be

enforced.

There are other interesting things that we are not able to

model so far.Among themare randomfunctions (the main

concern of (Haddawy 1994)),and a recursive temporal de

pendence of a relation on itself (addressed both in (Ngo

et al.1995) and (Glesner & Koller 1995)).In this sec

tion we dene a straightforwardgeneralization of relation al

Bayesian networks that allows us to treat all these issues in

a uniformway.

Wecanidentifyarecursivedependence of a relationonitself

as the general underlying mechanismwe have to model.In

the case of symmetric relations,this is a dependence of

r x;y on r y;x .In the case of a temporal development,

this is the dependence of a predicate r t; ,having a time

variable as its rst argument,on r t ; .Functions can

be seen as special relations r ;y ,where for every there

exists exactly one y,s.t.r ;y is true.Thus,for every ,

r ;y depends on all r ;y

0

in that exactly one of these

atoms must be true.

It is clear that there is no fundamental problem in model

ing such recursive dependencies within a Bayesian network

framework,as long as the recursive dependency of r

on r

;:::r

k

does not produce any cycles.Most ob

viously,in the case of a temporal dependency,the use of

r t ; in a denition of the probability of r t; does

not pose a problem,as long as a nonrecursive denition of

the probability of r ; is provided.

To make the recursive dependency of r x;y on r y;x in

a symmetric relation similarly wellfounded,we can use

a total order on the domain.Then we can generate a

randomsymmetric relation by rst dening the probability

of r x;y with x y,and then the (0,1valued) probability

of r y;x given r x;y .Nowconsider the case of a random

function r ;y with possible values y 2 f v

;:::;v

k

g.

Here,too,we can make the interdependence of the different

r ;y acyclic by using a total order on f v

;:::;v

k

g,and

assigning a truth value to r ;v

i

by taking into account

the already dened truth values of r ;v

j

for all v

j

that

precede v

i

in that order.

Fromthese examples we see that what we essentially need,

in order to extend our framework to cover a great vari

ety of interesting specic forms of probability distributi ons

over S structures,are wellfounded orderings on tuples of

domain elements.These wellfounded orderings can be

supplied via rigid relations on the domain,i.e.xed,prede

termined relations that are not generated probabilistically.

Indeed,one such relation we already have used throughout:

the equality relation.It is therefore natural to extend our

framework by allowing additional relations that are to be

used in the same way as the equality predicate has been em

ployed,namely,in constraints for combination functions.

Also,xed constants will be needed as the possible values

of randomfunctions.

For the case of a binary symmetric relation r x;y ,assume,

as above,that we are given a total (nonstrict) order on

the domain.A probability formula that denes a proba

bility distributionconcentrated on symmetric relations,and

making r x

;x

true with probability p for all x

;x

,

then is

F

r

x

;x

maxf maxf p j; x

x

g;(11)

maxf r x

;x

j;:x

x

gj; g:

As in (7),here a nested maxf:::g function is used in order

tomodel a distinctionbycases.The rst inner maxfunction

evaluates to p if x

x

,and to 0 else.The second max

function is equal to r x

;x

if x

>x

,and 0 else.

For the temporal example,assume that the domain contains

n time points t

;:::;t

n

,and a successor relation s

f t

i

;t

i

j i n g on the t

i

's.Assume that r t;

is a relation with a time parameter as the rst argument,

and that r t

; shall hold with probability p

for all ,

whereas r t

i

; has probability p

if r t

i

; holds,and

probability p

else.In order to dene the probability of

r t; by a probability formula,the case t t

must be

distinguished from the case t t

i

,i .For this we

use the probability formula F

t maxf j t

0

s t

0

;t g,

which evaluates to 0 for t t

,and to 1 for t t

;:::;t

n

.

We can nowuse the formula

F

r

t; F

t p

F

t maxf r t

0

; p

r t

0

; p

j t

0

s t

0

;t g

to dene the probability of r t; .

Finally,for a functional relation r ;y ,suppose that we

are given a domain,together with the interpretations of n

constant symbols v

;:::;v

n

,and a strict total order <,s.t.

v

<v

<:::<v

n

.Nowconsider the probabilityformula

F

r

;y maxf r ;z j z z<y g

maxf maxf p

j; y v

g;:::;

maxf p

n

j; y v

n

gj; g

The rst factor in this formula tests whether r ;z al

ready is true for some possible value v

i

<y.If this is the

case,then the probability of r ;y given by F

r

;y is

0.Otherwise,the probability of r ;y is p

i

iff y v

i

.

The probabilitythat by this procedure the argument is as

signed the value v

i

then is p

p

::: p

i

p

i

.

By a suitable choice of the p

i

any probability distribution

over the v

i

can be generated.

The given examples motivate a generalization of relational

Bayesian networks.For this,let R be a vocabulary contain

ing relationand constant symbols,S a relational vocabulary

with R\S ;.An R constraint c for is a quantier

free R formula.Dene the class of R probabilityformulas

over S precisely as in denition 2.3,with equality con

straint replaced by R constraint.

Denition 4.1 Let R;S be as above.Arecursive relational

Bayesian network for S with R constraints is given by a

directed acyclic graph containing one node for every r 2

S.The node for an n ary r 2 S is labeled with an R

probability formula F

r

x

;:::;x

n

over Pa(r ) [f r g.

The semantics of a recursive relational Bayesian network

is a bit more complicated than that of relational Bayesian

networks.The latter dened a mapping of domains D

into probability measures on S structures over D.Re

cursive relational Bayesian networks essentially dene a

mapping of R structures into probability measures on

S expansions of .This mapping,however,is only dened

for R structures whose interpretations of the symbols in R

lead to wellfounded recursive denitions of the probabil

ities for the r atoms (r 2 S ).If,for instance,R fg,

and is an R structure in which there exist two elements

d

;d

,s.t.neither d

d

,nor d

d

,then (11) does

not dene a probability measure on f r g expansions of ,

because the probability of r d

;d

gets dened in terms

of r d

;d

,and vice versa.

As in section 3,for every r

0

2 Pa(r ) [f r g a formula

pa

rr

; can be computed that denes for an R structure

and D the tuples

0

D,s.t.F

r

depends on

r

0

0

.While in section 3 existential formulas over the

empty vocabulary were obtained,for recursive relational

networks the pa

rr

are existential formulas over R.

The denitions of the probabilities F

r

are wellfounded

for D iff the relation pa

rr

f ;

0

j

pa

rr

;

0

holds in g is acyclic.A recursive relational

Bayesian network N thus denes a probability measure on

S expansions of those R structures ,for which the rela

tion pa

rr

is acyclic for all r 2 S.

The discussion of inference procedures for relational

Bayesian networks in section 3 applies with few modi

cations to recursive networks as well.Again,we can con

struct an auxiliary network with nodes for ground atoms,

using formulas pa

rr

and pa

rr

.The complexity of this

construction,however,increases on two accounts:rst,th e

existential quantications in the pa

rr

,pa

rr

can no longer

be reduced to mere cardinality constraints.Therefore,the

complexity of deciding whether pa

rr

;

0

holds for given

;

0

D is no longer guaranteed to be independent of the

size of the domain D.Second,to obtain the formulas pa

rr

we may have to build much larger disjunctions:it is no

longer sufcient to take the disjunction over all possible

paths from r

0

to r in the network structure of N.In ad

dition,for every relation r on these paths,the disjunction

over all possible paths within pa

r r

has to be taken.This

amounts to determining the length l of the longest path in

pa

r r

,and then taking the disjunction over all formulas

pa

i

r r

; 9

;:::;

i

pa

r r

;

^:::^ pa

r r

i

;

with i <l.As a consequence,the formulas pa

rr

are no

longer independent of the structure under consideration.

5 CONCLUSION

In this paper we have presented a new approach to deal

with rulelike probability statements for nondeterministic

relations on the elements of some domain of discourse.De

viating fromprevious proposals for formalizing such rules

with a logic programming style framework,we here have

associated with every relation symbol r a single probabil

ity formula that directly denes the probability distribut ion

over interpretations of r withina Bayesian network.The re

sultingframework is bothmore expressive and semantically

more transparent than previous ones.It is more expressive,

because it introduces the tools to restrict the instantiations

of certain rules to tuples satisfying certain equality con

straints,and to specify complex combinations and nestings

of combination functions.It is semantically more transpar

ent,because a relational Bayesian network directly denes

a unique probabilitydistributionover S structures,whereas

the semantics of a probabilistic rule base usually are only

implicitly dened througha transformation intoan auxilia ry

Bayesian network.

Inference from relational Bayesian networks by auxiliary

network construction is as efcient as inference (by essen

tially the same method) in rule based formalisms.It may

be hoped that in the case where this inference procedure

seems unsatisfactory,namely,for large domains most of

whose elements are not mentioned in the evidence,our new

representation paradigm will lead to more efcient infer

ence techniques.

Acknowledgments

I have beneted fromdiscussions with Daphne Koller who

also provided the original motivation for this work.This

work was funded in part by DARPAcontract DACA7693

C0025,under subcontract to Information Extraction and

Transport,Inc.

References

Breese,J.S.(1992),Construction of belief and decision

networks,Computational Intelligence.

Glesner,S.& Koller,D.(1995),Constructing exible dy

namic belief networks from rstorder probabilistic

knowledge bases,in Proceedings of ECSQARU,

Lecture Notes in Articial Intelligence,Springer Ver

lag.

Haddawy,P.(1994),Generating bayesian networks from

probability logic knowledge bases,in Proceedings

of the Tenth Conference on Uncertainty in Articial

Intelligence.

Halpern,J.(1990),An analysis of rstorder logics of

probability,Articial Intelligence 46,311350.

Koller,D.& Halpern,J.Y.(1996),Irrelevance and con

ditioning in rstorder probabilistic logic,in Pro

ceedins of the 13th National Conference on Articial

Intelligence (AAAI),pp.569576.

Ngo,L.,Haddawy,P.& Helwig,J.(1995),A theoretical

framework for contextsensitive temporal probability

model construction with application to plan projec

tion,in Proceedings of the Eleventh Conference on

Uncertainty in Articial Intelligence,pp.419426.

Poole,D.(1993),Probabilistic horn abduction and

bayesian networks,Articial Intelligence 64,81

129.

## Comments 0

Log in to post a comment