# X f x 1 x n gfg

Relational Bayesian Networks
Manfred Jaeger

Computer Science Department,
Stanford University,Stanford CA94305
jaeger@robotics.stanford.edu
Abstract
A new method is developed to represent prob­
abilistic relations on multiple random events.
Where previously knowledge bases containing
probabilistic rules were used for this purpose,
here a probabilitydistributionover the relations is
directly represented by a Bayesian network.By
using a powerful way of specifying conditional
probability distributions in these networks,the
resulting formalism is more expressive than the
previous ones.Particularly,it provides for con­
straints on equalities of events,and it allows to
dene complex,nested combination functions.
1 INTRODUCTION
In a standard Bayesian network,nodes are labeled with ran­
dom variables (r.v.s) X that take values in some nite set
f x
￿
;:::;x
n
g.Anetwork with r.v.s (earth)quake,burglary,
and alarm,each with possible values f true,falseg,for in­
stance,then denes a joint probabilitydistributionfor th ese
r.v.s.
Evidence,E,is a set of instantiationsof some of the r.v.s.A
query asks for the probability of a specic value x of some
r.v.X,given the instantiations in the evidence.The answer
to this query is the conditional probability P ￿ X ￿ x j E ￿
in the distribution P dened by the network.
The implicit underlying assumption we here make is that
the value assignments in the evidence and the query in­
stantiate the attributes of one single random event,or
object,that has been sampled (observed) according to
the distribution of the network.If,for instance,E ￿
f quake = true,alarm= trueg,then both instantiations are
assumed to refer to one single observed state of the world
!,and not the facts that there was an earthquake in 1906,
and the alarmbell is ringing right now.
￿
On leave from:Max­Planck­Institut f¨ur Informatik,Im
In case we indeed have evidence about several ob­
served events,e.g.quake(!
￿
) = true,alarm(!
￿
) = true,
burglary(!
￿
) = false,then,for the purpose of answer­
ing a query X ￿!￿ ￿ x about one of these events,all
evidence about other events can be ignored,and only
P ￿ X ￿!￿ ￿ x j E ￿!￿￿ needs to be computed.For each
of these computations the same Bayesian network can be
used.
Things become much different when we also want to model
relations that may hold between two different random
events.Suppose,for instance,we also want to say some­
thingabout the probabilitythat one earthquake was stronger
than another.For this we use the binary relation stronger,
and would like to relate the probability of stronger(!
￿
;!
￿
￿
to,say,alarm(!
￿
) and alarm(!
￿
).Evidence may now
contain instantiations of stronger for many different pairs
of states:f stronger(!
￿
;!
￿
);:::;stronger(!
￿
;!
n
)g,and a
query may be alarm(!
￿
￿.In evaluating this query,we
no longer can ignore information about the other events
!
￿
;:::;!
n
.This means,however,that if we do not want
to impose an a priori restriction on the number of events
we can have evidence for,no single xed Bayesian network
with nite­range r.v.s will be sufcient to evaluate querie s
for arbitrary evidence sets.
Nevertheless,the probabilistic information that we would
like to encode about relations between an arbitrary number
of different events may very well be expressible by some ­
nite set of laws,applicable to an arbitrary number of events.
One way of expressing such laws,which has been explored
in the past ( (Breese 1992),(Poole 1993),(Haddawy 1994)),
is to use probabilistic rules such as
stronger ￿ u;v ￿
￿:￿
quake￿ u ￿ ^ quake￿ v ￿
^ alarm￿ u ￿ ^:alarm￿ v ￿:(1)
The intended meaning here is:for all states of the world
!
￿
and!
￿
,given that quake￿!
￿
￿ ^:::^:alarm￿!
￿
￿ is
true,the probability that!
￿
is stronger than!
￿
is 0.8.A
rule­base containing expressions of this form then can be
used to construct,for each specic evidence and query,
a Bayesian network over binary r.v.s stronger￿!
￿
;!
￿
￿,
stronger￿!
￿
;!
￿
￿,quake￿!
￿
￿,...,in which the answer to the
query subsequently is computed using standard Bayesian
network inference.
In all the above mentioned approaches,quite strong syn­
tactic and/or semantic restrictions are imposed in the for­
malismthat severely limit its expressiveness.Poole (1993)
does not allowthe general expressiveness of rules like (1),
but only combines deterministic rules with the specicatio n
of certain unconditional probabilities.Haddawy(1994) al­
lows only rules in which the antecedent does not contain
free variables that do not appear in the consequent.As
pointed out by Glesner and Koller (1995),this is a severe
limitation.For instance,we can then not express by a rule
like aids￿ x ￿
p
contact￿ x;y ￿ that the probability of person
x having aids depends on any other person y,with whomx
had sexual contact.When we do permit an additional free
variable y in this manner,it also has to be dened howthe
probability of the consequent is affected when there exist
multiple instantiations of y that make the antecedent true
(this question also arises when several rules with the same
consequent are permitted in the rule base ).In (Glesner &
Koller 1995) and (Ngo,Haddawy &Helwig1995) therefore
a combination rule is added to the rule­base,which denes
how the conditional probabilities arising fromdifferent in­
stantiations,or rules,are to be combined.If the different
causal relationships described by the rules are understood
to be independent,then the combination rule typically will
be noisy­or.
The specication of a single combination rule applied to
all sets of instantiations of applicable rules,again,does
not permit us to describe certain important distinctions.If,
for instance,we have a rule that relates aids￿ x ￿ to the re­
lation contact￿ x;y ￿,and another rule that relates aids￿ x ￿
to the relation donor￿ x;y ￿,standing for the fact that x
has received a blood transfusion from donor y,then the
probability computed for aids￿ a ￿,using a simple combina­
tion rule,will depend only on the number of instantiations
for contact￿ a;y ￿ and for donor￿ a;y ￿.Particularly,we are
not able to make special provisions for the two rules to
be instantiated by the same element b,even though the
case contact￿ a;b ￿ ^ donor￿ a;b ￿ clearly has to be distin­
guished fromthe case contact￿ a;b ￿ ^ donor￿ a;c ￿,or even
contact￿ a;b ￿ ^ donor￿ a;a ￿.
In this paper a representation formalism is developed that
incorporates constraints on the equality of instantiating el­
ements,and thereby allows us to dene different probabil­
ities in situations only distinguished by equalities between
instantiating elements.
Furthermore,our representation method will allow us to
specify hierarchical,or nested,combination rules.As
an illustrations of what this means,consider the unary
predicate cancer￿ x ￿,representing that person x will de­
velop cancer at some time,and the three placed rela­
tion exposed￿ x;y;z ￿,representing that organ y of per­
son x was exposed to radiation at time z (by the taking
of an x­ray,intake of radioactively contaminated food,
etc).Suppose,now,that for person x we have evidence
E ￿ f exposed￿ x;y
i
;z
j
￿ j i ￿ ￿;:::;k ￿ j ￿ ￿;:::;l g,
where y
i
￿ y
i
￿
for some i;i
0
,and z
j
￿ z
j
￿
for some
j;j
0
.Assume that for any specic organ y,multiple ex­
posures of y to radiation have a cumulative effect on the
risk of developing cancer of y,so that noisy­or is not the
adequate rule to model the combined effect of instances
exposed￿ x;y;z
j
￿ on the probability of developing cancer
of y.On the other hand,developing cancer at any of the
various organs y can be viewed as independent causes for
developing cancer at all.Thus,a single rule of the form
cancer￿ x ￿
p
exposed￿ x;y;z ￿ together with a at combi­
nation rule is not sufcient to model the true probabilistic
relationships.Instead,we need to use one rule to rst com­
bine for every xed y the instances given by different z,and
then use another rule (here noisy­or) to combine the effect
of the different y's.
To permit constraints on the equality of instantiating ele­
ments,and to allowfor hierarchical denitions of combina­
tion functions,in this paper we depart from the method of
representing our information in a knowledge base contain­
ing different types of rules.Instead,we here use Bayesian
networks with a node for every relation symbol r of some
vocabulary S,which is seen as a r.v.whose values are pos­
sible interpretations of r in some specic domain D.The
state space of these relational Bayesian networks therefore
can be identied with the set of all S ­structures over D,and
its semantics is a probability distributionover S ­structures,
as wereusedbyHalpern(1990) tointerpret rst­order proba ­
bilistic logic.Halpern and Koller(1996) have used Markov
networks labeled with relation symbols for representing
conditional independencies in probabilitydistributionsover
S ­structures.This can be seen as a qualitative analog to the
quantitative relational Bayesian networks described here.
2 THE BASIC FRAMEWORK
In medical example domains it is often natural to make the
domain closure assumption,i.e.to assume that the domain
under considerationconsists just of those objects mentioned
in the knowledge base.The following example highlights
a different kind of situation,where a denite domain of
objects is given over which the free variables are to range,
yet there is no evidence about most of these objects.
Example 2.1 Robot TBayes0.1 moves in an environment
consisting of n distinct locations.TBayes0.1 can make di­
rect moves fromany location x to any location y unless the
(directed) path x!y is blocked.This happens to be the
case with probability p
￿
for all x 6￿ y.At each time step
TBayes0.1,as well as a certain number of other robots op­
erating in this domain,make one move along an unblocked
path x!y;x 6￿ y.TBayes0.1 just has completed the task
it was assigned to do,and is nowin search of new instruc­
tions.It can receive these instructions,either by reaching a
terminal­locationfromwhere a central task assigning com­
puter can be accessed,or by meeting another robot that will
assign TBayes0.1 a subtask of its own job.Unfortunately,
TBayes0.1 only has the vaguest idea of where the terminal
locations are,or where the other robots are headed.The
best model of its environment that it can come up with,
is that every location x is a terminal location with proba­
bility p
￿
,and that any unblocked path x!y is likely to
be taken by at least one robot at any given time step with
probability p
￿
.In order to plan its next move,TBayes0.1
tries to evaluate for every location x the probability that
going to x leads to success,dened as either getting in­
structions at x directly,or being able to access a terminal
location in one more move from x.Hence,the probability
of s(uccess)(x ) is 1 if t(erminal)(x ) is true,or if t ￿ z ￿ and
:b(locked)(x;z ) holds for some z.Otherwise,there still is
a chance of s ￿ x ￿ being true,determined by the number of
incoming paths z!x,each of which is likely to be taken
by another robot with probability p
￿
.Assuming a fairly
large number of other robots,the event that z!x is taken
by some robot can be viewed as independent from z
0
!x
being taken by a robot,so that the overall probability that
another robot will reach location x is given by ￿ ￿￿ p
￿
￿
k
,
where k ￿ jf z j z 6￿ x;:b ￿ z;x ￿ gj,i.e.by combining the
individual probabilities via noisy­or.
The foregoing example gives an informal descriptionof how
the probability of s ￿ x ￿ is evaluated,given the predicates b
and t.Also,the probabilities for b and t are given.Piecing
all this information together (and assuming independence
whenever no dependence has been mentioned explicitly),
we obtain for every nite domain D of locations a proba­
bility distribution P for the f b;t;s g ­structures over D.
Our aim nowis to represent this class of probability distri­
butions in compact formas a Bayesian network with nodes
b;t,and s.Given the description of the dependencies in
the example,it is clear that this network should have two
edges:one leading fromb to s,and one leading fromt to s.
The more interesting problemis how to specify the condi­
tional probability of the possible values of each node (i.e.
the possible interpretations of the symbol at that node),
given the values of its parent nodes.For the two parentless
nodes in our example this is accomplished very easily:for
a given domain D,and for all locations x;y 2 D we have
P ￿ b ￿ x;y ￿￿ ￿

p
￿
if x 6￿ y
￿ if x ￿ y
(2)
P ￿ t ￿ x ￿￿ ￿ p
￿
:(3)
Here P ￿ b ￿ x;y ￿￿ stands for the probability that ￿ x;y ￿ be­
longs to the interpretationof b.Similarly for P ￿ t ￿ x ￿￿.Since
b ￿ x;y ￿ and b ￿ x
0
;y
0
￿ for ￿ x;y ￿ 6￿￿ x
0
;y
0
￿,respectively t ￿ x ￿
and t ￿ x
0
￿ for x 6￿ x
0
,were assumed to be mutually indepen­
dent,this denes a probabilitydistributionover the possi ble
interpretations in D of the two predicates.For example,the
probability that I  D  D is the interpretation of b is 0
if ￿ x;x ￿ 2 I for some x 2 D,and p
j I j
￿
￿￿ p
￿
￿
n ￿ n ￿￿ j I j
else.
Next,we have to dene the probability of interpretations
of s.Given interpretations of b and t,the events s ￿ x ￿
and s ￿ x
0
￿ are independent for x 6￿ x
0
.Also,example 2.1
contains a hight level description of howthe probability of
s ￿ x ￿ is to be computed.Our aim now is to formalize this
computation rule in such a manner,that P ￿ s ￿ x ￿￿ can be
computed by evaluating a single functional expression,in
the same manner as P ￿ b ￿ x;y ￿￿ and P ￿ t ￿ x ￿￿ are given by
(2) and (3).
Since P ￿ s ￿ x ￿￿ depends on the interpretations of b and t,
we begin with functional expressions that access these in­
terpretations.This is done by using indicator functions
￿
I ￿ b ￿
￿ x;y ￿ and ￿
I ￿ t ￿
￿ x ￿.￿
I ￿ b ￿
￿ x;y ￿,for example,evalu­
ates to 1 if ￿ x;y ￿ is in the given interpretation I ￿ b ￿ of b,
and to 0 otherwise.Though the function ￿
I ￿ b ￿
￿ x;y ￿ has
to be distinguished fromthe logical expression b ￿ x;y ￿,for
the benet of greater readability,in the sequel the simpler
notation will be used for both.Thus,b ￿ x;y ￿ stands for the
function ￿
I ￿ b ￿
￿ x;y ￿ whenever it appears withina functional
expression.
In order to nd a suitable functional expression F
s
￿ x ￿ for
P ￿ s ￿ x ￿￿,assume rst that t ￿ x ￿ is true.Since t ￿ x ￿ implies
s ￿ x ￿,in this case we need to obtain F
s
￿ x ￿￿￿.In the case
:t ￿ x ￿,the probability of s ￿ x ￿ is computed by considering
all locations z 6￿ x for which either:b ￿ x;z ￿ or:b ￿ z;x ￿.
Any such z that satises:b ￿ x;z ￿ ^ t ￿ z ￿ again makes s ￿ x ￿
true with probability 1.If only:b ￿ z;x ￿ holds,then the
locationz merely contributes a probability p
￿
to P ￿ s ￿ x ￿￿.
Thus,for any z,the contribution of z to P ￿ s ￿ x ￿￿ is given
by maxf t ￿ z ￿￿￿ b ￿ x;z ￿￿;p
￿
￿￿ b ￿ z;x ￿￿ g.Combining all
the relevant z via noisy­or,we obtain the formula
F
s
￿ x ￿￿ n­of maxf t ￿ z ￿￿￿ b ￿ x;z ￿￿;p
￿
￿￿ b ￿ z;x ￿￿ g
j z ￿ z 6￿ x g (4)
for x with:t ￿ x ￿.
Abbreviating the functional expression on the right­hand
side of (4) by H ￿ x ￿,we can nally put the two cases t ￿ x ￿
and:t ￿ x ￿ together,dening
F
s
￿ x ￿￿ t ￿ x ￿￿￿￿ t ￿ x ￿￿ H ￿ x ￿:(5)
We now give a general denition of a representation lan­
guage for forming functional expressions in the style of (5).
We begin by describing the general class of combination
functions,instances of which are the functions n­o and max
used above.
Denition 2.2 Acombination function is any function that
maps every nite multiset (i.e.a set possibly containing
multiple copies of the same element) with elements from
[0,1] into [0,1].
Except n­o and max,examples of combination functions
are min,the arithmetic mean of the arguments,etc.Each
combination function must include a sensible denition for
its result on the empty set.For example,we here use the
conventions n­o;￿ max;￿￿,min;￿￿.
In the following,we use bold type to denote tuples of vari­
ables: ￿￿ x
￿
;:::;x
n
￿ for some n.The number of ele­
ments in tuple  is denoted by j  j.An equality constraint
c ￿  ￿ for  is a quantier free formula over the empty vocab­
ulary,i.e.,a formula only containing atomic subformulas of
the formx
i
￿ x
j
.
Denition 2.3 The class of probability formulas over the
relational vocabulary S is inductively dened as follows.
(i) (Constants) Each rational number q 2 ￿￿;￿￿ is a proba­
bility formula.
(ii) (Indicator functions) For every n ­ary symbol r 2 S,
and every n ­tuple  of variables,r ￿  ￿ is a probability
formula.
(iii) (Convex combinations) When F
￿
;F
￿
;F
￿
are probabil­
ity formulas,then so is F
￿
F
￿
￿￿￿ F
￿
￿ F
￿
.
(iv) (Combination functions) When F
￿
;:::;F
k
are prob­
ability formulas,comb is any combination function,
, are tuples of variables,and c ￿ ; ￿ is an equal­
ity constraint,then combf F
￿
;:::;F
k
j  ￿ c ￿ ; ￿ g is a
probability formula.
Note that special cases of (iii) are multiplication (F
￿
￿￿ )
and inversion ( F
￿
￿￿;F
￿
￿￿ ).The set of free variables
of a probability formula is dened in the canonical way.
The free variables of combf:::g are the union of the free
variables of the F
i
,minus the variables in .
A probability formula F over S in the free variables  ￿
￿ x
￿
;:::;x
n
￿ denes for every S ­structure  over a domain
D a mapping D
n
7!￿￿;￿￿.The value F ￿  ￿ for  2 D
n
is
dened inductively over the structure of F.We here give
the details only for case (iv).
Let F ￿  ￿ be of the form
combf F
￿
￿ ; ￿;:::;F
k
￿ ; ￿ j  ￿ c ￿ ; ￿ g (where not
necessarily all the variables in  and  actually appear
in all the F
i
and in c ).In order to dene F ￿  ￿,we must
specify the multiset represented by
f F
￿
￿ ; ￿;:::;F
k
￿ ; ￿ j  ￿ c ￿ ; ￿ g:(6)
Let E  D
j  j
be the set f 
0
j c ￿ ;
0
￿ g.For each

0
2 E and each i 2 f ￿;:::;k g,by induction hypothe­
sis,F
i
￿ ;
0
￿ 2 ￿￿;￿￿.The multiset represented by (6) now
is dened as containing as many copies of p 2 ￿￿;￿￿ as
there are representations p ￿ F
i
￿ ;
0
￿ with different i or

0
.Note that F
i
￿ ;
0
￿ and F
i
￿ ;
00
￿ count as different
representations even in the case that the variables for which

0
and 
00
substitute different elements do not actually ap­
pear in F
i
.The multiset f r ￿  ￿ j z ￿ z ￿ z g,for instance,
contains as many copies of the indicator r ￿  ￿,as there are
elements in the domain over which it is evaluated.
For any tautological constraint like z ￿ z,in the sequel we
simply write .
Another borderline case that needs clarication is the case
where  is empty.Here our denition degenerates to:if
c ￿  ￿ holds,then the multiset f F
￿
￿  ￿;:::;F
k
￿  ￿ j;￿ c ￿  ￿ g
contains as many copies of p 2 ￿￿;￿￿ as there are represen­
tations p ￿ F
i
￿  ￿;it is empty if c ￿  ￿ does not hold.
By using indicator functions r ￿  ￿,the value of F ￿  ￿ is
being dened in terms of the validityin  of atomic formu­
las r ￿ 
0
￿.A natural generalization of probability formulas
might therefore be considered,in which not only the truth
values of atomic formulas are used,but indicator functions
for arbitrary rst­order formulas are allowed.As the fol­
lowing lemma shows,this provides no real generalization.
Lemma 2.4 Let  ￿  ￿ be a rst­order formula over the rela­
tional vocabulary S.Then there exists a probabilityformula
F

￿  ￿ over S,using max as the only combination function,
s.t.for every nite S ­structure ,and every  2 D
j  j
:
F

￿  ￿￿￿ iff  ￿  ￿ holds in ,and F

￿  ￿￿￿ else.
Proof:By induction on the structure of .If   r ￿  ￿
for some r 2 S,then F

￿  ￿￿ r ￿  ￿.For   x
￿
￿ x
￿
,
let F

￿ x
￿
;x
￿
￿ ￿ maxf ￿ j;￿ x
￿
￿ x
￿
g.Conjunction
and negation are handled by multiplication and in­
version,respectively,of probability formulas.For
 9 y ￿ ;y ￿ the corresponding probability formula is
F

￿  ￿￿ maxf F

￿ ;y ￿ j y ￿  g.￿
Denition 2.5 A relational Bayesian network for the (re­
lational) vocabulary S is given by a directed acyclic graph
containingone node for every r 2 S.The node for an n ­ary
r 2 S is labeled with a probabilityformula F
r
￿ x
￿
;:::;x
n
￿
over the symbols in the parent nodes of r,denoted by Pa(r ).
The denition for the probability of b ￿ x;y ￿ in (2) does
not seem to quite match denition 2.5,because it contains
a distinction by cases not accounted for in denition 2.5.
However,this distinction by cases can be incorporated into
a single probability formula.If,for instance,c
￿
￿  ￿ and
c
￿
￿  ￿ are two mutually exclusive and exhaustive equality
constraints,then
F ￿  ￿ ￿￿ maxf maxf F
￿
￿  ￿ j;￿ c
￿
￿  ￿ g;
maxf F
￿
￿  ￿ j;￿ c
￿
￿  ￿ gj;￿  g (7)
evaluates to F
￿
￿  ￿ for  with c
￿
￿  ￿,and to F
￿
￿  ￿ for 
with c
￿
￿  ￿.
Let N nowbearelational Bayesiannetworkover S.Let r be
(the label of) a node in N with arity n,and let  be a Pa(r )­
structure over domain D.For every  2 D
n
,F
r
￿  ￿ 2 ￿￿;￿￿
then is dened.Thus,for every interpretation I ￿ r ￿ of r in
D
n
we can dene
P ￿ I ￿ r ￿￿￿￿
Y
 2 I ￿ r ￿
F
r
￿  ￿
Y
 62 I ￿ r ￿
￿￿ F
r
￿  ￿￿;
which gives a probability distribution over interpretations
of r,given the interpretations of Pa(r ).Given a xed do­
main D,a relational Bayesian network thus denes a joint
probability distribution P over the interpretations in D of
the symbols in S,or,equivalently,a probability measure
on S ­structures over D.Hence,semantically,relational
Bayesian networks are mappings of nite domains D into
probability measures on S ­structures over D.
Example 2.6 Reconsider the relations cancer and exposed
as described in the introduction.Assume that ￿ !
￿￿;￿￿ is the probability distribution that for any xed organ
y gives the probability that y develops cancer after the n th
exposure to radiation.Let ￿￿ n ￿￿￿
P
n
i ￿￿
￿ n ￿ be the cor­
responding distributionfunction.Then ￿ can be used to de­
ne a combination function comb
￿
by letting for a multiset
A:comb
￿
A ￿￿￿￿ n ￿,where n is the number of nonzero el­
ements in A (counting multiplicities).Using comb
￿
we ob­
tainthe probabilityformula comb
￿
f exposed￿ x;y;z ￿ j z ￿  g
for the contributionof organ y to the cancer risk of x.Com­
bining for all y,then
F
cancer
￿ x ￿￿ n­of comb
￿
f exposed￿ x;y;z ￿ j z ￿  gj y ￿  g
is a probability formula dening the risk of cancer for x,
given the relation exposed.
In the preceding example we have tacitly assumed a multi­
sorted domain,so that the variables x;y;z range over dif­
ferent sets people,organs,times,respectively.We
here do not introduce an extra formalization for dealing
with many sorted domains.It is clear that this can be done
easily,but would introduce an extra load of notation.
3 INFERENCE
The inference problem we would like to solve is:given
a relational Bayesian network N for S,a nite domain
D ￿ f d
￿
;:::;d
n
g,an evidence set of ground literals
E ￿ f r
￿
￿ 
￿
￿;:::;r
k
￿ 
k
￿;:r
k ￿￿
￿ 
k ￿￿
￿;:::;:r
m
￿ 
m
￿ g
with r
i
2 S (not necessarily distinct),
i
 D (not neces­
sarily distinct) for i ￿￿;:::;m,and a ground atomr
￿
￿ 
￿
￿
(r
￿
2 S;
￿
 D ),what is the probability of r
￿
￿ 
￿
￿ given
r
￿
￿ 
￿
￿;:::;:r
m
￿ 
m
￿?More precisely:in the probabil­
ity measure P dened by N on the S ­ structures over
D,what is the conditional probability P ￿ r
￿
￿ 
￿
￿ j E ￿
of a structure satisfying r
￿
￿ 
￿
￿,given that it satises
r
￿
￿ 
￿
￿;:::;:r
m
￿ 
m
￿?
Since for any given nite domain a relational Bayesian
network can be seen as an ordinary Bayesian network for
variables with nitely many possible values,in principle,
any inference algorithmfor standardBayesian networkscan
be used.
Unfortunately,however,direct applicationof any such algo­
rithmwill be inefcient,because they include a summation
over all possible values of a node,and the number of pos­
sible values here is exponential in the size of the domain.
For this reason,it will often be more efcient to follow
the approach used in inference from rule­base encodings
of probabilistic knowledge,and to construct for every spe­
cic inference task an auxiliary Bayesian network whose
nodes are ground atoms in the symbols from S,each of
whichwiththetwopossiblevalues trueandfalse(cf.(Breese
1992),(Ngo et al.1995)).
The reason why we here can do the same is that in the
query r
￿
￿ 
￿
￿ we do not ask for the probability of any spe­
cic interpretation of r
￿
,but only for the probability of all
interpretations containing 
￿
.For the computation of this
probability,in turn,it is irrelevant to know the exact inter­
pretations of parent nodes r
0
of r.Instead,we only need to
knowwhich of those tuples 
0
belong to r
0
,whose indicator
r
0
￿ 
0
￿ is needed in the computation of F
r
￿
￿ 
￿
￿.
In order to construct such an auxiliary network,we have to
compute for some given atom r ￿  ￿ the list of atoms r
0
￿ 
0
￿
on whose truth value F
r
￿  ￿ depends.One way of doing
this is to just go through a recursive evaluation of F
r
￿  ￿,
and list all the ground atoms encountered in this evaluation.
However,rather than doing this,it is useful to compute for
every relation symbol r 2 S,and each parent relation r
0
of
r,an explicit description of the tuples ,such that F
r
￿  ￿
depends on r
0
￿  ￿.Such an explicit description can be given
in formof a rst­order formula pa
rr
￿ ￿ ; ￿ over the empty
vocabulary.
To demonstrate the general method for the computation of
these formulas,we show how to obtain pa
sb
￿ x;y
￿
;y
￿
￿ for
F
s
￿ x ￿ as dened in (5).By induction on the structure of
F
s
,we compute formulas pa
Gb
￿ ;y
￿
;y
￿
￿ that dene for
a subformula G ￿  ￿ of F
s
the set of ￿ y
￿
;y
￿
￿ s.t.G ￿  ￿
depends on b ￿ y
￿
;y
￿
￿.In the end,then,pa
sb
￿ x;y
￿
;y
￿
￿￿ 
pa
F

b
￿ x;y
￿
;y
￿
￿.
The two subformulas t ￿ x ￿ and ￿￿ t ￿ x ￿￿ of F
s
do not
depend on b at all;therefore we can let pa
t ￿ x ￿ b
￿ x;y
￿
;y
￿
￿ 
pa
￿￿ t ￿ x ￿￿ b
￿ x;y
￿
;y
￿
￿  ,where  is some unsatisable for­
mula.
To obtain pa
H ￿ x ￿ b
￿ x;y
￿
;y
￿
￿ we begin with the atomic
subformulas b ￿ x;z ￿ and b ￿ z;x ￿ of H ￿ x ￿,which yield
pa
b ￿ x;z ￿ b
￿ x;z;y
￿
;y
￿
￿  y
￿
￿ x ^ y
￿
￿ z and
pa
b ￿ z;x ￿ b
￿ x;z;y
￿
;y
￿
￿  y
￿
￿ z ^ y
￿
￿ x respectively.
The remaining atomic subformulas t ￿ z ￿,1,and p
￿
appear­
ing within the max combination function again only yield
the unsatisable .Skipping one trivial step where the for­
mulas for the two arguments of M ￿ x;z ￿￿  maxf:::g are
computed,we next obtain the formula
pa
M ￿ x;z ￿ b
￿ x;z;y
￿
;y
￿
￿ 
￿ y
￿
￿ x ^ y
￿
￿ z ￿ _ ￿ y
￿
￿ z ^ y
￿
￿ x ￿
(after deleting some meaningless  ­disjuncts).H ￿ x ￿ ￿
n­of M ￿ x;z ￿ j z ￿ z 6￿ x g depends on all b ￿ y
￿
;y
￿
￿ for which
there exist some z 6￿ x s.t.pa
M ￿ x;z ￿ b
￿ x;z;y
￿
;y
￿
￿.Hence,
pa
H ￿ x ￿ b
￿ x;y
￿
;y
￿
￿ 
9 z ￿￿ y
￿
￿ x ^ y
￿
￿ z ￿ _ ￿ y
￿
￿ z ^ y
￿
￿ x ￿￿;(8)
which is already the same as pa
F

￿ x ￿ b
￿ x;y
￿
;y
￿
￿.Finally,
we can simplify (8),and obtain
pa
sb
￿ x;y
￿
;y
￿
￿ 
￿ y
￿
￿ x ^ y
￿
6￿ x ￿ _ ￿ y
￿
6￿ x ^ y
￿
￿ x ￿:(9)
In general,the formulas pa
rr
￿
￿ ; ￿ are existential;­
formulas.It is not always possible to completely elimi­
nate the existential quantiers as in the preceding example.
However,it is always possible to transformpa
rr
￿
￿ ; ￿ into
a formula so that quantiers only appear in subformulas of
the form 9
 n
xx ￿ x,postulating the existence of at least
n elements.This means that for every formula pa
rr
￿
￿ ; ￿,
and tuples ;
0
 D,it can be checked in time independent
of the size of D whether pa
rr
￿ ￿ ;
0
￿ holds.
The formula pa
rr
￿ ￿ ; ￿ enables us to nd for every tuple 
the parents r
0
￿ 
0
￿ of r ￿  ￿ in the auxiliary network.More­
over,we can take this one step further:suppose that in the
original network N there is a path of length two fromnode
r
00
via a node r
0
to r.Then,in the auxiliary network,there
is a path of length two froma node r
00
￿ 
00
￿ via a node r
0
￿ 
0
￿
to r ￿  ￿ iff the formula
pa
r
￿￿
!r
￿
!r
￿ ; ￿￿ 9 z ￿ pa
rr
￿
￿ ; ￿ ^ pa
r
￿
r
￿￿
￿ ; ￿￿:
(10)
is satised for  ￿ ,and  ￿ 
00
.Taking the disjunction
of all formulas of the form (10) for all paths in N leading
from r
00
to r then yields a formula pa

rr
￿￿
￿ ; ￿ dening
all predecessors r
00
￿ 
00
￿ of a node r ￿  ￿ in the auxiliary
network.
Using the pa
rr
￿ and pa

rr
￿￿
,we can for given evidence and
query construct the auxiliary network needed to answer the
query:we begin with a node r
￿
￿ 
￿
￿ for the query.For all
nodes r ￿  ￿ added to the network,we add all parents r
0
￿ 
0
￿
of r ￿  ￿,as dened by pa
rr
￿.If r ￿  ￿ is not instantiatedin E,
using the formulas pa

r
￿
r
,we check whether the subgraph
rooted at r ￿  ￿ contains a node instantiated in E.If this is
the case,we add all successors of r ￿  ￿ that lie on a path
from r ￿  ￿ to an instantiated node (these are again given
by the formulas pa

r
￿
r
).Thus,we can construct directly
the minimal network needed to answer the query,without
rst backward chaining fromevery atomin E,and pruning
afterwards.
Auxiliary networks as described here still encode ner dis­
tinctions in the instantiations of the nodes of N than is
actually needed to solve our inference problem.Consider,
for example,the case where the domain in example 2.1 con­
sists of ten locations f l
￿
;:::;l
￿￿
g,there is no evidence,and
the query is s ￿ l
￿
￿.According to (9),the auxiliary network
will contain nodes b ￿ l
￿
;l
i
￿;b ￿ l
i
;l
￿
￿ for all i ￿ ￿;:::;￿￿.
In applying standard inference techniques on this network,
we distinguish e.g.the case where b ￿ l
￿
;l
￿
￿;b ￿ l
￿
;l
￿
￿ are
true and b ￿ l
￿
;l
￿
￿;b ￿ l
￿
;l
￿
￿ are false from the case where
b ￿ l
￿
;l
￿
￿;b ￿ l
￿
;l
￿
￿ are false and b ￿ l
￿
;l
￿
￿;b ￿ l
￿
;l
￿
￿ are true,
and all other b ￿ l
￿
;l
i
￿;b ￿ l
i
;l
￿
￿ have the same truth value.
However,for the given inference problem,this distinction
really is unnecessary,because the identityof locations men­
tioned neither in evidence nor query is immaterial.Future
work will therefore be directed towards nding inference
techniques for relational Bayesian networks that distinguish
instantiations of the relations in the network at a higher
level of abstraction than the current auxiliary networks,and
thereby reduce the complexity of inference in terms of the
size of the underlying domain.
4 RECURSIVE NETWORKS
In the distributionsdened by relational Bayesian network s
of denition 2.5,the events r ￿  ￿ and r ￿ 
0
￿ with  6￿ 
0
are
conditionally independent,given the interpretation of the
parent relations of r.This is a rather strong limitation of
the expressiveness of these networks.For instance,using
these networks,we can not model a variationof example 2.1
in which the predicate blocked is symmetric:b ￿ x;y ￿ being
independent from b ￿ y;x ￿,b ￿ x;y ￿,b ￿ y;x ￿ can not be
enforced.
There are other interesting things that we are not able to
model so far.Among themare randomfunctions (the main
concern of (Haddawy 1994)),and a recursive temporal de­
pendence of a relation on itself (addressed both in (Ngo
et al.1995) and (Glesner & Koller 1995)).In this sec­
tion we dene a straightforwardgeneralization of relation al
Bayesian networks that allows us to treat all these issues in
a uniformway.
Wecanidentifyarecursivedependence of a relationonitself
as the general underlying mechanismwe have to model.In
the case of symmetric relations,this is a dependence of
r ￿ x;y ￿ on r ￿ y;x ￿.In the case of a temporal development,
this is the dependence of a predicate r ￿ t; ￿,having a time­
variable as its rst argument,on r ￿ t ￿; ￿.Functions can
be seen as special relations r ￿ ;y ￿,where for every  there
exists exactly one y,s.t.r ￿ ;y ￿ is true.Thus,for every ,
r ￿ ;y ￿ depends on all r ￿ ;y
0
￿ in that exactly one of these
atoms must be true.
It is clear that there is no fundamental problem in model­
ing such recursive dependencies within a Bayesian network
framework,as long as the recursive dependency of r ￿  ￿
on r ￿ 
￿
￿;:::r ￿ 
k
￿ does not produce any cycles.Most ob­
viously,in the case of a temporal dependency,the use of
r ￿ t ￿; ￿ in a denition of the probability of r ￿ t; ￿ does
not pose a problem,as long as a non­recursive denition of
the probability of r ￿￿; ￿ is provided.
To make the recursive dependency of r ￿ x;y ￿ on r ￿ y;x ￿ in
a symmetric relation similarly well­founded,we can use
a total order  on the domain.Then we can generate a
randomsymmetric relation by rst dening the probability
of r ￿ x;y ￿ with x  y,and then the (0,1­valued) probability
of r ￿ y;x ￿ given r ￿ x;y ￿.Nowconsider the case of a random
function r ￿ ;y ￿ with possible values y 2 f v
￿
;:::;v
k
g.
Here,too,we can make the interdependence of the different
r ￿ ;y ￿ acyclic by using a total order on f v
￿
;:::;v
k
g,and
assigning a truth value to r ￿ ;v
i
￿ by taking into account
the already dened truth values of r ￿ ;v
j
￿ for all v
j
that
precede v
i
in that order.
Fromthese examples we see that what we essentially need,
in order to extend our framework to cover a great vari­
ety of interesting specic forms of probability distributi ons
over S ­structures,are well­founded orderings on tuples of
domain elements.These well­founded orderings can be
supplied via rigid relations on the domain,i.e.xed,prede ­
termined relations that are not generated probabilistically.
Indeed,one such relation we already have used throughout:
the equality relation.It is therefore natural to extend our
framework by allowing additional relations that are to be
used in the same way as the equality predicate has been em­
ployed,namely,in constraints for combination functions.
Also,xed constants will be needed as the possible values
of randomfunctions.
For the case of a binary symmetric relation r ￿ x;y ￿,assume,
as above,that we are given a total (non­strict) order  on
the domain.A probability formula that denes a proba­
bility distributionconcentrated on symmetric relations,and
making r ￿ x
￿
;x
￿
￿ true with probability p for all ￿ x
￿
;x
￿
￿,
then is
F
r
￿ x
￿
;x
￿
￿￿ maxf maxf p j;￿ x
￿
 x
￿
g;(11)
maxf r ￿ x
￿
;x
￿
￿ j;￿:x
￿
 x
￿
gj;￿  g:
As in (7),here a nested maxf:::g ­function is used in order
tomodel a distinctionbycases.The rst inner max­function
evaluates to p if x
￿
 x
￿
,and to 0 else.The second max­
function is equal to r ￿ x
￿
;x
￿
￿ if x
￿
>x
￿
,and 0 else.
For the temporal example,assume that the domain contains
n ￿￿ time points t
￿
;:::;t
n
,and a successor relation s ￿
f ￿ t
i
;t
i ￿￿
￿ j ￿  i  n ￿ g on the t
i
's.Assume that r ￿ t; ￿
is a relation with a time parameter as the rst argument,
and that r ￿ t
￿
; ￿ shall hold with probability p
￿
for all ,
whereas r ￿ t
i ￿￿
; ￿ has probability p
￿
if r ￿ t
i
; ￿ holds,and
probability p
￿
else.In order to dene the probability of
r ￿ t; ￿ by a probability formula,the case t ￿ t
￿
must be
distinguished from the case t ￿ t
i
,i  ￿.For this we
use the probability formula F
￿
￿ t ￿ ￿ maxf ￿ j t
0
￿ s ￿ t
0
;t ￿ g,
which evaluates to 0 for t ￿ t
￿
,and to 1 for t ￿ t
￿
;:::;t
n
.
We can nowuse the formula
F
r
￿ t; ￿￿￿￿ F
￿
￿ t ￿￿ p
￿
￿
F
￿
￿ t ￿ maxf r ￿ t
0
; ￿ p
￿
￿￿￿ r ￿ t
0
; ￿￿ p
￿
j t
0
￿ s ￿ t
0
;t ￿ g
to dene the probability of r ￿ t; ￿.
Finally,for a functional relation r ￿ ;y ￿,suppose that we
are given a domain,together with the interpretations of n
constant symbols v
￿
;:::;v
n
,and a strict total order <,s.t.
v
￿
<v
￿
<:::<v
n
.Nowconsider the probabilityformula
F
r
￿ ;y ￿￿￿￿ maxf r ￿ ;z ￿ j z ￿ z<y g ￿ 
maxf maxf p
￿
j;￿ y ￿ v
￿
g;:::;
maxf p
n
j;￿ y ￿ v
n
gj;￿  g
The rst factor in this formula tests whether r ￿ ;z ￿ al­
ready is true for some possible value v
i
<y.If this is the
case,then the probability of r ￿ ;y ￿ given by F
r
￿ ;y ￿ is
0.Otherwise,the probability of r ￿ ;y ￿ is p
i
iff y ￿ v
i
.
The probabilitythat by this procedure the argument  is as­
signed the value v
i
then is ￿￿ p
￿
￿￿￿ p
￿
￿:::￿￿ p
i ￿
￿ p
i
.
By a suitable choice of the p
i
any probability distribution
over the v
i
can be generated.
The given examples motivate a generalization of relational
Bayesian networks.For this,let R be a vocabulary contain­
ing relationand constant symbols,S a relational vocabulary
with R\S ￿;.An R ­constraint c ￿  ￿ for  is a quantier­
free R ­formula.Dene the class of R ­probabilityformulas
over S precisely as in denition 2.3,with equality con­
straint replaced by  R ­constraint.
Denition 4.1 Let R;S be as above.Arecursive relational
Bayesian network for S with R ­constraints is given by a
directed acyclic graph containing one node for every r 2
S.The node for an n ­ary r 2 S is labeled with an R ­
probability formula F
r
￿ x
￿
;:::;x
n
￿ over Pa(r ) [f r g.
The semantics of a recursive relational Bayesian network
is a bit more complicated than that of relational Bayesian
networks.The latter dened a mapping of domains D
into probability measures on S ­structures over D.Re­
cursive relational Bayesian networks essentially dene a
mapping of R ­structures  into probability measures on
S ­expansions of .This mapping,however,is only dened
for R ­structures whose interpretations of the symbols in R
lead to well­founded recursive denitions of the probabil­
ities for the r ­atoms (r 2 S ).If,for instance,R ￿ fg,
and  is an R ­structure in which there exist two elements
d
￿
;d
￿
,s.t.neither d
￿
 d
￿
,nor d
￿
 d
￿
,then (11) does
not dene a probability measure on f r g ­expansions of ,
because the probability of r ￿ d
￿
;d
￿
￿ gets dened in terms
of r ￿ d
￿
;d
￿
￿,and vice versa.
As in section 3,for every r
0
2 Pa(r ) [f r g a formula
pa
rr
￿
￿ ; ￿ can be computed that denes for an R ­structure
 and   D the tuples 
0
 D,s.t.F
r
￿  ￿ depends on
r
0
￿ 
0
￿.While in section 3 existential formulas over the
empty vocabulary were obtained,for recursive relational
networks the pa
rr
￿
are existential formulas over R.
The denitions of the probabilities F
r
￿  ￿ are well­founded
for   D iff the relation  ￿ pa
rr
￿ ￿￿ f ￿ ;
0
￿ j
pa
rr
￿ ;
0
￿ holds in  g is acyclic.A recursive relational
Bayesian network N thus denes a probability measure on
S ­expansions of those R ­structures ,for which the rela­
tion  ￿ pa
rr
￿ is acyclic for all r 2 S.
The discussion of inference procedures for relational
Bayesian networks in section 3 applies with few modi­
cations to recursive networks as well.Again,we can con­
struct an auxiliary network with nodes for ground atoms,
using formulas pa
rr
￿
and pa

rr
￿
.The complexity of this
construction,however,increases on two accounts:rst,th e
existential quantications in the pa
rr
￿
,pa

rr
￿
can no longer
be reduced to mere cardinality constraints.Therefore,the
complexity of deciding whether pa
￿  ￿
rr
￿
￿ ;
0
￿ holds for given
;
0
 D is no longer guaranteed to be independent of the
size of the domain D.Second,to obtain the formulas pa

rr
￿
we may have to build much larger disjunctions:it is no
longer sufcient to take the disjunction over all possible
paths from r
0
to r in the network structure of N.In ad­
dition,for every relation ￿r on these paths,the disjunction
over all possible paths within  ￿ pa
￿r ￿r
￿ has to be taken.This
amounts to determining the length l of the longest path in
 ￿ pa
￿r ￿r
￿,and then taking the disjunction over all formulas
pa
i
￿r ￿r
￿ ; ￿￿ 9 
￿
;:::;
i
￿ pa
￿r ￿r
￿ ;
￿
￿ ^:::^ pa
￿r ￿r
￿ 
i
; ￿￿
with i <l.As a consequence,the formulas pa

rr
￿
are no
longer independent of the structure  under consideration.
5 CONCLUSION
In this paper we have presented a new approach to deal
with rule­like probability statements for nondeterministic
relations on the elements of some domain of discourse.De­
viating fromprevious proposals for formalizing such rules
with a logic programming style framework,we here have
associated with every relation symbol r a single probabil­
ity formula that directly denes the probability distribut ion
over interpretations of r withina Bayesian network.The re­
sultingframework is bothmore expressive and semantically
more transparent than previous ones.It is more expressive,
because it introduces the tools to restrict the instantiations
of certain rules to tuples satisfying certain equality con­
straints,and to specify complex combinations and nestings
of combination functions.It is semantically more transpar­
ent,because a relational Bayesian network directly denes
a unique probabilitydistributionover S ­structures,whereas
the semantics of a probabilistic rule base usually are only
implicitly dened througha transformation intoan auxilia ry
Bayesian network.
Inference from relational Bayesian networks by auxiliary
network construction is as efcient as inference (by essen­
tially the same method) in rule based formalisms.It may
be hoped that in the case where this inference procedure
seems unsatisfactory,namely,for large domains most of
whose elements are not mentioned in the evidence,our new
representation paradigm will lead to more efcient infer­
ence techniques.
Acknowledgments
I have beneted fromdiscussions with Daphne Koller who
also provided the original motivation for this work.This
work was funded in part by DARPAcontract DACA76­93­
C­0025,under subcontract to Information Extraction and
Transport,Inc.
References
Breese,J.S.(1992),Construction of belief and decision
networks,Computational Intelligence.
Glesner,S.& Koller,D.(1995),Constructing exible dy­
namic belief networks from rst­order probabilistic
knowledge bases,in Proceedings of ECSQARU,
Lecture Notes in Articial Intelligence,Springer Ver­
lag.
Haddawy,P.(1994),Generating bayesian networks from
probability logic knowledge bases,in Proceedings
of the Tenth Conference on Uncertainty in Articial
Intelligence.
Halpern,J.(1990),An analysis of rst­order logics of
probability,Articial Intelligence 46,311350.
Koller,D.& Halpern,J.Y.(1996),Irrelevance and con­
ditioning in rst­order probabilistic logic,in Pro­
ceedins of the 13th National Conference on Articial
Intelligence (AAAI),pp.569576.