Relational Bayesian Networks
Manfred Jaeger
Computer Science Department,
Stanford University,Stanford CA94305
jaeger@robotics.stanford.edu
Abstract
A new method is developed to represent prob
abilistic relations on multiple random events.
Where previously knowledge bases containing
probabilistic rules were used for this purpose,
here a probabilitydistributionover the relations is
directly represented by a Bayesian network.By
using a powerful way of specifying conditional
probability distributions in these networks,the
resulting formalism is more expressive than the
previous ones.Particularly,it provides for con
straints on equalities of events,and it allows to
dene complex,nested combination functions.
1 INTRODUCTION
In a standard Bayesian network,nodes are labeled with ran
dom variables (r.v.s) X that take values in some nite set
f x
;:::;x
n
g.Anetwork with r.v.s (earth)quake,burglary,
and alarm,each with possible values f true,falseg,for in
stance,then denes a joint probabilitydistributionfor th ese
r.v.s.
Evidence,E,is a set of instantiationsof some of the r.v.s.A
query asks for the probability of a specic value x of some
r.v.X,given the instantiations in the evidence.The answer
to this query is the conditional probability P X x j E
in the distribution P dened by the network.
The implicit underlying assumption we here make is that
the value assignments in the evidence and the query in
stantiate the attributes of one single random event,or
object,that has been sampled (observed) according to
the distribution of the network.If,for instance,E
f quake = true,alarm= trueg,then both instantiations are
assumed to refer to one single observed state of the world
!,and not the facts that there was an earthquake in 1906,
and the alarmbell is ringing right now.
On leave from:MaxPlanckInstitut f¨ur Informatik,Im
Stadtwald,D66123 Saarbr¨ucken,Germany
In case we indeed have evidence about several ob
served events,e.g.quake(!
) = true,alarm(!
) = true,
burglary(!
) = false,then,for the purpose of answer
ing a query X ! x about one of these events,all
evidence about other events can be ignored,and only
P X ! x j E ! needs to be computed.For each
of these computations the same Bayesian network can be
used.
Things become much different when we also want to model
relations that may hold between two different random
events.Suppose,for instance,we also want to say some
thingabout the probabilitythat one earthquake was stronger
than another.For this we use the binary relation stronger,
and would like to relate the probability of stronger(!
;!
to,say,alarm(!
) and alarm(!
).Evidence may now
contain instantiations of stronger for many different pairs
of states:f stronger(!
;!
);:::;stronger(!
;!
n
)g,and a
query may be alarm(!
.In evaluating this query,we
no longer can ignore information about the other events
!
;:::;!
n
.This means,however,that if we do not want
to impose an a priori restriction on the number of events
we can have evidence for,no single xed Bayesian network
with niterange r.v.s will be sufcient to evaluate querie s
for arbitrary evidence sets.
Nevertheless,the probabilistic information that we would
like to encode about relations between an arbitrary number
of different events may very well be expressible by some
nite set of laws,applicable to an arbitrary number of events.
One way of expressing such laws,which has been explored
in the past ( (Breese 1992),(Poole 1993),(Haddawy 1994)),
is to use probabilistic rules such as
stronger u;v
:
quake u ^ quake v
^ alarm u ^:alarm v :(1)
The intended meaning here is:for all states of the world
!
and!
,given that quake!
^:::^:alarm!
is
true,the probability that!
is stronger than!
is 0.8.A
rulebase containing expressions of this form then can be
used to construct,for each specic evidence and query,
a Bayesian network over binary r.v.s stronger!
;!
,
stronger!
;!
,quake!
,...,in which the answer to the
query subsequently is computed using standard Bayesian
network inference.
In all the above mentioned approaches,quite strong syn
tactic and/or semantic restrictions are imposed in the for
malismthat severely limit its expressiveness.Poole (1993)
does not allowthe general expressiveness of rules like (1),
but only combines deterministic rules with the specicatio n
of certain unconditional probabilities.Haddawy(1994) al
lows only rules in which the antecedent does not contain
free variables that do not appear in the consequent.As
pointed out by Glesner and Koller (1995),this is a severe
limitation.For instance,we can then not express by a rule
like aids x
p
contact x;y that the probability of person
x having aids depends on any other person y,with whomx
had sexual contact.When we do permit an additional free
variable y in this manner,it also has to be dened howthe
probability of the consequent is affected when there exist
multiple instantiations of y that make the antecedent true
(this question also arises when several rules with the same
consequent are permitted in the rule base ).In (Glesner &
Koller 1995) and (Ngo,Haddawy &Helwig1995) therefore
a combination rule is added to the rulebase,which denes
how the conditional probabilities arising fromdifferent in
stantiations,or rules,are to be combined.If the different
causal relationships described by the rules are understood
to be independent,then the combination rule typically will
be noisyor.
The specication of a single combination rule applied to
all sets of instantiations of applicable rules,again,does
not permit us to describe certain important distinctions.If,
for instance,we have a rule that relates aids x to the re
lation contact x;y ,and another rule that relates aids x
to the relation donor x;y ,standing for the fact that x
has received a blood transfusion from donor y,then the
probability computed for aids a ,using a simple combina
tion rule,will depend only on the number of instantiations
for contact a;y and for donor a;y .Particularly,we are
not able to make special provisions for the two rules to
be instantiated by the same element b,even though the
case contact a;b ^ donor a;b clearly has to be distin
guished fromthe case contact a;b ^ donor a;c ,or even
contact a;b ^ donor a;a .
In this paper a representation formalism is developed that
incorporates constraints on the equality of instantiating el
ements,and thereby allows us to dene different probabil
ities in situations only distinguished by equalities between
instantiating elements.
Furthermore,our representation method will allow us to
specify hierarchical,or nested,combination rules.As
an illustrations of what this means,consider the unary
predicate cancer x ,representing that person x will de
velop cancer at some time,and the three placed rela
tion exposed x;y;z ,representing that organ y of per
son x was exposed to radiation at time z (by the taking
of an xray,intake of radioactively contaminated food,
etc).Suppose,now,that for person x we have evidence
E f exposed x;y
i
;z
j
j i ;:::;k j ;:::;l g,
where y
i
y
i
for some i;i
0
,and z
j
z
j
for some
j;j
0
.Assume that for any specic organ y,multiple ex
posures of y to radiation have a cumulative effect on the
risk of developing cancer of y,so that noisyor is not the
adequate rule to model the combined effect of instances
exposed x;y;z
j
on the probability of developing cancer
of y.On the other hand,developing cancer at any of the
various organs y can be viewed as independent causes for
developing cancer at all.Thus,a single rule of the form
cancer x
p
exposed x;y;z together with a at combi
nation rule is not sufcient to model the true probabilistic
relationships.Instead,we need to use one rule to rst com
bine for every xed y the instances given by different z,and
then use another rule (here noisyor) to combine the effect
of the different y's.
To permit constraints on the equality of instantiating ele
ments,and to allowfor hierarchical denitions of combina
tion functions,in this paper we depart from the method of
representing our information in a knowledge base contain
ing different types of rules.Instead,we here use Bayesian
networks with a node for every relation symbol r of some
vocabulary S,which is seen as a r.v.whose values are pos
sible interpretations of r in some specic domain D.The
state space of these relational Bayesian networks therefore
can be identied with the set of all S structures over D,and
its semantics is a probability distributionover S structures,
as wereusedbyHalpern(1990) tointerpret rstorder proba
bilistic logic.Halpern and Koller(1996) have used Markov
networks labeled with relation symbols for representing
conditional independencies in probabilitydistributionsover
S structures.This can be seen as a qualitative analog to the
quantitative relational Bayesian networks described here.
2 THE BASIC FRAMEWORK
In medical example domains it is often natural to make the
domain closure assumption,i.e.to assume that the domain
under considerationconsists just of those objects mentioned
in the knowledge base.The following example highlights
a different kind of situation,where a denite domain of
objects is given over which the free variables are to range,
yet there is no evidence about most of these objects.
Example 2.1 Robot TBayes0.1 moves in an environment
consisting of n distinct locations.TBayes0.1 can make di
rect moves fromany location x to any location y unless the
(directed) path x!y is blocked.This happens to be the
case with probability p
for all x 6 y.At each time step
TBayes0.1,as well as a certain number of other robots op
erating in this domain,make one move along an unblocked
path x!y;x 6 y.TBayes0.1 just has completed the task
it was assigned to do,and is nowin search of new instruc
tions.It can receive these instructions,either by reaching a
terminallocationfromwhere a central task assigning com
puter can be accessed,or by meeting another robot that will
assign TBayes0.1 a subtask of its own job.Unfortunately,
TBayes0.1 only has the vaguest idea of where the terminal
locations are,or where the other robots are headed.The
best model of its environment that it can come up with,
is that every location x is a terminal location with proba
bility p
,and that any unblocked path x!y is likely to
be taken by at least one robot at any given time step with
probability p
.In order to plan its next move,TBayes0.1
tries to evaluate for every location x the probability that
going to x leads to success,dened as either getting in
structions at x directly,or being able to access a terminal
location in one more move from x.Hence,the probability
of s(uccess)(x ) is 1 if t(erminal)(x ) is true,or if t z and
:b(locked)(x;z ) holds for some z.Otherwise,there still is
a chance of s x being true,determined by the number of
incoming paths z!x,each of which is likely to be taken
by another robot with probability p
.Assuming a fairly
large number of other robots,the event that z!x is taken
by some robot can be viewed as independent from z
0
!x
being taken by a robot,so that the overall probability that
another robot will reach location x is given by p
k
,
where k jf z j z 6 x;:b z;x gj,i.e.by combining the
individual probabilities via noisyor.
The foregoing example gives an informal descriptionof how
the probability of s x is evaluated,given the predicates b
and t.Also,the probabilities for b and t are given.Piecing
all this information together (and assuming independence
whenever no dependence has been mentioned explicitly),
we obtain for every nite domain D of locations a proba
bility distribution P for the f b;t;s g structures over D.
Our aim nowis to represent this class of probability distri
butions in compact formas a Bayesian network with nodes
b;t,and s.Given the description of the dependencies in
the example,it is clear that this network should have two
edges:one leading fromb to s,and one leading fromt to s.
The more interesting problemis how to specify the condi
tional probability of the possible values of each node (i.e.
the possible interpretations of the symbol at that node),
given the values of its parent nodes.For the two parentless
nodes in our example this is accomplished very easily:for
a given domain D,and for all locations x;y 2 D we have
P b x;y
p
if x 6 y
if x y
(2)
P t x p
:(3)
Here P b x;y stands for the probability that x;y be
longs to the interpretationof b.Similarly for P t x .Since
b x;y and b x
0
;y
0
for x;y 6 x
0
;y
0
,respectively t x
and t x
0
for x 6 x
0
,were assumed to be mutually indepen
dent,this denes a probabilitydistributionover the possi ble
interpretations in D of the two predicates.For example,the
probability that I D D is the interpretation of b is 0
if x;x 2 I for some x 2 D,and p
j I j
p
n n j I j
else.
Next,we have to dene the probability of interpretations
of s.Given interpretations of b and t,the events s x
and s x
0
are independent for x 6 x
0
.Also,example 2.1
contains a hight level description of howthe probability of
s x is to be computed.Our aim now is to formalize this
computation rule in such a manner,that P s x can be
computed by evaluating a single functional expression,in
the same manner as P b x;y and P t x are given by
(2) and (3).
Since P s x depends on the interpretations of b and t,
we begin with functional expressions that access these in
terpretations.This is done by using indicator functions
I b
x;y and
I t
x .
I b
x;y ,for example,evalu
ates to 1 if x;y is in the given interpretation I b of b,
and to 0 otherwise.Though the function
I b
x;y has
to be distinguished fromthe logical expression b x;y ,for
the benet of greater readability,in the sequel the simpler
notation will be used for both.Thus,b x;y stands for the
function
I b
x;y whenever it appears withina functional
expression.
In order to nd a suitable functional expression F
s
x for
P s x ,assume rst that t x is true.Since t x implies
s x ,in this case we need to obtain F
s
x .In the case
:t x ,the probability of s x is computed by considering
all locations z 6 x for which either:b x;z or:b z;x .
Any such z that satises:b x;z ^ t z again makes s x
true with probability 1.If only:b z;x holds,then the
locationz merely contributes a probability p
to P s x .
Thus,for any z,the contribution of z to P s x is given
by maxf t z b x;z ;p
b z;x g.Combining all
the relevant z via noisyor,we obtain the formula
F
s
x nof maxf t z b x;z ;p
b z;x g
j z z 6 x g (4)
for x with:t x .
Abbreviating the functional expression on the righthand
side of (4) by H x ,we can nally put the two cases t x
and:t x together,dening
F
s
x t x t x H x :(5)
We now give a general denition of a representation lan
guage for forming functional expressions in the style of (5).
We begin by describing the general class of combination
functions,instances of which are the functions no and max
used above.
Denition 2.2 Acombination function is any function that
maps every nite multiset (i.e.a set possibly containing
multiple copies of the same element) with elements from
[0,1] into [0,1].
Except no and max,examples of combination functions
are min,the arithmetic mean of the arguments,etc.Each
combination function must include a sensible denition for
its result on the empty set.For example,we here use the
conventions no; max;,min;.
In the following,we use bold type to denote tuples of vari
ables: x
;:::;x
n
for some n.The number of ele
ments in tuple is denoted by j j.An equality constraint
c for is a quantier free formula over the empty vocab
ulary,i.e.,a formula only containing atomic subformulas of
the formx
i
x
j
.
Denition 2.3 The class of probability formulas over the
relational vocabulary S is inductively dened as follows.
(i) (Constants) Each rational number q 2 ; is a proba
bility formula.
(ii) (Indicator functions) For every n ary symbol r 2 S,
and every n tuple of variables,r is a probability
formula.
(iii) (Convex combinations) When F
;F
;F
are probabil
ity formulas,then so is F
F
F
F
.
(iv) (Combination functions) When F
;:::;F
k
are prob
ability formulas,comb is any combination function,
, are tuples of variables,and c ; is an equal
ity constraint,then combf F
;:::;F
k
j c ; g is a
probability formula.
Note that special cases of (iii) are multiplication (F
)
and inversion ( F
;F
).The set of free variables
of a probability formula is dened in the canonical way.
The free variables of combf:::g are the union of the free
variables of the F
i
,minus the variables in .
A probability formula F over S in the free variables
x
;:::;x
n
denes for every S structure over a domain
D a mapping D
n
7!;.The value F for 2 D
n
is
dened inductively over the structure of F.We here give
the details only for case (iv).
Let F be of the form
combf F
; ;:::;F
k
; j c ; g (where not
necessarily all the variables in and actually appear
in all the F
i
and in c ).In order to dene F ,we must
specify the multiset represented by
f F
; ;:::;F
k
; j c ; g:(6)
Let E D
j j
be the set f
0
j c ;
0
g.For each
0
2 E and each i 2 f ;:::;k g,by induction hypothe
sis,F
i
;
0
2 ;.The multiset represented by (6) now
is dened as containing as many copies of p 2 ; as
there are representations p F
i
;
0
with different i or
0
.Note that F
i
;
0
and F
i
;
00
count as different
representations even in the case that the variables for which
0
and
00
substitute different elements do not actually ap
pear in F
i
.The multiset f r j z z z g,for instance,
contains as many copies of the indicator r ,as there are
elements in the domain over which it is evaluated.
For any tautological constraint like z z,in the sequel we
simply write .
Another borderline case that needs clarication is the case
where is empty.Here our denition degenerates to:if
c holds,then the multiset f F
;:::;F
k
j; c g
contains as many copies of p 2 ; as there are represen
tations p F
i
;it is empty if c does not hold.
By using indicator functions r ,the value of F is
being dened in terms of the validityin of atomic formu
las r
0
.A natural generalization of probability formulas
might therefore be considered,in which not only the truth
values of atomic formulas are used,but indicator functions
for arbitrary rstorder formulas are allowed.As the fol
lowing lemma shows,this provides no real generalization.
Lemma 2.4 Let be a rstorder formula over the rela
tional vocabulary S.Then there exists a probabilityformula
F
over S,using max as the only combination function,
s.t.for every nite S structure ,and every 2 D
j j
:
F
iff holds in ,and F
else.
Proof:By induction on the structure of .If r
for some r 2 S,then F
r .For x
x
,
let F
x
;x
maxf j; x
x
g.Conjunction
and negation are handled by multiplication and in
version,respectively,of probability formulas.For
9 y ;y the corresponding probability formula is
F
maxf F
;y j y g.
Denition 2.5 A relational Bayesian network for the (re
lational) vocabulary S is given by a directed acyclic graph
containingone node for every r 2 S.The node for an n ary
r 2 S is labeled with a probabilityformula F
r
x
;:::;x
n
over the symbols in the parent nodes of r,denoted by Pa(r ).
The denition for the probability of b x;y in (2) does
not seem to quite match denition 2.5,because it contains
a distinction by cases not accounted for in denition 2.5.
However,this distinction by cases can be incorporated into
a single probability formula.If,for instance,c
and
c
are two mutually exclusive and exhaustive equality
constraints,then
F maxf maxf F
j; c
g;
maxf F
j; c
gj; g (7)
evaluates to F
for with c
,and to F
for
with c
.
Let N nowbearelational Bayesiannetworkover S.Let r be
(the label of) a node in N with arity n,and let be a Pa(r )
structure over domain D.For every 2 D
n
,F
r
2 ;
then is dened.Thus,for every interpretation I r of r in
D
n
we can dene
P I r
Y
2 I r
F
r
Y
62 I r
F
r
;
which gives a probability distribution over interpretations
of r,given the interpretations of Pa(r ).Given a xed do
main D,a relational Bayesian network thus denes a joint
probability distribution P over the interpretations in D of
the symbols in S,or,equivalently,a probability measure
on S structures over D.Hence,semantically,relational
Bayesian networks are mappings of nite domains D into
probability measures on S structures over D.
Example 2.6 Reconsider the relations cancer and exposed
as described in the introduction.Assume that !
; is the probability distribution that for any xed organ
y gives the probability that y develops cancer after the n th
exposure to radiation.Let n
P
n
i
n be the cor
responding distributionfunction.Then can be used to de
ne a combination function comb
by letting for a multiset
A:comb
A n ,where n is the number of nonzero el
ements in A (counting multiplicities).Using comb
we ob
tainthe probabilityformula comb
f exposed x;y;z j z g
for the contributionof organ y to the cancer risk of x.Com
bining for all y,then
F
cancer
x nof comb
f exposed x;y;z j z gj y g
is a probability formula dening the risk of cancer for x,
given the relation exposed.
In the preceding example we have tacitly assumed a multi
sorted domain,so that the variables x;y;z range over dif
ferent sets people,organs,times,respectively.We
here do not introduce an extra formalization for dealing
with many sorted domains.It is clear that this can be done
easily,but would introduce an extra load of notation.
3 INFERENCE
The inference problem we would like to solve is:given
a relational Bayesian network N for S,a nite domain
D f d
;:::;d
n
g,an evidence set of ground literals
E f r
;:::;r
k
k
;:r
k
k
;:::;:r
m
m
g
with r
i
2 S (not necessarily distinct),
i
D (not neces
sarily distinct) for i ;:::;m,and a ground atomr
(r
2 S;
D ),what is the probability of r
given
r
;:::;:r
m
m
?More precisely:in the probabil
ity measure P dened by N on the S structures over
D,what is the conditional probability P r
j E
of a structure satisfying r
,given that it satises
r
;:::;:r
m
m
?
Since for any given nite domain a relational Bayesian
network can be seen as an ordinary Bayesian network for
variables with nitely many possible values,in principle,
any inference algorithmfor standardBayesian networkscan
be used.
Unfortunately,however,direct applicationof any such algo
rithmwill be inefcient,because they include a summation
over all possible values of a node,and the number of pos
sible values here is exponential in the size of the domain.
For this reason,it will often be more efcient to follow
the approach used in inference from rulebase encodings
of probabilistic knowledge,and to construct for every spe
cic inference task an auxiliary Bayesian network whose
nodes are ground atoms in the symbols from S,each of
whichwiththetwopossiblevalues trueandfalse(cf.(Breese
1992),(Ngo et al.1995)).
The reason why we here can do the same is that in the
query r
we do not ask for the probability of any spe
cic interpretation of r
,but only for the probability of all
interpretations containing
.For the computation of this
probability,in turn,it is irrelevant to know the exact inter
pretations of parent nodes r
0
of r.Instead,we only need to
knowwhich of those tuples
0
belong to r
0
,whose indicator
r
0
0
is needed in the computation of F
r
.
In order to construct such an auxiliary network,we have to
compute for some given atom r the list of atoms r
0
0
on whose truth value F
r
depends.One way of doing
this is to just go through a recursive evaluation of F
r
,
and list all the ground atoms encountered in this evaluation.
However,rather than doing this,it is useful to compute for
every relation symbol r 2 S,and each parent relation r
0
of
r,an explicit description of the tuples ,such that F
r
depends on r
0
.Such an explicit description can be given
in formof a rstorder formula pa
rr
; over the empty
vocabulary.
To demonstrate the general method for the computation of
these formulas,we show how to obtain pa
sb
x;y
;y
for
F
s
x as dened in (5).By induction on the structure of
F
s
,we compute formulas pa
Gb
;y
;y
that dene for
a subformula G of F
s
the set of y
;y
s.t.G
depends on b y
;y
.In the end,then,pa
sb
x;y
;y
pa
F
b
x;y
;y
.
The two subformulas t x and t x of F
s
do not
depend on b at all;therefore we can let pa
t x b
x;y
;y
pa
t x b
x;y
;y
,where is some unsatisable for
mula.
To obtain pa
H x b
x;y
;y
we begin with the atomic
subformulas b x;z and b z;x of H x ,which yield
pa
b x;z b
x;z;y
;y
y
x ^ y
z and
pa
b z;x b
x;z;y
;y
y
z ^ y
x respectively.
The remaining atomic subformulas t z ,1,and p
appear
ing within the max combination function again only yield
the unsatisable .Skipping one trivial step where the for
mulas for the two arguments of M x;z maxf:::g are
computed,we next obtain the formula
pa
M x;z b
x;z;y
;y
y
x ^ y
z _ y
z ^ y
x
(after deleting some meaningless disjuncts).H x
nof M x;z j z z 6 x g depends on all b y
;y
for which
there exist some z 6 x s.t.pa
M x;z b
x;z;y
;y
.Hence,
pa
H x b
x;y
;y
9 z y
x ^ y
z _ y
z ^ y
x ;(8)
which is already the same as pa
F
x b
x;y
;y
.Finally,
we can simplify (8),and obtain
pa
sb
x;y
;y
y
x ^ y
6 x _ y
6 x ^ y
x :(9)
In general,the formulas pa
rr
; are existential;
formulas.It is not always possible to completely elimi
nate the existential quantiers as in the preceding example.
However,it is always possible to transformpa
rr
; into
a formula so that quantiers only appear in subformulas of
the form 9
n
xx x,postulating the existence of at least
n elements.This means that for every formula pa
rr
; ,
and tuples ;
0
D,it can be checked in time independent
of the size of D whether pa
rr
;
0
holds.
The formula pa
rr
; enables us to nd for every tuple
the parents r
0
0
of r in the auxiliary network.More
over,we can take this one step further:suppose that in the
original network N there is a path of length two fromnode
r
00
via a node r
0
to r.Then,in the auxiliary network,there
is a path of length two froma node r
00
00
via a node r
0
0
to r iff the formula
pa
r
!r
!r
; 9 z pa
rr
; ^ pa
r
r
; :
(10)
is satised for ,and
00
.Taking the disjunction
of all formulas of the form (10) for all paths in N leading
from r
00
to r then yields a formula pa
rr
; dening
all predecessors r
00
00
of a node r in the auxiliary
network.
Using the pa
rr
and pa
rr
,we can for given evidence and
query construct the auxiliary network needed to answer the
query:we begin with a node r
for the query.For all
nodes r added to the network,we add all parents r
0
0
of r ,as dened by pa
rr
.If r is not instantiatedin E,
using the formulas pa
r
r
,we check whether the subgraph
rooted at r contains a node instantiated in E.If this is
the case,we add all successors of r that lie on a path
from r to an instantiated node (these are again given
by the formulas pa
r
r
).Thus,we can construct directly
the minimal network needed to answer the query,without
rst backward chaining fromevery atomin E,and pruning
afterwards.
Auxiliary networks as described here still encode ner dis
tinctions in the instantiations of the nodes of N than is
actually needed to solve our inference problem.Consider,
for example,the case where the domain in example 2.1 con
sists of ten locations f l
;:::;l
g,there is no evidence,and
the query is s l
.According to (9),the auxiliary network
will contain nodes b l
;l
i
;b l
i
;l
for all i ;:::;.
In applying standard inference techniques on this network,
we distinguish e.g.the case where b l
;l
;b l
;l
are
true and b l
;l
;b l
;l
are false from the case where
b l
;l
;b l
;l
are false and b l
;l
;b l
;l
are true,
and all other b l
;l
i
;b l
i
;l
have the same truth value.
However,for the given inference problem,this distinction
really is unnecessary,because the identityof locations men
tioned neither in evidence nor query is immaterial.Future
work will therefore be directed towards nding inference
techniques for relational Bayesian networks that distinguish
instantiations of the relations in the network at a higher
level of abstraction than the current auxiliary networks,and
thereby reduce the complexity of inference in terms of the
size of the underlying domain.
4 RECURSIVE NETWORKS
In the distributionsdened by relational Bayesian network s
of denition 2.5,the events r and r
0
with 6
0
are
conditionally independent,given the interpretation of the
parent relations of r.This is a rather strong limitation of
the expressiveness of these networks.For instance,using
these networks,we can not model a variationof example 2.1
in which the predicate blocked is symmetric:b x;y being
independent from b y;x ,b x;y ,b y;x can not be
enforced.
There are other interesting things that we are not able to
model so far.Among themare randomfunctions (the main
concern of (Haddawy 1994)),and a recursive temporal de
pendence of a relation on itself (addressed both in (Ngo
et al.1995) and (Glesner & Koller 1995)).In this sec
tion we dene a straightforwardgeneralization of relation al
Bayesian networks that allows us to treat all these issues in
a uniformway.
Wecanidentifyarecursivedependence of a relationonitself
as the general underlying mechanismwe have to model.In
the case of symmetric relations,this is a dependence of
r x;y on r y;x .In the case of a temporal development,
this is the dependence of a predicate r t; ,having a time
variable as its rst argument,on r t ; .Functions can
be seen as special relations r ;y ,where for every there
exists exactly one y,s.t.r ;y is true.Thus,for every ,
r ;y depends on all r ;y
0
in that exactly one of these
atoms must be true.
It is clear that there is no fundamental problem in model
ing such recursive dependencies within a Bayesian network
framework,as long as the recursive dependency of r
on r
;:::r
k
does not produce any cycles.Most ob
viously,in the case of a temporal dependency,the use of
r t ; in a denition of the probability of r t; does
not pose a problem,as long as a nonrecursive denition of
the probability of r ; is provided.
To make the recursive dependency of r x;y on r y;x in
a symmetric relation similarly wellfounded,we can use
a total order on the domain.Then we can generate a
randomsymmetric relation by rst dening the probability
of r x;y with x y,and then the (0,1valued) probability
of r y;x given r x;y .Nowconsider the case of a random
function r ;y with possible values y 2 f v
;:::;v
k
g.
Here,too,we can make the interdependence of the different
r ;y acyclic by using a total order on f v
;:::;v
k
g,and
assigning a truth value to r ;v
i
by taking into account
the already dened truth values of r ;v
j
for all v
j
that
precede v
i
in that order.
Fromthese examples we see that what we essentially need,
in order to extend our framework to cover a great vari
ety of interesting specic forms of probability distributi ons
over S structures,are wellfounded orderings on tuples of
domain elements.These wellfounded orderings can be
supplied via rigid relations on the domain,i.e.xed,prede
termined relations that are not generated probabilistically.
Indeed,one such relation we already have used throughout:
the equality relation.It is therefore natural to extend our
framework by allowing additional relations that are to be
used in the same way as the equality predicate has been em
ployed,namely,in constraints for combination functions.
Also,xed constants will be needed as the possible values
of randomfunctions.
For the case of a binary symmetric relation r x;y ,assume,
as above,that we are given a total (nonstrict) order on
the domain.A probability formula that denes a proba
bility distributionconcentrated on symmetric relations,and
making r x
;x
true with probability p for all x
;x
,
then is
F
r
x
;x
maxf maxf p j; x
x
g;(11)
maxf r x
;x
j;:x
x
gj; g:
As in (7),here a nested maxf:::g function is used in order
tomodel a distinctionbycases.The rst inner maxfunction
evaluates to p if x
x
,and to 0 else.The second max
function is equal to r x
;x
if x
>x
,and 0 else.
For the temporal example,assume that the domain contains
n time points t
;:::;t
n
,and a successor relation s
f t
i
;t
i
j i n g on the t
i
's.Assume that r t;
is a relation with a time parameter as the rst argument,
and that r t
; shall hold with probability p
for all ,
whereas r t
i
; has probability p
if r t
i
; holds,and
probability p
else.In order to dene the probability of
r t; by a probability formula,the case t t
must be
distinguished from the case t t
i
,i .For this we
use the probability formula F
t maxf j t
0
s t
0
;t g,
which evaluates to 0 for t t
,and to 1 for t t
;:::;t
n
.
We can nowuse the formula
F
r
t; F
t p
F
t maxf r t
0
; p
r t
0
; p
j t
0
s t
0
;t g
to dene the probability of r t; .
Finally,for a functional relation r ;y ,suppose that we
are given a domain,together with the interpretations of n
constant symbols v
;:::;v
n
,and a strict total order <,s.t.
v
<v
<:::<v
n
.Nowconsider the probabilityformula
F
r
;y maxf r ;z j z z<y g
maxf maxf p
j; y v
g;:::;
maxf p
n
j; y v
n
gj; g
The rst factor in this formula tests whether r ;z al
ready is true for some possible value v
i
<y.If this is the
case,then the probability of r ;y given by F
r
;y is
0.Otherwise,the probability of r ;y is p
i
iff y v
i
.
The probabilitythat by this procedure the argument is as
signed the value v
i
then is p
p
::: p
i
p
i
.
By a suitable choice of the p
i
any probability distribution
over the v
i
can be generated.
The given examples motivate a generalization of relational
Bayesian networks.For this,let R be a vocabulary contain
ing relationand constant symbols,S a relational vocabulary
with R\S ;.An R constraint c for is a quantier
free R formula.Dene the class of R probabilityformulas
over S precisely as in denition 2.3,with equality con
straint replaced by R constraint.
Denition 4.1 Let R;S be as above.Arecursive relational
Bayesian network for S with R constraints is given by a
directed acyclic graph containing one node for every r 2
S.The node for an n ary r 2 S is labeled with an R
probability formula F
r
x
;:::;x
n
over Pa(r ) [f r g.
The semantics of a recursive relational Bayesian network
is a bit more complicated than that of relational Bayesian
networks.The latter dened a mapping of domains D
into probability measures on S structures over D.Re
cursive relational Bayesian networks essentially dene a
mapping of R structures into probability measures on
S expansions of .This mapping,however,is only dened
for R structures whose interpretations of the symbols in R
lead to wellfounded recursive denitions of the probabil
ities for the r atoms (r 2 S ).If,for instance,R fg,
and is an R structure in which there exist two elements
d
;d
,s.t.neither d
d
,nor d
d
,then (11) does
not dene a probability measure on f r g expansions of ,
because the probability of r d
;d
gets dened in terms
of r d
;d
,and vice versa.
As in section 3,for every r
0
2 Pa(r ) [f r g a formula
pa
rr
; can be computed that denes for an R structure
and D the tuples
0
D,s.t.F
r
depends on
r
0
0
.While in section 3 existential formulas over the
empty vocabulary were obtained,for recursive relational
networks the pa
rr
are existential formulas over R.
The denitions of the probabilities F
r
are wellfounded
for D iff the relation pa
rr
f ;
0
j
pa
rr
;
0
holds in g is acyclic.A recursive relational
Bayesian network N thus denes a probability measure on
S expansions of those R structures ,for which the rela
tion pa
rr
is acyclic for all r 2 S.
The discussion of inference procedures for relational
Bayesian networks in section 3 applies with few modi
cations to recursive networks as well.Again,we can con
struct an auxiliary network with nodes for ground atoms,
using formulas pa
rr
and pa
rr
.The complexity of this
construction,however,increases on two accounts:rst,th e
existential quantications in the pa
rr
,pa
rr
can no longer
be reduced to mere cardinality constraints.Therefore,the
complexity of deciding whether pa
rr
;
0
holds for given
;
0
D is no longer guaranteed to be independent of the
size of the domain D.Second,to obtain the formulas pa
rr
we may have to build much larger disjunctions:it is no
longer sufcient to take the disjunction over all possible
paths from r
0
to r in the network structure of N.In ad
dition,for every relation r on these paths,the disjunction
over all possible paths within pa
r r
has to be taken.This
amounts to determining the length l of the longest path in
pa
r r
,and then taking the disjunction over all formulas
pa
i
r r
; 9
;:::;
i
pa
r r
;
^:::^ pa
r r
i
;
with i <l.As a consequence,the formulas pa
rr
are no
longer independent of the structure under consideration.
5 CONCLUSION
In this paper we have presented a new approach to deal
with rulelike probability statements for nondeterministic
relations on the elements of some domain of discourse.De
viating fromprevious proposals for formalizing such rules
with a logic programming style framework,we here have
associated with every relation symbol r a single probabil
ity formula that directly denes the probability distribut ion
over interpretations of r withina Bayesian network.The re
sultingframework is bothmore expressive and semantically
more transparent than previous ones.It is more expressive,
because it introduces the tools to restrict the instantiations
of certain rules to tuples satisfying certain equality con
straints,and to specify complex combinations and nestings
of combination functions.It is semantically more transpar
ent,because a relational Bayesian network directly denes
a unique probabilitydistributionover S structures,whereas
the semantics of a probabilistic rule base usually are only
implicitly dened througha transformation intoan auxilia ry
Bayesian network.
Inference from relational Bayesian networks by auxiliary
network construction is as efcient as inference (by essen
tially the same method) in rule based formalisms.It may
be hoped that in the case where this inference procedure
seems unsatisfactory,namely,for large domains most of
whose elements are not mentioned in the evidence,our new
representation paradigm will lead to more efcient infer
ence techniques.
Acknowledgments
I have beneted fromdiscussions with Daphne Koller who
also provided the original motivation for this work.This
work was funded in part by DARPAcontract DACA7693
C0025,under subcontract to Information Extraction and
Transport,Inc.
References
Breese,J.S.(1992),Construction of belief and decision
networks,Computational Intelligence.
Glesner,S.& Koller,D.(1995),Constructing exible dy
namic belief networks from rstorder probabilistic
knowledge bases,in Proceedings of ECSQARU,
Lecture Notes in Articial Intelligence,Springer Ver
lag.
Haddawy,P.(1994),Generating bayesian networks from
probability logic knowledge bases,in Proceedings
of the Tenth Conference on Uncertainty in Articial
Intelligence.
Halpern,J.(1990),An analysis of rstorder logics of
probability,Articial Intelligence 46,311350.
Koller,D.& Halpern,J.Y.(1996),Irrelevance and con
ditioning in rstorder probabilistic logic,in Pro
ceedins of the 13th National Conference on Articial
Intelligence (AAAI),pp.569576.
Ngo,L.,Haddawy,P.& Helwig,J.(1995),A theoretical
framework for contextsensitive temporal probability
model construction with application to plan projec
tion,in Proceedings of the Eleventh Conference on
Uncertainty in Articial Intelligence,pp.419426.
Poole,D.(1993),Probabilistic horn abduction and
bayesian networks,Articial Intelligence 64,81
129.
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Comments 0
Log in to post a comment