This document describes the file formats used by JavaBayes

kettlecatelbowcornerΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 3 χρόνια και 10 μήνες)

183 εμφανίσεις

This document describes the file formats
used
by JavaBayes




1. Introduction

................................
................................
................................
................................
..

2

2. BIF version 0.20

................................
................................
................................
...........................

3

3. BIF version 0.15

................................
................................
................................
...........................

8

4. BIF version 0.10

................................
................................
................................
...........................

8

5. XMLBIF version 0.50

................................
................................
................................
..................

8

6. XMLBIF version 0.40

................................
................................
................................
................

15

7. XMLBIF version 0.30

................................
................................
................................
................

15

8. XMLBIF
-
EVIDENCE 0.50

................................
................................
................................
.......

16

Author: André Hideaki Saheki

andrehide@gmail.com

1. Introduction

L
oading and saving data in
JavaBayes



Data can be locally loaded/saved when you use JavaBayes as an application. Note that applets
cannot load/save data (they are forbidden by the browsers)!

Applications and applets
can read Bayesian networks through the Internet; this opens the possibility
that
JavaBayes

be used to help process and organize the huge amounts of data and knowledge in the
Internet.

This section contains a detailed description of the formats that can be

manipulated by
JavaBayes
. If
you have no interest on this kind of information (if you are not reading/writing files for
JavaBayes
),
you can skip this section entirely.


All the formats


There are
six

different formats

to store networks.
JavaBayes

is abl
e to write to two of these formats
and read from five of them.

The Bayesian Interchange Format version 0.10 (BIF 0.10) is a simple format, that has been
succesfully used to represent a variety of networks. But BIF 0.10 had certain problems, and has
been re
placed by BIF version 0.15.

Support to BIF 0.10 has been dropped from the current version
of JavaBayes.

BIF 0.20 is an improvement over BIF0.15, and
JavaBayes

does not save files in BIF 0.1
0

and 0.15
anymore. You can choose between XMLBIF 0.
5
0 and BIF 0.20

in the Options menu, with options
to use save probabilities as a single table or individual entries.

XMLBIF 0.30 is an experimental format, based on the XML 1.0 specification. It has been
superseded by the XMLBIF 0.40
and 0.50
format
s
.

The best way to und
erstand it is to read about BIF 0.20, then read something about XML, then read
the description of XMLBIF 0.
5
0.

For files, any extension is possible, but the extension
bif

is recommended for BIF 0.20, and the
extension
xml

is used for XMLBIF 0.
5
0.

In summa
ry,
JavaBayes

reads BIF 0.15, BIF 0.20, XMLBIF 0.30, XMLBIF 0.40 and XMLBIF
0.50
, and writes to BIF 0.20 and XMLBIF 0.50.

The preferred and most flexible format to use is XMLBIF 0.50.


Note that no format supports Noisy functions (since
JavaBayes

does not
support those functions
yet). The BIF formats also use the general concept of a
property
; implementations of the BIF format
can use specific properties.
JavaBayes

handles some properties, such as
observed
,
explanation

and
credal
-
set
, which are explained la
ter on.


Representing probability values


It is important to understand how the
JavaBayes

formats handle the specification of probability
values. All distributions are specified as arrays of real numbers, and the meaning of the numbers
depends on the def
inition of the distribution. Note that the same representation is used in internal
arrays to store and

manipulate probability values.



The dog problem example is used again to show how probabilities are stored.

The distribution
p(f)

in the example above
can be specified as follows:

0.15, 0.85


Let's consider a more complicated example. The function
p
(
d
|f,
b
) is given by

0.99, 0.90, 0.97, 0.30, 0.01, 0.10, 0.03, 0.70


The logic is simple: proceed as if you were filling a table, where the indices of the
table vary from
the right to left (in the example above, it is like binary counting because all variables have only two
values).

A more complicated example would be a function
p
(
A
|B,
C
) where
A

has 3 values,
B

has 2 values
and
C

has 4 values. The function
is represented as:

p
(
A
1
|B
1

C
1
)
p
(
A
1
|B
1

C
2
)
p
(
A
1
|B
1

C
3
)
p
(
A
1
|B
1

C
4
)

p
(
A
1
|B
2

C
1
)
p
(
A
1
|B
2

C
2
)
p
(
A
1
|B
2

C
3
)
p
(
A
1
|B
2

C
4
)

p
(
A
2
|B
1

C
1
)
p
(
A
2
|B
1

C
2
)
p
(
A
2
|B
1

C
3
)
p
(
A
2
|B
1

C
4
)

p
(
A
2
|B
2

C
1
)
p
(
A
2
|B
2

C
2
)
p
(
A
2
|B
2

C
3
)
p
(
A
2
|B
2

C
4
)

p
(
A
3
|B
1

C
1
)
p
(
A
3
|B
1

C
2
)
p
(
A
3
|B
1

C
3
)
p
(
A
3
|B
1

C
4
)

p
(
A
3
|B
2

C
1
)
p
(
A
3
|B
2

C
2
)
p
(
A
3
|B
2

C
3
)
p
(
A
3
|B
2

C
4
).


IMPORTANT:

Notice that there is some redundancy in the values, because all probability
functions must add up to one. Right now the BayesianNetworks package does not attempt to fill
blanks or ensur
e consistency; the user has to provide the data in the correct format (it has to have
the correct number of values, has to add to one, etc).

IMPORTANT:

The BIF 0.20 and XMLBIF 0.
5
0 save probabilities in a different order. Using the
same concept of filling

a table, the BIF format reads one columns after another, while the XMLBIF
reads one row at a time. For example, the function
p
(
d
|f,
b
) is written as

0.99, 0.90, 0.97, 0.30, 0.01, 0.10, 0.03, 0.70

in the BIF 0.20 format and

0.99, 0.01, 0.90, 0.10, 0.97, 0.
03, 030, 0.70

in the XMLBIF 0.
5
0 format.


2.
BIF version 0.20

The BIF formats follow a syntax using blocks, similar to the way C or Java code is written.

White spaces, tabs and newlines are ignored; the C/C++ style of comments is adopted. The ``,''
charac
ter is also ignored when it occurs between tokens.

The basic unit of information is a
block
: a piece of text which starts with a keyword and ends with
the end of an attribute list (to be explained later). Arbitrary characters are allowed between blocks.
T
his allows the user to insert arbitrarily long comments outside the blocks. It also allows user
-
specific blocks and commands to be placed outside the standard blocks.

Other than blocks, the BIF 0.20 refers to three entities: words, non
-
negative i
ntegers a
nd non
-
negative reals.

A
word

is a contiguous sequence of
ASCII
characters,
enclosed by

double quotes.

A
non
-
negative number

is a sequence of numeric characters, containing a decimal point or an
exponent or both.

Blocks

A block is a unit of information.
The general format of a block is:


block
-
type block
-
name {


attribute
-
name attribute
-
value;


attribute
-
name attribute
-
value;


attribute
-
name attribute
-
value;


}

with as many attributes as necessary. The closing semicolon is ma
ndatory after each attribute.

There are three possible blocks:
network
,
variable

and
probability

blocks.

A network block defines the name of the network and lists the properties. Example:


network "Robot
-
Planning" {


property “version 1.1”;



property “author Nobody”;


}

Variable blocks define the variables in a network. Example:


variable Leg {


type discrete[2] { long, short };


property “temporary yes”;


}

Probability blocks specify the (conditional) probability table
s (CPTs) for these variables, and hence
the topology of the network. The block indicates the variables of the probability distribution right
after the keyword probability. Example:


probability ( "Leg" | "Arm" ) {


table 0.1 0.9 0.9 0.1;


}

Th
e blocks must be placed in the following order:

A network declaration block (one, must be first).

A series of variable declaration blocks and probability definition blocks, possibly inter
-
mixed.

Attributes

Several attributes are defined at this point:
property
,
type
,
table
,
default

and entry attributes (the
entry attribute is not associated with any keyword).

The attribute property can appear in all types of blocks. A property is just a string of arbitrary text to
be associated with a block. Examples o
f properties:


property "size 12";


property "name Trial number ten";

Any text is valid in the string following keyword
property
. The idea is to store information that is
specific to a particular system or network in the properties. Any number of
property attributes can
appear in a block. The text of the property must be enclosed by double quotes.



There are attributes that are specific to probability blocks (these attributes are discussed in the next
section):

table lists a sequence of non
-
negat
ive real numbers.

default lists a sequence of non
-
negative real numbers.

the entry attribute, which is not associated with any keyword.

The
JavaBayes

properties

JavaBayes

uses a number of properties to load and save information about Bayesian networks:


In the BIF 0.20 format, these properties are used only for informative purposes.

The syntax for properties is

property “<text>”;

Network, variable and definition blocks may have any number of properties.


Variable Blocks

A variable block is identifies by

the keywork “variable” followed by the name of the variable.

The type attribute is specific to variable blocks. The property type lists the values of a discrete
variable:


type discrete[ number
-
of
-
values ] { list
-
of
-
values };

The number
-
of
-
values toke
n is a non
-
negative integer which indicates how many different values
this variable may assume (the size of the list
-
of
-
values). The list
-
of
-
values is a sequence of words,
each one the name of a variable value.

Position is also a
n

attribute available only
for variable blocks. It denotes the position of the node on
the screen, with coordinates in pixels starting from the top
-
left corner of the screen.

A
nother attribute for variables is mode. It may assume
one of
four different values in BIF 0.20:

[
nature|dec
ision|utility|explanation
]. Below is an example showing the position and mode attributes:

variable "node1" { //2 categories


type discrete[2] { "true" "false" };


position = (69, 99);


mode nature;

}


The only meaningful values for JavaBayes are nat
ure or explanation. Nature is a normal bayesian
network node, while explanation indicates that the variable is explanatory. The meaning of a
explanatory variable is that you would like to know which value for the variable would produce the
highest probabil
ity or expectation. It is not necessarily true that you can operate on the variable and
change it at will; it is just that you want to know which value would be best in the face of evidence.

If you request
JavaBayes

to produce the ``best'' configuration fo
r the explanation variables,
JavaBayes

will only process the variables that are marked through an explanation mode.


Probability Blocks

Probability blocks are used to define the actual network topology and conditional probability tables.

An example of a s
tandard probability block is:

probability("GasGauge" | "Gas", "BatteryPower") {


("yes" "high") 0.999 0.001;


("yes" "low") 0.850 0.150;


("yes" "medium") 0.000 1.000;


("no" "high") 0.000 1.000;


("no" "l
ow") 0.000 1.000;


("no" "medium") 0.000 1.000;

}

As explained before, the symbol `,'' is ignored between tokens so it does not affect the list of
variables given after the keyword probability. The variables however must be enclosed by
parenthesi
s.
The following syntax would also be accepted (for each line):

("yes"
,

"high") 0.999 0.001;


The example above uses the entry attribute, which is different from the other attributes in that it has
no keyword. It simply starts with an opening parenthesis,
and has a list of values for all the
conditioning variables. After the closing parenthesis, a list of probability values for the first variable
is given (the user must provide numbers that add to 1, but this is not mandatory).

The probability vectors can
be listed in any order, since the names in parentheses uniquely identify
the parent instantiation.

In addition to the entry attribute, the BIF 0.
20

supports the concept of a default entry. So the above
CPT could have been specified equivalently as:

proba
bility("GasGauge" | "Gas", "BatteryPower") {


default 0.000 1.000;


("yes" "low") 0.850 0.150;


("no" "medium") 0.000 1.000;

}

Note that each number is a separate token, so we can use ``,'' between numbers.

Another way to

define a probability distribution is through the table attribute. The body of such
attribute is a sequence of non
-
negative real numbers, in the counting order of the declared variables
(if all variables were binary, we would say binary counting with least

significant digit in the right).
So, for the example above, we could simply say:

probability("GasGauge" | "Gas", "BatteryPower") {


table 0.999 0.850 0.0 0.0 0.0 0.0 0.001 0.15 1.0 1.0 1.0 1.0;

}

There are some subtle rules that regulate these
declarations.

If multiple default declarations exist, only the last one is valid.

If multiple table declarations exist, only the last one is valid.

A table can contain more elements than the necessary to specify a distribution; the excess elements
are d
iscarded.

A table can contain less elements than the necessary to specify a distribution, which is then padded
with zeros.

Specified entries override conflicting default and table declarations.


Character formatting

All network, variable and category na
mes must be enclosed in double quotes. If the name contains a
double quote or a backslash they must be escaped by a backslash.

All characters are accepted, expect new lines, tabs and backspaces.

For JavaBayes 0.4, the encoding of the characters must be ASC
II.


Implementation

The implementation of BIF 0.20 is based on a set of rules written and compiled with JavaCC.


Examples

Here are some of the available examples:

dog
-
problem.bif
, a very simple network based on the discussion at Charniak, E., Bayesian Networks
without Tears, AI Magazine, 1991.

elimbel2.bif
, a
simple network based on the second example in the
Elimbel system
.

car
-
starts.bif
, a somewhat larg
e network contributed by
Sreekanth Nagarajan
, based on the
automobile belief network that David Heckerman and Jack Breese presented in the March, 1995
issue of Communications of the ACM.

alarm.bif
, the famous
Alarm

network.

Here is the dog
-
problem.bif network:

// Bayesian network in BIF format

// File generated by JavaBayes (http://www.cs.cmu.edu/~javabayes
)

// Fri Nov 28 14:00:56 GMT
-
03:00 2003


network "Dog_Problem" { //5 nodes

}


variable "dog_out" { //2 categories


type discrete[2] { "true" "false" };


position = (155, 165);


mode nature;

}


probability ( "dog_out" | "bowel_problem" "family
_out" ) { //3 variable(s) and 8 values


table




0.99 0.97 0.9 0.3 0.01 0.03 0.1 0.7;

}


variable "bowel_problem" { //2 categories


type discrete[2] { "true" "false" };


position = (190, 69);


mode nature;

}


probability ( "bowel_problem" ) { //
1 variable(s) and 2 values


table




0.01 0.99;

}


variable "family_out" { //2 categories


type discrete[2] { "true" "false" };


position = (112, 69);


mode nature;

}


probability ( "family_out" ) { //1 variable(s) and 2 values


table




0.15
0.85;

}


variable "hear_bark" { //2 categories


type discrete[2] { "true" "false" };


position = (154, 241);


mode nature;

}


probability ( "hear_bark" | "dog_out" ) { //2 variable(s) and 4 values


table




0.7 0.01 0.3 0.99;

}


variable "li
ght_on" { //2 categories


type discrete[2] { "true" "false" };


position = (73, 165);


mode nature;

}


probability ( "light_on" | "family_out" ) { //2 variable(s) and 4 values


table




0.6 0.05 0.4 0.95;

}

3. BIF version 0.15

The BIF version 0
.15 is very similar to 0.20.

The only differences is the lack of a “mode” attribute
,
the “position” attribut
e is written in a different way and a variable is defined as “explanatory” with
an optional property “explanation”.

The position of the node is writ
ten inside the variable block as:

property "position = (
x coordinate
,
y coordinate
)" ;

Below is sample variable block

with the “explanation” property
:

variable "hear
-
bark" { //2 values


type discrete[2] { "true" "false" };


property "explanation";

prope
rty "position = (296, 268)" ;

}

4.
BIF version 0.10

The BIF version 0.10 is different from 0.15 because it does not use double quotes surrounding
words. This limits the possible characters in a network, variable or category name
s

to numbers,
letters, under
score (_) and dash (
-
).

A
n

additional restriction is that the first character must be a
letter.

5.
XMLBIF version
0.50

The XMLBIF format provides a different perspective for the storage and manipulation of Bayesian
networks. Instead of focusing on a readab
le and simplified description of Bayesian networks, the
XMLBIF format emphasizes ease of distribution through wide area networks. The XMLBIF format
is defined through XML, a dialect of SGML that is used to specify formats. The advantage of XML
is that it h
as industry
-
wide support, and many software developers plan to introduce parsers, search
-
engines, and browsers for XML. The power of XML is that it is a standard language for editing
formats, and XMLBIF attempts to use XML to reduce to a minimum the burden

of distributing
graphical models to a large audience.


The XMLBIF format is actually quite similar to BIF 0.15, but it is stated in a manner that is XML
-
compliant. Note the similarity of XMLBIF to HTML; this happens because both HTML and XML
are dialects

of SGML.

White spaces, tabs and newlines are ignored

outside tags
. The XML style of comments and
declarations is used to detect text that should be ignored: any character between <! and > is ignored.
Note that XML comments should be enclosed by <!
-

and
-
>.

The XMLBIF format is defined by a set of XML
-
compliant tags. Other than XML tags, the
XMLBIF 0.
50

refers to three entities: words, non
-
negative integers and non
-
negative reals.

A
word

is a contiguous sequence of characters,
whose encoding is defined b
y the encoding attribute
of the XML file
.

A
non
-
negative number

is a sequence of numeric characters, containing a decimal point or an
exponent or both.

Note that every XML file starts with the expression <?xml version="1.0"?>, indicating the XML
version.
Other attributes and directives can be contained within this tag; for example, the tag
<?xml
version="1.0" encoding="ISO
-
8859
-
1"?>

specifies the file encoding. This initial tag is followed by
any XML definitions and statements that define the DTD for the d
ocument (the DTD is always
optional in XML).


Specification

The DTD for XMLBIF0.50 is:

<!DOCTYPE BIF [

<!ELEMENT BIF ( NETWORK )*>



<!ATTLIST BIF VERSION CDATA #REQUIRED>

<!ELEMENT NETWORK ( NAME, ( PROPERTY | VARIABLE | DEFINITION )* )>


<!ATTLIST NE
TWORK TYPE (discrete|continuous|hybrid) "discrete">

<!ELEMENT NAME (#PCDATA)>

<!ELEMENT VARIABLE ( NAME, ( OUTCOME | PROPERTY | POSITION)* ) >



<!ATTLIST VARIABLE TYPE (nature|decisi
on|utility|explanation|gaussian
) "nature">

<!ELEMENT OUTCOME (#PCDATA)>

<!ELEMENT DEFINITION ( FOR | GIVEN | TABLE | ENTRY | PROPERTY )* >

<!ELEMENT FOR (#PCDATA)>

<!ELEMENT GIVEN (#PCDATA)>

<!ELEMENT TABLE (#PCDATA)>

<!ELEMENT ENTRY (CATEGORY, LIST , MEAN , VARIANCE , REGRESSORS)*>

<!ELEMENT CATEGORY (#PCDATA)>

<!ELEMENT LIS
T (#PCDATA)>

<!ELEMENT MEAN (#PCDATA)>

<!ELEMENT VARIANCE (#PCDATA)>

<!ELEMENT REGRESSORS (#PCDATA)>

<!ELEMENT PROPERTY (#PCDATA)>

<!ELEMENT POSITION EMPTY>



<!ATTLIST POSITION X CDATA #REQUIRED Y CDATA #REQUIRED>

]>

The

first tag of a XMLBIF 0.
5

file is

the <BIF> tag; the last tag is the closing </BIF> tag. All the
information about the model is contained between these tags. There are three basic units of
information:
network
,
variable

and
definition
s
.

A network is defined by its name, followed by a lis
t of properties (optional), followed by a list of
variables and probability densities.
The network tag has an optional attribute TYPE. This attribute
defines if the network is discrete(only discrete variables), gaussian(only continuous gaussian
variables)
or hybrid(both type of variables). In the absence of the attribute, it is assumed that the
network is discrete.

For example, a network may be defined as:

<BIF VERSION="0.5
">

<NETWORK

TYPE="
discrete
"
>

<NAME>Dog
-
Problem</NAME>

<PROPERTY>date Sunday, 19 July
, 1998</PROPERTY>

<PROPERTY>author John</PROPERTY>



variables and probabilities go here


</NETWORK>

</BIF>


The VERSION attribute in the BIF tag is mandatory. Variables are defined by their names, types
and properties:

<VARIABLE TYPE="
nature
">


<NAME>lig
ht
-
on</NAME>


<OUTCOME>true</OUTCOME>


<OUTCOME>false</OUTCOME>


<POSITION X="30" Y="30"/>


<PROPERTY>any text can be used here</PROPERTY>

</VARIABLE>


Conditional probability densities
and distributions
can be specified in various
ways inside the
DEFINITI
ON tag.

One example
for a discrete variable
is:

<DEFINITION>


<FOR>hear
-
bark</FOR>


<GIVEN>dog
-
out</GIVEN>


<TABLE>
0.7 0.3 0.01 0.99
</TABLE>

</DEFINITION>

There is no mandatory order of variable and
definition

blocks.


A
property

is just a string of arbi
trary text to be associated with a block. Examples of properties:


<PROPERTY>size 12</PROPERTY>


<PROPERTY>comment Trial number ten</PROPERTY>

Any text is valid in the string inside the PROPERTY opening and closing tags. The idea is to store
informa
tion that is specific to a particular system or network in the properties. Any number of
property attributes can appear in a block.


Variables

A va
riable is defined by a NAME tag,
and its possible OUTCOMES
, if there are any.


The TYPE

attribute of the
var
iable
is
“nature”

for discrete variables
,
“explanation” for explanatory
discrete variables

and


gaussian
” for gaussian

variables
.

Only discrete variables have OUTCOME
tags.

Other possible values for TYPE are “
decision
” and “
utility
”, even though these are
not treated
by JavaBayes.

POSITION is a tag without text, with two attributes: the X coordinate and Y coordinate, starting
from the top
-
left point in the drawing area of the network.

The

explanation”

TYPE

for
a variable

is used when it is desired
to indic
ate that variable light
-
on is
to be estimated.
To accomplish this, l
ight
-
on
can be set
as a
explanation variable
, i.e., a variable
which will be estimated. The meaning of a explanatory variable is that
it is wanted

to know which
value for the variable woul
d produce the highest probability or expectation. It is not necessarily true
that the variable
can operated
and change it at will; it is just that
it desired

to know which value
would be best in the face of evidence.

If
JavaBayes

is requested
to produce th
e ``best'' configuration for the explanation variables,
JavaBayes

will only process the variables that are marked through an explanation property.

There are also properties that are related to
robustness analysis

in
JavaBayes
. Since robustness
analysis is still an ongoing research project, the support for it is minimal.


Definition

The structure of the DEFINITION tag depends w
hether the variable is di
screte or

gaussian.


Definition of discrete variables

The TABLE tag is specific to the DEFINITION block
of discrete variables
(note that a definition
can be a probability distribution, a set of decision values
,

a set of utility values

or the moment
charact
eristics of gaussian variables
, depending on the TYPE attrib
utes of the referred variable).
DEFINITION blocks are used to define the actual network topology, by specifying conditional
probability tables

and distributions
.


An example of a standard
definit
ion

block is:

<DEFINITION>


<FOR>GasGauge</FOR>


<GIVEN>BatteryPower</GIVEN>


<TABLE>1.0 0.0 0.2 0.
8

</TABLE>

</DEFINITION>


for a variable GasGauge that is defined with TYPE equal to ``
nature
” and has the variable
BatteryPower as its only parent
. The bod
y of the TABLE tag is a sequence of non
-
negative real
numbers, in the counting order of the declared variables (if all variables were binary, we would say
binary counting with least significant digit in the right). If multiple table declarations exist, onl
y the
last one is valid.

The same definition could be written as separate entries for each combination of parents.

<DEFINITION>


<FOR>GasGauge</FOR>


<GIVEN>BatteryPower</GIVEN>

<ENTRY>

<CATEGORY>true</CATEGORY>

<LIST>
1.0 0.0
</LIST>

</ENTRY>

<ENTRY>

<CATE
GORY>
false
</CATEGORY>

<LIST>0.
2

0.
8
</LIST>

</ENTRY>

</DEFINITION>

In this form, each ENTRY corresponds to a combination of parents. There are one or more
CATEGORY tags, each for a parent. The single LIST tag represents the probabilities for the
variable fo
r this combination of parents.

Variables with the “explanation” attribute have exactly the same definition block of “nature”
variables.


Definition of
gaussian

variables

without discrete parents

Gaussian variables without discrete parents

also
have their d
istribution specified in definitions
blocks.

<DEFINITION>

<FOR>
gaussian_node1
</FOR>

<GIVEN>
gaussian_parent1
</GIVEN>

<ENTRY>

<MEAN>1.0</MEAN>

<VARIANCE>1.0</VARIANCE>

<REGRESSORS>4.0</REGRESSORS>

</ENTRY>

</DEFINITION>

Above is a sample definition for a var
iable gaussian_node1 with a parent gaussian_parent1. The
presentation of parent
-
child relationships is the same as with discrete variables, with the variable
name in the FOR tag and its parents in the GIVEN tag.

There is only one ENTRY tag. Inside the ENTR
Y there are mandatory MEAN and VARIANCE
tags, each one containing a double value, with the restriction that VARIANCE is a positive non
-
zero value. A REGRESSOR tag must be present if the node has parents. The REGRESSOR tag is a
list of doubles, with each va
lue representing a linear function between the variable and each one of
its parents. For example, for a variable with 5 gaussian parents, the REGRESSOR tag must contain
5 double values.


Definition of
gaussian

variables

with

discrete parents

Gaussian varia
bles
with

discrete parents

differ from

the above in the ENTRY BLOCK.

Suppose there is a gaussian variable with two gaussian and two discrete binary (true, false) parents.
Its definition block would be:

<DEFINITION>

<FOR>
gaussian_node1
</FOR>

<GIVEN>
gaussian
_parent1
</GIVEN>

<GIVEN>
gaussian_parent2
</GIVEN>

<GIVEN>
discrete_parent1
</GIVEN>

<GIVEN>
discrete_parent2
</GIVEN>

<ENTRY>

<CATEGORY>true</CATEGORY>

<CATEGORY>false</CATEGORY>

<MEAN>1.
2
</MEAN>

<VARIANCE>0
.5
</VARIANCE>

<REGRESSORS>4.0

2.0
</REGRESSORS>

</ENTRY
>


… Three more entries…


</DEFINITION>


As can be seen above, the structure combines elements from the definition of discrete variables (in
the entry style) and also from gaussian variables without discrete parents.

There should be one ENTRY for each comb
ination of the discrete parents. Inside each ENTRY
there are MEAN, VARIANCE and REGRESSORS tags. Again, the REGRESSORS tag only exists
if there are gaussian parents.

The order of GIVEN tags must be respected inside the groups of discrete and gaussian paren
ts. This
means that the following order represents the same as above:

<GIVEN>
gaussian_parent1
</GIVEN>

<GIVEN>
discrete_parent1
</GIVEN>

<GIVEN>
gaussian_parent2
</GIVEN>

<GIVEN>
discrete_parent2
</GIVEN>


However the following order is different from the previou
sly presented:

<GIVEN>
gaussian_parent2
</GIVEN>

<GIVEN>
gaussian_parent
1
</GIVEN>

<GIVEN>
discrete_parent1
</GIVEN>

<GIVEN>
discrete_parent2
</GIVEN>


Implementation

XMLBIF is based on the xml specification 1.0.

The implementation of loading/saving functions in
JavaBayes 0.4 uses a validating SAX2 parser.

Unless of are using a JRE (Java Runtime Environment) 1.4 or higher you need to install a SAX2
parser

to use XML formats with JavaBayes.

A reference implementation is available at


HTTP://www.saxproject.org


Examples

Here are some of the available examples:

dog
-
problem.xml
, a very simple network based on the discussion at Ch
arniak, E., Bayesian
Networks without Tears, AI Magazine, 1991.

Here is the dog
-
problem.xml network:


<?xml version="1.0" encoding="ISO
-
8859
-
1"?>

<!
--

Bayesian network in XMLBIF v0.
5

(BayesNet Interchange Format)

Produced by JavaBayes (http://www.cs.cmu.
edu/~javabayes/

--
>

<!
--

DTD for the XMLBIF 0.
5

format
--
>


<!DOCTYPE BIF [

<!ELEMENT BIF ( NETWORK )*>



<!ATTLIST BIF VERSION CDATA #REQUIRED>

<!ELEMENT NETWORK ( NAME, ( PROPERTY | VARIABLE | DEFINITION )* )>


<!ATTLIST NETWORK TYPE (discrete|continu
ous|hybrid) "discrete">

<!ELEMENT NAME (#PCDATA)>

<!ELEMENT VARIABLE ( NAME, ( OUTCOME | PROPERTY | POSITION)* ) >



<!ATTLIST VARIABLE TYPE (nature|decisi
on|utility|explanation|gaussian
) "nature">

<!ELEMENT OUTCOME (#PCDATA)>

<!ELEMENT DEFINITION ( FOR
| GIVEN | TABLE | ENTRY | PROPERTY )* >

<!ELEMENT FOR (#PCDATA)>

<!ELEMENT GIVEN (#PCDATA)>

<!ELEMENT TABLE (#PCDATA)>

<!ELEMENT ENTRY (CATEGORY, LIST , MEAN , VARIANCE , REGRESSORS)*>

<!ELEMENT CATEGORY (#PCDATA)>

<!ELEMENT LIST (#PCDATA)>

<!ELEMENT MEAN
(#PCDATA)>

<!ELEMENT VARIANCE (#PCDATA)>

<!ELEMENT REGRESSORS (#PCDATA)>

<!ELEMENT PROPERTY (#PCDATA)>

<!ELEMENT POSITION EMPTY>



<!ATTLIST POSITION X CDATA #REQUIRED Y CDATA #REQUIRED>

]>


<BIF VERSION="0.
5
">

<NETWORK>

<NAME>Dog_Problem</NAME>

<VARIABLE

TYPE="nature">

<NAME>dog_out</NAME>

<OUTCOME>true</OUTCOME>

<OUTCOME>false</OUTCOME>

<POSITION X="155" Y="165"/>

</VARIABLE>

<DEFINITION>

<FOR>dog_out</FOR>

<GIVEN>bowel_problem</GIVEN>

<GIVEN>family_out</GIVEN>

<TABLE>0.99 0.01 0.97 0.03 0.9 0.1 0.3 0.7<
/TABLE>

</DEFINITION>

<VARIABLE TYPE="nature">

<NAME>bowel_problem</NAME>

<OUTCOME>true</OUTCOME>

<OUTCOME>false</OUTCOME>

<POSITION X="190" Y="69"/>

</VARIABLE>

<DEFINITION>

<FOR>bowel_problem</FOR>

<TABLE>0.01 0.99</TABLE>

</DEFINITION>

<VARIABLE TYPE="n
ature">

<NAME>family_out</NAME>

<OUTCOME>true</OUTCOME>

<OUTCOME>false</OUTCOME>

<POSITION X="112" Y="69"/>

</VARIABLE>

<DEFINITION>

<FOR>family_out</FOR>

<TABLE>0.15 0.85</TABLE>

</DEFINITION>

<VARIABLE TYPE="nature">

<NAME>hear_bark</NAME>

<OUTCOME>true<
/OUTCOME>

<OUTCOME>false</OUTCOME>

<POSITION X="154" Y="241"/>

</VARIABLE>

<DEFINITION>

<FOR>hear_bark</FOR>

<GIVEN>dog_out</GIVEN>

<TABLE>0.7 0.3 0.01 0.99</TABLE>

</DEFINITION>

<VARIABLE TYPE="nature">

<NAME>light_on</NAME>

<OUTCOME>true</OUTCOME>

<OUTCO
ME>false</OUTCOME>

<POSITION X="73" Y="165"/>

</VARIABLE>

<DEFINITION>

<FOR>light_on</FOR>

<GIVEN>family_out</GIVEN>

<TABLE>0.6 0.4 0.05 0.95</TABLE>

</DEFINITION>

</NETWORK>

</BIF>

6.
X
MLBIF version
0.40

The XMLBIF version 0.40 is a simplified version of
XMLBIF 0.50. This version is similar in most
aspects to 0.50, with the significant difference that it only works with discrete networks.

The
XMLBIF 0.50
format
was written with the objective of being backward and forward compatible
with 0.40.

This means di
screte networks save in XMLBIF 0.40 can be opened with 0.50, as well as
networks saved with XMLBIF 0.50 can be opened with 0.40. Gaussian and hybrid networks can
only be opened with XMLBIF 0.50.


The DTD for XMLBIF 0.40 is posted below:

<!
--

DTD for the XM
LBIF 0.4 format
--
>

<!DOCTYPE BIF [

<!ELEMENT BIF ( NETWORK )*>



<!ATTLIST BIF VERSION CDATA #REQUIRED>

<!ELEMENT NETWORK ( NAME, ( PROPERTY | VARIABLE | DEFINITION )* )>

<!ELEMENT NAME (#PCDATA)>

<!ELEMENT VARIABLE ( NAME, ( OUTCOME | PROPERTY | POSITI
ON)* ) >



<!ATTLIST VARIABLE TYPE (nature|decision|utility|explanation) "nature">

<!ELEMENT OUTCOME (#PCDATA)>

<!ELEMENT DEFINITION ( FOR | GIVEN | TABLE | ENTRY | PROPERTY )* >

<!ELEMENT FOR (#PCDATA)>

<!ELEMENT GIVEN (#PCDATA)>

<!ELEMENT TABLE (#PCDATA
)>

<!ELEMENT ENTRY (CATEGORY*, LIST)>

<!ELEMENT CATEGORY (#PCDATA)>

<!ELEMENT LIST (#PCDATA)>

<!ELEMENT PROPERTY (#PCDATA)>

<!ELEMENT POSITION EMPTY>



<!ATTLIST POSITION X CDATA #REQUIRED Y CDATA #REQUIRED>

]>


7.
XMLBIF version 0.30

The XMLBIF 0.30 was
the first XML version used with JavaBayes. The structure of the format is
similar in most aspects with 0.40 and 0.50.

The specific differences are:

This version allows the indication of observed variables using a property.

For example, t
o indicate that var
iable light
-
on is observed with value
true

(i.e., l
ight
-
on = true is the
evidence):

<VARIABLE TYPE="chance">


<NAME>light
-
on</NAME>


<OUTCOME>true</OUTCOME>


<OUTCOME>false</OUTCOME>


<PROPERTY>observed true</PROPERTY>


<PROPERTY>position = (73, 165)</PROP
ERTY>

</VARIABLE>

The same functionality has been moved to a separate file format with XMLBIF 0.40 and 0.50.


The position of the variable is written in a different manner, inside a PROPERTY tag. The
information that a variable is explanatory is also infor
med with a property.

An example follows below:

<VARIABLE TYPE="chance">


<NAME>light
-
on</NAME>


<OUTCOME>true</OUTCOME>


<OUTCOME>false</OUTCOME>


<PROPERTY>explanation</PROPERTY>


<PROPERTY>position = (73, 165)</PROPERTY></VARIABLE>

</VARIABLE>


The attri
bute TYPE of variables have the following options: chance, decision and utility. The value
“chance” has been renamed to “nature” in XMLBIF 0.40 and 0.50.


In previous JavaBayes versions, the XMLBIF parser has been implemented with JavaCC. For this
reason,
there are some cases where the saved networks cannot be properly opened with newer
versions of JavaBayes. This problem normally happens when some characters which need espaping
in XML are used for names in the network. In these cases, it is necessary to ma
nually correct the
files containing the saved networks.





8.
XMLBIF
-
EVIDENCE

0.50


XMLBIF
-
EVIDENCE is an accompanying format for XMLBIF 0.50. The objective of this
specification is to
stor
e

evidences

in a separate file.

The DTD for XMLBIF
-
EVIDENCE is as
follows:


<!
--

DTD for the XMLBIF
-
EVIDENCE 0.
5

format
--
>


<!
DOCTYPE

BIF
-
EVIDENCE [

<!
ELEMENT

BIF
-
EVIDENCE (DESCRIPTION?, NETWORK*)>



<!
ATTLIST

BIF
-
EVIDENCE VERSION CDATA #REQUIRED>

<!
ELEMENT

DESCRIPTION (#PCDATA)>

<!
ELEMENT

NETWORK (NAME, EVIDENCE*)>

<!
ELEMENT

NAME (#PCDATA)>

<!
ELEMENT

EVIDENCE (VARIABLENAME,
VALUE
)>

<!
ELEMENT

VARIABLENAME (#PCDATA)>

<!
ELEMENT

VALUE

(#PCDATA)>

]>


Only
variables
with evidence
are saved in the file. When
an evidence

file is loaded, all variables
that don't contain an entr
y in the file have
their evidence retracted.


The DESCRIPTION tag is optional and denotes any string of text.

The file must contain a NETWORK tag, with its name and a list of
EVIDENCE
s. An empty list of
evidences will retract all evidences in the network.

Each EVIDENCE tag contains two values: the name of the variable (
VARIABLENAME
) and the
value of its evidence. For discrete variables, the evidence is the name of the observed category. For
gaussian variables, the evidence is a double with the mean value of

the variable.


Before loading an observation file, be sure to load the corresponding network.