Advantages of Bayesian Networks in Data Mining and
Knowledge Discovery.
By Petri Myllymäki, Ph.D., Academy Research Fellow,
Complex Systems Computation Group, Helsinki Institute for Information Technology
During the past few years we have witnessed the a
rrival of a number of complex Data
Mining products on the market, but for small

scale business these are usually too
expensive to use. Their use is also usually too difficult for the average user. Latest
development in Bayesian Networks technology provides
new opportunities to
develop better performing tools for various Data Mining tasks, and a solid unifying
theoretical framework for knowledge discovery in general.
The joint research projects of Complex Systems Computation research group and Bayes Inform
ation
Technology Ltd. have studied issues related to the task of delivering problem domain models that can
be implemented as computer software offering adaptive and intelligent services for the user..
The research has focused on Bayesian modeling. In this
approach, all the related problems in building
such models are solved within the same theoretical framework based on probability theory. In order to
be able to apply this theoretically elegant approach in practice, the set of possible models has to be
cons
trained by some basic assumptions on the problem domain. It has previously been argued that the
required assumptions simplify models to the point where they become useless for practical
applications. However, this argumentation has been disproved by recent
theoretical developments on
applied probability theory, leading to the use of what are known as Bayesian networks, and by several
successful applications based on this technology. According to our experience, most competing
technologies cannot achieve sam
e prediction accuracy This argument is supported by our success in
two recent international Data Mining competitions, the COIL competition (2000), and the KDD Cup
(2001).
A Bayesian network is a high

level representation of a probability distribution over
a set of variables
that are used for building a model of the problem domain. The benefit of the Bayesian network
representation lies in the way such a structure can be used as a compact representation for many
naturally occurring and complex problem doma
ins..
Intuitively speaking, a Bayesian network model is constructed by explicitly determining all the direct
dependencies between the random variables of the problem domain. In a Bayesian network each node
represents one of the observable features of the
problem domain, and the arcs between the nodes
represent the direct dependencies between the corresponding variables. In addition, each node has to be
provided with a table of conditional probabilities, where the variable in question is conditioned by its
immediate predecessors in the network. If machine learning is used, this type of a model can be
constructed from empirical data automatically by the computer.
Advantages
The Bayesian framework offers several advantages over alternative modeling approaches.
The most
important of these advantages are:
Decision theory
As Bayesian networks are models of the problem domain probability distribution, they can be used for
computing the predictive distribution on the outcomes of possible actions. This means that it
is possible
to use decision theory for risk analysis, and choose in each situation the action, which maximizes the
expected utility. It can be shown that in a very natural sense, this is the optimal procedure for making
decisions.
Consistent, theoreticall
y solid mechanism for processing uncertain information
Probability theory provides a consistent calculus for uncertain inference, meaning that the output of the
system is always unambiguous. Given the input, all the alternative mechanisms for computing the
output with the help of a Bayesian network model produce exactly the same answer.
Smoothness properties
Bayesian network models have been found to be very robust in the sense that small alterations in the
model do not affect the performance of the system
dramatically. This means that maintaining and
updating existing models is easy since the functioning of the system changes smoothly as the
model is being modified. For sales and marketing systems this is a crucial characteristics, as these
systems need to
be able to follow market changes rapidly without complex and time consuming re

modeling.
Flexible applicability
Bayesian networks model the problem domain as a whole by constructing a joint probability
distribution over different combinations of the doma
in variables. This means that the same Bayesian
network model can be used for solving both discriminative tasks (classification) and
regression problems (configuration problems and prediction). Besides predictive purposes, Bayesian
networks can also be use
d for explorative data mining tasks by examining the conditional distributions,
dependencies and correlations found by the modeling process.
A theoretical framework for handling expert knowledge
In Bayesian modeling, expert domain knowledge can be coded as
prior distributions, prior meaning
that the probability distributions are defined before and independently of processing any possible
sample data. This allows for combining expert knowledge with statistical data in a very practical way.
Using suitable pri
or distributions, the priors can be given a semantically clear explanation in terms of
the data (expert knowledge can be interpreted as an unseen data

set of the same form as the training
data). This means that the experts will also be able to give an esti
mate of the weight or importance of
their prior knowledge, compared to the training data available.
A clear semantic interpretation of the model parameters
Unlike neural network models, which usually appear to the user as a ``black box'', all the parameter
s in
Bayesian networks have an understandable semantic interpretation. It is for this reason that Bayesian
networks can be constructed directly by using domain expert knowledge, without a time

consuming
learning process. On the other hand, if machine learn
ing techniques are used (with or without expert
knowledge) for constructing Bayesian network models from sample data, the resulting model can be
analyzed and explained in terms that are understandable to domain experts.
Different variable types
Probabilis
tic models can handle several different type variables at the same time, whereas many
alternative model technologies have been designed for some single specific type of variables
(continuous, discrete etc.). For these alternatives, working with several var
iable types requires some
kind of transformation operations, which in some cases may be the cause for unexpected results. From
the probabilistic point of view, all the basic entities are distributions, which means that all the different
variable types fall
elegantly in the same unifying framework.
A theoretical framework for handling missing data
In the Bayesian network framework, missing data is marginalized out by integrating over all the
possibilities of the missing values.
Although the advantages of
probabilistic modeling have been largely recognized and accepted, the
probabilistic approach has often been neglected in the past as the theoretically correct, but
computationally infeasible methodology. Perhaps the most common argument against using
proba
bilistic models has been that the number of parameters needed for defining the models is too high.
Nevertheless, the theoretical framework of Bayesian network modeling suggests that it is possible to
construct quite successful probabilistic models using on
ly a moderate number of parameters. In
addition, Bayesian networks appear to be rather insensitive to the accuracy of the parameters, so
determining good parameter values is in many application areas quite feasible. For these reasons, there
has during the
last few years been a rapid growth in the number of Bayesian network models being
developed. Several companies have noticed the obvious advantages of using Bayesian networks, and
are pursuing work in this area. Bayesian network models are currently being a
pplied in, for example,
building intelligent agents and adaptive user interfaces (Microsoft, NASA), process control (NASA,
General Electric, Lockheed), fault diagnosis (Hewlett Packard, Intel, American Airlines), pattern
recognition and data mining (NASA),
and medical diagnosis (BiopSys, Microsoft).
Comments 0
Log in to post a comment