Multi-Relational Rule Discovery

fantasicgilamonsterΔιαχείριση Δεδομένων

20 Νοε 2013 (πριν από 3 χρόνια και 7 μήνες)

116 εμφανίσεις

Multi
-
Relational Rule Discovery

Progress Report

November 2002

Mahmut Uludag

Eastern Mediterranean University

Supervisor: Prof. Dr. Mehmet Tolun


Summary

This report presents the current status of the multi
-
relational rule discovery research. As
part of the

research, a data mining system is being developed that allows discovery of
multi
-
relational rules using data from a relational database. The basic assumption of the
system is that objects analysed are stored in a set of tables.

Multi
-
relational rules disc
overed would either be used in predicting an object attribute
value (by looking at the other object attribute values), or they would make it possible to
see the hidden relationship between the object attribute values. The second way of using
the discovered

rules can be described as ‘data summarisation’.

The system already developed is not yet able to use data available from every possible
‘connected’ schema but it requires that there is a central table directly connected to all
other tables. However, this r
eport gives details of the general solution for a ‘connected’
schema. We plan to implement the general solution during December 2002.

The report also gives details of the possible usage of the rule discovery algorithm in
SRS
, Lion biosciences’ data integra
tion system

[
14
]. Basically, when a relational library
is defined in SRS most of the input required by the rule discovery algorithm becomes
automatically defined. The rule discovery process can be started by having a button on
t
he SRS query results page allowing the user to link to “hidden relations in the results”.

1. Introduction

Most of the current data mining algorithms are designed to use datasets from a single
table. They require that each object be described by a fixed set

of attributes. Compared
to a single table dataset, a relational database containing multiple tables makes it
possible to represent more complex and structured data. In addition, today, a significant
amount of scientific data is stored in relational databa
ses, such as gene expression data.
For these reasons, it is important to have discovery algorithms running for relational
data in its natural form without requiring the data to be viewed in one single table.


2

Previous work on multi
-
relational data mining co
ncentrated on producing decision tree
classifiers [
11
] [
5
] or association rule mining [
7
]. However, in a more recent study [
12
]
multi
-
relational

data mining was applied to a rule discovery algorithm as a test case for
a new paradigm.

In [
10
], a multi
-
relational data
-
mining framework was described that can be used to
analyse data in relational databases containing multip
le relations. In this study, we have
applied the described framework to a specific rule induction algorithm, namely ILA
(Inductive Learning Algorithm). ILA is a ‘covering’ type machine learning algorithm
that takes each class in turn and seeks a way of cov
ering all instances in it, at the same
time excluding instances not in the class [
2
]. The concepts suggested in the framework,
such as selection graphs, target table, target attribute, all helped much during the process
of buil
ding the multi
-
relational version of the ILA algorithm. However we needed to
extend the framework to have a sense of distinction between covered and not yet
covered objects.

A relational data model consisting of multiple tables may represent several object

classes, i.e., within a schema while one set of tables represents a class of object a
different set of tables may represent another class. Before starting the multi
-
relational
data mining process one should analyse the schema and select the list of tables

that
represents the kind of objects to be analysed. One of the selected tables is central for the
object and each row in the table should correspond to a single object in the database. In
the previous multi
-
relational data mining publications, this centra
l table is named as
‘target table’ in [
11
] and [
5
], ‘primary table’ in [
7
] and [
8
], ‘master relation’ in [
13
], and

hub table’ in [
14
]. Part of the object information stored in other selected tables can be
found by following the associations between the target table and the other tables.

The target table in multi relational data mining termin
ology has a similar meaning to the
fact table in data warehouses, where the database usually consists of a fact table, storing
all measures, with dimensional tables around it. Having one table per dimension leads to
a star schema, and by normalising the di
mension tables we a snowflake schema is
obtained [
9
]. The fact table is related to several dimension tables using a foreign key
relationship to each dimension table. That is, each row inserted into the fact table will
have value
s in the foreign key columns that correspond to primary key data in the
various dimensions.

In many cases a database may have several kinds of objects. In these cases multiple
target tables can be selected. However Relational
-
ILA can analyse examples of on
ly one
kind of object at a time so target tables should be selected one by one for each execution
of Relational
-
ILA.

ILA requires a particular feature of the object under consideration to be used as a
dependent attribute for classification. For this reason

one of the columns within the
target table should be defined as the target attribute.

The most apparent difference between a standard rule and a multi
-
relational rule is that
in case of a multi
-
relational rule, attribute names in conditions are annotated
using the
name of the relational table to which the attribute is related. However, the table name
detail does not need to be presented to the user, see for example Figure 1.



3


Figure 1. An example multi
-
relational rule, from PharmaDM web site[
15
].



2. The Algorithm

This section summarises the basic multi
-
relational ILA algorithm. Input to the algorithm
can be listed as follows.



The list of
table names,



The name and type of each column in each table,



The list of foreign key relationships between the tables,



The name of the target table,



The name of the target attribute.


Given the input, the multi
-
relational ILA algorithm generates a list of
multi
-
relational
rules that cover all objects stored in the given instances of the list of tables. It is assumed
that the target table is connected to all other tables through foreign key relations.

Relational
-
ILA starts by selecting the first value of the

target attribute, and then attempts
to generate rules that will correctly classify objects having this attribute value, in other
words objects belonging to the current class. For this purpose, the following steps are
repeated until all examples of the cur
rent class are covered, i.e. correctly classified.
Then the same steps are repeated for the other class values.

2.1 Candidate generation

The first step in the ILA algorithm is to generate the hypotheses. In ILA terminology,
‘hypotheses’ means a set of pote
ntially interesting rules. Relational
-
ILA first attempts to
cover all the class examples by finding hypotheses with only one condition. After
exploiting all possible size
-
one hypotheses, ILA starts to refine the set of hypotheses by
adding further conditio
ns one at a time. The refinement process continues until either all
examples are covered or the maximum size for rules is reached.

The following is the current version the function, which is used to build the initial
hypotheses.


4


Vector buildHypothesis(St
ring classValue){

Vector hypothesis = new Vector();


For each table {


For each column in the selected table{


If ( column is not the target attribute


&& column is not a primary key


&& column is not a foreign key) {


Check
whether the table is the target table


Check whether the column is numeric


Select the proper SQL template and generate SQL


resultSet = execute generated SQL


hypothesis += generate hypothesis using resultSet



}


}


}


}


return the hypotheses set

}


For the general solution, we need to change this function such that it will not search the
table space by just iterating over the table list, but will search by following the relations
between tables. So
we need a table name as a second argument. Also the loop that
iterates over the table list will be replaced by a loop that will iterate over the foreign key
relations from/to the current table. So, the general solution will be as follows. When the
function

first executed, it should be provided by the target table name.


Vector buildHypothesis(String classValue, String tableName){

Vector hypothesis = new Vector();


For each column in the selected table{


If ( column is not the target attribute



&& column is not a primary key


&& column is not a foreign key) {


Check whether the table is the target table


Check whether the column is numeric


Select the proper SQL template and generate SQL


resultSet = execu
te generated SQL


hypothesis += generate hypothesis using resultSet


}


}


}


For each table linked by a foreign key relation{


call buildHypothesis(classValue, linkedTableName)


}


}


return the hypotheses set

}


2.1.1 Candidates in the target table

A list of hypotheses is generated by first searching the target table using the following
SQL template. The template is applied for each column other than the class and the
primary key column.


5

Select
attr
, count(
attr
)

from
targetTable
, covered where
targetAttr

=
currentClass

and covered.id =
targetTable
.
pk

and covered.mark=0 group by
attr
;

Here,



attr

is one of the column names in the target table which cannot be the primary
key column(s) or the class column, i.e. the t
arget attribute,



targetTable

is the central table which has one row for each object for which
classification rules are being searched,



targetAttr

is the class column which is to be used in the right side of the
classification rules,



pk

is the name of the p
rimary key column in the target table.


In the template, the table ‘covered’ is a temporary table generated by Relational
-
ILA. In
order to reduce the complexity of communication between Relational
-
ILA and the
database system, the information about covered
objects is stored in the temporary table
rather than in an internal data structure. An alternative approach would be to keep this
information in the Relational
-
ILA main memory. This requires the hypothesis
generation SQL templates to be extended such that
the list of identifiers of the covered
elements is included in the SQL. The alternative approach was not selected because it
can make the SQL statements too long when the number of marked objects is large.

The temporary table ‘covered’ has two attributes n
amed as ‘id’ and ‘mark’. When ILA
starts processing a class, this table is initialised as follows. A new row is inserted for
each object belonging to the current class. The ‘id’ field gets the value of the primary
key and the mark field is set to zero. Whe
n a new rule is generated, the ‘mark’ fields of
the rows that refer to the covered objects are set to one.

The above SQL template generates hypotheses together with the frequency values.
However ILA also needs to know about the frequency of the hypotheses
in other
classes. For this reason, the following SQL template is used to calculate the frequency
values of the hypotheses in the other classes.

Select
attr
, count(
attr
) from
targetTable

where
targetAttr

<>
currentClass

group by
attr
;


2.1.2 Candidates in
the other tables

After generating the list of all possible size
-
one hypotheses in the target table, the next
step is to find the size
-
one hypotheses using the other tables. The following SQL
template is used for this purpose. The SQL queries are repeated f
or each table and for
each column except the foreign key column(s).

Select
attr
, count(
attr
) from
table
, covered,
targetTable

where
table
.
fk

=
targetTable
.
pk

and
targetTable
.
targetAttr

=
currentClass

and covered.id =
targetTable
.
pk

and covered.mark=0 group

by
attr

Here,



table

refers to the current table where the hypothesis is being searched,



fk

is the foreign key column which is connected to the target table.



6

Similar to the hypotheses in the target table case we need to calculate the frequency of
the hypo
theses in the other classes. The frequency values are calculated using SQL
statements derived from the following SQL template.


Select
attr
, count(
attr
) from
table
,
targetTable

where
table
.
fk

=
targetTable
.
pk

and
targetTable
.
targetAttr

<>
currentClass

grou
p by
attr


After each time the above queries are executed, using the returned result set, ILA builds
an internal list of hypotheses using the Hypothesis class, which has the following
attributes.




The
table

attribute is used to store the table name informa
tion.



The
column

attribute is used to store the column name information.



The
value

attribute refers to the column value for which the hypothesis stands.



The
trueValues

attribute is used to store the number of times the hypothesis is
true in the not yet cov
ered elements of the training data.



The
falseValues

attribute is used to store the number of times the hypothesis is
false.


The Hypothesis class has also an attribute called
connectedTo
, which relates hypotheses
to each other so that it is possible to for
m multi
-
relational hypotheses. A multi
-
relational
hypothesis is composed of simple hypotheses all connected by the ‘and’ logical
operator. Similar to selection graphs [
10
] multi
-
relational hypotheses can be translated
into SQL i
n a straightforward manner. For this purpose we have implemented the
toSQL

method for the Hypothesis class. The method simply traverses the hypotheses connected
to each other and prepares the SQL equivalent of the hypothesis object.

2.2 Hypothesis Evaluati
on and Rule Generation


At this stage, the list of the hypotheses is ready with the frequency values, for each
hypothesis showing how each hypothesis classifies the training data. This is the typical
information needed by many rule discovery algorithms for

selection of rules. This
means that the results of this study are not bound by the ILA algorithm but can be used
for other rule discovery algorithms to have their multi
-
relational versions.

The rule selection criterion of ILA is simple, i.e. select the hy
pothesis with the
maximum number of the
trueValues

but the
falseValues

should be nil. If there is any
hypothesis satisfying this criterion then it is used to generate a new rule. Rules in
Relational
-
ILA are represented using the following attributes.




The
hypothesis

attribute refers to the hypothesis object from which the
rule was derived. The hypothesis object forms the left
-
hand side of a rule.


7



The
targetTable

attribute is used to store the name of the target table.



The
targetAttr

attribute is used to sto
re the name of the target attribute.


In summary, the
hypothesis

attribute forms the left
-
hand side of a multi
-
relational rule
while the
targetTable

and the
targetAttr

attributes forms the right
-
hand side of it.
Different to a conventional classification r
ule, a multi
-
relational rule includes the name
of the relational table to which the target attribute belongs.

After a new rule has been generated, the objects classified by the new rule are marked as
covered in the table ‘covered’ as described in the previ
ous section. Then, in the further
steps of the algorithm, the new set of hypotheses is generated only from the objects that
have not yet been covered.

The hypothesis generation and the rule generation steps are repeated until no valid rule
can be found in
the current hypothesis set or all objects of the current class are covered
by the already generated rules. If no valid rule can be produced but there are objects not
covered then the active set of hypotheses is refined as described in the next section.
Bef
ore describing the refinement algorithm we may first describe the basic algorithm as
follows.


For each class {


Reset covered examples table (classValue);


Repeat {


Vector hypothesis = Build Hypothesis (classValue);


Rule rule = Select Rules (hyp
othesis, classValue);


If no rule was selected {


int size = 1;


Refine Search (hypothesis, classValue, size);


Stop processing for the current class


}


Else {


Update covered examples information(rule);


If all examples
are covered


Then stop processing for the current class


}


}

}


2.3 Refinement of Hypothesis


Refinement of a multi
-
relational hypothesis means extending the actual description of
the hypothesis. It results in a new selection of objects that is a

subset of the selection
associated with the original multi
-
relational hypothesis.

The refine search function simply extends the given list of hypotheses, first using the
attributes in the target table then using the attributes in other tables. After a lis
t of
extended hypotheses is generated the usual rule selection criterion is applied to the
extended set. If a rule is generated out of the extended hypothesis set then the hypothesis
is deleted from the current hypothesis set and the refinement process is
repeated. When
no more rules can be generated, the refine search function recursively calls itself by

8

providing the last extended set as its input. Each time the refine search function
recursively calls itself it tries to extend all hypotheses by adding on
e more condition.

The following is the pseudo code description of the refine search function in the system
being developed.


RefineSearch(Vector currentHypothesis, String classValue, int size){


If size is larger than the maximum size then return;


Repea
t{


extHypothesis = extendHypothesis(currentHypothesis, classValue);


Rule rule = selectRules(extendedHypothesis, classValue);


If no rule was selected {


Refine Search(extHypotheses, classValue, size+1);


Stop processing for the current

class;


}Else{


Update covered examples information(rule);


If all examples are covered


Then stop processing for the current class


}


}

}



The following is the pseudo code description of the extend hypothesis function which is
u
sed by the refine search function. It is not the current version but the one for the
general solution.


Vector extendHypothesis(String classValue, Vector currentHypotheses,
String tableName){


Vector extendedHypothesis = new Vector();


For each colum
n in the input table{


If ( column is not the target attribute


&& column is not a primary key


&& column is not a foreign key) {


For each hypothesis in currentHypothesis{


Check whether the table is the target table



Check whether the column is numeric


Select the proper SQL template


generate SQL(selected hypothesis)


Execute SQL


Append all returned hypothesis to extendedHypothesis


}


}


For each table linked by a fore
ign key relation{


call buildHypothesis(classValue, linkedTableName)


}


}

}




2.3.1 Extending candidates using the target table attributes



9

The following is the SQL statement template to extend a hypothesis by adding a new
condition from the tar
get table.

Select
attr
, count(
attr
) from targetTable, covered, [
table_list
] where
targetAttr

=
currentClass

and [
condition_list
] covered.id =
targetTable.pk

and covered.mark=0 group by
attr
;

Here the
table_list

variable includes the list of the tables to w
hich the hypothesis being
extended refers. The
condition_list

variable represents the conditions covered by the
hypothesis being extended.

As before, Relational
-
ILA needs to know the frequency of the extended hypotheses in
the classes other than the curren
t class. The following SQL template is used for this
purpose.

Select
attr
, count(
attr
) from
targetTable
, [
table_list
] where
targetAttr

<>
currentClass

and [
condition_list
] group by
attr
;

2.3.2. Extending candidates using the other tables attributes


The
following is the SQL statement to extend a hypothesis by adding a new condition
from one of the tables, which is not the target table.

Select
attr
, count(
attr
) from
targetTable
, covered, [
table_list
] where
targetAttr

=
currentClass

and
table.fk

=
targetTa
ble.pk

and [
condition_list
]
covered.id

=
targetTable.pk

and
covered.mark
=0 group by
attr
;

The following SQL statement is used to find the frequency of the hypotheses, generated
by the previous SQL, in the classes other than the current class.

Select
attr
,

count(
attr
) from
targetTable
, [
table_list
] where
targetAttr

<>
currentClass

and
table
.
fk

=
targetTable
.
pk

and [
condition_list
] group by
attr
;

3. Discretization


The ILA algorithm can only handle symbolic attributes. For this reason, it needs to
discretize

the numeric attribute values in the pre
-
processing step. In the data
-
mining
system being developed, for discretizing the numeric attributes, the Weka machine
-
learning library is used [
6
]. Weka has a number of di
scretization methods implemented
in its DiscretizeFilter class. Given the list of instances and the index of the attribute to
be discretized, different flavours of the
calculateCutPoints

methods in the
DiscretizeFilter class produce the list of cut points
according to the selected
discretization method. The generated cut points list can be retrieved by the
getCutPoints

method in the same class.

When the user starts a rule induction process, Relational
-
ILA checks each table for
numeric attributes. Then a lis
t of Weka instance objects is generated for only the

10

numeric attributes. Then ILA calls related Weka functions, first to discretize the
instances, then to retrieve the cut points. The cut points are stored in a temporary table,
named ‘disc’, in the connect
ed database. The table should have the following attributes:




table_name
: name of the table the numeric attribute is from.



attribute_name
: name of the column the numeric attribute is associated with.



interval_name
: the interval name, given by Weka when the

cut points are
calculated.



min_val
: minimum value for the current interval.



max_val
: maximum value for the current interval.


Ideally, the
table_name
, the
attribute_name
, and the
interval_name

columns should be
indexed to reduce the time ILA retrieves thi
s information.

In the previous section all the database primitives were given for symbolic attributes.
When they are to be used for a numeric attribute Relational
-
ILA slightly modifies them.
One more join condition is added using the following template:

at
trname

= disc.
attribute_name

and
attr

> disc.min_val and attr < disc.max_val

The ‘select’ part of the primitives is also changed such that the attribute column is
replaced by the three column names;
interval_name
,
min_val

and
max_val
.
Accordingly, the ‘gro
up by’ part of the queries should have the same replacement.

Comparative studies indicate that the entropy
-
based MDL heuristic outperforms other
methods of discretization in most cases [
4
]. For this reason we sel
ected the entropy
-
based MDL heuristic as the default method of discretization for Relational
-
ILA.

4. Missing Attributes


For the missing attributes values problem we plan to implement two of the well
-
known
solutions. The first solution described in [
14
] simply treats any missing attribute values
as just another attribute value. The second solution would be to replace missing values
by the most common value of the attribute for the class. For a numeric attribute the
discretizatio
n procedure should first be applied to the non
-
null numeric attribute values.
Then the null attribute values should all be related to the ‘missing’ attribute value for the
first solution, but to the most common value for the second solution.

5. Experiments


We have made experiments using data available in MS Access and MySQL databases.
In the experiments, the following three datasets, well known by the machine
-
learning
researches, were used.

-

objects,

-

promoter,


11

-

iris.

Both single and multi table versions of t
he objects dataset was used. With its small size
the objects dataset was especially useful to check the correctness of the algorithm while
doing the initial development. In the multi table version of the objects dataset we had a
target table with only two
attributes; the ‘id’ column to uniquely identify each object
and the ‘class’ column, which is the target attribute. All other attributes of the objects
dataset were stored in separate tables. Each table having two columns one for the
attribute associated a
nd the other is the ‘id’ column, which is a foreign key that refers to
the primary key in the target table. We made experiments using the promoter dataset,
both single table and multi
-
table versions, just to see whether the developed system is
fine when th
e number of tables/columns/rows are a bit large. We experimented with the
single table version of the iris dataset to test the discretization solution.

We decided to use the genes dataset [
1
] of KDD Cup 2001 for further testing
of the
implementation of the basic multi
-
relational ILA algorithm. The dataset was
downloaded from KDD Cup 2001 web site and then imported into a Microsoft Access
database. In the first experiments using the Sun odbc
-
jdbc bridge driver, the program
retriev
ed several errors so we transferred the dataset to a MySQL database.

There are two tables in the genes dataset.

-

One relational table (genes_relation) specifies which genes (or their protein
products) interacts which other genes.

-

The other relational ta
ble (interaction_relations) specifies a variety of properties
of individual genes or proteins.

The genes_relation table has information about 862 different genes. There could be
more than one row for each gene. The attribute gene_id identifies a gene uniqu
ely. The
dataset has one numeric attribute and several attributes with missing values.

Current version of the Relational
-
ILA requires that the target table must have a primary
key. In order to have a target table with a primary key we have normalized the
g
enes_relation table as shown in Figure 2, similar to the way as described in [
5
]. The
Interaction table having two foreign keys on it would be a good test bed for the general
solution.




12

Figure 2. Schema of the KDD Cup 2001 gen
es data after normalization



6. How can it be used in SRS?


The relational rule discovery algorithm described can be used in SRS either to analyse
all objects in a relational database or it may be used to analyse a selected list of objects
after a user qu
ery (here objects correspond to what we call entries in SRS). Compared to
running the algorithm on whole data, in the second way the discovery process may need
a relatively short time depending on the size of the query result set. Also the discovered
rules

would have more potential of helping users in understanding the query results. In
order to have the discovery algorithm running on partial data, we need to extend the
conditions of the generated SQL queries using the conditions in the user SQL query.

As
a starting point, to execute the discovery algorithm we may have a button on the SRS
query results page such as “hidden relations in the results”.

We have two possibilities for integrating the rule discovery algorithm into SRS. One
alternative would be to
define it as an SRS application. The other alternative would be
to implement the algorithm in SRS native code. The second way will certainly give
better performance results in terms of time.

In SRS
-
Relational, when a library is defined, most of the inputs
necessary for the
discovery algorithm become defined automatically. The hub table normally corresponds
to the target table. When the relational rule discovery process is started the user will
only need to select the target attribute. Then the algorithm sho
uld produce a list of rules
showing the relation between the target attribute and the other attributes of the object.
The generated list of rules can be displayed using an SRS view. As shown in the
example in Figure 3, we may use some metaphors or other me
ans to display rules in
natural language as far as possible.


13


Figure 3. Another multi
-
relational rule example from PharmaDM[
15
]

7. Discussion


Until now we have addressed the case where we have a primary key in the target table
and all other tables have foreign key relations to the target table. However we have
found a general solution and described in this report. Basically
, the solution will change
how we search the table space; instead of visiting tables by just iterating the list of tables
the general solution starting from the target table will visit all connected tables in a
depth
-
first search manner by following the fo
reign
-
key relations.

The implementation of the general solution will be the highest priority task for
December. However, one of the high
-
priority tasks in GUI side will be to implement a
GUI component that will allow users to define the foreign key relatio
ns between the
selected tables.

In the later stages, we should investigate the possibility of having the classification
attribute in any of the tables. This was done in [
5
] but no details were given.

It seems that having a “disc
over hidden relations in your results” feature in SRS can
make SRS a more useful tool for the researchers.


References

1.

Cheng, J., Krogel, M., Sese, J., Hatsiz, C., Morishita, S., Hayashi, H., and Page,
D., KDD Cup 2001 Report, ACM Special Interest Group on

Knowledge
Discovery and Data Mining (SIGKDD) Explorations, vol. 3, issue 2, January
2002.


14

2.

M.R. Tolun and Saleh M. Abu
-
Soud, ILA: An Inductive Learning Algorithm for
Rule Extraction, Expert Systems with Applications, 14(3), April 1998, 361
-
370.

3.

Holte, R.C.

1993. Very simple classification rules perform well on most
commonly used datasets. Machine Learning 11:63
-
91.

4.

Ron Kohavi, Mehran Sahami, Error
-
based and entropy
-
based discretization of
continuous features,
Proceedings of the Second International Conferen
ce on
Knowledge Discovery and Data Mining
, 1996.

5.

H. Leiva and V. Honavar. In Saso Dzeroski, Luc De Raedt, and Stefan Wrobel,
editors, Experiments with MRDTL

A Multi
-
relational Decision Tree Learning
Algorithm,
Proceedings of the Workshop on Multi
-
Relationa
l Data Mining
(MRDM
-
2002)
, pages 97
--
112. University of Alberta, Edmonton, Canada, July
2002.

6.

“Data Mining: Practical machine learning tools with Java implementations,” by
Ian H. Witten and Eibe Frank, Morgan Kaufmann, San Francisco.

7.

Viviane Crestana
-
Jense
n, Nandit Soparkar: Frequent Itemset Counting Across
Multiple Tables. PAKDD 2000: 49
-
61.

8.

V. Crestana and N. Soparkar. Mining decentralized data repositories. Technical
Report: CSE
-
TR
-
385
-
99. The University of Michigan, EECS Dept. Ann Arbor,
USA. 1999.

9.

Them
istoklis Palpanas: Knowledge Discovery in Data Warehouses. SIGMOD
Record 29(3): 88
-
100(2000).

10.

Knobbe, A.J., Blockeel, H., Siebes, A., Van der Wallen, D.M.G. Multi
-
Relational Data Mining, In Proceedings of Benelearn’99, 1999.

11.

Knobbe, A.J., Siebes, A., Van d
er Wallen, D.M.G. Multi
-
Relational Decision
Tree Induction, In Proceedings of PKDD’99, 1999.

12.

A.J. Knobbe, A. Siebes, and B. Marseille. Involving Aggregate Functions in
Multi
-
Relational Search, In Proceedings of the 6
th

European Conference on
Principles of
Data Mining and Knowledge Discovery, pages 1
--
1. Springer
-
Verlag, August 2002.

13.

Stefan Wrobel, An algorithm for multi
-
relational discovery of subgroups,
Proceedings of PKDD’97, Springer
-
Verlag, Berlin, New York, 1997.

14.

SRS
-
Relational White Paper, Working wit
h relational databases using SRS,
LION Bioscience Ltd.

http://www.lionbioscience.com/solutions/products/srs

15.

PharmaDM is an integrated drug discovery solutions company.
http://www.pharmadm.com/company.asp