Data Mining Concepts and Research Trends

internalchildlikeInternet and Web Development

Nov 12, 2013 (4 years and 1 month ago)

93 views

Data Mining Concepts and Research Trends

Do
-
Heon LEE

Database Laboratory

Dept. of Computer Science

Chonnam National University

1998. 5. 21.

KISS
-
SIGDB Tutorial 1998



Do Heon LEE

Table of Contents


Definition and Motivation of Data Mining


Classification of Data Mining Techniques


Mining Association Rules


Attribute Dependencies


Database Summarization


Data Mining Projects


DBMiner/GeoMiner/WebMiner


MineSet


Data Mining and Data Warehousing


References



Do Heon LEE

Definition of Data Mining

Data mining is the


nontrivial extraction of




implicit,



: beyond databases and catalogs


previously unknown, and

: exclude well
-
known knowledge


potentially useful information

: application
-
dependent usefulness


from large volume of


: performance perspective


actual data

.



: missing, erroneous data


Some counter examples


The 3th attribute of table ‘EMP’ is ‘SALARY’.


Explicit information in the DB catalog


Most of college students have been graduated from high schools.


Well
-
known information, common sense



Do Heon LEE

Motivation of Data Mining Research

Growing reliance

on database systems

Fast advance of database

system technology

Increasing volume of

data stored in databases

Database =

operational data
collection +

useful resource
reflecting
domain
characteristics

Mining databases for useful knowledge

that can be exploited in decision making



Do Heon LEE

Comparison with Machine Learning

Data Mining


Dynamic data


Errorneous data


Uncertain data


Missing data


Coexistence of irrelvant data


Immense size


Structured data

Machine Learning


Static data


Error
-
free data


Exact data


No missing data


Only relevant data


Moderate size


Flat collection of data



Data mining is an
actual

application of machine learning methodologies.



Do Heon LEE

Classification of Data Mining Techniques



On knowledge types to be discovered



Characterization : generalized description of data characteristics



Classfication : description of discriminating characteristics



Clustering : grouping data having common properties



Association : co
-
occurence relationships among multiple events



Trend analysis : characterize evolution trend of temporal data



Pattern analysis : find specified patterns in large DB’s


Types of mining targets are continuously evolved

according to emerged application demands. ( cf. SQL evolution )




On database types to be mined



relational, transactional, object
-
oriented, temporal, multi
-
media etc ..




On techniques adopted



statistics, symbolic learning, neural networks, visualization etc..



Do Heon LEE

Association Rules : Definition and Applications


QUEST project at IBM Almaden Research Center


Association rules ( among items )

-
Given a collection of transactions each of which is { item
-
1, ..., item
-
n },


an association rule has a form of


{ item
-
11, item
-
12, ... , item
-
1m}
--
> { item
-
21, item
-
22, ... , item
-
2k }


antecedent items consequence items

-
The existence of an item(or items) implies the existence of other item(s) in
the same transaction.

In a POS(Point
-
Of
-
Sales) data set,


10/15/13:01 { coke, bread, hamburger }

10/15/14:21 { coke, hamburger , juice}

10/15/14:25 { milk, sandwich, juice }

10/15/15:13 { sandwich, milk, juice, bread }

10/15/16:31 { hamburger, juice, coke}

.....

{ hamburger }
--
> {coke}

{sandwich, juice}
--
> {milk}

decision making

for shelf layout design,

direct mailing, etc ...

association rules


Customer usage patterns in public communication services


Fault co
-
occurence analysis in complex systems



Do Heon LEE

Association Rules : Usefulness Measures



Two measures for identifying useful association rules

-

support

: statistical significance
-

the fraction of transactions containing all items

-

confidence

: rule strength
-

the fraction of transactions containing consequence items to


transactions containing antecedent items

{ coke, bread, hamburger }

{ coke, hamburger , juice}

{ milk, sandwich, juice }

{ sandwich, milk, juice, bread }

{ hamburger, juice, coke}

{ coke, bread, hamburger }

{ coke, hamburger , juice}

{ hamburger, juice }

{ milk, hamburger, sweater }

{ coke, milk, juice }

For an assoication rule

{coke}
--
> { hamburger },


support : 5 out of 10 = 50 %

confidence : 5 out of 6 = 83 %


hamburger


o

o

x

x

o

o

o

o

o

x


7

coke


o

o

x

x

o

o

o

x

x

o


6

both


o

o

x

x

o

o

o

x

x

x


5



Do Heon LEE

Association Rules : Mining Procedures

{ coke, bread, hamburger }

{ coke, hamburger , juice}

{ milk, sandwich, juice }

{ sandwich, milk, juice, bread }

{ hamburger, juice, coke}

{ coke, bread, hamburger }

{ coke, hamburger , juice}

{ hamburger, juice }

{ milk, hamburger, sweater }

{ coke, milk, juice }

{ coke, juice }

{ coke, sweater}

The first phase



: finding frequent item
-
sets ( high support )


: the threshold value for support is given as 40 %

The second phase



: finding strong associations (high confidence)


: the threshold value for confidence is given as 70%

{coke} : 8

{bread} : 3

{hamburger} : 7

{juice} : 8

{milk} : 4

{sandwich} : 2

{sweater} : 2

{coke, hamburger} : 5

{coke, juice } : 5

{hamburger, juice} : 4

{coke}
--
> {hamburger} : 5 out of 8 = 62.5 %

{hamburger}
--
> {coke} : 5 out of 7 = 71 %

{coke}
--
> {juice} : 5 out of 8 = 62.5 %

{juice}
--
> {coke} : 5 out of 8 = 62.5 %

{coke, hamburger, juice} : 2



Blind search : 2
N

candidates



AIS : basic algorithm



SETM : sort
-
merge algorithm



Apriori : tree
-
structured candidate sets



AprioriTid : temprary table generation



Partition : partitioned mining



DHP : hash
-
based algorithm



Do Heon LEE

Sequential Patterns

CID


1

1


2

2

2


3


4

4

4


5

Time


95/06/25

95/06/30


95/06/10

95/06/15

95/06/20


95/06/25


95/06/25

95/06/30

95/07/25


95/06/12

Items


30

90


10,20

30

40,60,70


30,50,70


30

40,70

90


90

CID


1

2

3

4

5

Sequence


<(30) (90)>

<(10,20) (30) (40,60,70)>

<(30,50,70)>

<(30) (40,70) (90)>

<(90)>

Maximal sequential patterns with support > 25%

<(30) (90)>

<(30) (40,70)>



Do Heon LEE

Telecommunication Network Diagnosis

node
-
C

node
-
E

node
-
H

node
-
B

node
-
F

node
-
I

node
-
A

node
-
D

node
-
G

(C, 123 )

(E, 256 )

( F, 678 )

“Co
-
occurence of 123 alarm


in C and 256 alarm in E

implies 678 alarm in F in

30 minintes.”

* time = 30 min



Do Heon LEE

Attribute Dependencies


Given attributes A1, A2, ..., Am



f
(A1, A2, ..., Am, a set of constants) ==>


g
(A1, A2, ... Am, a set of constants)



where
f

and
g

are arbitrary (boolean) functions.



e.g. (A1 = c1 and A2 = c2) then (A3 = c3 and A4 = c4)



Intractable problems because the number of possible functions and constants
are potentially infinite.


Thus, several constraints are given to make them tractable in actual domains.


e.g. LHS is a conjuction of simple predicates and RHS is an assertion of
classification
--
> Classification problem



Do Heon LEE

Classification


Symbolic classification rules(e.g. decision trees)


The most well
-
studied area among inductive learning problems.


Neural network approach


Weight values in edges
--
> symbolic description of classification rules


Still far from a practical solution <
--

too costly learning time


; Suitable for single
-
learning/multiple
-
runs problems

A1

A2

C


a

d

1

a

e

2

b

f

3

b

g

3

A1

A2

1

a

b

d

e

2

3



Do Heon LEE

Bottom
-
Up Summarization


DBLEARN project at J.Han's Lab., Simon Fraser Univ., Canada

Name


Lee

Kim

Yoon

Park

Choi

Hong

Major


music

physics

math

painting

computing

statistics

Birth_Place


Kwangju

Sunchon

Mokpo

Yeosu

Taegu

Suwon

GPA


3.4

3.9

3.7

3.4

3.8

3.2

vote


1

1

1

1

1

1

Major


art

science

science

art

science

science

Birth_Place


Chunnam

Chunnam

Chunnam

Chunnam

Kyungbuk

Kyonggi

GPA


good

execellent

execellent

good

execellent

good

vote


1

1

1

1

1

1

Major


art

science

science

science

Birth_Place


Chunnam

Chunnam

Kyungbuk

Kyonggi

GPA


good

execellent

execellent

good

vote


2

2

1

1

art

music

painting

science

physics

math

computing

Major

..
.

...

Kwangju

Chunnam

Korea

Foreign

Kyungbuk

Birth_Place

Sunchon

...

...

...

...

GPA

bad

good

execellent

[4.0
-
3.5]

(3.5,3.0]

(3.0,0.0]

Domain Knowledge

attribute
-
oriented substitution

merging redundant records



Do Heon LEE

Top
-
Down Summarization

<
w, w
>

1.000

< engineering,
w

>

0.833

<
w
, developer >

0.800

<
w
, marketer >

0.411

< engineering, developer >

0.700

<
w
, programmer >

0.589

< editor, developer >

0.489

< engineering, programmer >

0.522

< editor, programmer >

0.456

t
SD

= 0.4

Table to be summarized

PROGRAM


vi

emacs

word

gcc

tetris

USER


John

Tom

Lee

Park

Yang

Fuzzy set hierarchies

engineering

editor

game

w

...

developer

marketer

programmer

w

...

PROG_01

USR_01

: user's selection


CLEVER system at DB Lab. KAIST



Do Heon LEE

Data Mining Projects


QUEST : IBM Almeden Research Center

-
a common set of operations in a unified framework

-
classfication, association etc..


KDW(Knowledge Discovery Workbench) : GTE Laboratory Inc.

-
focus on architectural issues of data mining system

-
clustering, classification, summarization, deviation detection etc


IMACS(Intelligent Market Analysis and Classification System) : AT&T
Bell Lab

-
focus on human interaction on data mining

-
data archaeology


CoverStory : Information Resources Incorporated

-
summarization on supermarket scanner data


DBMiner/GeoMiner/WebMiner : Simon Fraser Univ.


MineSet : Silicon Graphics Inc.



Do Heon LEE

DBMiner


DBMiner Research Group in Simon Fraser Univ., Canada


DMQL : a SQL
-
like Data Mining Query Language


Data structures : Generalized relations, multi
-
dimensional data cube

Graphical User Interface

SQL Server

Discovery Modules

Data

Concept Hierarchy

DB



Do Heon LEE

DBMiner(cont’d)


Functions


Characterizer : the general characteristics of a set of user
-
specified data


attribute
-
oriented induction


eg. Cold(x) => headache(x) and cough(x)


eg. Fever(x) => headache(x) and low
-
leucocyte
-
count(x)


Discriminator : features that distinguish the target class from constrasting classes


eg. Low
-
leucocyte
-
count(x) => Fever(x)


Classifier : generalization
-
based decision tree induction


Association rule finder : multi
-
level association rules


Meta
-
rule guided miner : confine the search to specific forms of rules


eg. Meta
-
rule : major(s : student, x) and p(s, y) => GPA(s, z)


Predictor : predict the possible values for missing data, after factor analysis


eg. An employee’s potential salary can be predicted based on the salary distribution of
similar employees in the company


Data evolution evaluator


eg. Growth patterns of certain stocks


Deviation evaluator


eg. A set of stocks whose growth patterns deviate from the major trend.



Do Heon LEE

GeoMiner/WebMiner


GeoMiner with GMQL(Geo
-
Mining Query Language)


An extension of DBMiner for spatial data mining


Modules


Geo
-
characterizer


eg. Given spatial hierarchies of Western Canada, discover general
weather patterns according to region partitions


Geo
-
comparator(= discriminator)


eg. The differences in weather patterns between British Columbia and
Alberta


Geo
-
associator


WebMiner with WebQL


It finds resources in the internet related to a specific topic


eg. What is the most popular document about data mining in terms of number of
accesses


cf. WEB traversal pattern discovery(by Chen, Park and Yu, 1996)


eg. If a user visits h1 => h2 => h5 then he/she is apt to visit h8 => h11



Do Heon LEE

MineSet


Developed by Silicon Graphics Inc.


Combine intelligent data mining algorithms and multidimensional data
visualization techniques


Association rule generator/rule visualizer


Classification tools


MLC++ based classification modules


Decision tree inducer


Option tree inducer


Evidence classifier inducer


Decision table inducer


Tree/evidence visualizers


Map visualizer : spatial data analysis


Clustering module


Regressin tree inducer : predict unknown values



Do Heon LEE

Rule Visualizer of MineSet

Cited from the Silicon Graphics Inc. Home Pages



Do Heon LEE

Decision Tree Visualizer of MineSet

Cited from the Silicon Graphics Inc. Home Pages



Do Heon LEE

Map Visualizer of MineSet

Cited from the Silicon Graphics Inc. Home Pages



Do Heon LEE

Two Perspectives on Data Mining


AI practitioner’s perspective


Extensions of machine learning technology


Focus on sophisticated measures and theories rather than efficiency
improvement


DB practitioner’s perspective


Application of machine learning paradigms to massive and actual data
management problems



A suggestion as a DB practitioner


First step : Blindly search possible knowledge ==>
“ Data Mining”


There is no guru who could guide the search directions.


No available heuristics : Rather ignore heuristics for
unknown

patterns
.


Second step : Validate the discovered rough knowledge in detail



Do Heon LEE

Data Mining and Data Warehousing

Relational DB
-
1

Object
-
oriented DB
-
1

Legacy DB
-
1

File system
-
1

Relational DB
-
2

Object
-
oriented DB
-
2

Data

warehouse

builder/

manager

Metadata

Data mart
-
1

Data mart
-
2

Data mart
-
3

Data mart
-
4

Data mart
-
5

Operational Data

Data for Decision Support

Process
-
oriented

Subject
-
oriented

Data

warehouse

Data Mining



Do Heon LEE

Research Issues


Looking for useful mining targets


Associations, characteristic rules, classification, clustering


Functional dependency, regression trees


Similar sequential patterns/time series


Variations of association rules


Alternatives for simple support and confidence measures


Generalized/multilevel association rules


Performance enhancement for association rule discovery


System implementation issues


Identify core functions(eg. A tightly
-
coupled architecture[MEO98], MLC++)


Elicit common DBMS requirements for various data mining tasks


Integration with relational databases and/or multi
-
dimensional databases


Data/knowledge visualization


Extended query language or extened CLI : eg. DMQL


And so on ...



Do Heon LEE

References

[Data Mining General]



[FRW91] W. J. Frawley, G. Piatetsky
-
Shapiro and C. J. Matheus, “Knowledge Discovery in Databases : An
Overview”, Knowledge Discovery in Databases, G. Piatetsky
-
Shapiro and W. J. Frawley Ed., AAAI Press,
1991, pp. 1
-
27


[AGR93a] R. Agrawal, T. Imielinski and A. Swami,

Database Mining : A Performance Perspective

, IEEE
Trans. on Knowledge and Data Enginieering, Vol. 5, No. 6, 1993, pp. 914
-
925


[MAT93] C. J. Matheus, P. Chan and G. Piatetsky
-
Shapiro,

Systems for Knowledge Discovery in Databases

,
IEEE TKDE, Vol. 5, No. 6, 1993, pp. 903
-
913


[HOL94a] M Holsheimer and A. Siebes,

Data Mining : The Search for Knowledge in Databases

, Report CS
-
R9406, ISSN 0169
-
118X, CWI(Centrum voor Wiskunde en Informatica), The Netherland, 1994


[Association Rules]



[AGR93b] R. Agrawal, T. Imielinski and A. Swami,

Mining Associations between Sets of Items in Massive
Databases

, Proc. ACM SIGMOD, Washington D.C., May 1993


[AGR94] R. Agrawal and R. Srikant,

Fast Algorithms for Mining Association Rules in Large Databases

,
Proc. VLDB, Santiago, Sep. 1994, pp. 487
-
499


[KLE94] M. Klemettien, H. Mannila, P. Ronakainen, H. Toivonen and A. Verkamo,

Finding Interesting
Rules from Large Sets of Discovered Association Rules

, Proc. CIKM, Gaithersburg, Nov. 1994, pp. 401
-
407



Do Heon LEE

References(Cont’d)


[HOT95] M. Houtsma and A. Swami,

Set
-
Oriented Mining for Association Rules in Relational Databases

,
Proc. ICDE, Taipei, Mar. 1995, pp. 25
-
33


[SAV95] A. Savasere, E. Omiecinski, S. Navathe,

An Efficient Algorithm for Mining Association Rules in
Large Databases

, Proc. VLDB, Zurich, Sep. 1995, pp. 432
-
444


[SRI95] R. Srikant and R. Agrawal,

Mining Generalized Association Rules

, Proc. VLDB, Zurich, Sep. 1995,
pp. 407
-
419


[HAN95] J. Han and Y. Fu,

Discovery of Multiple
-
level Association Rules from Large Databases

, Proc.
VLDB, Zurich, Sep. 1995, pp. 420
-
431


[PAR95a] J.
-
S. Park and Y. Fu, “An Efficient Hash Based Algorithm for Mining Association Rules”, Proc.
SIGMOD, 1995, pp. 175
-
186


[PAR95b] J.
-
S. Park, M.
-
S. Chen and P. S. Yu, “Efficient Parallel Data Mining for Association Rules”, Proc.
CIKM, 1995


[SRI96] R. Srikant and R. Agrawal, “Minining Quantitative Association Rules in Large Relational Tables”,
Proc. SIGMOD, Quebec, Jun. 1996, pp. 1
-
12


[FUK96] T. Fukuda, Y. Morimoto, S. Morishita and T.Tokuyama, “Data Mining Using Two
-
Dimensional
Optimized Association Rules : Scheme, Algorithms, and Visualization”, Proc. SIGMOD, Quebec, Jun. 1996,
pp. 13
-
23


[CHE96] D. Cheung, J. Han, V. Ng and C.Wong, “Maintenance of Discovered Association Rules in Large
Databases : An Incremental Updating Technique”, Proc. ICDE, New Orleans, Feb. 1996, pp. 106
-
114



Do Heon LEE

References(Cont’d)


[BRI
97
a]

S
.

Brin,

R
.

Motwami,

J
.

Ullman

and

S
.

Tsur,

“Dynamic

Itemset

Counting

and

Implication

Rules

for

Market

Basket

Data”,

Proc
.

SIGMOD,

1997
,

pp
.

255
-
264


[BRI
97
b]

S
.

Brin,

R
.

Motwami

and

C
.

Silverstein,

“Beyond

Market

Baskets

:

Generalizing

Association

Rules

to

Correlations”,

Proc
.

SIGMOD,

1997
,

pp
.

265
-
276


[HAN
97
]

E
.

H
.

Han,

G
.

Karypis

and

V
.

Kumar,

“Scalable

Parallel

Data

Mining

for

Association

Rules”,

Proc
.

SIGMOD,

1997
,

pp
.

277
-
288


[AGG
98
]

C
.

C
.

Aggarwal

and

P
.

S
.

Yu,

“Online

Generation

of

Association

Rules”,

Proc
.

Int’l

Conf
.

on

Data

Engineering,

1998
,

pp
.

402
-
411


[OZD
98
]

B
.

Özden,

S
.

Ramaswamy

and

A
.

Silberschatz,

“Cyclic

Association

Rules”,

Proc
.

Int’l

Conf
.

on

Data

Engineering,

1998
,

pp
.

412
-
423


[LIN
98
]

J
.

-
L
.

Lin

and

M
.

H
.

Dunham,

“Mining

Association

Rules

:

Anti
-
Skew

Algorithms”,

Proc
.

Int’l

Conf
.

on

Data

Engineering,

1998
,

pp
.

486
-
493


[SAV
98
]

A
.

Savasere,

E
.

Omiecinski

ans

S
.

Navathe,

“Mining

for

Strong

Negative

Associations

in

a

Large

Database

of

Customer

Transactions”,

Proc
.

Int’l

Conf
.

on

Data

Engineering,

1998
,

pp
.

494
-
502


[RAS
98
]

R
.

Rastogi

and

K
.

Shim,

“Mining

Optimized

Association

Rules

with

Categorical

and

Numeric

Attributes”,

Proc
.

Int’l

Conf
.

on

Data

Engineering,

1998
,

pp
.

503
-
513




Do Heon LEE

References(Cont’d)

[Characterization]



[HAN91] Y. Cai, N. Cercone and J. Han,

Attribute
-
Oriented Induction in Relational Databases

, Knowledge
Discovery in Databases, G. Piatetsky
-
Shapiro and W. Frawley Ed., AAAI Press, 1991, pp. 213
-
228


[HAN92a] J. Han, Y. Cai and N. Cercone,

Knowledge Discovery in Databases : An Attribute
-
Oriented
Approach

, Proc. VLDB, 1992, pp. 547
-
559


[HAN92b] J. Han, Y. Cai, N. Cercone and Y. Huang,

DBLEARN : A Knowledge Discovery System for
Large Databases

, Proc. CIKM, 1992, pp. 473
-
481


[HAN93] J. Han, Y. Cai and N. Cercone,

Data
-
Driven Discovery of Quantitative Rules in Relational
Databases

, IEEE TKDE, Vol. 5, No. 1, Feb. 1993, pp. 29
-
40


[LEE94] D.
-
H. Lee and M. H. Kim,

Discovering Database Summaries through Refinements of Fuzzy
Hypotheses

, Proc. ICDE, Houston, Feb. 1994, pp. 223
-
230


[LEE97] D.H. Lee and M.H. Kim, "Database Summarization Using Fuzzy ISA Hierarchies", IEEE
Transactions on Systems, Man and Cybernetics, Vol.27, No.4, August 1997, pp. 671
-
680



Do Heon LEE

References(Cont’d)

[Sequential Patterns]



[ARG93c] R. Agrawal, C. Faloutsos and A. Swami,

Efficient Similarity Search in Sequence Databases

, Proc.
the 4th Int

l Conf. on Foundations of Data Organization and Algorithms, Chicago, Oct 1993


[FAL94] C. Faloutsos, M. Ranganathan and Y. Manolopoulos,

Fast Subsequence Matching in Time
-
Series
Databases

, Proc. SIGMOD, Minneapolis, May. 1994, pp. 419
-
429


[AGR95a] R. Agrawal and R. Srikant,

Mining Sequential Patterns

, Proc. ICDE, Taipei, Mar. 1995, pp. 3
-
14


[AGR95b] R. Agrawal, K.Lin, H. Sawhney and K. Shim,

Fast Similarity Search in the Presense of Noise,
Scaling, and Translation in Time
-
Series Databases

, Proc. VLDB, Zurich, Sep. 1995, pp. 490
-
501


[AGR95c] R. Agrawal, G. Psaila, E. Wimmers and M. Zait,

Querying Shapes of Histories

, Proc. VLDB,
Zurich, Sep. 1995, pp. 502
-
514


[HAT96] K. Hatonen, M. Klemettinen, H. Mannila, P. Ronkainen and H. Toivonen, “Knowledge Discovery
from Telecommunication Network Alarm Databases”, Proc. ICDE, New Orleans, Feb. 1996, pp. 115
-
123


[SHA96] H. Shatkay and S.Zdonik, “Approximate Queries and Representations for Large Data Sequences”,
Proc. ICDE, New Orleans, Feb. 1996, pp. 536
-
545


[LI96] C. Li, P. Yu and V. Castelli, “HierarchyScan: A Hierarchical Similarity Search Algorithm for
Databases of Long Sequences”, Proc. ICDE, New Orleans, Feb. 1996, pp. 546
-
555


[CHE96] M.
-
S. Chen, J. S. Park and P. S. Yu, “Data Mining for Path Traversal Patterns in a Web
Environment”, Proc. ICDCS, 1997, pp. 385
-
392


[SHA
97
]

J
.

Shafer

and

R
.

Agrawal,

“Parallel

Algorithms

for

High
-
Dimensional

Proximity

Joins”,

Proc
.

VLDB,

1997
,

pp
.

176
-
185



Do Heon LEE

References(Cont’d)

[Classification/Clustering]



[QUI89] J. Quinlan and R. Rivest,

Inferring Decision Trees Using the Minimum Description Length
Principle

, Information and Computation, Vol. 80, 1989, pp. 227
-
248


[YAS91] R. Yasdi,

Learning Classification Rules from Database in the Context of Knowledge Acquisition
and Representation

, IEEE TKDE, Vol. 3, No. 3, Sep. 1991, pp. 293
-
306


[CHA91] K. Chan and A. Wong,

A Statistical Technique for Extracting Classificatory Knowledge from
Databases

, Knowledge Discovery in Databases, G. Piatetsky
-
Shapiro and W. Frawley Ed., AAAI Press, 1991,
pp. 107
-
123


[UTH91] R. Uthursamy, U. Fayyad and S. Spangler,

Learning Useful Rules from Inconclusive Data

,
Knowledge Discovery in Databases, G. Piatetsky
-
Shapiro and W. Frawley Ed., AAAI Press, 1991, pp. 141
-
157


[ZIA91] W. Ziarko,

The Discovery, Analysis and Representation of Data Dependencies in Databases

,
Knowledge Discovery in Databases, G. Piatetsky
-
Shapiro and W. Frawley Ed., AAAI Press, 1991, pp. 195
-
209


[PIA91] G. Piatetsky
-
Shapiro,

Discovery, Analysis and Presentation of Strong Rules

, Knowledge Discovery
in Databases, G. Piatetsky
-
Shapiro and W. Frawley Ed., AAAI Press, 1991, pp. 229
-
248


[MAN91] M. Manago and Y. Kodratoff,

Induction of Decision Trees from Complex Structured Data

,
Knowledge Discovery in Databases, G. Piatetsky
-
Shapiro and W. Frawley Ed., AAAI Press, 1991, pp. 289
-
306



Do Heon LEE

References(Cont’d)


[SMY92] P. Smyth and R. Goodman,

An Information Theoretic Approach to Rule Induction from
Databases

, IEEE TKDE, Vol. 4, No. 4, Aug. 1992, pp. 301
-
316


[WAN92] L. Wang and J. Mendel,

Generating Fuzzy Rules by Learning from Examples

, IEEE TSMC, Vol.
22, No. 6, Nov. 1992, pp. 1414
-
1427


[AGR92] R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer and A. Swami,

An Interval Classifier for Database
Mining Applications

, Proc. VLDB, Vancouver, Aug. 1992, pp.207
-
216


[LU95] H. Lu, R. Setiono and H. Liu,

NeuroRule : A Connectionist Approach to Data Mining

, Proc. VLDB,
Zurich, Sep. 1995, 478
-
489


[HON91] J. Hong and C. Mao,

Incremental Discovery of Rules and Structure by Hierarchical and Parallel
Clustering

, Knowledge Discovery in Databases, G. Piatetsky
-
Shapiro and W. Frawley Ed., AAAI Press, 1991,
pp. 177
-
194


[NG94] R. Ng and J. Han, “Efficient and Effective Clustering Methods for Spatial Data Mining”, Proc. VLDB,
1994, pp. 144
-
155


[XU
98
]

X
.

Xu,

M
.

Ester,

H
.

-
P
.

Kriegel

and

J
.

Sander,

“A

Distribution
-
Based

Clustering

Algorithm

for

Mining

in

Large

Spatial

Databases”,

Proc
.

Int’l

Conf
.

on

Data

Engineering,

1998
,

pp
.

324
-
333




Do Heon LEE

References(Cont’d)

[System Implementations]



[SEL96] P.Selfridge, D.Srivastava and L. Wilson, “IDEA : Interactive Data Exploration and Analysis”, Proc.
SIGMOD, Quebec, Jun. 1996, pp. 24
-
34


[MEO98] R. Meo, G. Psalia and S. Ceri, “A Tightly
-
Coupled Architecture for Data Mining”, Proc. Int’l Conf.
on Data Engineering, 1998, pp. 316
-
323


[HAN96] J. Han et. al., “DBMiner : A System for Mining Knowledge in Large Relational Databases”, Proc.
KDD, 1996


[HAN97] J. Han et. al., “GEOMiner : A System Prototype for Spatial Data Mining”, Proc. SIGMOD, 1997


[HAN98] “WebMiner : A Resource and Knowledge Discovery System for the Internet”,
http://db.cs.sfu.ca/WebMiner/


[KOH96] R. Kohavi et. al., “Data Mining Using MCL++ : A Machine Learning Library in C++”, Proc. Tools
with AI, 1996, pp. 234
-
245


[HAL98] C. Hall ed., “MineSet 2.0 for Data Mining and Multidimensional Data Analysis”,
http://www.cgi.com/Products/software/MineSet/DMStrategies/index.html