Dr. Christos Nikolopoulos
Office: BR 197
(309) 677

2456
chris@bradley.edu
class web site at :
http://hilltop.bradley.edu/~chris/
and on Sakai
Office hours:
T
and
TH 11:45

12:30 and by appointment
CS 563
FALL Term
KNOWLEDGE DISCOVERY AND DATA MINING
Required
Textbook:
Witten I.
and Frank E., DATA MINING
: Practical Machine Learning Tools and
Techniques,
Morgan Kaufmann Publishers, 2008
Optional References:
1
.
Cios et al., DATA MINING: A Knowledge Discovery Approach
, Springer, 2007.
2.
Chris Nikolopoulos,
Expert Systems: An Introductionto First, Second
Generation
and Hybrid Knowledge based Systems,Marcel Dekker, 1997.
Description:
Advances in Knowledge Discovery and Data Mining bring together the latest research in
the areas of
statistics, databa
ses, machine learni
ng, and artificial intelligence
which
together
contribute to the rapidly growing field of knowledge discovery and data mining.
Topics covered include fundamental issues, knowledge representation, cleaning and
reprocessing of data sets, c
lassification and clustering, machine learning algorithms,
comparing machine learning algorithms and models, evaluating performance.
The
complimentary topic of Data Warehousing and OLA
P is covered in the class CS 572
,
Advanced Databases.
Learning Outcomes
Upon
succ
essful completion of the course
students will be able to:

approach data mining as a process
.

u
nderstand the mathematical
and
statistic
al
foundations o
f the machine learning
algorithms involved and
be able to provide a clear and concise descri
ption of testing and
benchmarking experiments.

possess a toolbox of techniques that can be immediately applied to real world knowledge
discovery problems, for
clustering, estimation, prediction
, and classification,
including
algorithms for
k

means cluste
ring,
classification and regression trees, the C4.5 algorithm,
logistic Regression,
k

nearest neighbor, multiple regression, and neural networks.

b
e proficient
in at least one
leading
data mining software, for example
WEKA.

reason as to which method nee
ds to be applied in a given situation depending on the
specific application domain and additional requirements

evaluate and compare different models using various
statistical techniques, for exa
mple
Bernoulli trials and statistic variables such as Kappa

s
tatistic, confusion matrix, RMS
error, etc.
Time Schedule
:
The
data set to be analyzed
for the final project
must
be chosen by
November 1
st
.
The
final project
written
report
has to be written in a research paper
form
at
(abstract,
introduction, main sect
ions, conclusions, bibliography
in PLA format
)
and is
due back
by
email
by December 13
th
11:00 a
.m.
The table below gives the reading assignments from the books and online sources.
The
dates are tentative and may be adjusted.
# Date
Topics
for online
discussion
Readings/Assignments
Day 1
What is DM/KD? Overview of what we
are going to cover.
Day 2
Review of Statistics
Notes emailed to you
Day 3
Review of Statistics and
Introduction to Machine Learning tools
and techniques
Witten Part I, Chapte
r 1, pp.
4

39
Day 4
Introduction to Machine Learning tools
and techniques
Witten Part I, Chapter 1, pp.
4

39
Day 5
Input: Concepts, instances and attributes
Witten Part I, Chapter 2, pp.
41

60
Day 6
Output: Knowledge representation
Witten Part I, Chapte
r 3, pp.
61

82
Day 7
Output: Knowledge representation
Witten Part I, Chapter 3, pp.
61

82
Day 8
Machine Learning: the basic methods
Witten Part I, Chapter 4, pp.
83

111, sections 4.1

4.4
Watch video 1
Day 9
Machine Learning: the basic methods
Witten Pa
rt I, Chapter 4, pp.
83

111, sections 4.1

4.4
Watch video 1
Day 10
Machine Learning: the basic methods
Witten Part I, Chapter 4, pp.
112

139, sections 4.5

4.9
Watch video 2
Day 11
Machine Learning: the basic methods
Witten Part I, Chapter 4, pp.
112

139,
sections 4.5

4.9
Watch video 2
Day 12
Machine Learning: the basic methods
Witten Part I, Chapter 4, pp.
112

139, sections 4.5

4.9
Watch video 2
Day 13
The WEKA machine learning workbench
Witten Part II, Chapter 9, pp.
365

368 and Chapter 10, pp.
369

401
Day 14
NO CLASS

Bradley on Fall Recess
Tuesday,
October 22
nd
MIDTERM EXAM
Test is on
Witten’s chapters
1,2,3,
and
4
Day 16
Discuss exam/answers
Day 17
The WEKA machine learning workbench
Witten Part II, Chapter 10, pp.
401

423
Day 18
Evaluating
the discovered knowledge
Witten Part I, Chapter 5, pp.
143

160
Watch video 3
Day 19
Evaluating the discovered knowledge
Witten Part I, Chapter 5, pp.
160

183
Watch video 4
Day 20
Decide on a data set to use for Final
Project (could use the University of
California Irvine Machine Learning Data
Depository
http://archive.ics.uci.edu/ml/
)

send email to instructor to notify him of
which data set you chose
Report which data set you
chose and d
iscuss data sets
i
n class
Day 21
Engineering the input and output, attribute
selection, discretizing, automatic data
cleansing
Witten Part I, Chapter 7, pp.
285

341
Day 22
Engineering the input and output, attribute
selection, discretizing, automatic data
cleansing
Witten
Part I, Chapter 7, pp.
285

341
Day 23
Details on Decision trees, classification
rules, extending linear models, neural nets
Witten Part I, Chapter 6, pp.
187

235, sections 6.1

6.3
Watch video 5
Day 24
Details on Decision trees, classification
rules, ext
ending linear models, neural nets
Witten Part I, Chapter 6, pp.
187

235, sections 6.1

6.3
Watch video 5
Day 25
Details on Decision trees, classification
rules, extending linear models, neural nets
Witten Part I, Chapter 6, pp.
187

235, sections 6.1

6.3
Wa
tch video 5
Day 26
Instance

based learning, numeric
prediction, clustering, Bayesian networks
Witten Part I, Chapter 6, pp.
235

283
Day 27
NO CLASS

Thanksgiving Break
Day 28
Instance

based learning, numeric
prediction, clustering, Bayesian networks
Wit
ten Part I, Chapter 6, pp.
235

283
Day 29
Instance

based learning, numeric
prediction, clustering, Bayesian networks
Witten Part I, Chapter 6, pp.
235

283
Day 30
Review
, Project report is due
Project report is due by email
by 5:00 p.m
Friday,
December
1
3th
FINAL EXAM
1
2
:
0
0

2
:
0
0
Comprehensive but primarily
Witten’s chapters 5,6 and 7
Assessment
4
00 Points Total
25
% Midterm Exam
25
% Final
Data Mining project
report
2
5% homework assignments
25
%
Final Exam
Some
Videos
on DM/KD to watch
:
Video
1
:
II
T lecture 1:
http://www.bing.com/videos/watch/video/lecture

34

data

mining

and

knowledge

discovery/1
d0668894dc732fe82b91d0668894dc732fe82b9

83872645560
Video
2
: IIT lecture 2:
http://www.bing
.com/videos/watch/video/lecture

35

data

mining

and

knowledge

discovery

part

ii/f2c1c8cfcc5e319417f6f2c1c8cfcc5e319417f6

29437526744
Video
3
: DM and KD:
http://videolectures.net/mps07_lavrac_dmkd/
Video 4: Data Mining at NASA:
http://videolectures.net/kdd09_srivastava_dmnasata/
Video 5: SQL Know How Video,
http://www.microsoft.com/showcase/en/us/details/38b7e057

42d2

4a8c

b4d2

3154bc35d87a
More:
http://videolectures.net/Top/Computer_Science/Data_Mining/
Data Mining Project
:
The project
could
be
worked on
either as individual project or
as
a team project (teams
of at most two members).
The project is open ended and it involves applying WEKA
to
analyze a data set
. Which algorithms to use and which are t
he most appropriate, how to
clean the data etc. is entirely up to you.
Statistical analysis is to be performed to compare
models and find the most appropriate and accurate model.
To be of enough complexity
the data set should contain both numeric and nomin
al values and also holes.
To find a
d
ata set
for your project
,
a possible source is
the
machine learning depository
stored at
the
University of California Irvine
’s ML site
:
http://archive.ics.uci.edu/ml/
.
The data
mining software you will use for your project is WEKA (see Witten's book and the
download link in the main class page)
.
Comments 0
Log in to post a comment