CS 591.003 Title: Instructor: Time:Room: Office Hours:

siberiaskeinData Management

Nov 20, 2013 (3 years and 11 months ago)

220 views

CS 591.003

Title:
Topics on Data Mining

Instructor:

Abdullah Mueen

Time:

MWF 10:00
-

10:50 AM

Room:

Science Math Learning Center 352

Office Hours:

Wed & Thu, 11:00AM
-
1:00PM


Description:

This course covers a range of topics on data mining. Introductory
topics: clustering, classification, outlier Detection and association
-
rule discovery.
Advanced topics: technologies for data mining (Data
-
Cube, MapReduce), algorithms
for mining rich data types (time series, graph, trajectory) and applications of mining
al
gorithms (search result ranking, recommender system). The course will have
lectures on the introductory topics and assigned reading on the advanced topics.


What you will learn:




Basic data mining algorithms and their applications.



Hands
-
on experience in c
leaning, managing and processing large data.



Some advanced data mining applications in specific domains and the
challenges need to be solved.



How to write papers in data mining workshops and conferences when you
have good results.


Book:

Data Mining: Concepts and Techniques, 3rd

ed.

By
Jiawei

Han,

Micheline

Kamber

and

Jian

Pei


Grading:

Grading will be based on the project

(60%)

and presentation

(40%)

of the
paper with heavy
emphasis

on
the
project. There wi
ll be no exam.


Lecture Schedule:



Week 1:

Classification: Chapter 8

Week 2:

Classification: Chapter 8
+9

Week 3:

Frequent Pattern Mining: Chapter 6
, Labor day

Week 4:

Clustering: Chapter 10

Week 5:

Clustering: Chapter 10+11

Week 6:

Outlier Detection
:

Chapter 12

Week 7:

Time Series Mining
: Slides

Week 8:

Data
M
ining Tools: Matlab, Weka, VW
, Fall Break


Week 9:

Mining other data types
: Slides

Week 10:

Paper presentation

Week 11:

Paper Presentation

Week 12
:

Paper Presentation

Week 13
:

Paper
Presentation

Week 14
-
16

Project
, Thanksgiving


Papers:

Complete list of papers is in the course page.
If you want to present a paper of your
own choice, f
eel free to s
end it

before 5
th

for my approval.


Presentation:

Each student present one paper select
ed from the pool. The presentation will be for
30 minutes and remaining time will be discussion.
Every student should pick a
paper and a day by the 5
th

week. The schedule will be maintained in the course page.


Project:

Each group will do one project. A gr
oup can have at most two students. Of course,
expected work would be twice as much as one would do. I prefer individual projects.


A project consists of three phases.


1.

Proposal

(20%)
: Pick a data from below. Define a problem/pattern/structure
you want to s
olve/find/utilize in the data. Discuss the expected results if you
succeed. If you don't want to define your own, you can propose to reproduce
the original paper of the respective dataset which would be much harder.

2.

Implementation

(20%)
:

Solve/find/utiliz
e the problem/pattern/structure in
the data automatically.

You can use any language and platform.


3.

Report

(20%)
:

Write up the method you applied and discuss the
findings/results
.