Data Mining in Mobile and Cloud Computing Environments: Course Organization and Survey

voltaireblingΔιαχείριση Δεδομένων

20 Νοε 2013 (πριν από 3 χρόνια και 4 μήνες)

84 εμφανίσεις

Computing & Information Sciences

Kansas State University

CIS 690

Data Mining in Mobile and Cloud Computing Environments

William
H.
Hsu, Computing
and Information
Sciences

Shih
-
Hsiung

Chou, Industrial and Manufacturing Systems Engineering


Kansas State University


KSOL course page
:
http://bit.ly/a68KuL


Course web site:
http://
www.kddresearch.org/Courses/CIS690


Instructor home page:
http://www.cis.ksu.edu/~bhsu


Reading for Next Class:

Syllabus and Introductory
Handouts

Instructions for Labs 0


1

Han &
Kamber

2
e
, Sections 1.1


1.4.3 (pp. 1


25), 6.1 (pp. 285


289)

Data Mining in

Mobile and Cloud Computing Environments:

Course Organization and Survey

Lecture 0 of 27:

Part A


Course Organization

Computing & Information Sciences

Kansas State University

CIS 690

Data Mining in Mobile and Cloud Computing Environments

Course Administration


Course
Page
(KSOL
):
http://bit.ly/a68KuL



Class Web Page:
www.kddresearch.org/Courses/CIS690



Instructional E
-
Mail Addresses


Best Way to Reach Instructor


CIS690TA
-
L@listserv.ksu.edu

(always use this to reach instructor and TA)


CIS690
-
L@listserv.ksu.edu


Instructor: William Hsu, Nichols
324C


Office phone: +1 785 532 7905; home phone: +1 785 539 7180


IM:
AIM
/MSN/
YIM

hsuwh
/
rizanabsith
, ICQ
28651394
/
191317559
, Google
banazir


Office hours: after class Mon/Wed/Fri; other times by appointment


Graduate Teaching Assistant:
To Be Announced


Office location: Nichols
124 (CIS Visualization Lab) & Nichols 218


Office hours: to be announced on class web board


Grading
Policy: Overview


Midterm exam: 15%


Homework: 15%


Term project: 50%


Labs: 20% (1% each; see calendar)

Computing & Information Sciences

Kansas State University

CIS 690

Data Mining in Mobile and Cloud Computing Environments

Course
Policies


Letter Grades


15% graduations (85+%: A, 70+%: B, etc.)


Cutoffs may be more lenient, but a) never higher and b) seldom much lower


Grading
Policy


Exams: midterm (in
-
class, open
-
book/notes) 15%


Homework: 15% (2 written, 2 programming, 2 mixed; drop lowest 2, 3% each)


Term project (including proposal, interim, final reports): 50%


Labs (upload solutions to K
-
State On
-
Line file
dropbox
): 20%


Late Homework Policy


Allowed only in case of medical excusal


All other late homework: see drop policy


Attendance Policy


Absence due to travel or personal reasons: e
-
mail CIS690TA
-
L in advance


See instructor, Office of the Dean of Student Life as needed


Honor System Policy:
http://www.ksu.edu/honor/


On plagiarism: cite sources, use quotes if verbatim, includes textbooks


OK to
discuss

work, but turn in
your own work only


When in doubt, ask instructor

Computing & Information Sciences

Kansas State University

CIS 690

Data Mining in Mobile and Cloud Computing Environments


Course Content Management System (CMS)


http://
www.kddresearch.org/Courses/CIS690



Lecture notes (MS
PowerPoint 97
-
2010,
PDF)


Homeworks

(MS
Word 97
-
2010,
PDF)


Exam and homework solutions (MS PowerPoint
97
-
2010,
PDF)


Class announcements (students’ responsibility) and grade postings


Course Notes
Online and at
Copy Center (Required)


Mailing List (Automatic):
CIS690
-
L@listserv.ksu.edu



Homework/exams (before uploading to CMS, KSOL), sample data, solutions


Class participation


Project info, course calendar reminders


Dated research announcements (seminars, conferences, calls for papers)


LISTSERV Web Archive


http://
listserv.ksu.edu/archives/cis690
-
l.html



Stores e
-
mails to class mailing list as
browsable
/searchable posts

Class Resources

Computing & Information Sciences

Kansas State University

CIS 690

Data Mining in Mobile and Cloud Computing Environments

Recommended Text


Witten, I. H. & Frank, E. (2006).
Data Mining:
Practical Machine Learning Tools and
Techniques, second edition.

San Francisco,
CA, USA: Morgan Kauffman.

Other References

[on Reserve in Main or CIS Library]


Han, J. &
Kamber
, M. (2006).
Data
Mining: Concepts and Techniques,
second edition.

San Francisco, CA,
USA: Morgan Kauffman.


Mitchell, T. M. (1997)
Machine Learning.

New York, NY, USA: McGraw
-
Hill.


Tan, P.
-
N., Steinbach, M., & Kumar, V.
(2006).
Introduction to Data Mining.

Reading, MA, USA: Addison
-
Wesley.


Textbook

and Recommended References

Mitchell (1997)

Witten & Frank 2
e

Tan
et al.
(2006)

1
st

edition (outdated)

Han &
Kamber

2
nd

edition

Computing & Information Sciences

Kansas State University

CIS 690

Data Mining in Mobile and Cloud Computing Environments


Both Courses


Proficiency in high
-
level programming language (C++/C#, Java, Python,
etc.
)


Required: course in data structures


Recommended
:
discrete mathematics, probability


At least
80
hours for semester (up to
120
depending on term project)


Textbook


Data Mining: Concepts and Techniques
, 2
e

, Han &
Kamber

(2006)


Reserve texts: Mitchell’s
Machine Learning,
several other outside references


CIS
690
Data Mining in Mobile and Cloud Computing Environments


Fresh background in
symbolic logic, discrete math (sets, relations, counting)


Some background assumed in linear algebra, calculus


New topics: classification/regression, association, optimization, clustering


“Mathematical maturity”: ready to learn more


CIS
798
Topics in Computer Science


Recommended:
two programming courses


Read
up on
heuristic search, games, constraints, knowledge representation


AI programming experience helps (background lectures as needed)


Watch
advanced topics lectures; see list before choosing project topic

Background Expected

Computing & Information Sciences

Kansas State University

CIS 690

Data Mining in Mobile and Cloud Computing Environments

Syllabus [1]:

First Half of Course

Computing & Information Sciences

Kansas State University

CIS 690

Data Mining in Mobile and Cloud Computing Environments

Syllabus [2]:

Second Half of Course

Computing & Information Sciences

Kansas State University

CIS 690

Data Mining in Mobile and Cloud Computing Environments


Basics:
First
Two Weeks (Hours 2


9
of Course)


Review of mathematical
foundations: set theory, discrete math, probability


Types

of machine learning algorithms


Combinatorial analysis: mappings and counting


Bayesian classification


Bayesian Inference


Hour 3: association rules, statistical evaluation


Hours 6


10: Naïve Bayes, classification in R


Hours 15


18: clustering, Expectation
-
Maximization (EM)


Other Math Topics to be Covered


Information theory: decision tree induction, rule induction


Basic statistical hypothesis testing


Frequent
itemsets
: association rule mining


Convex optimization: constraints, linear and quadratic programming (QP)


Distance measures: clustering


Logic: propositional, first
-
order, resolution

Math Background

To Be Covered

Computing & Information Sciences

Kansas State University

CIS 690

Data Mining in Mobile and Cloud Computing Environments

Computing Platform:

Mobile/Cloud Environments


Android


Operating system: modified Linux


For mobile devices (Motorola Droid, HTC Incredible,
etc.
)


Android, Inc. & Open Handset Alliance


Software development kit: download from
http://developer.android.com/sdk/



S
oftware
E
nvironment for the
A
dvancement of
S
cholarly
R
esearch


Originally developed for compute clusters


Adapted for cloud computing environments


SEASR



overall environment:
http://seasr.org



Meandre



data mining flows:
http://seasr.org/meandre/


© 2005


present, National Center for Supercomputing Applications (NCSA)

Computing & Information Sciences

Kansas State University

CIS 690

Data Mining in Mobile and Cloud Computing Environments

Computing Platform:

Data Mining Software


Waikato Environment for Knowledge Analysis (WEKA)


Data mining package


Most popular machine learning and data mining software at present


Download from
http://www.cs.waikato.ac.nz/ml/weka/



R Interpreter


R: popular programming language for computational statistics


Used for data mining implementations


C
omprehensive
R

A
rchive
N
etwork (CRAN):
http://cran.r
-
project.org



Apache Hadoop


Java software framework


Data
-
intensive distributed applications


Inspired by Google MapReduce and Google File System (GFS)

Computing & Information Sciences

Kansas State University

CIS 690

Data Mining in Mobile and Cloud Computing Environments

About Project Proposals


Proposals


About 1
-
2 pages; due at end of second week of course, one revision allowed


Team projects: up to 2 people


Contents:
at least

one paragraph on each of


1. Problem statement: describe task, objectives, purpose


2. Background: survey related work and applicable approaches


3. Methodology: describe planned approach


4. Evaluation criteria: how will performance be assessed?


5. Milestones: what will be done, when?


Post Questions and Drafts to Class Mailing List