Data Mining

journeycartΤεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

122 εμφανίσεις

Data Mining

Lecture 3

Course Syllabus


Course topics
:


Introduction

(
Week1
-
Week2
)


What is Data Mining?


Data Collection and Data Management Fundamentals


The Essentials of Learning


The Emerging Needs for Different Data Analysis
Perspectives


Data Management and Data Collection Techniques for
Data Mining Applications

(Week3
-
Week4
)


Data Warehouses: Gathering Raw Data from Relational
Databases and transforming into Information.


Information Extraction and Data Processing Techniques


Data Marts: The need for building highly specialized data
storages for data mining applications

Week3
-

Remainder
-
Data to
Knowledge Pyramid


Increasing potential

to support

business decisions

End User

Business


Analyst


Data

Analyst

DBA


Making

Decisions

Data Presentation

Visualization Techniques

Data Mining

Information Discovery

Data Exploration

OLAP, MDA

Statistical Analysis, Querying and Reporting

Data Warehouses / Data Marts

Data Sources

Paper, Files, Information Providers, Database Systems, OLTP

Week 2
-

Remainder
-

Data Mining
Perspective to Knowledge
Discovery





adapted from:

U. Fayyad, et al. (1995), “From Knowledge Discovery to Data
Mining: An Overview,” Advances in Knowledge Discovery and
Data Mining, U. Fayyad et al. (Eds.), AAAI/MIT Press

Data

Target

Data

Selection


Knowledge

Preprocessed

Data

Patterns

Data Mining

Interpretation/

Evaluation

Preprocessing

Week 3
-

Remainder
-

Essentials of
Learning

Learning ?



can we formalize it?



is it just a chemical activation?



is it memorization?



is it continous node connecting/disconnecting
on dynamically changing brain network
topology?

Week 3
-

Remainder
-
Essentials of
Learning

The Artifical Intelligence View:


central to human knowledge and intelligence,
essential for building intelligent machines.




years of effort in AI has shown that trying to build
intelligent computers by programming all the rules
cannot be done;
automatic learning is crucial
. For
example, we humans are not born with the ability to
understand language


we learn it


and it makes
sense to try to have computers learn language
instead of trying to program it all it

Week 3
-

Remainder
-

Essentials of
Learning

The Software Engineering View:



Machine Learning allows us
to program computers by example
,
which can be easier than writing code the traditional way.




The Stats View:



Machine Learning is the marriage of computer science and statistics



computational techniques are applied to statistical problems. Machine
Learning has been applied to a vast number of problems in many
contexts, beyond the typical statistics problems. Machine Learning is
often designed with different considerations than statistics (e.g., speed
is often more important than accuracy).

Week 3
-

Essentials of Learning

Informal Learning Problem Definition:

computer

program that
improves its performance

at some task

through experience



Formal Learning Problem Definition:

computer program is said to learn
from experience E

with respect

to
some class of tasks T

and
performance measure P
,

if its performance at tasks in

T, as measured by P,

improves with experience E

Week 3
-

Essentials of Learning

A che
ss

learning problem:

Task T:
playing che
ss

Performance measure P:
percent of games won against opponents

Training experience E:
playing practice games against itself


A handwriting recognition learning problem:

Task T:

recognizing and classifying handwritten words within images

Performance measure P:

percent of words correctly classified

Training experience E:

a database of handwritten words with given classifications


A robot driving learning problem:

Task T:

driving on public four
-
lane highways using vision sensors

Performance measure P:

average distance traveled before an error (
as judged

by human overseer
)

Training experience E:

a sequence of images and steering commands recorded

while observing a human driver

Week 3
-

Essentials of Learning

Attributes of Experience


learn from
direct
training examples consisting of states and

the correct move for each


supervised learning


-
CHESS PROBLEM


providing

individual chess board states and the correct move for each


learn from

indirect
information consisting of the moves and final outcomes

of these moves.


unsupervised learning


-
CHESS PROBLEM


providing

sequences of moves and final outcomes of various games played


causality


credit assignment


Week 3
-

Essentials of Learning

Attributes of Experience


the degree to which the learner controls the sequence of training examples


CHESS PROBLEM

rely on the teacher to select informative board states and to provide the

correct move

for each


the learner might itself propose board states that it finds particularly confusing and ask

the teacher for the correct move


learner may have complete control over both the board states and (indirect) training

classifications, as it does when it learns by playing against itself with no teacher

present


Week 3
-

Essentials of Learning

Attributes of Experience

how well it represents the distribution of examples

over which the final system

performance P must be measured



!!!!

most current theory of machine learning rests on the crucial assumption that

the distribution of training examples is identical to the distribution of

test examples.
Despite our need to make this assumption in order to obtain

theoretical results, it is important to keep in mind that
this assumption must

often be violated in practice
.

Week 3
-

Essentials of Learning

Central Limit Theorem

The Central Limit Theorem is a theorem stating that

the sum of a large number of independent,


identically

distributed random variables

approximately follows a Normal distribution


Consider a

set of independent, identically

distributed random variables
Y1
. . .
YN,
governed by
an arbitrary probability

Distribution






even if we dont know the distrubition of individual Yi but we could compute the distribution

of


A

common rule of thumb is that we can use the

Normal approximation when
n
>=
30

follows the Normal Distrubition

Week 3
-

Essentials of Learning

Operational Definition of Learning Function
-


Target Function

Given training experience and target definition deciding on

learning architecture by considering

correctness

applicability

performance


CHESS PROBLEM

Ideal Target Function:

ChooseMove : B
-
>

M to indicate

that this function accepts as input any board from the set of

l
egal

boar
d
states B

and produces as output some move from the set

of legal moves M.


What if indirect training experience available to our system ?


Week 3
-

Essentials of Learning

Operational Definition of Learning Function
-


Target Function

Given training experience and target definition deciding on

learning architecture by considering

correctness

applicability

performance


CHESS PROBLEM

Operational Target Function:

V : B
-
>R

to denote that V maps

any legal

board state from the set B to some real value
.


assign higher scores to

better board states.

then use it to select the best move from any current board position.

can be accomplished by generating the successor board state produced by

every legal move, then using V to choose the best successor state and

t
herefore

the best legal move.

Is it really operational Not So!! searching all the way down to the end of

game. Computationally not operational


Week 3
-

Essentials of Learning

Operational Definition of Learning Function
-


Target Function

Given training experience and target definition deciding on

learning architecture by considering

correctness

applicability

performance


CHESS PROBLEM

Operational Target Function:

V : B
-
>R

to denote that V maps

any legal

board state from the set B to some real value
.


assign higher scores to

better board states.

then use it to select the best move from any current board position.

can be accomplished by generating the successor board state produced by

every legal move, then using V to choose the best successor state and

t
herefore

the best legal move.

Is it really operational Not So!! searching all the way down to the end of

game. Computationally not operational


Week 3
-

Essentials of Learning

Operational Definition of Learning Function
-


Target Function

Given training experience and target definition deciding on

learning architecture by considering

correctness

applicability

performance


CHESS PROBLEM

Choosing complex target function brings expressebility

but also bring performance battleneck

also brings the urgent need on extra more training examples (a lot more) to learn


Real issue is choosing the operation target function
-
>
MODELING
-
>function
approximation


Week 3
-

Essentials of Learning

Importance of Target Function


Target function simply determines the size of our hypothesis
space (solution space)

What if needed solution cannot be represented in our hypothesis
space


lets have perfect hypothetical H hypothesis space that can represent
every teachable function


so expressebility is not our problem are we OK with that H


NO we are now completely unable to generalize beyond the
observed examples


Week 3
-

Essentials of Learning

Importance of Target Function


if we need generalization and applicability to unseen

instances we must choose
biased
-
target function (generalizable
target function)


a learner that makes no a priori assumptions regarding the
identity of the target

function
has no rational basis for
classifying any unseen instances


we simply wish to capture here is the policy by which the learner
generalizes beyond the observed training data, to infer the
classification of new instances


Week 3
-

Essentials of Learning

Inductive Bias

target concept = target function

Formal Definition:


Week 3
-

Essentials of Learning

Inductive Bias Examples


ROTE
-
LEARNER:
Learning corresponds simply to storing each observed
training example in memory. Subsequent instances are classified by looking
them up in memory. If the instance is found in memory, the stored
classification is returned. Otherwise, the system refuses to classify the new
instance.

Inductive Bias: Bias
-

Free


CANDIDATE
-
ELIMINATlION ALGORITHM
: New instances are classified
only in the case where all members of the current version space (subset of
hypetheses consistent with our training examples) agree on the
classification. Otherwise, the system refuses to classify the new instance.

Inductive Bias: Target Concept in the Hypothesis Space


FIND
-
S:

This algorithm, finds the most specific hypothesis consistent with
the training examples. It then uses this hypothesis to classify all subsequent
instances.

Inductive Bias : Target Concept in the Hypothesis Space and most
specific hypothesis represent it


Week 3
-

Essentials of Learning

Search Bias vs Restriction Bias


ID3


searches a complete hypothesis space (i.e., one capable of expressing

any finite discrete
-
valued function). It searches incompletely through this

space, from simple to complex hypotheses, until its termination condition is

met (e.g., until it finds a hypothesis consistent with the data). Its inductive

bias is solely a consequence of the ordering of hypotheses by its search

strategy. Its hypothesis space introduces no additional bias. (
SEARCH
BIAS, PREFERENCE BIAS
)


CANDIDATE
-
ELIMINATlON

algorithm searches an incomplete

hypothesis space (i.e., one that can express only a subset of the potentially

teachable concepts, version space). It searches this space completely,
finding every hypothesis consistent with the training data. Its inductive bias
is solely a consequence of the expressive power of its hypothesis
representation. Its search strategy introduces no additional bias.
(
RESTRICTION BIAS LANGUAGE BIAS
)

Week 3
-
End


read


Supplemantary Book “Machine Learning”
-

Tom Mitchell Chapter 1


Chapter 2


Course Text Book Chapter 2 (
preparation for
the next week
)