Data Mining
Lecture 3
Course Syllabus
•
Course topics
:
•
Introduction
(
Week1

Week2
)
–
What is Data Mining?
–
Data Collection and Data Management Fundamentals
–
The Essentials of Learning
–
The Emerging Needs for Different Data Analysis
Perspectives
•
Data Management and Data Collection Techniques for
Data Mining Applications
(Week3

Week4
)
–
Data Warehouses: Gathering Raw Data from Relational
Databases and transforming into Information.
–
Information Extraction and Data Processing Techniques
–
Data Marts: The need for building highly specialized data
storages for data mining applications
Week3

Remainder

Data to
Knowledge Pyramid
Increasing potential
to support
business decisions
End User
Business
Analyst
Data
Analyst
DBA
Making
Decisions
Data Presentation
Visualization Techniques
Data Mining
Information Discovery
Data Exploration
OLAP, MDA
Statistical Analysis, Querying and Reporting
Data Warehouses / Data Marts
Data Sources
Paper, Files, Information Providers, Database Systems, OLTP
Week 2

Remainder

Data Mining
Perspective to Knowledge
Discovery
adapted from:
U. Fayyad, et al. (1995), “From Knowledge Discovery to Data
Mining: An Overview,” Advances in Knowledge Discovery and
Data Mining, U. Fayyad et al. (Eds.), AAAI/MIT Press
Data
Target
Data
Selection
Knowledge
Preprocessed
Data
Patterns
Data Mining
Interpretation/
Evaluation
Preprocessing
Week 3

Remainder

Essentials of
Learning
Learning ?
•
can we formalize it?
•
is it just a chemical activation?
•
is it memorization?
•
is it continous node connecting/disconnecting
on dynamically changing brain network
topology?
Week 3

Remainder

Essentials of
Learning
The Artifical Intelligence View:
•
central to human knowledge and intelligence,
essential for building intelligent machines.
•
years of effort in AI has shown that trying to build
intelligent computers by programming all the rules
cannot be done;
automatic learning is crucial
. For
example, we humans are not born with the ability to
understand language
—
we learn it
—
and it makes
sense to try to have computers learn language
instead of trying to program it all it
Week 3

Remainder

Essentials of
Learning
The Software Engineering View:
•
Machine Learning allows us
to program computers by example
,
which can be easier than writing code the traditional way.
The Stats View:
•
Machine Learning is the marriage of computer science and statistics
•
computational techniques are applied to statistical problems. Machine
Learning has been applied to a vast number of problems in many
contexts, beyond the typical statistics problems. Machine Learning is
often designed with different considerations than statistics (e.g., speed
is often more important than accuracy).
Week 3

Essentials of Learning
Informal Learning Problem Definition:
computer
program that
improves its performance
at some task
through experience
Formal Learning Problem Definition:
computer program is said to learn
from experience E
with respect
to
some class of tasks T
and
performance measure P
,
if its performance at tasks in
T, as measured by P,
improves with experience E
Week 3

Essentials of Learning
A che
ss
learning problem:
Task T:
playing che
ss
Performance measure P:
percent of games won against opponents
Training experience E:
playing practice games against itself
A handwriting recognition learning problem:
Task T:
recognizing and classifying handwritten words within images
Performance measure P:
percent of words correctly classified
Training experience E:
a database of handwritten words with given classifications
A robot driving learning problem:
Task T:
driving on public four

lane highways using vision sensors
Performance measure P:
average distance traveled before an error (
as judged
by human overseer
)
Training experience E:
a sequence of images and steering commands recorded
while observing a human driver
Week 3

Essentials of Learning
Attributes of Experience
learn from
direct
training examples consisting of states and
the correct move for each
–
supervised learning

CHESS PROBLEM
providing
individual chess board states and the correct move for each
learn from
indirect
information consisting of the moves and final outcomes
of these moves.
–
unsupervised learning

CHESS PROBLEM
providing
sequences of moves and final outcomes of various games played
causality
–
credit assignment
Week 3

Essentials of Learning
Attributes of Experience
the degree to which the learner controls the sequence of training examples
CHESS PROBLEM
rely on the teacher to select informative board states and to provide the
correct move
for each
the learner might itself propose board states that it finds particularly confusing and ask
the teacher for the correct move
learner may have complete control over both the board states and (indirect) training
classifications, as it does when it learns by playing against itself with no teacher
present
Week 3

Essentials of Learning
Attributes of Experience
how well it represents the distribution of examples
over which the final system
performance P must be measured
!!!!
most current theory of machine learning rests on the crucial assumption that
the distribution of training examples is identical to the distribution of
test examples.
Despite our need to make this assumption in order to obtain
theoretical results, it is important to keep in mind that
this assumption must
often be violated in practice
.
Week 3

Essentials of Learning
Central Limit Theorem
The Central Limit Theorem is a theorem stating that
the sum of a large number of independent,
identically
distributed random variables
approximately follows a Normal distribution
Consider a
set of independent, identically
distributed random variables
Y1
. . .
YN,
governed by
an arbitrary probability
Distribution
even if we dont know the distrubition of individual Yi but we could compute the distribution
of
A
common rule of thumb is that we can use the
Normal approximation when
n
>=
30
follows the Normal Distrubition
Week 3

Essentials of Learning
Operational Definition of Learning Function

Target Function
Given training experience and target definition deciding on
learning architecture by considering
correctness
applicability
performance
CHESS PROBLEM
Ideal Target Function:
ChooseMove : B

>
M to indicate
that this function accepts as input any board from the set of
l
egal
boar
d
states B
and produces as output some move from the set
of legal moves M.
What if indirect training experience available to our system ?
Week 3

Essentials of Learning
Operational Definition of Learning Function

Target Function
Given training experience and target definition deciding on
learning architecture by considering
correctness
applicability
performance
CHESS PROBLEM
Operational Target Function:
V : B

>R
to denote that V maps
any legal
board state from the set B to some real value
.
assign higher scores to
better board states.
then use it to select the best move from any current board position.
can be accomplished by generating the successor board state produced by
every legal move, then using V to choose the best successor state and
t
herefore
the best legal move.
Is it really operational Not So!! searching all the way down to the end of
game. Computationally not operational
Week 3

Essentials of Learning
Operational Definition of Learning Function

Target Function
Given training experience and target definition deciding on
learning architecture by considering
correctness
applicability
performance
CHESS PROBLEM
Operational Target Function:
V : B

>R
to denote that V maps
any legal
board state from the set B to some real value
.
assign higher scores to
better board states.
then use it to select the best move from any current board position.
can be accomplished by generating the successor board state produced by
every legal move, then using V to choose the best successor state and
t
herefore
the best legal move.
Is it really operational Not So!! searching all the way down to the end of
game. Computationally not operational
Week 3

Essentials of Learning
Operational Definition of Learning Function

Target Function
Given training experience and target definition deciding on
learning architecture by considering
correctness
applicability
performance
CHESS PROBLEM
Choosing complex target function brings expressebility
but also bring performance battleneck
also brings the urgent need on extra more training examples (a lot more) to learn
Real issue is choosing the operation target function

>
MODELING

>function
approximation
Week 3

Essentials of Learning
Importance of Target Function
Target function simply determines the size of our hypothesis
space (solution space)
What if needed solution cannot be represented in our hypothesis
space
lets have perfect hypothetical H hypothesis space that can represent
every teachable function
so expressebility is not our problem are we OK with that H
NO we are now completely unable to generalize beyond the
observed examples
Week 3

Essentials of Learning
Importance of Target Function
if we need generalization and applicability to unseen
instances we must choose
biased

target function (generalizable
target function)
a learner that makes no a priori assumptions regarding the
identity of the target
function
has no rational basis for
classifying any unseen instances
we simply wish to capture here is the policy by which the learner
generalizes beyond the observed training data, to infer the
classification of new instances
Week 3

Essentials of Learning
Inductive Bias
target concept = target function
Formal Definition:
Week 3

Essentials of Learning
Inductive Bias Examples
ROTE

LEARNER:
Learning corresponds simply to storing each observed
training example in memory. Subsequent instances are classified by looking
them up in memory. If the instance is found in memory, the stored
classification is returned. Otherwise, the system refuses to classify the new
instance.
Inductive Bias: Bias

Free
CANDIDATE

ELIMINATlION ALGORITHM
: New instances are classified
only in the case where all members of the current version space (subset of
hypetheses consistent with our training examples) agree on the
classification. Otherwise, the system refuses to classify the new instance.
Inductive Bias: Target Concept in the Hypothesis Space
FIND

S:
This algorithm, finds the most specific hypothesis consistent with
the training examples. It then uses this hypothesis to classify all subsequent
instances.
Inductive Bias : Target Concept in the Hypothesis Space and most
specific hypothesis represent it
Week 3

Essentials of Learning
Search Bias vs Restriction Bias
ID3
searches a complete hypothesis space (i.e., one capable of expressing
any finite discrete

valued function). It searches incompletely through this
space, from simple to complex hypotheses, until its termination condition is
met (e.g., until it finds a hypothesis consistent with the data). Its inductive
bias is solely a consequence of the ordering of hypotheses by its search
strategy. Its hypothesis space introduces no additional bias. (
SEARCH
BIAS, PREFERENCE BIAS
)
CANDIDATE

ELIMINATlON
algorithm searches an incomplete
hypothesis space (i.e., one that can express only a subset of the potentially
teachable concepts, version space). It searches this space completely,
finding every hypothesis consistent with the training data. Its inductive bias
is solely a consequence of the expressive power of its hypothesis
representation. Its search strategy introduces no additional bias.
(
RESTRICTION BIAS LANGUAGE BIAS
)
Week 3

End
•
read
–
Supplemantary Book “Machine Learning”

Tom Mitchell Chapter 1
–
Chapter 2
–
Course Text Book Chapter 2 (
preparation for
the next week
)
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο