What is learning?

boorishadamantΤεχνίτη Νοημοσύνη και Ρομποτική

29 Οκτ 2013 (πριν από 3 χρόνια και 11 μήνες)

78 εμφανίσεις

1

What is learning?


“Learning denotes changes in a system that ... enable a
system to do the same task more efficiently the next
time.”

Herbert Simon



“Learning is any process by which a system improves
performance from experience.”


Herbert Simon



“Learning is constructing or modifying representations of
what is being experienced.”


Ryszard Michalski



“Learning is making useful changes in our minds.”

Marvin Minsky

2

Machine Learning
-

Example


One of my favorite AI/Machine Learning sites:


http://www.20q.net/





3

Why learn?


Build software agents that can adapt to their users or to other
software agents

or to changing environments


Personalized news or mail filter


Personalized tutoring


Mars robot



Develop systems that are
too difficult/expensive to construct
manually

because they require specific detailed skills or
knowledge tuned to a specific task


Large, complex AI systems cannot be completely derived by hand
and require dynamic updating to incorporate new information.



Discover new things

or structure that were previously unknown to
humans


Examples: data mining, scientific discovery



4

5

Applications


Assign object/event to one of a given finite set of
categories.


Medical diagnosis


Credit card applications or transactions


Fraud detection in e
-
commerce


Spam filtering in email


Recommended books, movies, music


Financial investments


Spoken words


Handwritten letters


6

Major paradigms of machine learning


Rote learning




“Learning by memorization.”


Employed by first machine learning systems, in 1950s


Samuel’s Checkers program



Supervised learning


Use specific examples to reach general conclusions
or
extract general rules


Classification (Concept learning)


Regression



Unsupervised learning (
Clustering
)



Unsupervised identification of natural
groups in data



Reinforcement
learning


Feedback (positive or negative reward) given at the
end of a
sequence

of steps



Analogy


Determine correspondence between two different representations



Discovery



Unsupervised, specific goal not given



7

Rote Learning is Limited


Memorize I/O pairs and perform exact matching with
new inputs



If
a
computer has not seen
the
precise case before, it
cannot apply its experience



W
e w
ant computer
s

to “
generalize
” from prior experience


Generalization is the most important factor in learning

8

The inductive learning problem


Extrapolate from a given set of examples to make
accurate predictions about future examples



Supervised versus unsupervised learning


Learn an unknown function f(X) = Y, where X is an input
example and Y is the desired output.


Supervised learning

implies we are given a
training set

of
(X, Y) pairs by a “teacher”


Unsupervised learning

means we are only given the Xs
.


Semi
-
supervised learning
: mostly unlabelled data


9

Types of supervised learning


a)
Classification
:


We are given the label of the training objects: {(x1,x2,y=T/O)}



We are interested in classifying
future

objects: (x1’,x2’) with
the correct label.


I.e. Find y’ for given (x1’,x2’)
.





b)
Concept Learning
:



We are given positive and negative samples for the concept
we want to learn (e.g.Tangerine): {(x1,x2,y=+/
-
)}




We are interested in classifying future objects as member of
the class (or positive example for the concept) or not.



I.e. Answer +/
-

for given (x1’,x2’)
.




x1=size

x2=color

Tangerines Oranges

Tangerines Not Tangerines

10

Types of Supervised Learning



Regression


Target function is
continuous

rather than class membership


y=f(x)

11

Example

Positive Examples

Negative Examples

How does this symbol classify?


Concept


Solid Red Circle in a (
r
egular?) Polygon


What about?


Figures on left side of page


Figures drawn before 5pm 2/2/89 <etc>


12

Inductive learning framework
:
Feature Space


Raw input data from sensors are typically preprocessed to obtain a
feature vector
, X, that adequately describes all of the relevant
features for classifying examples



Each x is a list of
(
attribute, value) pairs

x

= [
Color=Orange Shape=Round Weight=200g
]



Each attribute
can be discrete or continuous



Each example can be interpreted as a point in an n
-
dimensional
feature space
, where
n

is the number of attributes



Model space M defines the
possible
hypotheses


M:
X

→ C

,
M = {m
1
, … m
n
} (possibly infinite)



Training data can be used to direct the search for a good
(consistent, complete, simple) hypothesis in the model space


13

Feature Space

Size

Color

Weight

?

Big

2500

Gray

A “
concept
” is then a (possibly disjoint)
volume

in this space.

14

Learning: Key Steps

• data and assumptions



what data is available for the learning task?



what can we assume about the problem?


• representation



how should we represent the examples to be classified


• method and estimation



what are the possible hypotheses?



what

learning algorithm to use to infer the most likely hypothesis?



how do we adjust our predictions based on the feedback?



evaluation



how well are we doing?



15


16

Evaluation of Learning Systems


Experimental


Conduct controlled cross
-
validation experiments to compare
various methods on a variety of benchmark datasets.


Gather data on their performance, e.g. test accuracy, training
-
time, testing
-
time.


Analyze differences for statistical significance.



Theoretical


Analyze algorithms mathematically and prove theorems about
their:


Computational complexity


Ability to fit training data


Sample complexity (number of training examples needed to learn
an accurate function)


17

Measuring Performance

Performance of the learner can be measured in one of the
following ways, as suitable for the application:


Accuracy performance


Number of mistakes


Mean Square Error


Solution quality (length, efficiency)


Speed of performance




18

19

Curse of Dimensionality

20

Curse of Dimensionality


Imagine a learning task, such as recognizing printed


characters.



Intuitively, adding more attributes would help the

learner,

as more information never hurts, right?



In fact, sometimes it does, due to what is called


curse

of dimensionality


it can be summarized as the situation where the available data
may not be sufficient to compensate for the increased number of
parameters that comes with increased dimensionality

22

Polynomial Curve Fitting


23

Sum
-
of
-
Squares Error Function

24

0
th

Order Polynomial

25

1
st

Order Polynomial

26

3
rd

Order Polynomial

27

9
th

Order Polynomial

28

Over
-
fitting

Root
-
Mean
-
Square (RMS) Error:

29

Data Set Size:

9
th

Order Polynomial

30

Data Set Size:

9
th

Order Polynomial

32

Issues in Machine Learning


Training Experience


What can be the training experience (labelled samples, self
-
play,…)



Target Function


What should we aim to learn?


What should the representation of the target function be (features, hypothesis class,…)



Learning:


What learning algorithms exist for learning general target functions from specific training examples?


Which algorithms can approximate functions well (and when)?


How does noisy data influence accuracy?


How does number of training samples influence accuracy?



Training Data:


How much training data is sufficient?


How does number of training examples influence accuracy?


What is the best strategy for choosing a useful next training experience? How it affects complexity?



Prior Knowledge/Domain Knowledge:


When and how can prior knowledge guide the learning process?



Evaluation:


What specific target functions/performance measure should the system attempt to learn?