Machine Learning Lecture 1 - Max-Planck-Institut für Informatik

journeycartAI and Robotics

Oct 15, 2013 (3 years and 11 months ago)

99 views

Machine Learning

Saarland University, SS 2007

Holger Bast

Marjan Celikik

Kevin Chang

Stefan Funke

Joachim Giesen


Max
-
Planck
-
Institut für Informatik

Saarbrücken, Germany

Lecture 1, Friday April 19
th
, 2007

(basics and example applications)

Overview of this Lecture


Machine Learning Basics


Classification


Objects as feature vectors


Regression


Clustering


Example applications


Surface reconstruction


Preference Learning


Netflix challenge (how to earn $1,000,000)


Text search

Classification


Given a set of points, each labeled
+
or



learn something from them …


… in order to predict label of new points

+

+

+

+

+

+

+

+

















?



this is an instance of
supervised learning

Classification


Quality


Which classifier is better?


answer requires a
model

of where the data comes from


and a measure of quality/accuracy

+

+

+

+

+

+

+

+

















?

Classification


Outliers and Overfitting


We have to find a balance between two extremes


oversimplification

(


large
classification error
)


overfitting

(


污捫l

牥r畬慲楴a
)


again: requires a model of the data

+

+

+

+

+

+

+

+

















+







Classification


Point Transformation


If a classifier does not work for the original data


try it on a
transformation

of the data


typically: make points linearly separable by a suitable
mapping to a
higher
-
dimensional space

+

+

+

+

+

+







+

+

0

+

+

+

+

+







+

+

+

map


x

to

(x , |x|)

Classification


more labels

+

+

+

+

+

+

+

+

















o

o

o

o

o

o

o


Typically:


first, basic technique for binary classification


then, extension to more labels

Objects as Feature Vectors


But why learn something about
points
?


General Idea:


represent objects as points in a space of fixed dimension


each dimension corresponds to a so
-
called feature of the
object


Very crucial:


selection of features


normalization of vectors

Objects as Feature Vectors


Example: Objects with attributes


features = values


normalize by reference value for each feature

Person 1

Person 2

Person 3

188 cm

181 cm

190 cm

75 kg

90 kg

77 kg

age 36

age 32

age 34

188

75

36

181

90

33

Person 4

176 cm

55 kg

age 24

height

weight

age

190

77

34

172

55

34

1.04

0.94

0.90

1.01

1.13

0.83

height/180

weight/70

age/30

1.06

0.96

0.85

0.96

0.69

0.60

Objects as Feature Vectors

2

8

2

8

5

8

2

7

2

2

8

2

8

5

8

2

7

2


Example: Images


features = pixels

(with grey values)


often fine without
further normalization

1

6

1

6

6

6

1

6

1

Image 1

Image 2

pixel (1,1)

pixel (1,2)

pixel (1,3)

pixel (2,1)

pixel (2,2)

pixel (2,3)

pixel (3,1)

pixel (3,2)

pixel (3,3)

1

6

1

6

6

6

1

6

1

Objects as Feature Vectors


Example: Text documents


features = words


normalize to
unit norm


1

1

1

0

0

0

1

Learning

Machine

SS

Statistical

Theory

2006

2007

Doc. 1

Machine
Learning

SS 2007

Doc. 2

Statistical
Learning

Theory

SS 2007

Doc. 3

Statistical
Learning

Theory

SS 2006

1

0

1

1

1

0

1

1

0

1

1

1

1

0

Objects as Feature Vectors


Example: Text documents


features = words


normalize to
unit norm


0.5

0.5

0.5

0

0

0

0.5

Learning

Machine

SS

Statistical

Theory

2006

2007

Doc. 1

Machine
Learning

SS 2007

Doc. 2

Statistical
Learning

Theory

SS 2007

Doc. 3

Statistical
Learning

Theory

SS 2006

0.4

0

0.4

0.4

0.4

0

0.4

0.4

0

0.4

0.4

0.4

0.4

0

Regression


Learn a
function

that maps objects
to values


Similar trade
-
off as for classification:


risk of
oversimplification

vs. risk of
overfitting

x

x

x

x

x

?

x

x

x

given value

(typically multi
-
dimensional
)

value to learn

(typically a real number)

Regression


Learn a
function

that maps objects
to values


Similar trade
-
off as for classification:


risk of oversimplification vs. risk of overfitting

x

x

x

x

x

?

x

x

x

given value

(typically multi
-
dimensional
)

value to learn

(typically a real number)

Clustering


Partition given set of points into clusters


Similar problems as for classification


follow data distribution, but not too closely


transformation often helps (next slide)

x

x

x

x

x

x

x

x

x

x

x

x

this is an instance of
unsupervised learning

Clustering


Partition given set of points into clusters


Similar problems as for classification


follow data distribution, but not too closely


transformation often helps (next slide)

x

x

x

x

x

x

x

x

x

x

x

x

Clustering


Partition given set of points into clusters


Similar problems as for classification


follow data distribution, but not too closely


transformation often helps (next slide)

x

x

x

x

x

x

x

x

x

x

x

x

Clustering


Partition given set of points into clusters


Similar problems as for classification


follow data distribution, but not too closely


transformation often helps (next slide)

x

x

x

x

x

x

x

x

x

x

x

x

Clustering


Transformation


For clustering, typically dimension
reduction

helps


whereas in classification,
embedding in a higher
-
dimensional space

typically helps

1

0

1

0

0

1

1

0

0

0

1

1

1

1

0

0

0

0

1

1

internet

web

surfing

beach

vectors for

documents 2, 3, and 4

equally dissimilar

0.9

0.8

0.8

0.0

0.0

-
0.1

0.0

0.0

1.1

0.9

project to 2 dimensions

2
-
clustering would

work fine now

doc1

doc2

doc3

doc4

doc5

Application Example: Text Search


676 abstracts from the Max
-
Planck
-
Institute


for example:


We present two theoretically interesting and empirically successful
techniques for improving the linear programming approaches,
namely graph transformation and local cuts, in the context of the
Steiner problem. We show the impact of these techniques on the
solution of the largest benchmark instances ever solved.


3283 words (words like
and
,
or
,
this
, … removed)


abstracts come from 5 working groups: Algorithms, Logic,
Graphics, CompBio, Databases


reduce to
10

concepts

No dictionary, no training, only the plain text itself !