Machine Learning Lecture 1 - Max-Planck-Institut für Informatik

AI and Robotics

Oct 15, 2013 (4 years and 6 months ago)

106 views

Machine Learning

Saarland University, SS 2007

Holger Bast

Marjan Celikik

Kevin Chang

Stefan Funke

Joachim Giesen

Max
-
Planck
-
Institut für Informatik

Saarbrücken, Germany

Lecture 1, Friday April 19
th
, 2007

(basics and example applications)

Overview of this Lecture

Machine Learning Basics

Classification

Objects as feature vectors

Regression

Clustering

Example applications

Surface reconstruction

Preference Learning

Netflix challenge (how to earn \$1,000,000)

Text search

Classification

Given a set of points, each labeled
+
or

learn something from them …

… in order to predict label of new points

+

+

+

+

+

+

+

+

?

this is an instance of
supervised learning

Classification

Quality

Which classifier is better?

model

of where the data comes from

and a measure of quality/accuracy

+

+

+

+

+

+

+

+

?

Classification

Outliers and Overfitting

We have to find a balance between two extremes

oversimplification

(

large
classification error
)

overfitting

(

)

again: requires a model of the data

+

+

+

+

+

+

+

+

+

Classification

Point Transformation

If a classifier does not work for the original data

try it on a
transformation

of the data

typically: make points linearly separable by a suitable
mapping to a
higher
-
dimensional space

+

+

+

+

+

+

+

+

0

+

+

+

+

+

+

+

+

map

x

to

(x , |x|)

Classification

more labels

+

+

+

+

+

+

+

+

o

o

o

o

o

o

o

Typically:

first, basic technique for binary classification

then, extension to more labels

Objects as Feature Vectors

points
?

General Idea:

represent objects as points in a space of fixed dimension

each dimension corresponds to a so
-
called feature of the
object

Very crucial:

selection of features

normalization of vectors

Objects as Feature Vectors

Example: Objects with attributes

features = values

normalize by reference value for each feature

Person 1

Person 2

Person 3

188 cm

181 cm

190 cm

75 kg

90 kg

77 kg

age 36

age 32

age 34

188

75

36

181

90

33

Person 4

176 cm

55 kg

age 24

height

weight

age

190

77

34

172

55

34

1.04

0.94

0.90

1.01

1.13

0.83

height/180

weight/70

age/30

1.06

0.96

0.85

0.96

0.69

0.60

Objects as Feature Vectors

2

8

2

8

5

8

2

7

2

2

8

2

8

5

8

2

7

2

Example: Images

features = pixels

(with grey values)

often fine without
further normalization

1

6

1

6

6

6

1

6

1

Image 1

Image 2

pixel (1,1)

pixel (1,2)

pixel (1,3)

pixel (2,1)

pixel (2,2)

pixel (2,3)

pixel (3,1)

pixel (3,2)

pixel (3,3)

1

6

1

6

6

6

1

6

1

Objects as Feature Vectors

Example: Text documents

features = words

normalize to
unit norm

1

1

1

0

0

0

1

Learning

Machine

SS

Statistical

Theory

2006

2007

Doc. 1

Machine
Learning

SS 2007

Doc. 2

Statistical
Learning

Theory

SS 2007

Doc. 3

Statistical
Learning

Theory

SS 2006

1

0

1

1

1

0

1

1

0

1

1

1

1

0

Objects as Feature Vectors

Example: Text documents

features = words

normalize to
unit norm

0.5

0.5

0.5

0

0

0

0.5

Learning

Machine

SS

Statistical

Theory

2006

2007

Doc. 1

Machine
Learning

SS 2007

Doc. 2

Statistical
Learning

Theory

SS 2007

Doc. 3

Statistical
Learning

Theory

SS 2006

0.4

0

0.4

0.4

0.4

0

0.4

0.4

0

0.4

0.4

0.4

0.4

0

Regression

Learn a
function

that maps objects
to values

-
off as for classification:

risk of
oversimplification

vs. risk of
overfitting

x

x

x

x

x

?

x

x

x

given value

(typically multi
-
dimensional
)

value to learn

(typically a real number)

Regression

Learn a
function

that maps objects
to values

-
off as for classification:

risk of oversimplification vs. risk of overfitting

x

x

x

x

x

?

x

x

x

given value

(typically multi
-
dimensional
)

value to learn

(typically a real number)

Clustering

Partition given set of points into clusters

Similar problems as for classification

follow data distribution, but not too closely

transformation often helps (next slide)

x

x

x

x

x

x

x

x

x

x

x

x

this is an instance of
unsupervised learning

Clustering

Partition given set of points into clusters

Similar problems as for classification

follow data distribution, but not too closely

transformation often helps (next slide)

x

x

x

x

x

x

x

x

x

x

x

x

Clustering

Partition given set of points into clusters

Similar problems as for classification

follow data distribution, but not too closely

transformation often helps (next slide)

x

x

x

x

x

x

x

x

x

x

x

x

Clustering

Partition given set of points into clusters

Similar problems as for classification

follow data distribution, but not too closely

transformation often helps (next slide)

x

x

x

x

x

x

x

x

x

x

x

x

Clustering

Transformation

For clustering, typically dimension
reduction

helps

whereas in classification,
embedding in a higher
-
dimensional space

typically helps

1

0

1

0

0

1

1

0

0

0

1

1

1

1

0

0

0

0

1

1

internet

web

surfing

beach

vectors for

documents 2, 3, and 4

equally dissimilar

0.9

0.8

0.8

0.0

0.0

-
0.1

0.0

0.0

1.1

0.9

project to 2 dimensions

2
-
clustering would

work fine now

doc1

doc2

doc3

doc4

doc5

Application Example: Text Search

676 abstracts from the Max
-
Planck
-
Institute

for example:

We present two theoretically interesting and empirically successful
techniques for improving the linear programming approaches,
namely graph transformation and local cuts, in the context of the
Steiner problem. We show the impact of these techniques on the
solution of the largest benchmark instances ever solved.

3283 words (words like
and
,
or
,
this
, … removed)

abstracts come from 5 working groups: Algorithms, Logic,
Graphics, CompBio, Databases

reduce to
10

concepts

No dictionary, no training, only the plain text itself !