# Machine Learning - Computer Science and Engineering

AI and Robotics

Oct 16, 2013 (4 years and 7 months ago)

95 views

Classification

Dr Eamonn Keogh

Computer Science & Engineering Department

University of California
-

Riverside

Riverside,CA 92521

eamonn@cs.ucr.edu

Who is smarter,
Humans or Pigeons?

Section 1.1 (again)

Section 4.1

Section 4.3

Section 4.2.2

Section 4.34

Glance over

Examples of class A

Examples of class B

1) What class is
this object?

2) What class is
this object?

1

2

3

4

1

2

3

4

Examples of class A

Examples of class B

1) What class is
this object?

2) What class is
this object?

1

2

3

4

1

2

3

4

Examples of class A

Examples of class B

1) What class is
this object?

2) What class is
this object?

1

2

3

4

1

2

3

4

The “game” we have just been
playing is Supervised
Classification.

Why is it useful?

Examples of class A

People who contracted
disease X.

Examples of class B

People who are disease free.

1) What class is
this person?

Is this person at risk
of getting the
disease?

2) What class is
this person?

Is this person at risk
of getting the
disease?

1

2

3

4

1

2

3

4

Patient temperature 99

Blood count 4214

Weight 167

Patient temperature 98

Blood count 3214

Weight 179

Patient temperature 97

Blood count 2763

Weight 121

Patient temperature 99

Blood count 3234

Weight 117

Patient temperature 97

Blood count 0012

Weight 190

Patient temperature 99

Blood count 0114

Weight 202

Patient temperature 98

Blood count 1014

Weight 345

Patient temperature 99

Blood count 1214

Weight 190

Patient temperature 97

Blood count 0118

Weight 280

Patient temperature 99

Blood count 3452

Weight 99

Examples of class A

Examples of class B

1) What class is
this object?

2) What class is
this object?

1

2

3

4

1

2

3

4

Examples of class A

Examples of class B

1

2

3

4

1

2

3

4

3 4

1.5 5

6 8

2.5 5

5 2.5

5 2

8 3

4.5 3

1) What class is
this object?

2) What class is
this object?

8 1.5

4.5 7

Classification

There are many classification algorithms, in this
class we will consider only…

Simple Linear Classifier.

Nearest Neighbor Classifier.

Decision Tree.

Naïve Bayes.

The classification problem

The classification algorithm is shown a number of labeled examples from the problem
domain of interest.
(this collection of labeled data is called the training set).

The algorithm builds a model that “explains” the labeling of the examples.
(this model may or
may not be accessible to humans, depending on the algorithm).

At some future time the algorithm is shown an unlabeled example, and asked to classify it.

Shape Domain

Cat Domain

Class:

Income

Savings

Num_credit_cards

Is_married

A:

123,000

34,100

0

N

B:

24,000

-
2,000

13

Y

A:

45,200

12,100

3

N

…..

……

…..

……

B:

423,020

23,440

0

N

B:

14,000

87,000

0

Y

A:

11,200

-
2,000

2

Y

Sample dataset for a credit worthiness problem

?

123,000

34,100

0

N

What is this instances class?

Number of rows is the
size

of the training set, number of columns is the
dimensionality

of the
training set, each row is called an instance (or exemplar) each column is called a feature.

Visualizing
classification
algorithms

We can visualize some classification
algorithms in 2D…

Warning: This tends to make the
problem look easy...

Class

feature 1

feature 2

height1

height2

A

3

4

B

5

2.5

A

1.5

5

...

10

1

2

3

4

5

6

7

8

9

10

1

2

3

4

5

6

7

8

9

A trivial machine learning example represented in 2D Euclidean Space. The blue
circles and red squares represent the two classes in our training data, and the black
shapes are the objects we are trying to classify.

From now on we will only consider the 2D plots when explaining algorithms and problems.
We should always remember that this plots are representations of real world objects.

Simple Linear Classifier

A dataset which is not
linearly separable

Piecewise Linear Classifier

(or some other function)

This example is one for which we know a perfect
rule, “above the diagonal is
circle

class, below the
diagonal is
square

class”.
(Don’t forget that for real
world problems we can never know a perfect rule,
even if there is one).

What happens if we learn a piecewise linear classifier
or a quadratic classifier on this dataset with small
training dataset?

This problem is called
overfitting
.

Piecewise Linear Classifier

The Nearest Neighbor
Algorithm

The nearest neighbor algorithm (NN)
works by projecting the item to be
classified into the same space as the
training data, then finding the labeled
exemplar which is closest. Whatever
class that nearest neighbor is, is then
assigned to the item to be classified.

In this example, the item (6, 2)
is correctly classified.

In spite of its amazing simplicity,
Nearest Neighbor is one of the best
algorithms for many problems.

We can use many different distance
measures to measure the distance between
objects. Typically Euclidean distance is
used.

Evaluation of Classification

Leaving one out

Cross fold validation

Discussion of Nearest Neighbor I

It is sensitive to irrelevant features.
One possible
solution is search for good subsets.

It is sensitive to noise.
One possible solution is use KNN.

1

2

3

4

5

6

7

8

9

10

1

2

3

4

5

6

7

8

9

10

1

2

3

4

5

6

7

8

9

10

1

2

3

4

5

6

7

8

9

10

Suppose there is a disease.
Although we don’t know this, it
happens that if your blood sugar
is over 5.5 you have the disease
and below you don’t….

Discussion of Nearest Neighbor II

It is sensitive to the units in which the features are
measured.
One possible solution is to normalize the features.

X axis measured in feet

Y axis measure in dollars

X axis measured in inches

Y axis measure in dollars

Discussion of Nearest Neighbor III

Scalability

A Famous Problem.

R. A. Fisher’s Iris Dataset.

3 classes

50 of each class

The task is to classify Iris plants
into one of 3 varieties using the
Petal Length and Petal Width.

Iris Setosa

Iris Versicolor

Iris Virginica