# Intro to Machine Learning

AI and Robotics

Oct 19, 2013 (4 years and 5 months ago)

112 views

Unsupervised Learning

Clustering

K
-
Means

Recall:

Key Components of Intelligent Agents

Representation Language:

Graph, Bayes Nets, Linear functions

Inference Mechanism:
A*, variable elimination, Gibbs sampling

Learning Mechanism:
Maximum Likelihood, Laplace Smoothing,
-
Nearest Neighbor,
many more
: k
-
means, EM, PCA, …

-------------------------------------

Evaluation Metric:

Likelihood, quadratic loss (a.k.a. squared error),
regularized loss,
margins,

many
more: 0
-
1 loss, conditional likelihood,
precision/recall, …

Supervised vs. Unsupervised Learning

Supervised Learning: “Labeled” Data

X
11

X
12

X
1N

Y
1

X
21

X
22

X
2N

Y
2

X
M1

X
M2

X
MN

Y
M

Unsupervised Learning: “Unlabeled” Data

X
11

X
12

X
1N

?

X
21

X
22

X
2N

?

X
M1

X
M2

X
MN

?

In supervised learning, the learning algorithm is given training examples that
contain inputs (the X values) and “labels” or “outputs” (the Y values).

In unsupervised learning, the learning algorithm is given training examples that
contain inputs (the X values), but no “labels” or “outputs” (no Y values).

It’s called “unsupervised” because there are no “labels” to help “supervise” the
learning algorithm during the learning process, to get it to the right model.

Example Unsupervised Problem 1

Are these data points distributed
completely randomly, or do you see
some structure in them?

How many clusters do you see?

None

1

2

3

4

5

X
1

X
2

Example Unsupervised Problem 1

Are these data points distributed
completely randomly, or do you see
some structure in them?

Structured

there are clusters!

How many clusters do you see?

None

1

2

3

4

5

X
1

X
2

Example Unsupervised Problem 2

There are 2 input variables, X1 and
X2, in this space. So this is called a
“2
-
dimensional space”.

How many dimensions are actually
needed to describe this data?

0

1

2

3

X
1

X
2

Example Unsupervised Problem 2

There are 2 input variables, X1 and
X2, in this space. So this is called a
“2
-
dimensional space”.

How many dimensions are actually
needed to describe this data?

1

dimension captures most of the
variation in this data.

2 dimensions will capture
everything.

X
1

X
2

Types of Unsupervised Learning

Density Estimation

-

Clustering (Example 1)

-

Dimensionality Reduction (Example 2)

Factor Analysis

-

Blind signal separation

Example Open Problem in AI:

Unsupervised Image Segmentation
(and Registration)

Examples taken from (
Felzenszwab

and
Huttenlocher
,
Int. Journal of Computer Vision
, 59:2,
2004).
http://cs.brown.edu/~pff/segment
/
.

The K
-
Means Clustering Algorithm

Inputs:

1)
Some
unlabeled

(no outputs) training data

2)
A number K, which must be greater than 1

Output:

A label between 1 and K for each data point,
indicating which cluster the data point belongs
to.

Visualization of K
-
Means

Data

Visualization of K
-
Means

1. Generate
K

random initial cluster centers, or “means”.

Visualization of K
-
Means

2. Assign
each point to the closest “mean” point.

Visualization of K
-
Means

2. Assign each point to the closest “mean” point.

Visually, the mean points divide the space into a
Voronoi

diagram.

Visualization of K
-
Means

3
.
Recompute

the “mean” (center) of each colored set of data.

Notice: “means” do not have to be at the same position as a data point,
although some times they might be.

Visualization of K
-
Means

3
.
Recompute

the “mean” (center) of each colored set of data.

Notice: “means” do not have to be at the same position as a data point,
although some times they might be.

Visualization of K
-
Means

4. Repeat steps 2 & 3 until the “means” stop moving (convergence).

a. Repeat step 2 (assign each point to the nearest mean)

Visualization of K
-
Means

4. Repeat steps 2 & 3 until the “means” stop moving (convergence).

a. Repeat step 2 (assign each point to the nearest mean)

Visualization of K
-
Means

4. Repeat steps 2 & 3 until the “means” stop moving (convergence).

a. Repeat step 2 (assign each point to the nearest mean)

b. Repeat step 3 (
recompute

means)

Visualization of K
-
Means

4. Repeat steps 2 & 3 until the “means” stop moving (convergence).

a. Repeat step 2 (assign each point to the nearest mean)

b. Repeat step 3 (
recompute

means)

Quiz: Where will the means be after the next iteration?

Visualization of K
-
Means

4. Repeat steps 2 & 3 until the “means” stop moving (convergence).

a. Repeat step 2 (assign each point to the nearest mean)

b. Repeat step 3 (
recompute

means)

Answer: Where will the means be after the next iteration?

Visualization of K
-
Means

4. Repeat steps 2 & 3 until the “means” stop moving (convergence).

a. Repeat step 2 (assign each point to the nearest mean)

b. Repeat step 3 (
recompute

means)

Quiz: Where will the means be after the next iteration?

Visualization of K
-
Means

4. Repeat steps 2 & 3 until the “means” stop moving (convergence).

a. Repeat step 2 (assign each point to the nearest mean)

b. Repeat step 3 (
recompute

means)

Answer: Where will the means be after the next iteration?

Formal Description of the Algorithm

Input:

1)
X
11
, …, X
1N
; … ; X
M1
, …, X
MN

2)
K

Output: Y
1
; …; Y
M
, where each Y
i

is in {1, …, K}

Formal Description of the Algorithm

1.
Init
:

For each k in {1, …, K}, create a random point
C
k

2.
Repeat until all
C
k

remain the same:

Assignment (aka Expectation):

For each X
i
,

let C[X
i
]

the k value for the closest
C
k

to X
i

Update (aka Maximization):

For each
C
k
,

let
D
k

{X
i

|C[X
i
] = k} (set of X
i

assigned to cluster k)

if

|
D
k
| = 0, let
C
k

random new point

else

let
C
k

1
|

𝑘
|

𝑋
𝑖
𝑋
𝑖


𝑘

(average of points in
D
k
)

3.
Return

C[X
i
] for each X
i

Evaulation

metric for K
-
means

LOSS Function (or Objective function) for K
-
means:

Within
-
cluster
-
sum
-
of
-
squares loss (WCSS):

WCSS(X
1
, …, X
M
, C
1
, …, C
K
)

=

𝑖 𝑎𝑛 
(
𝑋
𝑖
,
𝐶
𝑘
)
2
𝑋
𝑖
|

𝑋
𝑖
=
𝑘
𝐾
𝑘
=
1

Complexity of K
-
Means

Finding a globally
-
optimal solution to WCSS is known to be an NP
-
hard
problem.

K
-
means is known to converge to a local minimum of WCSS.

K
-
means is a “heuristic” or “greedy” algorithm, with no guarantee that
it will find the global optimum.

On real datasets, K
-
means usually converges very quickly. Often,
people run it multiple times with different random initializations, and
choose the best result.

In some cases, K
-
means will still take exponential time (assuming
P!=NP), even to find a local minimum. However, such cases are rare in
practice.

Quiz

Is K
-
means

Classification or Regression?

Generative or Discriminative?

Parametric or Nonparametric?

Is K
-
means

Classification or Regression?

-

classification: output is a discrete value (cluster label) for
each point

Generative or Discriminative?

-

discriminative: it has fixed input variables and output
variables.

Parametric or Nonparametric?

-

parametric: the number of cluster centers (K) does not
change with the number of training data points

Quiz

Is K
-
means

Supervised or Unsupervised?

Online or batch?

Closed
-
form or iterative?

Is K
-
means

Supervised or Unsupervised?

-

Unsupervised

Online or batch?

-

batch: if you add a new data point, you need to revisit
all the training data to
recompute

the locally
-
optimal
model

Closed
-
form or iterative?

-
iterative: training requires many passes through the data

Quiz

Which of the following problems might be
solved using K
-
Means? Check all that apply.

For those that work, explain what the inputs and
outputs (X and Y variables) would be.

Segmenting an image

Finding galaxies (dense groups of stars) in a
telescope’s image of the night sky

Identify different species of bacteria from DNA
samples of bacteria in seawater

Which of the following problems might be solved using K
-
Means? Check all that apply.

For those that work, explain what the inputs and outputs
(X and Y variables) would be.

Segmenting an image: Yes. Inputs are the pixel
intensities, outputs are segment labels.

Finding galaxies (dense groups of stars) in a telescope’s
image of the night sky. Yes. Inputs are star locations,
outputs are galaxy labels

Identify different species of bacteria from DNA samples
of bacteria in seawater. Yes. Inputs are gene
sequences, outputs are species labels.