Unsupervised Learning
Clustering
K

Means
Recall:
Key Components of Intelligent Agents
Representation Language:
Graph, Bayes Nets, Linear functions
Inference Mechanism:
A*, variable elimination, Gibbs sampling
Learning Mechanism:
Maximum Likelihood, Laplace Smoothing,
gradient descent, perceptron, k

Nearest Neighbor,
many more
: k

means, EM, PCA, …

Evaluation Metric:
Likelihood, quadratic loss (a.k.a. squared error),
regularized loss,
margins,
many
more: 0

1 loss, conditional likelihood,
precision/recall, …
Supervised vs. Unsupervised Learning
Supervised Learning: “Labeled” Data
X
11
X
12
…
X
1N
Y
1
X
21
X
22
…
X
2N
Y
2
…
…
…
…
…
X
M1
X
M2
…
X
MN
Y
M
Unsupervised Learning: “Unlabeled” Data
X
11
X
12
…
X
1N
?
X
21
X
22
…
X
2N
?
…
…
…
…
…
X
M1
X
M2
…
X
MN
?
In supervised learning, the learning algorithm is given training examples that
contain inputs (the X values) and “labels” or “outputs” (the Y values).
In unsupervised learning, the learning algorithm is given training examples that
contain inputs (the X values), but no “labels” or “outputs” (no Y values).
It’s called “unsupervised” because there are no “labels” to help “supervise” the
learning algorithm during the learning process, to get it to the right model.
Example Unsupervised Problem 1
Are these data points distributed
completely randomly, or do you see
some structure in them?
How many clusters do you see?
None
1
2
3
4
5
X
1
X
2
Example Unsupervised Problem 1
Are these data points distributed
completely randomly, or do you see
some structure in them?
Structured
–
there are clusters!
How many clusters do you see?
None
1
2
3
4
5
X
1
X
2
Example Unsupervised Problem 2
There are 2 input variables, X1 and
X2, in this space. So this is called a
“2

dimensional space”.
How many dimensions are actually
needed to describe this data?
0
1
2
3
X
1
X
2
Example Unsupervised Problem 2
There are 2 input variables, X1 and
X2, in this space. So this is called a
“2

dimensional space”.
How many dimensions are actually
needed to describe this data?
1
dimension captures most of the
variation in this data.
2 dimensions will capture
everything.
X
1
X
2
Types of Unsupervised Learning
Density Estimation

Clustering (Example 1)

Dimensionality Reduction (Example 2)
Factor Analysis

Blind signal separation
Example Open Problem in AI:
Unsupervised Image Segmentation
(and Registration)
Examples taken from (
Felzenszwab
and
Huttenlocher
,
Int. Journal of Computer Vision
, 59:2,
2004).
http://cs.brown.edu/~pff/segment
/
.
The K

Means Clustering Algorithm
Inputs:
1)
Some
unlabeled
(no outputs) training data
2)
A number K, which must be greater than 1
Output:
A label between 1 and K for each data point,
indicating which cluster the data point belongs
to.
Visualization of K

Means
Data
Visualization of K

Means
1. Generate
K
random initial cluster centers, or “means”.
Visualization of K

Means
2. Assign
each point to the closest “mean” point.
Visualization of K

Means
2. Assign each point to the closest “mean” point.
Visually, the mean points divide the space into a
Voronoi
diagram.
Visualization of K

Means
3
.
Recompute
the “mean” (center) of each colored set of data.
Notice: “means” do not have to be at the same position as a data point,
although some times they might be.
Visualization of K

Means
3
.
Recompute
the “mean” (center) of each colored set of data.
Notice: “means” do not have to be at the same position as a data point,
although some times they might be.
Visualization of K

Means
4. Repeat steps 2 & 3 until the “means” stop moving (convergence).
a. Repeat step 2 (assign each point to the nearest mean)
Visualization of K

Means
4. Repeat steps 2 & 3 until the “means” stop moving (convergence).
a. Repeat step 2 (assign each point to the nearest mean)
Visualization of K

Means
4. Repeat steps 2 & 3 until the “means” stop moving (convergence).
a. Repeat step 2 (assign each point to the nearest mean)
b. Repeat step 3 (
recompute
means)
Visualization of K

Means
4. Repeat steps 2 & 3 until the “means” stop moving (convergence).
a. Repeat step 2 (assign each point to the nearest mean)
b. Repeat step 3 (
recompute
means)
Quiz: Where will the means be after the next iteration?
Visualization of K

Means
4. Repeat steps 2 & 3 until the “means” stop moving (convergence).
a. Repeat step 2 (assign each point to the nearest mean)
b. Repeat step 3 (
recompute
means)
Answer: Where will the means be after the next iteration?
Visualization of K

Means
4. Repeat steps 2 & 3 until the “means” stop moving (convergence).
a. Repeat step 2 (assign each point to the nearest mean)
b. Repeat step 3 (
recompute
means)
Quiz: Where will the means be after the next iteration?
Visualization of K

Means
4. Repeat steps 2 & 3 until the “means” stop moving (convergence).
a. Repeat step 2 (assign each point to the nearest mean)
b. Repeat step 3 (
recompute
means)
Answer: Where will the means be after the next iteration?
Formal Description of the Algorithm
Input:
1)
X
11
, …, X
1N
; … ; X
M1
, …, X
MN
2)
K
Output: Y
1
; …; Y
M
, where each Y
i
is in {1, …, K}
Formal Description of the Algorithm
1.
Init
:
For each k in {1, …, K}, create a random point
C
k
2.
Repeat until all
C
k
remain the same:
Assignment (aka Expectation):
For each X
i
,
let C[X
i
]
the k value for the closest
C
k
to X
i
Update (aka Maximization):
For each
C
k
,
let
D
k
{X
i
C[X
i
] = k} (set of X
i
assigned to cluster k)
if

D
k
 = 0, let
C
k
random new point
else
let
C
k
1

𝑘

𝑋
𝑖
𝑋
𝑖
∈
𝑘
(average of points in
D
k
)
3.
Return
C[X
i
] for each X
i
Evaulation
metric for K

means
LOSS Function (or Objective function) for K

means:
Within

cluster

sum

of

squares loss (WCSS):
WCSS(X
1
, …, X
M
, C
1
, …, C
K
)
=
𝑖 𝑎𝑛
(
𝑋
𝑖
,
𝐶
𝑘
)
2
𝑋
𝑖

𝑋
𝑖
=
𝑘
𝐾
𝑘
=
1
Complexity of K

Means
Finding a globally

optimal solution to WCSS is known to be an NP

hard
problem.
K

means is known to converge to a local minimum of WCSS.
K

means is a “heuristic” or “greedy” algorithm, with no guarantee that
it will find the global optimum.
On real datasets, K

means usually converges very quickly. Often,
people run it multiple times with different random initializations, and
choose the best result.
In some cases, K

means will still take exponential time (assuming
P!=NP), even to find a local minimum. However, such cases are rare in
practice.
Quiz
Is K

means
Classification or Regression?
Generative or Discriminative?
Parametric or Nonparametric?
Answer
Is K

means
Classification or Regression?

classification: output is a discrete value (cluster label) for
each point
Generative or Discriminative?

discriminative: it has fixed input variables and output
variables.
Parametric or Nonparametric?

parametric: the number of cluster centers (K) does not
change with the number of training data points
Quiz
Is K

means
Supervised or Unsupervised?
Online or batch?
Closed

form or iterative?
Answer
Is K

means
Supervised or Unsupervised?

Unsupervised
Online or batch?

batch: if you add a new data point, you need to revisit
all the training data to
recompute
the locally

optimal
model
Closed

form or iterative?

iterative: training requires many passes through the data
Quiz
Which of the following problems might be
solved using K

Means? Check all that apply.
For those that work, explain what the inputs and
outputs (X and Y variables) would be.
•
Segmenting an image
•
Finding galaxies (dense groups of stars) in a
telescope’s image of the night sky
•
Identify different species of bacteria from DNA
samples of bacteria in seawater
Answer
Which of the following problems might be solved using K

Means? Check all that apply.
For those that work, explain what the inputs and outputs
(X and Y variables) would be.
•
Segmenting an image: Yes. Inputs are the pixel
intensities, outputs are segment labels.
•
Finding galaxies (dense groups of stars) in a telescope’s
image of the night sky. Yes. Inputs are star locations,
outputs are galaxy labels
•
Identify different species of bacteria from DNA samples
of bacteria in seawater. Yes. Inputs are gene
sequences, outputs are species labels.
Comments 0
Log in to post a comment