CS325 Artiﬁcial Intelligence

Ch.20 – Unsupervised Machine Learning

Cengiz Günay

Spring 2013

Günay

Ch.20 – Unsupervised Machine Learning

Unsupervised Learning

Missing teacher

No labels,y

Just input data,x

What can you learn with it?

1

Simplifying data (e.g.,dimensionality reduction)

2

Organizing data (e.g.,clustering)

Works by ﬁnding structure in data,exploits redundancies

Entry survey:Unsupervised Learning (0.5 points of ﬁnal grade)

What is it good for in real life?

Where would you use it?

Günay

Ch.20 – Unsupervised Machine Learning

Unsupervised Learning

Missing teacher

No labels,y

Just input data,x

What can you learn with it?

1

Simplifying data (e.g.,dimensionality reduction)

2

Organizing data (e.g.,clustering)

Works by ﬁnding structure in data,exploits redundancies

Entry survey:Unsupervised Learning (0.5 points of ﬁnal grade)

What is it good for in real life?

Where would you use it?

Günay

Ch.20 – Unsupervised Machine Learning

Unsupervised Learning

Missing teacher

No labels,y

Just input data,x

What can you learn with it?

1

Simplifying data (e.g.,dimensionality reduction)

2

Organizing data (e.g.,clustering)

Works by ﬁnding structure in data,exploits redundancies

Entry survey:Unsupervised Learning (0.5 points of ﬁnal grade)

What is it good for in real life?

Where would you use it?

Günay

Ch.20 – Unsupervised Machine Learning

Unsupervised Learning

Missing teacher

No labels,y

Just input data,x

What can you learn with it?

1

Simplifying data (e.g.,dimensionality reduction)

2

Organizing data (e.g.,clustering)

Works by ﬁnding structure in data,exploits redundancies

Entry survey:Unsupervised Learning (0.5 points of ﬁnal grade)

What is it good for in real life?

Where would you use it?

Günay

Ch.20 – Unsupervised Machine Learning

The Google PageRank Algorithm

Why called Google PageRank®?

Assigns “importance” to each page based on incoming links

Before PageRank:

Manually made online directories (e.g.,Yahoo!)

Bag of words maximum likelihood

PageRank improves bag of words model:

Iterative algorithm that models surfer randomly clicking away

In each page,the probability of ﬁnding a target page is divided

by outgoing links.

World with pages:A,B,C,and D.Init 8PR(x) = 0:25

If B,C,and D all link to A,then

PR(A) = PR(B) +PR(C) +PR(D) = 0:75

If B had a link to pages C and A,while page D had links to all

three pages,then PR(A) = PR(B)=2 +PR(C) +PR(D)=3

Günay

Ch.20 – Unsupervised Machine Learning

The Google PageRank Algorithm

Why called Google PageRank®?

Assigns “importance” to each page based on incoming links

Before PageRank:

Manually made online directories (e.g.,Yahoo!)

Bag of words maximum likelihood

PageRank improves bag of words model:

Iterative algorithm that models surfer randomly clicking away

In each page,the probability of ﬁnding a target page is divided

by outgoing links.

World with pages:A,B,C,and D.Init 8PR(x) = 0:25

If B,C,and D all link to A,then

PR(A) = PR(B) +PR(C) +PR(D) = 0:75

If B had a link to pages C and A,while page D had links to all

three pages,then PR(A) = PR(B)=2 +PR(C) +PR(D)=3

Günay

Ch.20 – Unsupervised Machine Learning

The Google PageRank Algorithm

Why called Google PageRank®?

Assigns “importance” to each page based on incoming links

Before PageRank:

Manually made online directories (e.g.,Yahoo!)

Bag of words maximum likelihood

PageRank improves bag of words model:

Iterative algorithm that models surfer randomly clicking away

In each page,the probability of ﬁnding a target page is divided

by outgoing links.

World with pages:A,B,C,and D.Init 8PR(x) = 0:25

If B,C,and D all link to A,then

PR(A) = PR(B) +PR(C) +PR(D) = 0:75

If B had a link to pages C and A,while page D had links to all

three pages,then PR(A) = PR(B)=2 +PR(C) +PR(D)=3

Günay

Ch.20 – Unsupervised Machine Learning

The Google PageRank Algorithm

Why called Google PageRank®?

Assigns “importance” to each page based on incoming links

Before PageRank:

Manually made online directories (e.g.,Yahoo!)

Bag of words maximum likelihood

PageRank improves bag of words model:

Iterative algorithm that models surfer randomly clicking away

In each page,the probability of ﬁnding a target page is divided

by outgoing links.

World with pages:A,B,C,and D.Init 8PR(x) = 0:25

If B,C,and D all link to A,then

PR(A) = PR(B) +PR(C) +PR(D) = 0:75

If B had a link to pages C and A,while page D had links to all

three pages,then PR(A) = PR(B)=2 +PR(C) +PR(D)=3

Günay

Ch.20 – Unsupervised Machine Learning

The Google PageRank Algorithm

Why called Google PageRank®?

Assigns “importance” to each page based on incoming links

Before PageRank:

Manually made online directories (e.g.,Yahoo!)

Bag of words maximum likelihood

PageRank improves bag of words model:

Iterative algorithm that models surfer randomly clicking away

In each page,the probability of ﬁnding a target page is divided

by outgoing links.

World with pages:A,B,C,and D.Init 8PR(x) = 0:25

If B,C,and D all link to A,then

PR(A) = PR(B) +PR(C) +PR(D) = 0:75

If B had a link to pages C and A,while page D had links to all

three pages,then PR(A) = PR(B)=2 +PR(C) +PR(D)=3

Günay

Ch.20 – Unsupervised Machine Learning

The Google PageRank Algorithm

Why called Google PageRank®?

Assigns “importance” to each page based on incoming links

Before PageRank:

Manually made online directories (e.g.,Yahoo!)

Bag of words maximum likelihood

PageRank improves bag of words model:

Iterative algorithm that models surfer randomly clicking away

In each page,the probability of ﬁnding a target page is divided

by outgoing links.

World with pages:A,B,C,and D.Init 8PR(x) = 0:25

If B,C,and D all link to A,then

PR(A) = PR(B) +PR(C) +PR(D) = 0:75

If B had a link to pages C and A,while page D had links to all

three pages,then PR(A) = PR(B)=2 +PR(C) +PR(D)=3

Günay

Ch.20 – Unsupervised Machine Learning

The Google PageRank Algorithm

Why called Google PageRank®?

Assigns “importance” to each page based on incoming links

Before PageRank:

Manually made online directories (e.g.,Yahoo!)

Bag of words maximum likelihood

PageRank improves bag of words model:

Iterative algorithm that models surfer randomly clicking away

In each page,the probability of ﬁnding a target page is divided

by outgoing links.

World with pages:A,B,C,and D.Init 8PR(x) = 0:25

If B,C,and D all link to A,then

PR(A) = PR(B) +PR(C) +PR(D) = 0:75

If B had a link to pages C and A,while page D had links to all

three pages,then PR(A) = PR(B)=2 +PR(C) +PR(D)=3

Günay

Ch.20 – Unsupervised Machine Learning

Other Unsupervised Learning Examples

Dimensionality reduction:

1

Principal/independent component analysis (PCA/ICA)

2

Factor analysis

3

Google PageRank

Clustering:

1

Blind source separation

2

k-Means clustering

3

Competitive learning

4

Expectation maximization (EM)

5

Self-organizing maps (SOM)

Günay

Ch.20 – Unsupervised Machine Learning

Other Unsupervised Learning Examples

Dimensionality reduction:

1

Principal/independent component analysis (PCA/ICA)

2

Factor analysis

3

Google PageRank

Clustering:

1

Blind source separation

2

k-Means clustering

3

Competitive learning

4

Expectation maximization (EM)

5

Self-organizing maps (SOM)

Günay

Ch.20 – Unsupervised Machine Learning

k-Means Clustering

Algorithm:

1

Randomly place k cluster centers

2

Assign each point to closest center

3

Move each cluster to center of gravity of new set

4

Go back to step 2 until no change

Problems:

Choosing the appropriate k

Local minima

High dimensionality

Not mathematical

Günay

Ch.20 – Unsupervised Machine Learning

k-Means Clustering

Algorithm:

1

Randomly place k cluster centers

2

Assign each point to closest center

3

Move each cluster to center of gravity of new set

4

Go back to step 2 until no change

Problems:

Choosing the appropriate k

Local minima

High dimensionality

Not mathematical

Günay

Ch.20 – Unsupervised Machine Learning

Improving k-Means with Gaussians

Gaussian or normal distribution function:

N(;) = P(xj;) =

1

p

2

e

(x)

2

=2

2

Mean and variance parameters can be approximated from data:

=

1

M

X

i

x

i

; =

1

M

X

i

(x

i

)

2

Watch Dr.Thrun use Maximum Likelihood to derive these!

Günay

Ch.20 – Unsupervised Machine Learning

Improving k-Means with Gaussians

Gaussian or normal distribution function:

N(;) = P(xj;) =

1

p

2

e

(x)

2

=2

2

Mean and variance parameters can be approximated from data:

=

1

M

X

i

x

i

; =

1

M

X

i

(x

i

)

2

Watch Dr.Thrun use Maximum Likelihood to derive these!

Günay

Ch.20 – Unsupervised Machine Learning

Multi-variate Gaussians

How would a 2D Gaussian look like?

N(;) = (2)

d=2

jj

1=2

exp

1

2

(x )

T

1

(x )

;where

=

1

M

X

i

(x

i

)

T

(x

i

);andd is the number of dimensions:

Günay

Ch.20 – Unsupervised Machine Learning

Multi-variate Gaussians

How would a 2D Gaussian look like?

N(;) = (2)

d=2

jj

1=2

exp

1

2

(x )

T

1

(x )

;where

=

1

M

X

i

(x

i

)

T

(x

i

);andd is the number of dimensions:

Günay

Ch.20 – Unsupervised Machine Learning

Multi-variate Gaussians

How would a 2D Gaussian look like?

N(;) = (2)

d=2

jj

1=2

exp

1

2

(x )

T

1

(x )

;where

=

1

M

X

i

(x

i

)

T

(x

i

);andd is the number of dimensions:

Günay

Ch.20 – Unsupervised Machine Learning

Fitting Multi-variate Gaussians

Günay

Ch.20 – Unsupervised Machine Learning

Fitting Multi-variate Gaussians

Günay

Ch.20 – Unsupervised Machine Learning

Using Gaussians for Clusters

Assume points belong to clusters with multi-variate Gaussian

distribution.

We could use Maximum Likelihood,but we don’t know

Gaussian parameters (mean and variance).

It’s a chicken-egg problem!

Solution:pretend we have centers.Choose randomly like in

k-means,and then run Expectation Maximization.

Günay

Ch.20 – Unsupervised Machine Learning

Using Gaussians for Clusters

Assume points belong to clusters with multi-variate Gaussian

distribution.

We could use Maximum Likelihood,but we don’t know

Gaussian parameters (mean and variance).

It’s a chicken-egg problem!

Solution:pretend we have centers.Choose randomly like in

k-means,and then run Expectation Maximization.

Günay

Ch.20 – Unsupervised Machine Learning

Expectation Maximization

Expectation Maximization (EM):two-step iterative algorithm

1

Expectation step:For all i,j,calculate probability x

j

belongs

to cluster i:

p

ij

= P(C = i jx

j

) = P(x

j

jC = i )P(C = i )

2

Maximization step:Recalculate parameters

i

=

X

j

p

ij

x

j

=n

i

i

=

X

j

p

ij

(x

j

i

)(x

j

i

)

T

=n

i

Günay

Ch.20 – Unsupervised Machine Learning

Expectation Maximization

Expectation Maximization (EM):two-step iterative algorithm

1

Expectation step:For all i,j,calculate probability x

j

belongs

to cluster i:

p

ij

= P(C = i jx

j

) = P(x

j

jC = i )P(C = i )

2

Maximization step:Recalculate parameters

i

=

X

j

p

ij

x

j

=n

i

i

=

X

j

p

ij

(x

j

i

)(x

j

i

)

T

=n

i

Günay

Ch.20 – Unsupervised Machine Learning

Expectation Maximization

Expectation Maximization (EM):two-step iterative algorithm

1

Expectation step:For all i,j,calculate probability x

j

belongs

to cluster i:

p

ij

= P(C = i jx

j

) = P(x

j

jC = i )P(C = i )

2

Maximization step:Recalculate parameters

i

=

X

j

p

ij

x

j

=n

i

i

=

X

j

p

ij

(x

j

i

)(x

j

i

)

T

=n

i

Günay

Ch.20 – Unsupervised Machine Learning

We Find the Gaussian Clusters

Unsupervised learning of Gaussians is also used in

Radial Basis Function neural networks

Günay

Ch.20 – Unsupervised Machine Learning

We Find the Gaussian Clusters

Unsupervised learning of Gaussians is also used in

Radial Basis Function neural networks

Günay

Ch.20 – Unsupervised Machine Learning

We Find the Gaussian Clusters

Unsupervised learning of Gaussians is also used in

Radial Basis Function neural networks

Günay

Ch.20 – Unsupervised Machine Learning

Can Also Use Gaussians for Density Estimation

Günay

Ch.20 – Unsupervised Machine Learning

Can Also Use Gaussians for Density Estimation

Günay

Ch.20 – Unsupervised Machine Learning

Summary for Expectation Maximization

Expectation Maximization:

All points belong to all centers

Better solution

Less susceptible to local minima

What else can Expectation Maximization do?

Not limited to learning Gaussians

Find hidden variables in Bayes net if we cannot count them

like in the spam example

Find hidden (latent) variables in other algorithms,like Hidden

Markov Models

Learning structures of problems with unknowns (e.g.,Bayes

nets)

Günay

Ch.20 – Unsupervised Machine Learning

Summary for Expectation Maximization

Expectation Maximization:

All points belong to all centers

Better solution

Less susceptible to local minima

What else can Expectation Maximization do?

Not limited to learning Gaussians

Find hidden variables in Bayes net if we cannot count them

like in the spam example

Find hidden (latent) variables in other algorithms,like Hidden

Markov Models

Learning structures of problems with unknowns (e.g.,Bayes

nets)

Günay

Ch.20 – Unsupervised Machine Learning

Summary for Expectation Maximization

Expectation Maximization:

All points belong to all centers

Better solution

Less susceptible to local minima

What else can Expectation Maximization do?

Not limited to learning Gaussians

Find hidden variables in Bayes net if we cannot count them

like in the spam example

Find hidden (latent) variables in other algorithms,like Hidden

Markov Models

Learning structures of problems with unknowns (e.g.,Bayes

nets)

Günay

Ch.20 – Unsupervised Machine Learning

Dementia Reduction,

Number of dimensions to represent these data?

Günay

Ch.20 – Unsupervised Machine Learning

Dementia Reduction,

Number of dimensions to represent these data?

Günay

Ch.20 – Unsupervised Machine Learning

Linear Dimensionality Reduction

How to do this:

1

Find Gaussian parameters of data

2

Find eigenvectors and eigenvalues of covariance matrix

3

Choose eigenvectors with maximum eigenvalues

4

Project data onto selected eigenvector space

Günay

Ch.20 – Unsupervised Machine Learning

Linear Dimensionality Reduction

How to do this:

1

Find Gaussian parameters of data

2

Find eigenvectors and eigenvalues of covariance matrix

3

Choose eigenvectors with maximum eigenvalues

4

Project data onto selected eigenvector space

Günay

Ch.20 – Unsupervised Machine Learning

Linear Dimensionality Reduction

How to do this:

1

Find Gaussian parameters of data

2

Find eigenvectors and eigenvalues of covariance matrix

3

Choose eigenvectors with maximum eigenvalues

4

Project data onto selected eigenvector space

Günay

Ch.20 – Unsupervised Machine Learning

Linear Dimensionality Reduction Example

Günay

Ch.20 – Unsupervised Machine Learning

Linear Dimensionality Reduction Example

Günay

Ch.20 – Unsupervised Machine Learning

Linear Dimensionality Reduction Example

Günay

Ch.20 – Unsupervised Machine Learning

Reducing from Large Dimensional Spaces:Eigenfaces

Face example

50 50 = 2;500 pixels (dimensions)

Reduce to 12 “eigenface” dimensions

Günay

Ch.20 – Unsupervised Machine Learning

Reducing from Large Dimensional Spaces:Eigenfaces

Face example

50 50 = 2;500 pixels (dimensions)

Reduce to 12 “eigenface” dimensions

Günay

Ch.20 – Unsupervised Machine Learning

Reducing from Large Dimensional Spaces:Bodies

Body example

Three dimensions enough to distinguish:height,size,gender

Trick is to use piecewise linear projections

See linear embedding,iso maps for more info

Günay

Ch.20 – Unsupervised Machine Learning

Clustering by Aﬃnity

Would EM of k-means work well?

No.

Günay

Ch.20 – Unsupervised Machine Learning

Clustering by Aﬃnity

Would EM of k-means work well?

No.

Günay

Ch.20 – Unsupervised Machine Learning

Spectral Clustering

Rank deﬁcient matrix

Can use Principal Components Analysis to ﬁnd orthogonal

components

Example with clustering?

Günay

Ch.20 – Unsupervised Machine Learning

Competitive Learning:Neural Gas

Neural gas:

Growing Neural Gas

Gesture recognition

Günay

Ch.20 – Unsupervised Machine Learning

Source Separation

Coctail party problem

Blind source separation

Independent component analysis

Diﬀerence between PCA and ICA?

PCA ﬁnds orthogonal components.

ICA ﬁnds statistically independent components.

Thus,ICA is better for signal separation.

Günay

Ch.20 – Unsupervised Machine Learning

Source Separation

Coctail party problem

Blind source separation

Independent component analysis

Diﬀerence between PCA and ICA?

PCA ﬁnds orthogonal components.

ICA ﬁnds statistically independent components.

Thus,ICA is better for signal separation.

Günay

Ch.20 – Unsupervised Machine Learning

Source Separation

Coctail party problem

Blind source separation

Independent component analysis

Diﬀerence between PCA and ICA?

PCA ﬁnds orthogonal components.

ICA ﬁnds statistically independent components.

Thus,ICA is better for signal separation.

Günay

Ch.20 – Unsupervised Machine Learning

Source Separation

Coctail party problem

Blind source separation

Independent component analysis

Diﬀerence between PCA and ICA?

PCA ﬁnds orthogonal components.

ICA ﬁnds statistically independent components.

Thus,ICA is better for signal separation.

Günay

Ch.20 – Unsupervised Machine Learning

Summary

Learning without teachers is still useful

Makes sense of hidden structure within data

Many uses like:clustering,separation,dimension reduction,

density estimation,...

Both iterative algorithms and mathematical solutions

Make sense of natural data:faces,bodies

Competitive learning can be used to ﬁnd best adapted

solutions:e.g.,ﬁnd best on-screen keyboard for typing?

Use unsupervised learning ﬁrst to simplify data and then

combine with supervised learning!

Exit survey:Unsupervised Learning

What changed in your understanding?

Any new suggestions on where would you use it?

Günay

Ch.20 – Unsupervised Machine Learning

Summary

Learning without teachers is still useful

Makes sense of hidden structure within data

Many uses like:clustering,separation,dimension reduction,

density estimation,...

Both iterative algorithms and mathematical solutions

Make sense of natural data:faces,bodies

Competitive learning can be used to ﬁnd best adapted

solutions:e.g.,ﬁnd best on-screen keyboard for typing?

Use unsupervised learning ﬁrst to simplify data and then

combine with supervised learning!

Exit survey:Unsupervised Learning

What changed in your understanding?

Any new suggestions on where would you use it?

Günay

Ch.20 – Unsupervised Machine Learning

## Comments 0

Log in to post a comment