CS325 Artiﬁcial Intelligence
Ch.20 – Unsupervised Machine Learning
Cengiz Günay
Spring 2013
Günay
Ch.20 – Unsupervised Machine Learning
Unsupervised Learning
Missing teacher
No labels,y
Just input data,x
What can you learn with it?
1
Simplifying data (e.g.,dimensionality reduction)
2
Organizing data (e.g.,clustering)
Works by ﬁnding structure in data,exploits redundancies
Entry survey:Unsupervised Learning (0.5 points of ﬁnal grade)
What is it good for in real life?
Where would you use it?
Günay
Ch.20 – Unsupervised Machine Learning
Unsupervised Learning
Missing teacher
No labels,y
Just input data,x
What can you learn with it?
1
Simplifying data (e.g.,dimensionality reduction)
2
Organizing data (e.g.,clustering)
Works by ﬁnding structure in data,exploits redundancies
Entry survey:Unsupervised Learning (0.5 points of ﬁnal grade)
What is it good for in real life?
Where would you use it?
Günay
Ch.20 – Unsupervised Machine Learning
Unsupervised Learning
Missing teacher
No labels,y
Just input data,x
What can you learn with it?
1
Simplifying data (e.g.,dimensionality reduction)
2
Organizing data (e.g.,clustering)
Works by ﬁnding structure in data,exploits redundancies
Entry survey:Unsupervised Learning (0.5 points of ﬁnal grade)
What is it good for in real life?
Where would you use it?
Günay
Ch.20 – Unsupervised Machine Learning
Unsupervised Learning
Missing teacher
No labels,y
Just input data,x
What can you learn with it?
1
Simplifying data (e.g.,dimensionality reduction)
2
Organizing data (e.g.,clustering)
Works by ﬁnding structure in data,exploits redundancies
Entry survey:Unsupervised Learning (0.5 points of ﬁnal grade)
What is it good for in real life?
Where would you use it?
Günay
Ch.20 – Unsupervised Machine Learning
The Google PageRank Algorithm
Why called Google PageRank®?
Assigns “importance” to each page based on incoming links
Before PageRank:
Manually made online directories (e.g.,Yahoo!)
Bag of words maximum likelihood
PageRank improves bag of words model:
Iterative algorithm that models surfer randomly clicking away
In each page,the probability of ﬁnding a target page is divided
by outgoing links.
World with pages:A,B,C,and D.Init 8PR(x) = 0:25
If B,C,and D all link to A,then
PR(A) = PR(B) +PR(C) +PR(D) = 0:75
If B had a link to pages C and A,while page D had links to all
three pages,then PR(A) = PR(B)=2 +PR(C) +PR(D)=3
Günay
Ch.20 – Unsupervised Machine Learning
The Google PageRank Algorithm
Why called Google PageRank®?
Assigns “importance” to each page based on incoming links
Before PageRank:
Manually made online directories (e.g.,Yahoo!)
Bag of words maximum likelihood
PageRank improves bag of words model:
Iterative algorithm that models surfer randomly clicking away
In each page,the probability of ﬁnding a target page is divided
by outgoing links.
World with pages:A,B,C,and D.Init 8PR(x) = 0:25
If B,C,and D all link to A,then
PR(A) = PR(B) +PR(C) +PR(D) = 0:75
If B had a link to pages C and A,while page D had links to all
three pages,then PR(A) = PR(B)=2 +PR(C) +PR(D)=3
Günay
Ch.20 – Unsupervised Machine Learning
The Google PageRank Algorithm
Why called Google PageRank®?
Assigns “importance” to each page based on incoming links
Before PageRank:
Manually made online directories (e.g.,Yahoo!)
Bag of words maximum likelihood
PageRank improves bag of words model:
Iterative algorithm that models surfer randomly clicking away
In each page,the probability of ﬁnding a target page is divided
by outgoing links.
World with pages:A,B,C,and D.Init 8PR(x) = 0:25
If B,C,and D all link to A,then
PR(A) = PR(B) +PR(C) +PR(D) = 0:75
If B had a link to pages C and A,while page D had links to all
three pages,then PR(A) = PR(B)=2 +PR(C) +PR(D)=3
Günay
Ch.20 – Unsupervised Machine Learning
The Google PageRank Algorithm
Why called Google PageRank®?
Assigns “importance” to each page based on incoming links
Before PageRank:
Manually made online directories (e.g.,Yahoo!)
Bag of words maximum likelihood
PageRank improves bag of words model:
Iterative algorithm that models surfer randomly clicking away
In each page,the probability of ﬁnding a target page is divided
by outgoing links.
World with pages:A,B,C,and D.Init 8PR(x) = 0:25
If B,C,and D all link to A,then
PR(A) = PR(B) +PR(C) +PR(D) = 0:75
If B had a link to pages C and A,while page D had links to all
three pages,then PR(A) = PR(B)=2 +PR(C) +PR(D)=3
Günay
Ch.20 – Unsupervised Machine Learning
The Google PageRank Algorithm
Why called Google PageRank®?
Assigns “importance” to each page based on incoming links
Before PageRank:
Manually made online directories (e.g.,Yahoo!)
Bag of words maximum likelihood
PageRank improves bag of words model:
Iterative algorithm that models surfer randomly clicking away
In each page,the probability of ﬁnding a target page is divided
by outgoing links.
World with pages:A,B,C,and D.Init 8PR(x) = 0:25
If B,C,and D all link to A,then
PR(A) = PR(B) +PR(C) +PR(D) = 0:75
If B had a link to pages C and A,while page D had links to all
three pages,then PR(A) = PR(B)=2 +PR(C) +PR(D)=3
Günay
Ch.20 – Unsupervised Machine Learning
The Google PageRank Algorithm
Why called Google PageRank®?
Assigns “importance” to each page based on incoming links
Before PageRank:
Manually made online directories (e.g.,Yahoo!)
Bag of words maximum likelihood
PageRank improves bag of words model:
Iterative algorithm that models surfer randomly clicking away
In each page,the probability of ﬁnding a target page is divided
by outgoing links.
World with pages:A,B,C,and D.Init 8PR(x) = 0:25
If B,C,and D all link to A,then
PR(A) = PR(B) +PR(C) +PR(D) = 0:75
If B had a link to pages C and A,while page D had links to all
three pages,then PR(A) = PR(B)=2 +PR(C) +PR(D)=3
Günay
Ch.20 – Unsupervised Machine Learning
The Google PageRank Algorithm
Why called Google PageRank®?
Assigns “importance” to each page based on incoming links
Before PageRank:
Manually made online directories (e.g.,Yahoo!)
Bag of words maximum likelihood
PageRank improves bag of words model:
Iterative algorithm that models surfer randomly clicking away
In each page,the probability of ﬁnding a target page is divided
by outgoing links.
World with pages:A,B,C,and D.Init 8PR(x) = 0:25
If B,C,and D all link to A,then
PR(A) = PR(B) +PR(C) +PR(D) = 0:75
If B had a link to pages C and A,while page D had links to all
three pages,then PR(A) = PR(B)=2 +PR(C) +PR(D)=3
Günay
Ch.20 – Unsupervised Machine Learning
Other Unsupervised Learning Examples
Dimensionality reduction:
1
Principal/independent component analysis (PCA/ICA)
2
Factor analysis
3
Google PageRank
Clustering:
1
Blind source separation
2
kMeans clustering
3
Competitive learning
4
Expectation maximization (EM)
5
Selforganizing maps (SOM)
Günay
Ch.20 – Unsupervised Machine Learning
Other Unsupervised Learning Examples
Dimensionality reduction:
1
Principal/independent component analysis (PCA/ICA)
2
Factor analysis
3
Google PageRank
Clustering:
1
Blind source separation
2
kMeans clustering
3
Competitive learning
4
Expectation maximization (EM)
5
Selforganizing maps (SOM)
Günay
Ch.20 – Unsupervised Machine Learning
kMeans Clustering
Algorithm:
1
Randomly place k cluster centers
2
Assign each point to closest center
3
Move each cluster to center of gravity of new set
4
Go back to step 2 until no change
Problems:
Choosing the appropriate k
Local minima
High dimensionality
Not mathematical
Günay
Ch.20 – Unsupervised Machine Learning
kMeans Clustering
Algorithm:
1
Randomly place k cluster centers
2
Assign each point to closest center
3
Move each cluster to center of gravity of new set
4
Go back to step 2 until no change
Problems:
Choosing the appropriate k
Local minima
High dimensionality
Not mathematical
Günay
Ch.20 – Unsupervised Machine Learning
Improving kMeans with Gaussians
Gaussian or normal distribution function:
N(;) = P(xj;) =
1
p
2
e
(x)
2
=2
2
Mean and variance parameters can be approximated from data:
=
1
M
X
i
x
i
; =
1
M
X
i
(x
i
)
2
Watch Dr.Thrun use Maximum Likelihood to derive these!
Günay
Ch.20 – Unsupervised Machine Learning
Improving kMeans with Gaussians
Gaussian or normal distribution function:
N(;) = P(xj;) =
1
p
2
e
(x)
2
=2
2
Mean and variance parameters can be approximated from data:
=
1
M
X
i
x
i
; =
1
M
X
i
(x
i
)
2
Watch Dr.Thrun use Maximum Likelihood to derive these!
Günay
Ch.20 – Unsupervised Machine Learning
Multivariate Gaussians
How would a 2D Gaussian look like?
N(;) = (2)
d=2
jj
1=2
exp
1
2
(x )
T
1
(x )
;where
=
1
M
X
i
(x
i
)
T
(x
i
);andd is the number of dimensions:
Günay
Ch.20 – Unsupervised Machine Learning
Multivariate Gaussians
How would a 2D Gaussian look like?
N(;) = (2)
d=2
jj
1=2
exp
1
2
(x )
T
1
(x )
;where
=
1
M
X
i
(x
i
)
T
(x
i
);andd is the number of dimensions:
Günay
Ch.20 – Unsupervised Machine Learning
Multivariate Gaussians
How would a 2D Gaussian look like?
N(;) = (2)
d=2
jj
1=2
exp
1
2
(x )
T
1
(x )
;where
=
1
M
X
i
(x
i
)
T
(x
i
);andd is the number of dimensions:
Günay
Ch.20 – Unsupervised Machine Learning
Fitting Multivariate Gaussians
Günay
Ch.20 – Unsupervised Machine Learning
Fitting Multivariate Gaussians
Günay
Ch.20 – Unsupervised Machine Learning
Using Gaussians for Clusters
Assume points belong to clusters with multivariate Gaussian
distribution.
We could use Maximum Likelihood,but we don’t know
Gaussian parameters (mean and variance).
It’s a chickenegg problem!
Solution:pretend we have centers.Choose randomly like in
kmeans,and then run Expectation Maximization.
Günay
Ch.20 – Unsupervised Machine Learning
Using Gaussians for Clusters
Assume points belong to clusters with multivariate Gaussian
distribution.
We could use Maximum Likelihood,but we don’t know
Gaussian parameters (mean and variance).
It’s a chickenegg problem!
Solution:pretend we have centers.Choose randomly like in
kmeans,and then run Expectation Maximization.
Günay
Ch.20 – Unsupervised Machine Learning
Expectation Maximization
Expectation Maximization (EM):twostep iterative algorithm
1
Expectation step:For all i,j,calculate probability x
j
belongs
to cluster i:
p
ij
= P(C = i jx
j
) = P(x
j
jC = i )P(C = i )
2
Maximization step:Recalculate parameters
i
=
X
j
p
ij
x
j
=n
i
i
=
X
j
p
ij
(x
j
i
)(x
j
i
)
T
=n
i
Günay
Ch.20 – Unsupervised Machine Learning
Expectation Maximization
Expectation Maximization (EM):twostep iterative algorithm
1
Expectation step:For all i,j,calculate probability x
j
belongs
to cluster i:
p
ij
= P(C = i jx
j
) = P(x
j
jC = i )P(C = i )
2
Maximization step:Recalculate parameters
i
=
X
j
p
ij
x
j
=n
i
i
=
X
j
p
ij
(x
j
i
)(x
j
i
)
T
=n
i
Günay
Ch.20 – Unsupervised Machine Learning
Expectation Maximization
Expectation Maximization (EM):twostep iterative algorithm
1
Expectation step:For all i,j,calculate probability x
j
belongs
to cluster i:
p
ij
= P(C = i jx
j
) = P(x
j
jC = i )P(C = i )
2
Maximization step:Recalculate parameters
i
=
X
j
p
ij
x
j
=n
i
i
=
X
j
p
ij
(x
j
i
)(x
j
i
)
T
=n
i
Günay
Ch.20 – Unsupervised Machine Learning
We Find the Gaussian Clusters
Unsupervised learning of Gaussians is also used in
Radial Basis Function neural networks
Günay
Ch.20 – Unsupervised Machine Learning
We Find the Gaussian Clusters
Unsupervised learning of Gaussians is also used in
Radial Basis Function neural networks
Günay
Ch.20 – Unsupervised Machine Learning
We Find the Gaussian Clusters
Unsupervised learning of Gaussians is also used in
Radial Basis Function neural networks
Günay
Ch.20 – Unsupervised Machine Learning
Can Also Use Gaussians for Density Estimation
Günay
Ch.20 – Unsupervised Machine Learning
Can Also Use Gaussians for Density Estimation
Günay
Ch.20 – Unsupervised Machine Learning
Summary for Expectation Maximization
Expectation Maximization:
All points belong to all centers
Better solution
Less susceptible to local minima
What else can Expectation Maximization do?
Not limited to learning Gaussians
Find hidden variables in Bayes net if we cannot count them
like in the spam example
Find hidden (latent) variables in other algorithms,like Hidden
Markov Models
Learning structures of problems with unknowns (e.g.,Bayes
nets)
Günay
Ch.20 – Unsupervised Machine Learning
Summary for Expectation Maximization
Expectation Maximization:
All points belong to all centers
Better solution
Less susceptible to local minima
What else can Expectation Maximization do?
Not limited to learning Gaussians
Find hidden variables in Bayes net if we cannot count them
like in the spam example
Find hidden (latent) variables in other algorithms,like Hidden
Markov Models
Learning structures of problems with unknowns (e.g.,Bayes
nets)
Günay
Ch.20 – Unsupervised Machine Learning
Summary for Expectation Maximization
Expectation Maximization:
All points belong to all centers
Better solution
Less susceptible to local minima
What else can Expectation Maximization do?
Not limited to learning Gaussians
Find hidden variables in Bayes net if we cannot count them
like in the spam example
Find hidden (latent) variables in other algorithms,like Hidden
Markov Models
Learning structures of problems with unknowns (e.g.,Bayes
nets)
Günay
Ch.20 – Unsupervised Machine Learning
Dementia Reduction,
Number of dimensions to represent these data?
Günay
Ch.20 – Unsupervised Machine Learning
Dementia Reduction,
Number of dimensions to represent these data?
Günay
Ch.20 – Unsupervised Machine Learning
Linear Dimensionality Reduction
How to do this:
1
Find Gaussian parameters of data
2
Find eigenvectors and eigenvalues of covariance matrix
3
Choose eigenvectors with maximum eigenvalues
4
Project data onto selected eigenvector space
Günay
Ch.20 – Unsupervised Machine Learning
Linear Dimensionality Reduction
How to do this:
1
Find Gaussian parameters of data
2
Find eigenvectors and eigenvalues of covariance matrix
3
Choose eigenvectors with maximum eigenvalues
4
Project data onto selected eigenvector space
Günay
Ch.20 – Unsupervised Machine Learning
Linear Dimensionality Reduction
How to do this:
1
Find Gaussian parameters of data
2
Find eigenvectors and eigenvalues of covariance matrix
3
Choose eigenvectors with maximum eigenvalues
4
Project data onto selected eigenvector space
Günay
Ch.20 – Unsupervised Machine Learning
Linear Dimensionality Reduction Example
Günay
Ch.20 – Unsupervised Machine Learning
Linear Dimensionality Reduction Example
Günay
Ch.20 – Unsupervised Machine Learning
Linear Dimensionality Reduction Example
Günay
Ch.20 – Unsupervised Machine Learning
Reducing from Large Dimensional Spaces:Eigenfaces
Face example
50 50 = 2;500 pixels (dimensions)
Reduce to 12 “eigenface” dimensions
Günay
Ch.20 – Unsupervised Machine Learning
Reducing from Large Dimensional Spaces:Eigenfaces
Face example
50 50 = 2;500 pixels (dimensions)
Reduce to 12 “eigenface” dimensions
Günay
Ch.20 – Unsupervised Machine Learning
Reducing from Large Dimensional Spaces:Bodies
Body example
Three dimensions enough to distinguish:height,size,gender
Trick is to use piecewise linear projections
See linear embedding,iso maps for more info
Günay
Ch.20 – Unsupervised Machine Learning
Clustering by Aﬃnity
Would EM of kmeans work well?
No.
Günay
Ch.20 – Unsupervised Machine Learning
Clustering by Aﬃnity
Would EM of kmeans work well?
No.
Günay
Ch.20 – Unsupervised Machine Learning
Spectral Clustering
Rank deﬁcient matrix
Can use Principal Components Analysis to ﬁnd orthogonal
components
Example with clustering?
Günay
Ch.20 – Unsupervised Machine Learning
Competitive Learning:Neural Gas
Neural gas:
Growing Neural Gas
Gesture recognition
Günay
Ch.20 – Unsupervised Machine Learning
Source Separation
Coctail party problem
Blind source separation
Independent component analysis
Diﬀerence between PCA and ICA?
PCA ﬁnds orthogonal components.
ICA ﬁnds statistically independent components.
Thus,ICA is better for signal separation.
Günay
Ch.20 – Unsupervised Machine Learning
Source Separation
Coctail party problem
Blind source separation
Independent component analysis
Diﬀerence between PCA and ICA?
PCA ﬁnds orthogonal components.
ICA ﬁnds statistically independent components.
Thus,ICA is better for signal separation.
Günay
Ch.20 – Unsupervised Machine Learning
Source Separation
Coctail party problem
Blind source separation
Independent component analysis
Diﬀerence between PCA and ICA?
PCA ﬁnds orthogonal components.
ICA ﬁnds statistically independent components.
Thus,ICA is better for signal separation.
Günay
Ch.20 – Unsupervised Machine Learning
Source Separation
Coctail party problem
Blind source separation
Independent component analysis
Diﬀerence between PCA and ICA?
PCA ﬁnds orthogonal components.
ICA ﬁnds statistically independent components.
Thus,ICA is better for signal separation.
Günay
Ch.20 – Unsupervised Machine Learning
Summary
Learning without teachers is still useful
Makes sense of hidden structure within data
Many uses like:clustering,separation,dimension reduction,
density estimation,...
Both iterative algorithms and mathematical solutions
Make sense of natural data:faces,bodies
Competitive learning can be used to ﬁnd best adapted
solutions:e.g.,ﬁnd best onscreen keyboard for typing?
Use unsupervised learning ﬁrst to simplify data and then
combine with supervised learning!
Exit survey:Unsupervised Learning
What changed in your understanding?
Any new suggestions on where would you use it?
Günay
Ch.20 – Unsupervised Machine Learning
Summary
Learning without teachers is still useful
Makes sense of hidden structure within data
Many uses like:clustering,separation,dimension reduction,
density estimation,...
Both iterative algorithms and mathematical solutions
Make sense of natural data:faces,bodies
Competitive learning can be used to ﬁnd best adapted
solutions:e.g.,ﬁnd best onscreen keyboard for typing?
Use unsupervised learning ﬁrst to simplify data and then
combine with supervised learning!
Exit survey:Unsupervised Learning
What changed in your understanding?
Any new suggestions on where would you use it?
Günay
Ch.20 – Unsupervised Machine Learning
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Comments 0
Log in to post a comment