EE 290A: Generalized
Principal Component Analysis
Lecture 2 (by Allen Y. Yang):
Extensions of PCA
Sastry & Yang © Spring, 2011
EE 290A, University of California, Berkeley
1
Last time
Challenges in modern data clustering problems.
PCA reduces dimensionality of the data while
retaining as much data variation as possible.
Statistical view: The first
d
PCs are given by the
d
leading eigenvectors of the covariance.
Geometric view: Fitting a
d

dim subspace model via
SVD
Sastry & Yang © Spring, 2011
EE 290A, University of California, Berkeley
2
This lecture
Determine an optimal number of PCs:
d
Probabilistic PCA
Kernel PCA
Robust PCA shall be discussed later
Sastry & Yang © Spring, 2011
EE 290A, University of California, Berkeley
3
Determine the number of PCs
Choosing the optimal number of PCs in noise

free
case is straightforward:
Sastry & Yang © Spring, 2011
EE 290A, University of California, Berkeley
4
In the noisy case
Sastry & Yang © Spring, 2011
EE 290A, University of California, Berkeley
5
knee
point
A Model Selection Problem
With moderate Gaussian noise, to keep 100%
fidelity of the data, all
D

dim must be preserved.
However, we can still find tradeoff
between model
complexity and data fidelity
?
Sastry & Yang © Spring, 2011
EE 290A, University of California, Berkeley
6
More principled conditions
Sastry & Yang © Spring, 2011
EE 290A, University of California, Berkeley
7
Probabilistic PCA: A generative
approach
Sastry & Yang © Spring, 2011
EE 290A, University of California, Berkeley
8
Given sample statistics, (*) contains ambiguities
Assume
y
is standard normal, and
ε
is
isotropic
Then each observation is also Gaussian
Sastry & Yang © Spring, 2011
EE 290A, University of California, Berkeley
9
Determining principal axes by MLE
Sastry & Yang © Spring, 2011
EE 290A, University of California, Berkeley
10
Compute the log

likelihood for
n
samples
The gradient of
L
leads to stationary points
Two nontrivial solutions
Sastry & Yang © Spring, 2011
EE 290A, University of California, Berkeley
11
Sastry & Yang © Spring, 2011
EE 290A, University of California, Berkeley
12
Kernel PCA: for nonlinear data
Nonlinear embedding
Sastry & Yang © Spring, 2011
EE 290A, University of California, Berkeley
13
Example
Sastry & Yang © Spring, 2011
EE 290A, University of California, Berkeley
14
Question: How to recover the
coef
?
Compute the null space of the data matrix
The special polynomial embedding is called the
Veronese map
Sastry & Yang © Spring, 2011
EE 290A, University of California, Berkeley
15
Dimensionality Issue in Embedding
Given
D
and order
n
, what is the dimension of the
Veronese map?
Often the dimension blows up with large D or
n
.
Question
: Can we find the higher

order nonlinear
structures without explicitly calling the embedding
function?
Sastry & Yang © Spring, 2011
EE 290A, University of California, Berkeley
16
Nonlinear PCA
Nonlinear PCs
Sastry & Yang © Spring, 2011
EE 290A, University of California, Berkeley
17
In the case
M
is much larger than
n
Sastry & Yang © Spring, 2011
EE 290A, University of California, Berkeley
18
Kernel PCA
Computations in NLPCA only involve inner
products of the embedded samples, not the samples
themselves.
Therefore, the mapping relation can be expressed in
the the
computation of PCA
without explicitly
calling the embedding function.
The inner product of two embedded samples is
called the
kernel function
.
Sastry & Yang © Spring, 2011
EE 290A, University of California, Berkeley
19
Kernel Function
Sastry & Yang © Spring, 2011
EE 290A, University of California, Berkeley
20
Computing
NLPCs
via Kernel Matrix
Sastry & Yang © Spring, 2011
EE 290A, University of California, Berkeley
21
Examples of Popular Kernels
Polynomial kernel:
Gaussian kernel (Radial Basis Function):
Intersection kernel:
Sastry & Yang © Spring, 2011
EE 290A, University of California, Berkeley
22
Comments 0
Log in to post a comment