P r e s e n t e d B y Wa n c h e n L u
2/2 5/2 0 1 3
Multi

view Clustering via
Canonical Correlation Analysis
Kamalika
Chaudhuri
et al.
ICML
2009.
INTRODUCTION
ASSUMPTION IN MULTI

VIEW
PROBLEMS
•
The input variable (a real vector) can be partitioned into
two different views, where it is assumed that either view
of the input is sufficient to make accurate predictions

essentially the co

training assumption.
•
e.g.
•
Identity recognition with one view being a video stream and
the other an audio stream;
•
Web page classification where one view is the text and the
other is the hyperlink structure;
•
Object recognition with pictures from different camera angles;
•
A bilingual parallel corpus, with each view presented in one
language.
INTUITION IN MULTI

VIEW
PROBLEMS
•
Many multi

view learning algorithms force
agreement between the predictors based on either
view. (usually force the predictor on view 1 to equal
to the predictor based on view 2)
•
The complexity of the learning problem is reduced
by eliminating hypothesis from each view that do
not agree with each other.
BACKGROUND
CANONICAL CORRELATION ANALYSIS
•
CCA
is a way of measuring the linear relationship
between two multidimensional variables.
•
F
ind two basis vectors, one for
x
and one for
y
, such
that the correlations between the
projections
of the
variables onto these basis vectors are maximized.
CALCULATING CANONICAL
CORRELATIONS
•
Consider the total covariance matrix of random
variables
x
and
y
with zero mean:
•
The canonical correlations between
x
and
y
can be
found by solving the eigenvalue equations
RELATION TO OTHER LINEAR
SUBSPACE METHODS
•
Formulate the problems in one single eigenvalue
equation
PRINCIPAL COMPONENT ANALYSIS
•
The principal components are the eigenvectors of
the covariance matrix.
•
The projection of data onto the principal
components is an orthogonal transformation that
diagonalizes
the covariance matrix.
PARTIAL LEAST SQUARES
•
PLS
is basically the singular value decomposition
(
SVD
) of a between

sets covariance matrix.
•
In
PLS
regression, the principal vectors
corresponding to the largest principal values are
used as basis. A regression of
y
onto
x
is then
performed in this basis.
ALGORITHM
THE BASIC IDEA
•
Use
CCA
to project the data down to the subspace
spanned by the means to get an easier clustering
problem, then apply standard clustering algorithms
in this space.
•
When the data in at least one of the views is well
separated, this algorithm clusters correctly with high
probability.
A
L
GORITHM
•
Input: a set of samples
S,
the number of clusters
k
1.
Randomly partition
S
into two subsets
A
and
B
of
equal size.
2.
Let
C_12
(A) be the covariance matrix between
views 1 and 2, computed from the set
A
.
Compute the top
k

1
left singular vectors of
C_12
(A), and project the samples in
B
on the
subspace spanned by these vectors.
3.
Apply clustering algorithm (single linkage
clustering, K

means) to the projected examples in
view 1.
EXPERIMENTS
SPEAKER IDENTIFICATION
•
Dataset
•
41 speakers, speaking 10 sentences each
•
Audio features 1584 dimensions
•
Video feature 2394 dimensions
•
Method 1: use
PCA
project into 40 D
•
Method 2:
u
se
CCA
(after
PCA
into 100 D for images
and 1000 D for audios)
•
Cluster into 82 clusters (2 / speaker) using K

means
SPEAKER IDENTIFICATION
•
Evaluation
•
Conditional perplexity
•
= the mean # of speakers corresponding to each cluster
CLUSTERING WIKIPEDIA ARTICLES
•
Dataset
•
128 K Wikipedia articles, evaluated on 73 K articles that
belong to the 500 most frequent categories.
•
Link structure feature
L
is a concatenation of ``to`` and
``from`` vectors.
L(
i
)
is the number of times the current
article links to/from article
i
.
•
Text feature is a bag

of

words vector.
•
Methods: compared
PCA
and
CCA
•
Used a hierarchical clustering procedure, iteratively pick the
largest cluster, reduce the dimensionality using
PCA
or
CCA
,
and use k

means to break the cluster into smaller ones, until
reaching the total desired number of clusters.
RESULTS
THANK YOU
APPENDIX: A NOTE ON CORRELATION
•
Correlation between
x_i
and
x_i
is the covariance
normalized by the geometric mean of the
variances of
x_i
and
x_j
AFFINE TRANSFORMATIONS
•
An affine transformation is a map
Comments 0
Log in to post a comment