Archetypal Analysis for Machine Learning - DTU Informatics

builderanthologyAI and Robotics

Oct 19, 2013 (4 years and 20 days ago)

321 views

Informatics and Mathematical Modelling / Cognitive Sysemts Group

1

MLSP 2010 September 1st

Archetypal Analysis for
Machine Learning

Morten Mørup

DTU Informatics

Cognitive Systems Group

Technical University of Denmark

Joint work with

Lars Kai Hansen

DTU Informatics

Cognitive Systems Group

Technical University of Denmark

Informatics and Mathematical Modelling / Cognitive Sysemts Group

2

MLSP 2010 September 1st

Informatics and Mathematical Modelling / Cognitive Sysemts Group

X


††




X

C

S

Archetypical Analysis (AA)

AA formed by two simplex constraints


Archetype:
Xc
k

formed by convex combination of the data points

Projection:
s
n
gives the convex combination of archetypes forming each data point

3

MLSP 2010 September 1st

Informatics and Mathematical Modelling / Cognitive Sysemts Group

4

MLSP 2010 September 1st

The Original paper of Adler and Breiman considered 3 applications

Swiss army head shape

Los Angeles Basin air polution 1976

Tokamak Fusion Data

Other Applications:

Flame dynamics (Stone & Adler 1996)

End member extraction of Galaxy Spectra (
Chan et al, 2003
)

Data driven Benchmarking (
Porzio et al. 2008
)

Informatics and Mathematical Modelling / Cognitive Sysemts Group

Archetypical analysis extract the

”principal convex hull” (PCH) of the data cloud

Convex hull: Blue lines and light shaded region
(dots indicate points in convex set)

Dominant convex hull: green lines and gray shaded region
(dots indicate archetypes)

While convex set can be identified in linear time
O
(N) (
McCallum & Avis 1979
)

finding
C

and
S

is a non
-
convex (NP hard) problem.

5

MLSP 2010 September 1st

(Dwyer, 1988)

NB: One might think that AA is highy driven by outliers, however, ”outliers”
are only relevant if they reflect representative dynamics in the data!

Informatics and Mathematical Modelling / Cognitive Sysemts Group

6

MLSP 2010 September 1st

Our (new) mathematical results:

1: The AA/PCH model is in general unique!

2: The AA/PCH model can be efficiently
initialized by the proposed FurthestSum
algorithm

3: The AA/PCH model parameters can
be efficiently optimized by
normalization invariant projected
gradient

Large scale Applications

See Theorem 1

The proposed FurthestSum algorithm

guarantee extraction of points in the

convex set, see Theorem 2

For details on derivation of updates
and their computational complexity
see section 2.3

Informatics and Mathematical Modelling / Cognitive Sysemts Group

Our Machine Learning Applications


Computer vision


NeuroImaging


TextMining


Collaborative Filtering

7

MLSP 2010 September 1st

Informatics and Mathematical Modelling / Cognitive Sysemts Group

8

MLSP 2010 September 1st

Face database: K=361 pixels, N=2429


all
images belong with probabilty 1 to convex set

SVD/PCA:
Low
-
> high freq. dynamics

NMF:
Part Based Representation

AA:
Archetypes/Freaks

K
-
means:
Centroids/Prototypes

X







X

C

S

Computer Vision: CBCL face database

Informatics and Mathematical Modelling / Cognitive Sysemts Group

Archetypal Analysis naturally
bridges clustering methods with
low rank representations

9

MLSP 2010 September 1st

Informatics and Mathematical Modelling / Cognitive Sysemts Group

NeuroImaging: Positron Emission Tomography

10

MLSP 2010 September 1st

XC

S

Altansering tracer injected, recorded signal in theory mixture
of 3 underlying binding profiles (Archetypes): Low binding
regions, High binding regions and artery/veines. Each voxel a
given concentration fraction of these tissue types.

X







X

C

S

Low Binding

High Binding

Artery/Veines

Informatics and Mathematical Modelling / Cognitive Sysemts Group

Text Mining: NIPS term
-
document (bag of words)

11

MLSP 2010 September 1st

X







C

S

X

XC:

Distinct Aspects

Prototypical Aspects

Informatics and Mathematical Modelling / Cognitive Sysemts Group

12

MLSP 2010 September 1st

Collaborative filtering: MovieLens

Medium size and large size Movie lens data (www.grouplens.org)

Medium size: 1,000,209 ratings of 3,952 movies by 6,040 users

Large size: 10,000,054 ratings of 10,677 movies given by 71,567

Extracts features representing distinct user types, each user represented as a given concentration

fraction of the user types. AA appear to have less tendency to overfit.

Informatics and Mathematical Modelling / Cognitive Sysemts Group

Conclusion


Archetypal Analysis is Unique in general (Theorem 1)


Archetypal Analysis can be efficiently initialized by the
proposed FurhtestSum algorithm (Theorem 2) and optimized
through normalization invariant projected gradient.


Archetypal Analysis naturally bridges clustering with low rank
approximations


Archetypal Analysis results in easy interpretable features that
are closely related to the actual data


Archetypal Analysis useful for a large variety of machine
learning problem domains within unsupervised learning.

(Computer Vision, NeuroImaging, TextMining, Collaborative Filtering)


Archetypal Analysis can be extended to kernel representations
finding the principal convex hull in (a potentially infinite)
Hilbert space (see section 2.4 of the paper).


13

MLSP 2010 September 1st

Informatics and Mathematical Modelling / Cognitive Sysemts Group

Open problems and current research directions:


What is the optimal number of components?

Cross
-
validation based on missing value prediction

(see also collaborative filtering example in the paper)

Bayesian generative models for AA/PCH that
automatically penalize model complexity.


What if ’pure’ archetypes cannot be well represented
by the data available?

14

MLSP 2010 September 1st

vs.

Informatics and Mathematical Modelling / Cognitive Sysemts Group

Selected References from the paper

15

MLSP 2010 September 1st

[1] Adele Cutler and Leo Breiman, “Archetypal analysis,” Technometrics, vol. 36,
no. 4, pp. 338

347, Nov 1994.


[2] D. S. Hochbaum and D. B. Shmoys., “A best possible heuristic or the k
-
center
problem.,” Mathematics of Operational
Research, vol. 10, no. 2, pp. 180

184,
1985.


[7] Emily Stone and Adele Cutler, “Introduction to archetypal analysis of spatio
-
temporal dynamics,” Phys. D, vol. 96, no.
1
-
4, pp. 110

131, 1996.


[8] Giovanni C. Porzio, Giancarlo Ragozini, and Domenico Vistocco,
“On the use of
archetypes as benchmarks,” Appl.
Stoch. Model. Bus. Ind., vol. 24, no. 5, pp.
419

437, 2008.


[9] B. H. P. Chan, D. A. Mitchell, and L. E. Cram, “Archetypal
analysis of galaxy
spectra,” MON.NOT.ROY.ASTRON.SOC., v
ol. 338, pp. 790, 2003.


[11] D. McCallum and D. Avis, “A linear algorithm for finding the convex hull of a
simple polygon,” Information Processing
Letters, vol. 9, pp. 201

206, 1979.


[12] Rex A. Dwyer, “On the convex hull of random points in a polytope,” Journal of
Applied Probability, vol. 25, no. 4, pp.
688

699, 1988.