Some Topics Deserved

paraderollΤεχνίτη Νοημοσύνη και Ρομποτική

17 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

74 εμφανίσεις

Some Topics Deserved
Concerns

Songcan Chen

2013.3.6

Outlines


Copula & its applications



Kronecker Decomposition for Matrix



Covariance Descriptors & Metric on
manifold

[1] Fabrizio Durante and Carlo Sempi,
Copula Theory: An Introduction (Chapt. 1),
P. Jaworski et al. (eds.),
Copula Theory and Its Applications
, Lecture Notes in
Statistics 198,2010.

[2] Jean
-
David Fermanian,
An overview of the goodness
-
of
-
fit test problem for
copulas

(Chapt 1), arXiv: 19 Nov. 2012.

Applications

[A1] David Lopez
-
Paz, Jose Miguel Hernandez
-
Lobato, Bernhard Scholkopf,
Semi
-
Supervised Domain Adaptation with Non
-
Parametric Copulas,

NIPS2012/arXiv:1 Jan,2013.

[A2] David Lopez
-
Paz, et al,
Gaussian Process Vine Copulas for Multivariate
Dependence
, ICML2013/arXiv: 16 Feb. 2013.

[A3] Carlos Almeida, et al,
Modeling high dimensional time
-
varying dependence
using D
-
vine SCAR models
, arXiv: 9 Feb. 2012.

[A4] Alexander Baue, et al,
Pair
-
copula Bayesian networks
, arXiv:23 Nov. 2012.


… …

Copula & its applications

Kronecker Decomposition for Matrix

[1] C. V. Loan and N. Pitsianis,
Approximation with kronecker products
, in
Linear Algebra for Large Scale and Real Time Applications. Kluwer
Publications, 1993, pp. 293

314.

[2] T. Tsiligkaridis, A. Hero, and S. Zhou,
On Convergence of Kronecker
Graphical Lasso Algorithms
, to appear in IEEE TSP, 2013.

[3]
---
,
Convergence Properties of Kronecker Graphical Lasso Algorithms
,


arXiv:1204.0585, July 2012.

[4]
---
,
Low Separation Rank Covariance Estimation using Kronecker
Product Expansions
, google 2013.

[5]
---

Covariance Estimation in High Dimensions via Kronecker Product
Expansions
, arXiv:12 Feb. 2013.

[6]
---

SPARSE COVARIANCE ESTIMATION UNDER KRONECKER
PRODUCT STRUCTURE
, ICCASP2012,pp:3633
-
3636.

[7] Marco F. Duarte, Richard G. Baraniuk,
Kronecker Compressive Sensing
,
IEEE TIP, 21(2)494
-
504 2012

[8] MARTIN SINGULL, et al,
More on the Kronecker Structured Covariance
Matrix
,
Communications in Statistics

Theory and Methods
, 41: 2512

2523,
2012

Covariance Descriptor

[1]
Oncel Tuzel, Fatih Porikli, and Peter Meer,Region

Covariance
-
A Fast Descriptor for
Detection and Classification, Tech. Report 2005.

[2]
Yanwei Pang
,
Yuan Yuan
,
Xuelong Li
,
Gabor
-
Based Region Covariance Matrices for
Face Recognition
, IEEE T CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,
18(7):989
-
993,2008

[3] Anoop Cherian, et al,
Jensen
-
Bregman LogDet Divergence with Application to
Efficient Similarity Search for Covariance Matrices
, IEEE TPAMI, in press, 2012.

[4] Pedro Cortez Cargill,et al,
Object Tracking based on Covariance Descriptors and
On
-
Line Naive Bayes Nearest Neighbor Classifier
, 2010 4th Pacific
-
Rim Symp.
Image and Video Technology,pp.139
-
144.

[5]
Ravishankar Sivalingam, et al,
Positive Definite Dictionary Learning for
Region Covariances
, ICCV 2011.

[6] Mehrtash T. Harandi, et al,
Kernel Analysis over Riemannian Manifolds for
Visual Recognition of Actions, Pedestrians and Textures
, CVPR2012.

Copula & its applications

What is Copula?


Definition


Copulas are
statistical tools

that factorize
multivariate distributions into
the product
of its marginals

and
a function that
captures any possible form of dependence
among them (marginals)
.
This function is
referred to as the copula
, and


it links the marginals together into the joint
multivariate model.

What is Copula?


Mathematical formulation:

P(
x
i
) is the
marginal cdf

of the random variable
x
i
.

Interestingly, this density has
uniform marginals
, since
P(z)~ U[0; 1] for any random variable z.

When P(x
1
); … ; P(x
d
) are continuous, the copula
c(.)

is unique

(2)


Especially, when
factoriz
ing

multivariate

densities into
a product of marginal
distributions and
bivariate

copula functions

(called as
vines
)
.




Each of these

factors

corresponds to one
of
the building blocks

that

are

assume
d

either
constant
or

varying across

different
learning domains
.




applicable to DA, TL and MTL!

Characteristics


Infinitely many multivariate models

share the same
underlying copula function!

main advantage


allowed to model
separately

the marginal
distributions and the dependencies linking
them together to produce the multivariate
model subject of study.

Estimate
p
(
x
) from given samples

Step 1: Construct estimates of the marginal
pdfs







cdfs


Step 2: Combine them


Estimate marginal pdfs and cdfs


Parametric (copula) manners


Examples: Gaussian, Gumbel, Frank,


Clayton or Student copulas, etc.


Weaknesses:


Real
-
world data often exhibit complex
dependencies which cannot be correctly
described!

Illustration of
Weaknesses


Non
-
parametric manners


Using unidimensional KDEs.



Illustration of estimation for
Bivariate Copulas

Estimate marginal pdfs and cdfs

Non
-
parametric Bivariate Copulas


Estimating:

Now

From pdf to cdf (pseudo
-
sample from its copula
c
):

Where r.v. (u, v):

(4)

Non
-
parametric Bivariate Copulas

(u,v)’s joint density is the copula function c(u; v)!


Using KDE with Gaussian kernels can approximate c(u; v)!
but will lead to (u,v)’s support of [0,1]x[0,1] rather than R
2
!

Instead, performing the density estimation in a
transformed

space:

Selecting some continuous distribution with support on R,
strictly positive density , cumulative distribution and
quantile function .

Let their joint pdf:

(6)

Non
-
parametric Bivariate Copulas


The copula of this new density is
identical

to the copula of
(4), since the performed transformations are marginal
-
wise and the support of (6) is now R
2
;


Specially using Gauss density, having

See [A1] for more details of derivation!

Non
-
parametric Multivariate Copulas

From Bivariate (pair copula) to multivariate (copula):

Extension Trick: Introduction of
R
-
vine

Domain Adaptation:

Non
-
linear regression with
continuous

data


regression

Given the source pdf:

And solving a target task with density:

DA of Non
-
linear regression


Given the data available for both tasks, our objective is to
build a good estimate for the conditional density

To address this domain adaptation problem, we assume
that
p
t

is a modified version of
p
s
, In particular, we assume
that
p
t

is obtained in two steps from
p
s
.

DA of Non
-
linear regression

Step1: p
s

is expressed using an
R
-
vine

representation as
follows:




Step2: Some of the factors included in that representation
(
marginal distributions or pairwise copulas
) are modified
to derive
p
t
.


All we need to address the adaptation across domains
is to reconstruct the
R
-
vine representation of
p
s

using
data from the source task, and then identify which of the
factors have been modified to produce
p
t
. These factors
are corrected using data from the target task.



DA of Non
-
linear regression

A Key :


Changes in these factors across different domains can
be detected using two sample tests (such as MMD), and
transferred across domains in order to adapt the target
task density model!

See [A1] for more details!

Maximum Mean Discrepancy (MMD) will return low
p
-
values
when two samples are unlikely to have been drawn from the
same distribution!

Insights


How to extend the copula with image patches?


How to apply it to multiview learning with (semi
-
)
pairing or/and (semi
-
)supervision?


How to adapt the universum to such new problem?


How to apply it to zero
-
data learning?


Tailor it to 2D (even Tensor) copula




Kronecker Product Decomposition

for (Covariance) Matrix

Kronecker Product (KP)

Covariance

[1] C. V. Loan and N. Pitsianis,
Approximation with kronecker products
, in Linear Algebra
for Large Scale and Real Time Applications. Kluwer Publications, 1993, pp. 293

314.

[1] proves that any
pq
x
pq

matrix

0

can be written as an
orthogonal expansion of KPs of the form (1), thus allowing any
covariance matrix

to be
arbitrarily

approximated by a
bilinear

decomposition of the form (1).

(1)

Estimation of HD Covariance matrix

Applications


Channel modeling for MIMO wireless communications,
Geo
-
statistics, Genomics, Multi
-
task learning, Face
recognition, Recommendation systems, Collaborative
filtering, …

Estimation of HD Covariance matrix


Main difficulty of estimation via the

maximum likelihood principle:


The
nonconvexity of

optimization problem!


Seeking alternatives!


1) The flip flop (FF) algorithm [WJS08];


2) Penalized Least squares (PLS)[Lou12]


3) PERMUTED RANK
-
PLS (PRLS
)[5]



[WJS08] K. Werner, M. Jansson, and P. Stoica, On estimation of covariance matrices
with Kronecker product structure, IEEE TSP, 56(2), 2008.

[Lou12]K. Lounici, “High
-
dimensional covariance matrix estimation with missing
observations,” arXiv:1201.2577v5, May 2012

PLS

Sample covariance matrix (SCM):

with 0 means and covariance (1)

(2)

(3)

PRLS

(4)

(5)

As a result, the closed
-
form solution of (4) is

A Theorem

See [5] for more details!

Other estimation for

KP structured covariance estimation

The basic Kronecker model is


The ML objective:

Use

The problem (58) turns to

Hybrid Robust Kronecker Model

The ML objective:

Solving for
Σ
>0 again
via Lemma 4 yields

the problem (73) reduces to

Solve (75) using the fixed point iteration

Arbitrary can be used as initial iteration.

1 1
03
1 1
(,,,{ }) [( ) ( ) ]
i
n
C
i T i T i T i
i j j j j i
i j
E A B Q tr X BA Y X BA Y Q
 
 
     

1 1
01
(,,,) [( ) ( ) ]
T T T
E A B Q tr BA Y BA Y Q
 
    
1 1
1 1 2 2
1 1
log log
Q Q
   
 
     
1 1
02
1
(,,{ },{ }) [( ) ( ) ]
n
T T T
i i i i i i i i
i
E A B Q tr X BA Y X BA Y Q
 

     

… …

Insights (1)

1) Metric Learning (ML)


ML&CL, Relative Distance constraints, LMNN
-
like,…



2) Classification learning

Predictive function: f(X)=tr(W
T
X)+b;

The objective:

2 1 1
(,) [( ) ( ) ]
T
i j i j i j
d X X tr X X X X Q
 
   
2
1
*
1 1
min [ ( ( ) ) ] (,,)
C n
T i i
i j i j i i C
i j
tr W X b y W Pen W W
 
 
   
 
Insights (2)


ML across heterogeneous domains


2 lines:



1) Line 1:



2) Line 2 (for ML&CL)

2
2
(,);
T T T
i j x i y j ij ij
d W W W W

   
x y x y z z
0
(,) [ ] [ ]
0
T T T
T
W
f W U
W
 
  
 
 
x x
x y x y z z
y y
U U U
 
 
Symmetry and PSD

An indefinite measure ({U
i
} is base & {
α
i
} is sparsified)

1
(,) ( ) ( )
I
T T T
i i
i
f U U U U

 

   

x y z z z z z z
1
1
I
i
i
with




Implying that 2 lines can be unified to a common indefinite ML!

Noise model

i c c i ci ci
U
   
x m y e o

Where
c

is the
c
-
th class or cluster,
e
ci

is noise and
o
ci

is
outlier and its ||
o
ci
||
≠0

if outlier, 0 otherwise.

Discuss:

1)
U
c
=0,
o
ci
=0;
e
ci
~N(0, d
I
)


Means
; Lap(0,d
I
)


Medians
;
other priors


other statistics

2)
U
c

0,
o
ci
=0;
e
ci
~ N(0, d
I
)


PCA; Lap(0,d
I
)


L
1
-
PCA;


other priors

other PCAs;

Insights (4)

3) U
c
=0,
o
ci
≠0;
e
ci
~N(0, d
I
)


Robust (k
-
)Means
;


~ Lap(0,d
I
)


(k
-
)Medians;

4) Subspace


U
c
≠0,
o
ci
≠0;
e
ci
~N(0, d
I
)


Robust k
-
subspaces;

5)
m
c
=0 ……

6) Robust (Semi
-
)NMF ……

7) Robust CA ……




where noise model:
Γ
=BA
T
Υ
+E+O

i c c i ci ci
U
   
x m y e o
Covariance Descriptor (CD)

Applications of CD


Multi
-
camera object tracking;


Human detection,


Hmage segmentation,


Texture segmentation,


Robust face recognition,


Emotion recognition,


Human action recognition,


Speech recognition




[3] Anoop Cherian, et al,
Jensen
-
Bregman LogDet Divergence with Application to
Efficient Similarity Search for Covariance Matrices
, IEEE TPAMI, in press, 2012.

CD for Image and vision


I: an intensity or color image.


F: WxHx
d
feature image extracted from I by

(1)

where the function can be any mapping such as intensity,
color, gradients, filter responses, etc.

E.g.,

CD for Image and vision


For a given rectangular region R in F, let {z
k
},
k=1..n be the d
-
dimensional feature points inside
R, the CD of R is defined

(2)

CD for Face Image

Object representation:

Construct five covariance matrices from overlapping regions
of an object feature image. The covariances are used as the
object descriptors!

CD for Textures

Texture representation.

There are u images for each texture class and we sample

s regions from each image and compute covariance matrices C

Advantages


A single covariance matrix extracted from a region is usually
enough to match the region in different views and poses;


a natural way of fusing multiple features which might be
correlated;


low
-
dimensional compared to other region descriptors and
due to symmetry C
R
;


a certain scale and rotation invariance over the regions in
different images due to regardless of
the ordering and the
number of points
.


Fast in calculation via integral image!

Matching


Key:


Distance Measures between SPD matrices!


Known:



All SPD matrices with the size form a
Riemannian

manifold! Thus the distance
between 2 SPDs can be measured using

geodesics
!

However,
computing similarity


between covariance matrices is non
-
trivial
.

Metrics between 2 SPD Matrices X and Y


Affine Invariant Riemannian Metric (AIRM)





Log
-
Euclidean Riemannian Metric
(LERM)

Metrics between 2 SPD Matrices X and Y


Symmetrized KL
-
Divergence Metric (KLDM)








Jensen
-
Bregman LogDet Divergence (JBLD)

Properties of JBLD

Important Theorems (1)

Important Theorems (2)

Computing time (1)

Computing time (2)

K
-
means with JBLD


Objective

Isosurface plots for
various distance
measures (a) Frobenius
distance, (b) AIRM, (c)
KLDM, and (d) JBLD

Table 3, A comparison of various metrics on covariances
and their computational complexities against JBLD

See [3] for more details!

[3] Anoop Cherian, et al,
Jensen
-
Bregman LogDet Divergence with Application to
Efficient Similarity Search for Covariance Matrices
, IEEE TPAMI, in press, 2012.

Insights


How to extend CD to text?


Key: define CD on general graph with discrete
operators on graph, including


local: derivative, gradient, difference, etc..


global: centrality, etc..


Tailor CD to 2D classifier with various scenarios


KP and PDF defined on CD


Copula on CD!


Extend it to multiview with heterogeneous sources!




Thanks!

Q&A