Sparse Coding and Its Extensions for

journeycartΤεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

71 εμφανίσεις

1


Sparse Coding and Its Extensions for
Visual Recognition

Kai Yu


M
edia Analytics Department

NEC Labs America, C
upertino, CA

V
isual Recognition is HOT in Computer Vision

10/15/2013

2

C
altech 101

PASCAL VOC

80 Million Tiny

Images

I
mageNet

T
he pipeline of machine visual perception

10/15/2013

3

Low
-
level
sensing

Pre
-
processing

Feature
extract.

Feature
selection

Inference:
prediction,
recognition


M
ost c
ritical for accuracy


A
ccount for most of the
computation


Most time
-
consuming
in

development
cycle


O
ften hand
-
craft in practice

M
ost Efforts in
Machine Learning

Computer vision features

SIFT

Spin image

HoG

RIFT

S
lide Credit: Andrew Ng

GLOH

L
earning everything from data

10/15/2013

5

Low
-
level
sensing

Pre
-
processing

Feature
extract.

Feature
selection

Inference:
prediction,
recognition

M
achine
L
earning

Machine Learning


BoW + SPM Kernel

10/15/2013

6



Combining multiple features, this method had been the state
-
of
-
the
-
art
on Caltech
-
101, PASCAL, 15 Scene Categories, …

Figure credit: Fei
-
Fei Li, Svetlana Lazebnik

Bag
-
of
-
visual
-
words

representation (BoW)
based on vector quantization (VQ)

Spatial pyramid matching (SPM) kernel

W
inning Method in PASCAL VOC before 2009

10/15/2013

M
ultiple Feature
Sampling Methods

Multiple

Visual
Descriptors


VQ C
oding
,
H
istogram,
SPM

N
onlinear SVM

7

Convolution Neural Networks

8




T
he architectures of some successful methods are
not so much different from CNNs


Conv. Filtering Pooling Conv. Filtering Pooling

BoW+SPM: the same architecture

9

e.g, SIFT, HOG

VQ Coding

Average Pooling

(obtain histogram)

Nonlinear
SVM

Local Gradients

Pooling

Observations:



Nonlinear SVM is not scalable



VQ coding may be too coarse



Average pooling is not optimal



Why not learn the whole thing?

D
evelop better methods

10

Better Coding

Better Pooling

Scalable
Linear
Classifier

B
etter Coding

B
etter
Pooling

S
parse Coding

10/15/2013

11

Sparse coding (Olshausen & Field,1996). Originally
developed to explain early visual processing in the brain
(edge detection).

T
raining: given a set of random patches x, learning a
dictionary of bases [Φ
1,
Φ
2,
…]

Coding: for data vector x, solve LASSO to find the
sparse coefficient vector a




Sparse Coding E
xample


Natural Images

Learned bases (
f
1 , …,
f
64
): “Edges”



0.8 * + 0.3 * + 0.5 *


x





0.8 *
f
36

+ 0.3 *
f
42

+ 0.5 *
f
63

[a
1
, …, a
64
] =
[
0, 0, …, 0,

0.8
,
0, …, 0,
0.3
,
0, …, 0,
0.5
, 0
]

(feature representation)

Test example

Compact & easily interpretable

Slide credit: Andrew Ng

Testing:

What is this?

Motorcycles

Not motorcycles

Unlabeled images



[Raina, Lee, Battle, Packer & Ng, ICML 07]

S
elf
-
taught Learning

Testing:

What is this
?

Slide credit: Andrew Ng

Classification

R
esult on Caltech 101

10/15/2013

14

64%

SIFT VQ + N
onlinear
SVM

50%

Pixel

S
parse Coding
+ Linear SVM


9K

images, 101 classes

15

S
parse

Coding

M
ax

Pooling

Scalable
Linear
Classifier

Local Gradients

Pooling

e.g, SIFT, HOG

S
parse Coding on SIFT

[Y
ang, Yu, Gong & Huang
, CVPR09]

10/15/2013

16

64%

SIFT VQ + N
onlinear
SVM

73%

SIFT

S
parse Coding +
Linear SVM


C
altech
-
101

S
parse Coding on SIFT

[Y
ang, Yu, Gong & Huang
, CVPR09]

W
hat we have learned?

17

S
parse

Coding

M
ax

Pooling

Scalable
Linear
Classifier

Local Gradients

Pooling

1.
S
parse coding is a useful stuff (why?)

2.
H
ierarchical architecture is needed

e.g, SIFT, HOG

MNIST E
xperiments

10/15/2013

18

Error: 4.54%



When SC achieves the best classification accuracy, the
learned bases are like digits


each basis has a clear local
class association.


Error: 3.75%

Error: 2.64%

Distribution of coefficient (SIFT, Caltech101)

10/15/2013

19

Neighbor bases tend to
get nonzero coefficients

10/15/2013

20

I
nterpretation

2

Geometr
y

of data manifold




Each basis an “
anchor point




Sparsity
is induced by locality
:
each datum is a linear
combination of neighbor
anchors.


I
nterpretation

1

Discover
subspaces




Each basis is a “
direction




Sparsity
: each datum is a
linear combination of
only
several bases.



Related to topic model

A F
unction Approximation View to Coding

10/15/2013

21




S
etting
:
f(x) is a nonlinear
feature extraction function
on image patches x




Coding
: nonlinear mapping



x


a

typically, a is high
-
dim &
sparse




Nonlinear Learning
:


f(x) = <w, a>



A
coding scheme is good if it helps learning f(x)

10/15/2013

22

A F
unction Approximation View to Coding



The General Formulation

F
unction
Approx.
Error



A
n unsupervised
learning objective

Local Coordinate Coding (LCC)

10/15/2013

23




D
ictionary Learning: k
-
means (or hierarchical
k
-
means)




C
oding for x, to obtain its sparse representation
a


Step 1


ensure locality
: find the K nearest bases



Step 2


ensure low coding error
:




Yu, Zhang & Gong, NIPS 09

W
ang, Yang, Yu, Lv, Huang CVPR 10

Super
-
Vector Coding (SVC)

10/15/2013

24




D
ictionary Learning: k
-
means (or hierarchical
k
-
means)




C
oding for x, to obtain its sparse representation
a


Step 1


find the nearest bas
i
s of x, obtain its VQ
coding





e.g. [0, 0, 1, 0, …]


Step 2


form super vector coding:




e.g. [0, 0, 1, 0, …, 0, 0, (x
-
m
3
),
0

…]




Zhou, Yu, Zhang, and Huang, ECCV 10

Zero
-
order

Local tangent

F
unction Approximation based on

LCC

10/15/2013

25


data points


bases

locally linear

Yu, Zhang, Gong, NIPS 10

Function Approximation based on SVC

data points

cluster centers

Piecewise local linear (
first
-
order)

Local tangent

Zhou, Yu, Zhang, and Huang, ECCV 10

PASCAL VOC C
hallenge
2009

10/15/2013

27

Ours

Best of

Other Teams

Difference

Classes

N
o.1 for 18 of 20
categories



W
e used only HOG
feature on gray images

I
mageNet Challenge 2010

10/15/2013

28

~40%

VQ + I
ntersection Kernel

64%~73%

Various Coding
Methods + Linear SVM

1.4
million images, 1000 classes,

top5 hit rate


50%

Classification accuracy

H
ierarchical sparse coding

29

Conv. Filtering Pooling Conv. Filtering Pooling

L
earning from
unlabeled data

Yu, Lin, & Lafferty, CVPR 11

A two
-
layer sparse coding formulation

10/15/2013

30

MNIST Results
--

classification



HSC vs. CNN:
HSC provide even better performance than CNN





more amazingly, HSC learns features in
unsupervised

manner!

31

MNIST results
--

effect of hierarchical learning

C
omparing the Fisher score of HSC and SC



Discriminative power:
is significantly improved by HSC although HSC is



unsupervised coding

32

MNIST results
--

learned codebook

33

One dimension in the second layer: invariance to
translation, rotation, and deformation

Caltech101 results
--

classification



Learned descriptor:
performs slightly better than SIFT + SC

34

Conclusion and Future Work




function approximation
” view to derive novel sparse coding
methods.



Locality



one way to achieve sparsity and it’s really useful. But we
need deeper understanding of the feature learning methods



Interesting directions


Hierarchical coding


Deep Learning (many papers now!)


Faster methods for sparse coding (e.g. from LeCun’s group)


Learning features from a richer structure of data, e.g., video
(learning invariance to out plane rotation)

References

10/15/2013

37



L
earning Image Representations from Pixel Level via Hierarchical Sparse Coding,


K
ai Yu,
Yuanqing

Lin, John Lafferty.
CVPR 2011



Large
-
scale Image Classification: Fast Feature Extraction and SVM Training,


Y
uanqing

Lin,
Fengjun

Lv
,
Liangliang

Cao,
Shenghuo

Zhu, Ming Yang,
Timothee

Cour
, Thomas Huang, Kai Yu


in
CVPR 2011



ECCV 2010 Tutorial, Kai Yu, Andrew Ng (with links to some source codes)



Deep Coding Networks,


Yuanqing

Lin, Tong Zhang,
Shenghuo

Zhu, Kai Yu. In
NIPS 2010
.



Image Classification using Super
-
Vector Coding of Local Image Descriptors,


Xi Zhou, Kai Yu, Tong Zhang, and Thomas Huang. In
ECCV 2010
.



Efficient Highly Over
-
Complete Sparse Coding using a Mixture Model,


Jianchao

Yang, Kai Yu, and Thomas Huang. In
ECCV 2010
.



Improved Local Coordinate Coding using Local Tangents,


Kai Yu and Tong Zhang. In
ICML 2010
.



Supervised translation
-
invariant sparse coding,


Jianchao

Yang, Kai Yu, and Thomas Huang, In
CVPR 2010



Learning locality
-
constrained linear coding for image classification,


Jingjun

Wang,
Jianchao

Yang, Kai Yu,
Fengjun

Lv
, Thomas Huang. In
CVPR 2010
.



Nonlinear learning using local coordinate coding,


Kai Yu, Tong Zhang, and
Yihong

Gong. In
NIPS 2009
.



Linear spatial pyramid matching using sparse coding for image classification,


Jianchao

Yang, Kai Yu,
Yihong

Gong, and Thomas Huang. In
CVPR 2009
.