1
Sparse Coding and Its Extensions for
Visual Recognition
Kai Yu
M
edia Analytics Department
NEC Labs America, C
upertino, CA
V
isual Recognition is HOT in Computer Vision
10/15/2013
2
C
altech 101
PASCAL VOC
80 Million Tiny
Images
I
mageNet
T
he pipeline of machine visual perception
10/15/2013
3
Low
-
level
sensing
Pre
-
processing
Feature
extract.
Feature
selection
Inference:
prediction,
recognition
•
M
ost c
ritical for accuracy
•
A
ccount for most of the
computation
•
Most time
-
consuming
in
development
cycle
•
O
ften hand
-
craft in practice
M
ost Efforts in
Machine Learning
Computer vision features
SIFT
Spin image
HoG
RIFT
S
lide Credit: Andrew Ng
GLOH
L
earning everything from data
10/15/2013
5
Low
-
level
sensing
Pre
-
processing
Feature
extract.
Feature
selection
Inference:
prediction,
recognition
M
achine
L
earning
Machine Learning
BoW + SPM Kernel
10/15/2013
6
•
Combining multiple features, this method had been the state
-
of
-
the
-
art
on Caltech
-
101, PASCAL, 15 Scene Categories, …
Figure credit: Fei
-
Fei Li, Svetlana Lazebnik
Bag
-
of
-
visual
-
words
representation (BoW)
based on vector quantization (VQ)
Spatial pyramid matching (SPM) kernel
W
inning Method in PASCAL VOC before 2009
10/15/2013
M
ultiple Feature
Sampling Methods
Multiple
Visual
Descriptors
VQ C
oding
,
H
istogram,
SPM
N
onlinear SVM
7
Convolution Neural Networks
8
•
T
he architectures of some successful methods are
not so much different from CNNs
Conv. Filtering Pooling Conv. Filtering Pooling
BoW+SPM: the same architecture
9
e.g, SIFT, HOG
VQ Coding
Average Pooling
(obtain histogram)
Nonlinear
SVM
Local Gradients
Pooling
Observations:
•
Nonlinear SVM is not scalable
•
VQ coding may be too coarse
•
Average pooling is not optimal
•
Why not learn the whole thing?
D
evelop better methods
10
Better Coding
Better Pooling
Scalable
Linear
Classifier
B
etter Coding
B
etter
Pooling
S
parse Coding
10/15/2013
11
Sparse coding (Olshausen & Field,1996). Originally
developed to explain early visual processing in the brain
(edge detection).
T
raining: given a set of random patches x, learning a
dictionary of bases [Φ
1,
Φ
2,
…]
Coding: for data vector x, solve LASSO to find the
sparse coefficient vector a
Sparse Coding E
xample
Natural Images
Learned bases (
f
1 , …,
f
64
): “Edges”
0.8 * + 0.3 * + 0.5 *
x
0.8 *
f
36
+ 0.3 *
f
42
+ 0.5 *
f
63
[a
1
, …, a
64
] =
[
0, 0, …, 0,
0.8
,
0, …, 0,
0.3
,
0, …, 0,
0.5
, 0
]
(feature representation)
Test example
Compact & easily interpretable
Slide credit: Andrew Ng
Testing:
What is this?
Motorcycles
Not motorcycles
Unlabeled images
…
[Raina, Lee, Battle, Packer & Ng, ICML 07]
S
elf
-
taught Learning
Testing:
What is this
?
Slide credit: Andrew Ng
Classification
R
esult on Caltech 101
10/15/2013
14
64%
SIFT VQ + N
onlinear
SVM
50%
Pixel
S
parse Coding
+ Linear SVM
9K
images, 101 classes
15
S
parse
Coding
M
ax
Pooling
Scalable
Linear
Classifier
Local Gradients
Pooling
e.g, SIFT, HOG
S
parse Coding on SIFT
[Y
ang, Yu, Gong & Huang
, CVPR09]
10/15/2013
16
64%
SIFT VQ + N
onlinear
SVM
73%
SIFT
S
parse Coding +
Linear SVM
C
altech
-
101
S
parse Coding on SIFT
[Y
ang, Yu, Gong & Huang
, CVPR09]
W
hat we have learned?
17
S
parse
Coding
M
ax
Pooling
Scalable
Linear
Classifier
Local Gradients
Pooling
1.
S
parse coding is a useful stuff (why?)
2.
H
ierarchical architecture is needed
e.g, SIFT, HOG
MNIST E
xperiments
10/15/2013
18
Error: 4.54%
•
When SC achieves the best classification accuracy, the
learned bases are like digits
–
each basis has a clear local
class association.
Error: 3.75%
Error: 2.64%
Distribution of coefficient (SIFT, Caltech101)
10/15/2013
19
Neighbor bases tend to
get nonzero coefficients
10/15/2013
20
I
nterpretation
2
Geometr
y
of data manifold
•
Each basis an “
anchor point
”
•
Sparsity
is induced by locality
:
each datum is a linear
combination of neighbor
anchors.
I
nterpretation
1
Discover
subspaces
•
Each basis is a “
direction
”
•
Sparsity
: each datum is a
linear combination of
only
several bases.
•
Related to topic model
A F
unction Approximation View to Coding
10/15/2013
21
•
S
etting
:
f(x) is a nonlinear
feature extraction function
on image patches x
•
Coding
: nonlinear mapping
x
a
typically, a is high
-
dim &
sparse
•
Nonlinear Learning
:
f(x) = <w, a>
A
coding scheme is good if it helps learning f(x)
10/15/2013
22
A F
unction Approximation View to Coding
–
The General Formulation
F
unction
Approx.
Error
≤
A
n unsupervised
learning objective
Local Coordinate Coding (LCC)
10/15/2013
23
•
D
ictionary Learning: k
-
means (or hierarchical
k
-
means)
•
C
oding for x, to obtain its sparse representation
a
Step 1
–
ensure locality
: find the K nearest bases
Step 2
–
ensure low coding error
:
Yu, Zhang & Gong, NIPS 09
W
ang, Yang, Yu, Lv, Huang CVPR 10
Super
-
Vector Coding (SVC)
10/15/2013
24
•
D
ictionary Learning: k
-
means (or hierarchical
k
-
means)
•
C
oding for x, to obtain its sparse representation
a
Step 1
–
find the nearest bas
i
s of x, obtain its VQ
coding
e.g. [0, 0, 1, 0, …]
Step 2
–
form super vector coding:
e.g. [0, 0, 1, 0, …, 0, 0, (x
-
m
3
),
0
,
…]
Zhou, Yu, Zhang, and Huang, ECCV 10
Zero
-
order
Local tangent
F
unction Approximation based on
LCC
10/15/2013
25
data points
bases
locally linear
Yu, Zhang, Gong, NIPS 10
Function Approximation based on SVC
data points
cluster centers
Piecewise local linear (
first
-
order)
Local tangent
Zhou, Yu, Zhang, and Huang, ECCV 10
PASCAL VOC C
hallenge
2009
10/15/2013
27
Ours
Best of
Other Teams
Difference
Classes
N
o.1 for 18 of 20
categories
W
e used only HOG
feature on gray images
I
mageNet Challenge 2010
10/15/2013
28
~40%
VQ + I
ntersection Kernel
64%~73%
Various Coding
Methods + Linear SVM
1.4
million images, 1000 classes,
top5 hit rate
50%
Classification accuracy
H
ierarchical sparse coding
29
Conv. Filtering Pooling Conv. Filtering Pooling
L
earning from
unlabeled data
Yu, Lin, & Lafferty, CVPR 11
A two
-
layer sparse coding formulation
10/15/2013
30
MNIST Results
--
classification
HSC vs. CNN:
HSC provide even better performance than CNN
more amazingly, HSC learns features in
unsupervised
manner!
31
MNIST results
--
effect of hierarchical learning
C
omparing the Fisher score of HSC and SC
Discriminative power:
is significantly improved by HSC although HSC is
unsupervised coding
32
MNIST results
--
learned codebook
33
One dimension in the second layer: invariance to
translation, rotation, and deformation
Caltech101 results
--
classification
Learned descriptor:
performs slightly better than SIFT + SC
34
Conclusion and Future Work
“
function approximation
” view to derive novel sparse coding
methods.
Locality
–
one way to achieve sparsity and it’s really useful. But we
need deeper understanding of the feature learning methods
Interesting directions
–
Hierarchical coding
–
Deep Learning (many papers now!)
–
Faster methods for sparse coding (e.g. from LeCun’s group)
–
Learning features from a richer structure of data, e.g., video
(learning invariance to out plane rotation)
References
10/15/2013
37
•
L
earning Image Representations from Pixel Level via Hierarchical Sparse Coding,
K
ai Yu,
Yuanqing
Lin, John Lafferty.
CVPR 2011
•
Large
-
scale Image Classification: Fast Feature Extraction and SVM Training,
Y
uanqing
Lin,
Fengjun
Lv
,
Liangliang
Cao,
Shenghuo
Zhu, Ming Yang,
Timothee
Cour
, Thomas Huang, Kai Yu
in
CVPR 2011
•
ECCV 2010 Tutorial, Kai Yu, Andrew Ng (with links to some source codes)
•
Deep Coding Networks,
Yuanqing
Lin, Tong Zhang,
Shenghuo
Zhu, Kai Yu. In
NIPS 2010
.
•
Image Classification using Super
-
Vector Coding of Local Image Descriptors,
Xi Zhou, Kai Yu, Tong Zhang, and Thomas Huang. In
ECCV 2010
.
•
Efficient Highly Over
-
Complete Sparse Coding using a Mixture Model,
Jianchao
Yang, Kai Yu, and Thomas Huang. In
ECCV 2010
.
•
Improved Local Coordinate Coding using Local Tangents,
Kai Yu and Tong Zhang. In
ICML 2010
.
•
Supervised translation
-
invariant sparse coding,
Jianchao
Yang, Kai Yu, and Thomas Huang, In
CVPR 2010
•
Learning locality
-
constrained linear coding for image classification,
Jingjun
Wang,
Jianchao
Yang, Kai Yu,
Fengjun
Lv
, Thomas Huang. In
CVPR 2010
.
•
Nonlinear learning using local coordinate coding,
Kai Yu, Tong Zhang, and
Yihong
Gong. In
NIPS 2009
.
•
Linear spatial pyramid matching using sparse coding for image classification,
Jianchao
Yang, Kai Yu,
Yihong
Gong, and Thomas Huang. In
CVPR 2009
.
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο