PPT - People.csail.mit.edu

hurriedtinkleΤεχνίτη Νοημοσύνη και Ρομποτική

15 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

78 εμφανίσεις

1

2012

LLNL

Training Image Classifiers with Similarity Metrics,
Linear Programming, and Minimal Supervision

Asilomar

SSC


Karl Ni, Ethan Phelps, Katherine Bouman, Nadya Bliss

Lincoln Laboratory, Massachusetts Institute of Technology

2

November 2012

This work is sponsored by the Department of the Air Force under Air Force contract
FA8721
-
05
-
C
-
0002. Opinions, interpretations, conclusions, and recommendations are
those of the author and are not necessarily endorsed by the United States Government.

2

2012

LLNL


What can a computer understand?












Applying Semantic Understanding of Images


Who?


What?


When?


Where?

Classifier

Decision!

Training Data

Query by example

Statistical modeled

Query by sketch

Computer

vision algorithms


Image retrieval


Robotic navigation


Semantic labeling


Image sketch


Structure from Motion


Image localization



Requires: Some prior knowledge



Feature
Extraction

Matching &

Association

3

2012

LLNL

Processing

FRAMEWORK

Localization
Algorithms

EXPLOITATION

Ground Imagery, Video

Aerial Imagery, Video

Location

Training Framework


Metadata


Graphs


Point Clouds


Distributions


Terrain


Etc.

World

Model

Multi
-
Modal

Sources

OFFLINE

SETUP

Feature
Extraction

Matching &

Association

4

2012

LLNL


Introduction


Feature Pruning Background


Matched Filter Training


Results


Summary


Outline

5

2012

LLNL


Problems in image pattern matching







Features are a quantitative way for machines to
understand an image



Image Property


Feature Technique


Local Color



(
Luma

+
Chroma

Hists
)


Object texture



(DCT Local & Normalized)


Shape





(
Curvelets
,
Shapelets
)


Lower level gradients


(DWT :
Haar
,
Daubechies
)


Higher level descriptors

(SIFT/SURF/
HoG
,
etc
)


Scene descriptors


(GIST)
-

Torralba

et al.

Finding the Features of Image


Each image

=
10

million pixels!


Most dimensions are irrelevant


Multiple concepts inside the image


Typical chain:

Feature
Extraction

Training /
Classifier

6

2012

LLNL

Numerous features: subset is relevant



FEATURES ARE:


Red bricks on multiple
buildings


Small hedges, etc


Windows of a certain
type


Types of buildings are
there



FEATURES ARE:


Arches and white
buildings


Domes and ancient
architecture


Older/speckled
materials (higher
frequency image
content)



FEATURES ARE:


More suburb
-
like


Larger roads


Drier vegetation


Shorter houses

Choice of features requires looking at multiple semantic concepts
defined by entities and attributes
inside

of images

Feature
Extraction

Training /
Classifier

7

2012

LLNL


Most
of the features are irrelevant



Large dimensionality and algorithmic
complexity



Keep small numbers of salient features and discard

large numbers of
nondescriptive

features


Feature invariance to transformations, content, and

context only to an extent (e.g., SIFT, RIFT, etc.)


Simplify classifier (both computation & supervision)


Multiple instances of several features describing the same object



Require a high level of abstraction


Visual similarity does not always correlate with “semantic” similarity









Feature Descriptors

Feature
Extraction

Training /
Classifier

Brown et. al., Lowe et. al.,

Ng et. al.,
Thrun

et. al.

8

2012

LLNL


Tools to hand label

concepts


2006
-
2011


Google Image Labeler


Kobus’s

Corel Dataset


MIT
LabelMe


Yahoo! Games


Problems


Tedious


Time consuming


Incorrect


Very low throughput


Famous algorithms


Parallelizable


Not
generalizable
,

unfortunately


Getting the Right Features

Feature
Extraction

People can’t be flying or walking on billboards!

1. Chair, 2. Table, 3. Road, 4. Road, 5. Table, 6. Car, 7. Keyboard

1

3

4

6

7

2

5

9

2012

LLNL


Segmentation is a difficult manual task



Multiple semantic concepts per single image



Considerable amounts of noise most often irrelevant to any concept





Automatically Learn the Best Features

Concept 1 (e.g., sky)

Concept 2 (e.g., mountain)

Concept 3 (e.g., river)

Semantic Simplex

0.2

0.3

0.05



0.2

Kwitt
, et. al. (
Kitware
)

10

2012

LLNL


Lots of work in the 1990s


C
onditional probabilities
through large training data sets


Motivated by the
q
uery by example
and
query by sketch

problems


Primarily based on multiple instance learning and noisy density
estimation


Learning multiple instances of an object (no noise case)





Robustness to noise through law of large numbers


Hope to integrate it out







Although the area of red boxes per instance is small, their aggregate
over all instances is dominant

Leveraging Related Work

Noise, if uncorrelated, will

become more and more sparse

Diettrich
, et. al.

Keeler, et. al.

(Not the IBM Query for relational databases
Zloof
, but
Ballerini

et al.)

11

2012

LLNL


Feature clustering in the large









Mixture hierarchies can be incrementally trained







Parallel Calculations through Hierarchies

Top Level GMM

Lower Level GMMs

Can be done in parallel

image 1

image 2

image 3

Vasconcelos
, et. Al.

Image Class 1

Image Class 2

Image Class N

Distribution 1

Entire image

Distribution 2

Distribution N

Entire image

Entire image

Training images

Training images

Automatic feature
subselection

has been submitted to SSP 2012

Lincoln Laboratory

GRID Processing

12

2012

LLNL


Introduction


Feature Pruning Background


Matched Filter Training


Results


Summary


Outline

13

2012

LLNL


Hierarchical Gaussian mixtures as a density estimate


Small sample
-
bias is large


Non
-
convex / sensitive to initialization


Extensive
computational process to bring hierarchies together


Each
level requires
supervision (#classes, initialization, etc.)



Think
discriminantly
:


Instead of: Generating
centroids

that represent images


Think: Prune features to eliminate redundancy



Sparsity

optimization


Solving directly for the features that we want to use


Reduction of redundancy is intuitive and not generative



Under normalization, GMM’s classifier can be implemented with
matched filter instead



Finding a sparse basis set

normalize




c
C
c
y
x
,
max
arg
}
,...,
1
{
2
2
}
,...,
1
{
||
||
min
arg
c
C
c
y
x


14

2012

LLNL


Let the feature be the
j
th

feature in the training set,
where italicized

is the
i
th

dimension of that feature.








Let the
X
be a
d

x
N

matrix that represents the collection
of all the features, where the
j
th

column of
X

is a feature
vector
x
j
.

A Note on Notation
















)
(
)
(
2
)
(
)
(
1
j
d
j
j
j
x
x
x

x
)
(
j
x
)
(
j
i
x


N
j
X
x
x
x
x
,
,
,
,
2
1



15

2012

LLNL


Gaussian Mixture Models





Many optimization
problems induce
sparsity
:




Matched filter constraint:





Relaxation of constraints



Finding
sparsity

with linear programming

Group Lasso

Max
-
Constraint Optimization

Not convex

LP Optimization Problem:

Faster than G
-
Lasso

Independemt

of dimensionality!

Convex (unlike MF opt & GMM, EM)

On average, according to N
2

GMM, solved via EM

(non
-
convex optimization problem)










i
i
T
t
X
X
tr



)
(
min
arg
1
0



i
ij
t

1
1
:,

T
j

such that

and









j
j
T
X
X
tr
2
:,
||
||
)
(
max
arg




}
1
,
0
{

ij

1
1
:,

T
j

such that

and










j
j
X
X
2
:,
2
2
||
||
||
)
(
||
min
arg


















N
j
M
m
m
m
j
m
m
x
p
M
M
1
1
...
...
)
|
(
log
min
1
1




Feature
Extraction

Training /
Classifier

16

2012

LLNL


Relies on similarity matrix concept














Actual implementation does not
include similarity
matrix
, but rather keeps track of beta indices

Intuition

β

=

< t
1

< t
2

< t
3

< t
4











i
i
T
t
X
X
tr




)
(
min
arg
*
1
0



i
ij
t

1
1

T

s.t
.

and














1
95
.
1
.
0
1
.
0
95
.
1
2
.
0
0
1
.
0
2
.
0
1
98
.
1
.
0
0
98
.
1
X
X
T


-
norm of the
rows of
X

17

2012

LLNL


The optimization problem consists solely of dot
products in a similarity function, whose prototypes are
provided by that are similar to a set:






Nonlinearity may be introduced in a kernel function
(RKHS) that induces a vector space that we may not
necessarily know the mapping to.

Nonlinear Feature Selection










i
i
T
t
X
X
tr



)
(
min
arg
1
0



i
ij
t

1
1
:,

T
j

such that

and

a
r
g
m
i
n


K
(
x
i
,
x
j
)

i
j
i
j



t
i
i









1
0



i
ij
t

1
1
:,

T
j

such that

and

x
i

X
18

2012

LLNL

Application to Classification

Feature
Extraction

Matching &

Association

= BEST FEATURES

QUERY

Classifying Image

with Confidence

Just a faster way to classify imagery in one versus all frameworks











i
i
T
t
X
X
tr




)
(
min
arg
*
1
0



i
ij
t

1
1

T

TRAINING

Feature
Extraction

19

2012

LLNL


Introduction


Feature Pruning Background


Matched Filter Training


Results


Summary


Outline

20

2012

LLNL

LP Feature Learning versus G
-
Lasso


More intuitive grouping


Threshold learning is unnecessary


Post
-
processing is unnecessary


5.452% more accurate in +1/
-
1 learning classes

21

2012

LLNL

Segmentation and Classification Visual Result

Decisions

Original Image

Decisions

22

2012

LLNL

Interesting automatic semantic learning result

23

2012

LLNL

Application to Localization

Training

Datasets

MIT
-
Kendall

Vienna

Dubrovnik

Lubbock

Testing

MIT
-
Kendall

0.975

0.056

0.024

0.102

Vienna

0.050

0.896

0.035

0.060

Dubrovnik

0.015

0.024

0.905

0.057

Lubbock

0.097

0.002

0.053

0.901


1400 images per dataset


Filter reduction to 356 filters per class


Less than a minute classification time


Coverage of cities: entire cities (Vienna, Dubrovnik,
Lubbock), portion of Cambridge (MIT
-
Kendall)

24

2012

LLNL



Accurate modeling must occur before we have any hope
in classifying images.



Feature pruning is equivalent to Gaussian centroid
determination under normalization



Sparse optimization enables feature pruning and
matched filter creation



Sparse optimization contains only dot products so
optimization can occur with RKHS in the
transductive

setting


Summary

25

2012

LLNL


K. Ni, E. Phelps, K. L.
Bouman
, N. Bliss, “Image Feature Selection via Linear Programming,” to appear in Presentation at
Asilomar

SSC, Pacific Grove, CA. October (
Asilomar

‘12)



S. M. Sawyer, K. Ni, N. T. Bliss. "Cluster
-
based 3D Reconstruction of Aerial Video." to appear in Presentation at the 1st IEEE
High Performance Extreme Computing Conference, Waltham, MA. September 2012 (HPEC '12)



H.
Viggh

and K. Ni, “SIFT Based Localization Using Prior World Model for Robotic Navigation in Urban Environments,” to
appear in Presentation at the 16th International Conference on Image Processing, Computer Vision, and Pattern Recognition,
2012, Las Vegas, Nevada (IPCV
-
2012)



K. Ni, Z. Sun, N. Bliss, "Real
-
time Global Motion Blur Detection", to appear in Presentation at the IEEE International
Conference on Image Processing, 2012, Orlando, Florida, (ICIP
-
2012)



N.
Arcalano
, K. Ni, B. Miller, N. Bliss, P. Wolfe, "Moments of Parameter Estimates for Chung
-
Lu Random Graph Models",
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2012, Kyoto Japan, ICASSP
-
2012



A.
Vasile
, L.
Skelly
, K. Ni, R.
Heinrichs
, O. Camps, and M.
Sznaier
, “Efficient City
-
sized 3D Reconstruction from Ultra
-
High
Resolution Aerial and Ground Video Imagery”, Proceedings of the IEEE International Symposium on Visual Computing, 2011,
Las Vegas, NV, ISCV
-
2011, pp 347
-
358



K. Ni, Z. Sun, N. Bliss, "3
-
D Image Geo
-
Registration Using Vision
-
Based Modeling", Proceedings of the IEEE International
Conference on Acoustics, Speech, and Signal Processing, 2011, Prague, Czech Republic, ICASSP
-
2011, pp 1573
-

1576



K. Ni, T. Q. Nguyen, "Empirical Type
-
I Filter Design for Image Interpolation", Proceedings of the IEEE International
Conference on Acoustics, Speech, and Signal Processing, ICASSP
-
2010, pp 866
-

869



Z. Sun, N. Bliss, & K. Ni, "A 3
-
D Feature Model for Image Matching", Proceedings of the IEEE International Conference on
Acoustics, Speech, and Signal Processing, ICASSP
-
2010, pp 2194
-
2197



K. Ni, Z. Sun, N. Bliss, & N.
Snavely
, "Construction and exploitation of a 3D model from 2D image features", Proceedings of
SPIE International Conference on Electronic Imaging, Inverse Problems Session, SPIE
-
2010, Vol. 7533, San Jose, CA, U.S.A.,
January 2010.

References

26

2012

LLNL


MIT Lincoln Laboratory


Karl Ni


Nicholas Armstrong
-
Crews


Scott Sawyer


Nadya Bliss



MIT


Katherine L.
Bouman



Boston University


Zachary Sun



Northeastern University


Alexandru Vasile



Cornell University


Noah
Snavely

Contributors and Acknowledgements

27

2012

LLNL


Questions?

28

2012

LLNL


Backup