Computer Vision, Part 2

beeuppityΤεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 4 χρόνια και 20 μέρες)

88 εμφανίσεις

Computer Vision, Part 2

Object recognition and scene
“understanding”


What makes object recognition a hard task
for
computers?

HMAX

Riesenhuber
, M. &
Poggio
, T. (1999),

“Hierarchical Models of Object Recognition in Cortex”


Serre
, T., Wolf, L.,
Bileschi
, S.,
Risenhuber
, M., and
Poggio
, T. (2006),

“Robust Object Recognition with Cortex
-
Like Mechanisms”


HMAX: A hierarchical neural
-
network model of object
recognition.



Meant to model human vision at level of “immediate
recognition” capabilities of ventral visual pathway,
independent of attention or other top
-
down processes.



Also called “Standard Model” (because it incorporates the
“standard model” of visual cortex)



Inspired by earlier “
Neocognitron
” model of Fukushima (1980)


General ideas behind model


“Immediate” visual processing is
feedforward

and
hierachical
: low levels
detect simple features, which are combined hierarchically into increasingly
complex features to be detected



Layers of hierarchy alternate between “sensitivity” (to detecting features)
and “invariance” (to position, scale, orientation)



Size of receptive fields increases along the hierarchy



Degree of invariance increases along the hierarchy






The
HMAX
model for object recognition

(
Riesenhuber
,
Poggio
,
Serre
, et al.)


The
HMAX
model for object recognition

(
Riesenhuber
,
Poggio
,
Serre
, et al.)


Image (gray
-
scale)

S1 layer

Edge detectors

The
HMAX
model for object recognition

(
Riesenhuber
,
Poggio
,
Serre
, et al.)


Image (gray
-
scale)

S1 layer

Edge detectors

The
HMAX
model for object recognition

(
Riesenhuber
,
Poggio
,
Serre
, et al.)


Image (gray
-
scale)

C1 layer

Max over local S1 units

Layers alternate

between

“specificity”

and

“invariance”

over position,

scale, orientation


S1 layer

Edge detectors

The
HMAX
model for object recognition

(
Riesenhuber
,
Poggio
,
Serre
, et al.)


Image (gray
-
scale)

C1 layer

Max over local S1 units

S2 layer

Prototypes

(small image patches)

Layers alternate

between

“specificity”

and

“invariance”

over position,

scale, orientation


S1 layer

Edge detectors

The
HMAX
model for object recognition

(
Riesenhuber
,
Poggio
,
Serre
, et al.)


Image (gray
-
scale)

C1 layer

Max over local S1 units

S2 layer

Prototypes

(small image patches)

C2 layer

Max activation over each
prototype

Layers alternate

between

“specificity”

and

“invariance”

over position,

scale, orientation


S1 layer

Edge detectors

The
HMAX
model for object recognition

(
Riesenhuber
,
Poggio
,
Serre
, et al.)


Image (gray
-
scale)

C1 layer

Max over local S1 units

S2 layer

Prototypes

(small image patches)

C2 layer

Max activation over each
prototype

Classification layer

Object or image
classification

Layers alternate

between

“specificity”

and

“invariance”

over position,

scale, orientation


S1 layer

Edge detectors

The
HMAX
model for object recognition

(
Riesenhuber
,
Poggio
,
Serre
, et al.)


Image (gray
-
scale)

C1 layer

Max over local S1 units

S2 layer

Prototypes

(small image patches)

C2 layer

Max activation over each
prototype

Classification layer

Object or image
classification

Layers alternate

between

“specificity”

and

“invariance”

over position,

scale, orientation


Job of HMAX is to

produce a higher
-
level

representation of an image that

will be useful for classification.

S1 layer

Edge detectors

4 orientations, 16 scales

Image (gray
-
scale)

Etc.: 16 scales

One S1 receptive field:

MAX

MAX

S1 layer

Edge detectors

4 orientations, 16 scales

C1 layer

Max activation over local S1
units (local position, scale)

4 orientations, 8 scales

Image (gray
-
scale)

S2 layer

Calculate similarity to
prototype


(radial basis function)

4 orientations, 8 scales



C1 layer

Max activation over local S1
units (local position, scale)

4 orientations, 8 scales

S2 unit: Calculate similarity to prototype for each “pooled” position

in C1 layer.


S2 layer

Calculate similarity to
prototype


(radial basis function)

4 orientations, 8 scales



Prototypes

(~1000, chosen from image collection,

translated to C1 features)

C1 layer

Max activation over local S1
units (local position, scale)

4 orientations, 8 scales

S2 unit: Calculate similarity to prototype for each “pooled” position

in C1 layer.


S2 layer

Calculate similarity to
prototype


(radial basis function)

4 orientations, 8 scales



Prototypes

(~1000, chosen from image collection,

translated to C1 features)

C1 layer

Max activation over local S1
units (local position, scale)

4 orientations, 8 scales

S2 unit: Calculate similarity to prototype for each “pooled” position

in C1 layer.


Similarity: Radial basis function:

S2 layer

Calculate similarity to
prototype


(radial basis function)

4 orientations, 8 scales



C2 layer

Max activation over
position, orientation,
scale

S2
1

S2
2




MAX

(1 value)

MAX

(1 value)




C2 layer

Max over position,
orientation, scale

.11

.78




.32

Support Vector Machine

classification

(e.g.
,

dog / not dog
)

Streetscenes

“scene understanding” system

(
Bileschi
, 2006)

Use HMAX + SVM to identify object classes:

Car, Pedestrian, Bicycle, Building, Tree

How
Streetscenes

Works

(
Bileschi
, 2006)

1. Densely tile the image with

windows of different sizes.


2. C1 and C2 features are

computed in each window.


3. The features in each

window are given as input

to each of five trained

support vector machines


4. If any return a

classification with score above

a learned threshold, that object is

said to be “detected” .



Object detection (here, “car”) with HMAX model


(
Bileschi
, 2006)

Sample of results from HMAX model

(Serre et al., 2006)