f o = central frequency θ = angle γ = sigma in direction of ...

builderanthologyAI and Robotics

Oct 19, 2013 (3 years and 9 months ago)

53 views

Biologically Motivated
Computer Vision

Digital Image Processing

Sumitha Balasuriya

Department of Computing Science, University of Glasgow

General Vision Problem


Machine vision has been very successful in finding
solutions to
specific
,
well constrained

problems such as
optical character recognition or fingerprint recognition. In
fact machine vision has surpassed human vision in many
such closed domain tasks.


However it is only in biology where we find systems that
can handle
unconstrained
,
diverse

vision problems.


How can a biological or machine system which just
captures two dimensional visual information from a view
of a cluttered field even attempt to reason with and
function in the environment? An
accurate detailed spatial
model

of the environment is difficult to compute and the
whole problem of scene analysis is
ill
-
posed
.

A problem is well posed if (1) a solution exists, (2)
the solution is unique, (3) the solution depends
continuously on the initial data (stability property).


Ill
-
posed problem


?

Several possible solutions exist


The general vision problem isn’t
really solved in biology …


For example I can't build an accurate spatial world model
of the scene I look at ...



Biological systems have evolved to process visual data
to extract
just enough

information to perform the
reasoning for everyday tasks that are part of survival.



Visual information is combined with
higher level
knowledge

and
other sensory modalities

that constrain
the reasoning in the solution space and finally makes
vision possible.

Visual cortex and a bit more …


Lower visual
cortex

Direct feedback projections to V1
originate from:


V2 (complex features)


V3 (orientation, motion, depth)


V4 (colour, attention)


MT (motion)


MST (motion)


FEF (saccades, spatial memory)


LIP (saccade planning)


IT (recognition)

Feedback from higher
cortical areas

Frontal cortex


V2, V4, FEF, IT


V1


Face


features


V1




Newborn kittens


Placed in a carousel


One active, other passively

towed along


Both receive same stimulation


The actively moving kitten

receives visual stimulation which

results from its own movements


Only the active kitten develops

sensory
-
motor coordination.


Held and Hein, 1963


Conventional Computer Vision
Architecture

Feature


Extraction

Input

Action

Classification,
Recognition, Disparity

Output

The Future
-

Biologically Motivated
Computer Vision Architecture

Hierarchical
processing

Input

More abstract features / symbols

Square
triangle

s

t

Feedback processing

Lateral
processing

Is there a square,
triangle or circle?

Feedforward

processing

Optical illusions

Other modalities

Biologically Motivated Computer Vision
Architectures in action

http://www.lira.dist.unige.it/babybotvideos.htm

Simple colour cues.
Foveated sensors.


Also:

Learnt arm control,
Learn how to act on
objects

Biologically Inspired features


Machine vision and biological vision systems
process similar information (visual scenes) and
perform similar tasks (recognition, targeting)



Not surprisingly the optimal features that are
extracted by many machine vision system look
surprising like those found in biology



But first ….

11

Why bother with feature extraction?


Why not use the actual image/video itself for
reasoning/analysis?






INVARIANCE!



The information we extract (i.e. the features) from
the ‘entity’ must be insensitive to changes.



The extracted features might be invariant to rotation
and scaling of objects in images, lighting conditions,
partial occlusions

What features should we extract?


Depends….


Modality (video/image/audio …)


Task (eg: topic categorisation/face recognition/
audio compression)


Dimensionality reduction / sparsification


Invariance vs descriptiveness



If the features are too descriptive
they can’t generalise to new
examples


If they generalise to much


everything looks just about
the same

As the feature we extract becomes more
complex/descriptive it will also become less invariant to
even minor changes in the entity that we are measuring.

Human visual pathway


Inspiration for feature extraction methodology









Circularly symmetric
retinal ganglion
receptive fields


Receptive field
: area in
the FOV in which
stimulation leads to a
response in the neuron


Orientated simple cell
cortical receptive fields

(similar to Gabor filter)

Gabor filter


A function f(t) can be decomposed into cosine
(even) and sine (odd) functions
. Good for
defining periodic structures.
Not localised
.


There is an
uncertainty relation

between a
signals specificity in time and frequency.


Dennis Gabor defined a family of signals that
optimised this trade
-
off


Enables us to extract local features


Daugman(1995) defined a 2D filter based on
the above which was called a Gabor filter


These filters resemble cortical simple cells

Gabor filter


Localise the sine and cosine functions using a
Gabor envelope.


Gaussian envelope

Modulating cosine

Modulating sine


Even symmetric cosine
Gabor wavelet


Odd symmetric sine
Gabor wavelet

Gaussian envelope

Assuming symmetric Gaussian envelope


In the Fourier domain the Gabor is a Gaussian
centred about the central frequency (U,V). The
orientation of the Gabor in the spatial domain is

σ

U,V

u

v

Spatial Frequency Bandwidth


Bandwidth at half power point




Bandwidth depends on symmetric

Gaussian envelope’s sigma. Large

sigma results in narrow bandwidth

at the Gabor filter exactly filters at its central frequency. Also due
to the uncertainty relation a narrow frequency bandwidth will result
in reduced spatial localisation by the filter.

Spatial


Spectral (Fourier)

frequency

Wide bandwidth

Narrow bandwidth


Odd symmetric sine
Gabor wavelet


Even symmetric cosine
Gabor wavelet

Spatial filter profile

Gabor filter with
asymmetric Gaussian


However the Gabor’s Gaussian envelope need not be

circular symmetric! An elliptical spatial Gaussian

envelope lets us control orientation bandwidth.


Better formulation for asymmetric Gaussian envelope


Spatial domain



along direction of wave propagation


Spectral domain



along direction of wave propagation

f
o
= central frequency

θ

= angle

γ

= sigma in direction of propagation

η

= sigma perpendicular to direction of propagation

Fourier domain

Bandwidth of Gabor with asymmetric
Gaussian

Half power points


Along direction of wave propagation,


Perpendicular to direction of wave propagation,


Spatial bandwidth perpendicular to
wave propagation

Spatial bandwidth in direction of wave
propagation

Orientation Bandwidth


Orientation bandwidth is related to the number of orientations we
want to extract. The half power points of the filters should
coincide in the spectral domain.


u

v


Orientation
bandwidth


Spatial frequency
bandwidth


Half
power

ω
o

Δθ

If the filter bank consists of
k

orientated filters, and redundancy in orientation sampling


l=r
θ

small
θ

Orientation Bandwidth

u

v


Orientation
bandwidth


Spatial frequency
bandwidth


Half
power

ω
o

Δθ

Spatial domain

Frequency domain

Filter bank

Hypercolumn


Experiments by Hubel and Weisel (1962,1968)


A set of orientation selective units over a common
patch of the FOV.


Organised as a vertical column in the visual cortex


In computational system use information in
hypercolumn for higher level reasoning





Feature
vector


Only using the
even symmetric

component in the
filter bank

Properties of the hypercolumn
feature vector


Invariance to rotation in image plane


Even symmetric detector

Hypercolumn responses

stimulation

Cycle to canonical orientation


Invariance to rotation in image plane

Cycle responses in feature vector

stimulation

Properties of the hypercolumn
feature vector


Invariance to scaling (i.e. spatial frequency)


central frequency

stimulation

Scale Invariance Feature Transform


Pandemonium model (Selfridge, 1959!)



Build ever more complex

/ abstract features along

the hierarchy



Aggregate hypercolumn

feature vectors to

complex feature

SIFT features

Hypercolumn features

Complex feature vector

Rotate hypercolumn
features to canonical of
large support region

Rotate descriptor
canonical of large
support region

Recognition


Extract SIFT features at corner locations (Harris corner
detector), and scale space peaks

Training

Recognition

Recap


Biologically motivated computer vision architecture



Feedforward, feedback, lateral processing in
architecture



Hierarchical processing



Feature extraction provides information about entities
which are (somewhat!) invariant to changes



Gabor filter



Hypercolumn feature vector.



SIFT features







The End