# COMPUTER VISION: SOME

Τεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 4 χρόνια και 7 μήνες)

74 εμφανίσεις

COMPUTER VISION: SOME
CLASSICAL PROBLEMS

MACHINE LEARNING LABORATORY

COMPUTER SCIENCE AND AUTOMATION

INDIAN INSTITUTE OF SCIENCE

June 24, 2013

WHAT IS COMPUTER VISION and WHY IS IT

DIFFICULT?

Computer Vision, obviously, aims to build computers that can
see
!

In other words, it deals with
analyzing/understanding images
and
videos

through computers

Aim of analysis is to
find known patterns

in images
-

Detection
, or

match images

with known patterns
-

Recognition

For analysis of image we first need a
representation

for it

An image is stored in a computer as a 2 or 3 dimensional matrix, each
element a pixel

A single pixel carries very little, if any, semantic information!!!
!

Representation with Features

For most applications of machine learning, the first and foremost step is to find
features

Features are used for
representation of the data

Features should be such that we can have a metric space for them
-

usually they are vectors

Very elaborate features (high
-
dimensional) need to be avoided for computational reasons

Feature Vector
-

Difficult to process

Smaller Feature

Vector

Representation

Dimensionality Reduction

Features for Computer Vision

Pixel values

can serve as features, but are often not very meaningful

Groups of pixels

can have more meaning
-

but how to form such groups??

Groups
-
of
-
pixels/sub
-
images

at large number of scales and positions

/edges

Various
Filter Outputs

have also been explored

Difficult to interpret semantically, but found to work well in certain
applications

Finding concise, semantically meaningful features still a very major issue in
Computer Vision

SIFT Interest Points

A
filter

is an operator which processes a signal and removes some
undesired components

Difference
-
of
-
Gaussian Filters

-

a popular filter for images

Positions of
local maxima

of this filter output are the

interest points

Some interest points, like those on the edges, are discarded

At each interest point, a feature vector is computed using
and their orientations

inside
small windows around the interest point

This feature is invariant to orientation and scale of the image

SIFT: Scale
-
Invariant Feature Transform

SIFT INTEREST POINTS

FACE DETECTION
-
PROBLEM

Given an image, find the faces in it.

Used in many places like digital cameras and photo sharing albums,

Given a rectangular region in an image, say if it is a face or not!

Repeat this process for every location and every size of the rectangular
region

FACE DETECTION
-
GENERAL
APPROACH

Basically a
binary classification

problem

Requires building
model for face

Needs training samples
-

both positive and negative

Positive samples are face images, negative samples are non
-
face images

FACE images

NON
-
FACE images

FACE DETECTION
-
GENERAL
APPROACH

Basically a binary classification problem

Requires building model for face

Needs training samples
-

both positive and negative

Positive samples are face images, negative samples are non
-
face images

Learning algorithm

finds
boundary

between face and non
-
face images

FACE images

NON
-
FACE images

FACE DETECTION
-
GENERAL
APPROACH

Basically a binary classification problem

Requires building model for face

Needs training samples
-

both positive and negative

Positive samples are face images, negative samples are non
-
face images

Learning algorithm

finds
boundary

between face and non
-
face images

FACE images

NON
-
FACE images

Candidate

FACE DETECTION
-

BENCHMARK
and EVALUATION

Standard face
-
detection benchmark datasets available

FDDB: Face Detection dataset for
unconstrained setting

Performance usually measured using
Precision

and
Recall

Precision
: Of the reported face detections, how many were actually faces?

Recall
: Of the faces actually present, how many were detected?

F
-
score: Harmonic mean of precision and recall

FACE RECOGNITION
-
PROBLEM

Consists of a training phase and a testing phase

In the
training phase

we are given many face images, each marked with the
identity

of the person

In the
testing phase
, we are given a new face image, belonging to one of
these persons

find out the identity

of the person

This is a simple
Classification

problem in Machine Learning

First suitable features and representations have to be found

FACE RECOGNITION
-
PROBLEM

One approach is to build a
model for each person
, using the training
images provided for him

Second approach is to
compare the test image

to each of the training
images, and find the
closest match

It may be observed that
not every part of face image helps in
recognition
-

certain things about faces are common to everyone

A good strategy is to find the features that are most
distinctive

and
represent images only by them

Eigenfaces (1991) uses the last two strategies

Recognition accuracy is the obvious evaluation criteria

A good recognition algorithm should work well with less number of training
images

FACE RECOGNITION
-
CURRENT
STATUS

Face recognition has traditionally been done with
well
-
cropped, focussed

face images
-

Controlled Environment

Considered a
solved problem
.

Nowadays face recognition is being revisited for
semi
-
controlled or
uncontrolled environments
.

LFW (Labelled Faces in Wild)

-

a dataset of face images taken in such
settings
-

a new benchmark

OBJECT RECOGNITION
-
PROBLEM

Practically much more complex

Large number of images given from many object categories

Classify a test image into one of these categories

intra
-
class variations

OBJECT RECOGNITION
-
GENERAL APPROACH

Once again the idea is to build models for different objects

No single feature may be enough for classification

Some objects may have a distinctive color, others may have a distinctive
shape

Multiple Kernel Learning

-

a sophisticated machine learning formulation,
generally considered the best approach for this problem

Caltech
-
101: a dataset of 101 object categories

Close to 80 % accuracy obtained by Multiple Kernel Learning

Caltech
-
256: a dataset of 256 object categories
-

Accuracy of 50 %
considered good!

Intra
-
class variations continue to pose significant challenge and even
scepticism
-

is it at all a valid problem???

OBJECT DETECTION

Given an image find all the birds, trees, and cars in it!

Requires building models for each of these objects

Once again search entire image at
multiple positions and scales

Part
-
based Models

of objects considered efficient

Instead of modelling whole object, model different parts separately

Helps to handle
occlusion

and perhaps
intra
-
class variations

IMAGE SEGMENTATION

Given an image, divide it such that each segment contains an object

Basically a
clustering

problem

Does not require features and is done purely with pixel values

Has inspired advanced clustering techniques like spectral clustering

Graph
-
based method
-

models image as graph with each
pixel
representing a node

and

Each edge is given a
weight

according to
similarilty

of the corresponding
pixel values

Requires number of segments to be specified

IMAGE SEGMENTATION

Segmentation evaluated with respect to a gold standard segmentation

Every pair of pixels coming in the
same segment in the gold standard

should also be in
same segment in the segmentation

(and similarly for each pair of pixels coming in
different segments
)

Video Problems

Videos are
collections of images

taken over an interval of time
-

successive images are quite similar

Having to handle several images rather than one may make video problems
tougher

But the
temporal continuity

of videos provides a way out

Joint modelling

of multiple similar images can, in fact, give better
performance than modelling single image

motion
-
based features

like optical flow can be
used

Concept of Interest
-
points for images is extended to
Space
-
Time Interest
Points

for videos

Face Recognition, Face Detection etc can also be done in videos, often
more effectively than in images

OBJECT TRACKING
-
PROBLEM

Given a video which shows a person/object moving

Need to find it in each frame

Naive approach
-

reduce it to object detection problem

If object is at position (x, y) in frame t, it will be very close in frame (t + 1)

So if we know the position in time t, we need to search only
around that
same position

Reduces search space greatly!!

Main idea is to build an
appearance model

for the object

The appearance may change over time due to variations in size, illumination,
viewpoint etc

The appearance model must be
-

and
recomputed

throughout the
video

OBJECT TRACKING
-

BENCHMARK and EVALUATION

Performance measured with respect to gold standard, where in each frame a
bounding box is provided

Proportion of overlapping areas of the gold standard and reported bounding
boxes

OBJECT TRACKING
-
CURRENT
STATUS

Considered a solved problem under controlled illumination and background

Current research aims to handle
occlusion

of the object, and
sudden
changes in background and illumination

Tracking
multiple objects at the same time

is another important problem

Tracking is a
real
-
time application
. Efforts are on to process as many
frames as possible per second

-

remains the fundamental problem in vision.

A single miss can make the whole tracking go wrong.

Detection and correction of miss is an important problem to solve

ACTION RECOGNITION IN
VIDEOS

Surveillance cameras are nowadays available at many sensitive public
locations

The aim is to record activities of people

Requires use of
dynamic features
, which make use of the motion in videos

Some image
-
based features can be extended to videos, like
space
-
time
interest points

These can be used by viewing the video as a
space
-
time volume

The features can also be in the form of time
-
series

ACTION RECOGNITION IN
VIDEOS

In presenece of a benign background, static camera and a single actor, the
problem is considered solved

Current research aims to handle complex environments, like crowded
places, where the persons frequently get
occluded

Multi
-
person interaction recognition is another recent branchout of the
problem