Slide 1

bijoufriesAI and Robotics

Oct 19, 2013 (3 years and 5 months ago)

62 views

Scene Classification:

Computational and
Cognitive Approaches

Hamed Kiani

OUTLINE

Introduction

Conclusion

Background

Scene
Classification

2

Introduction


Scene



“A

semantically

coherent

human
-
scaled

view

of

a

real
-
world

environment

comprising

background

elements

and

multiple

discrete

objects

arranged

in

a

spatially

related

layout
.


[Henderson]

3

Introduction (
cnt
.)


Scene Classification Problem


2D image


Class Label (where?)

4

S.C

Outdoor, City, Man
-
made

Outdoor, Mountain, Natural

Indoor, Office, Man
-
made

Introduction (
cnt
.)



5

OUTLINE

Introduction

Conclusion

Background

Scene
Classification


6

a)
Computational Vision:


Feature Level



b)
Scene Perception



Background (
Cnt
.)


Features (Level of Information):



Low level Features


Contextual Level Feature
s

7

Background (
Cnt
.)


Low Level Features


Color:


RGB,
LAB, LUV, HSV (HSL),
YCrCb

and the hue
-
min
-
max
-
difference (HMMD) [Liu et
al],


Color
-
covariance
matrix, color histogram, color
moments,
etc.


T
exture:


To describe
the content of many natural
images:
fruit
skin, clouds, trees, bricks, and
fabric.



8

Background (
Cnt
.)


Low Level Features



Edge:


Edge histogram descriptor (EHD), SIFT, HOG


Man
-
made objects


Shape:


Aspect
ratio, circularity, Fourier descriptors,
moment invariants,
object boundary, etc
.



9

Background (
Cnt
.)


Contextual Level Features



Context: “Any information that may
influence the way a scene and the objects
within it are perceived” [
Strat
].


Why Contextual level features?




“Semantic Gap”




10

Background (
Cnt
.)


Feature Extraction: Contextual Level
Features




Semantic Gap” : limited description of
primitive image features and the richness of
human semantics [Chen et al.].



How bridge the “Semantic Gap”?


By representative high level features using
different source of contexts in image.

11

Background (
Cnt
.)


Contextual Level Features



Local Context


2D Scene Gist


Semantic Context


12

Background (
Cnt
.)


Contextual Level Features


Local
Context


Any context represented by:



Object boundary,


Object
shape/contour
models,


Code
words
(bag
-
of
-
feature
, visual
codes)


13

Background (
Cnt
.)


Contextual Level Features


2D Scene
Gist


Global
statistics of an image to capture
the “gist” or “concept frame” [
Oliva

and
Torralba
].


14

Background (
Cnt
.)


Contextual Level Features


Semantic
Context



Event
, activity,


Sub concept,


Presence
and location (spatial context) of
objects, parts and region [
Galleguillos

et al
.].

15

Background (
Cnt
.)


Human Scene Perception



How does human brain perceive the real
world’s scenes?




Object
-
centered
or
Scene
-
centered
representation?

16

Background (
Cnt
.)


Human Scene Perception


Object
centered representation



Scene is represented by
a set of objects
and
parts
as the atoms

(basic elements) [Fergus et al.].


17

Background (
Cnt
.)


Human Scene Perception


Object
centered representation





18

Background (
Cnt
.)


Human Scene Perception


Object
centered representation





Why not object centered?



Human’ brain realizes scene image very
rapidly (70 ms), even in presence of blurring
[
Biederman
]



19

Background (
Cnt
.)


Human Scene Perception


Scene
centered representation



Scene is represented by global information,
“schemas”, “gist”, based on the
overall
spatial organization

of objects in early stage
of perception:


Low
-
frequency spatial information


Diagnostic object (Man
-
made/Indoor images)


Color as key characteristic (Natural images)


20

OUTLINE

Introduction

Conclusion

Background

Scene
Classification


21

a)
S.C based on Computational
Vision

b)
S.C based on Visual Cognition



Literature Review (
Cnt
.)


Scene Classification (S.C) based on
Computational Vision



Local Scale Classification


Global Scale Classification


Multimodal Classification Systems


22

Literature Review (
Cnt
.)


S.C based on Computational Vision:
Local scale classification



S.C is performed using features which
extracted from
sub image elements

such as
super pixels, blocks, code words (bag
-
of
-
features), objects, blobs, parts and regions.


23

Literature Review (
Cnt
.)


S.C based on Computational Vision:
Local scale classification

Bosch et al. [Bosch et al.]:



Discovering the objects (grass, buildings, roads,
etc.) in each image,



Representing by visual words (color, texture,
orientation),


Using the distribution of visual words to perform
scene classification (probabilistic Latent Semantic
Analysis (
pLSA
).


24

Literature Review (
Cnt
.)

Bosch et al. [Bosch et al., 2006]:


25

Literature Review (
Cnt
.)


S.C based on Computational Vision:
Local scale classification

Vogel &
Schiele

[Vogel and
Schiele
]:



Dividing images into a grid of 10x10 local blocks,


Classifying each blocks is into one of nine local
-
concept classes (sky, water, grass, trunks, foliage,
field, rocks, flowers, and sand.)


Calculating the occurrence vector of local concepts
for each image


Using the occurrence vectors for learning and
image categorization.


26

Literature Review (
Cnt
.)

Vogel &
Schiele

[Vogel and
Schiele
, 2007]



27

Literature Review (
Cnt
.)


S.C based on Computational Vision:
Local scale classification

Carson et al. [Carson et al.]:



Representing the image as a combination of fine to
coarse blobs:
Blobworld

(texture, color)


Classifying based on similarity between training
Blobwords

and given input query.

28

Literature Review (
Cnt
.)


S.C based on Computational Vision:
Local scale classification

Carson et al. [Carson et al.]

29

Literature Review (
Cnt
.)


S.C based on Computational Vision:
Global scale classification



S.C is performed using features from the
global configuration,


Ignoring the details about local concepts,
and object information.


30

Literature Review (
Cnt
.)


S.C based on Computational Vision:
Global scale classification


Renninger

&
Malik

[
Renninger

and
Malik
]


Representing image’ textures as a vocabulary
of distinctive patterns


Encoding pattern’s vocabulary by
Texton


Constructing global representation of image by
frequency histogram of
Texton


Classifying an input query by
χ
2 similarity
integrated with k
-
NN





31

Literature Review (
Cnt
.)

Renninger

&
Malik

[
Renninger

and
Malik
]





32

Literature Review (
Cnt
.)


S.C based on Computational Vision:
Global scale classification

Vailaya

et al. [
Vailaya

et al.]

City vs. landscape classification


Representing images globally by a set of salient
features based on color (histogram, coherence
vectors), texture (moments of the DCT coefficients)
and edge (direction histogram and direction
coherence vectors)


Classifying an input query using a k
-
NN classifier
on the five low level features



33

Literature Review (
Cnt
.)


S.C based on Computational Vision:
Multimodal systems


Integrating the evidence presented by
multiple sources of information:



Features level,


Sub tasks level,


Classifiers level.

34

Literature Review (
Cnt
.)


S.C based on Computational Vision:
Multimodal systems

Boutell

and
Luo

[
Boutell

and
Luo
, 2004]

Integrating
low level feature
(
color histograms and
wavelet (texture) features
)
+

camera metadata
information
(
exposure time, flash fired, and subject
distance) using Bayesian classifier for indoor vs.
outdoor scenes classification


35

Literature Review (
Cnt
.)


S.C based on Computational Vision:
Multimodal systems


Sub task level:


Integration of a set of computational vision
tasks such as occlusion reasoning, surface
orientation estimation, object recognition,
segmentation and scene categorization.

36

Literature Review (
Cnt
.)


S.C based on Computational Vision:
Multimodal systems


Heitz

et al. [
Heitz

at el]

Cascaded Classification Models (CCMs):
integrate scene classification, object
detection, multi
-
class segmentation, and 3D
reconstruction to improve performance on
some or all tasks.

37

Literature Review (
Cnt
.)


S.C based on Computational Vision:
Multimodal systems

Heitz

et al. [
Heitz

at el]

38

Literature Review (
Cnt
.)


S.C based on Computational Vision:
Multimodal systems


Li et al. [Li et al.]

Feedback Enabled Cascaded Classification
Models (FE
-
CCM)


Integrating scene classification, depth
estimation, event categorization and
saliency detection, object detection and
geometric labeling.

39

Literature Review (
Cnt
.)


S.C based on Computational Vision:
Multimodal systems

Li et al. [Li et al.]
(FE
-
CCM)

40

Literature Review (
Cnt
.)


Scene Classification based on Visual
Cognition:


How does human perceive surrounding
scenes?


Which are relevant scene categories for
humans?


Which image features are possibly evaluated
by humans?


How can weak and strong objects affect the
accuracy of scene perception?


How the visual cognition can be modeled
for scene classification?

41

Literature Review (
Cnt
.)


Scene Classification based on Visual
Cognition:

Determining semantic categories of photographs
[
Rogowitz

et al.]:

-
database of 97 images

-
categorize images along two main axes: man
-
made vs. natural and human vs. non
-
human, 4
main categories and 20 subcategories

-
Color/edge/boundary/lines role in classification:
natural vs. man
-
made


42

Literature Review (
Cnt
.)


Scene Classification based on Visual
Cognition:

which image features generally used by human to
perform scene recognition [
McCotter

et al.]


experiments on eight scene categories
(highway, street,…)


phase spectra of the scene category


category
-
specific diagnostic regions in the
phase spectra

43

OUTLINE

Introduction

Summary

Background

Scene
Classification


44

a)
Summary/ Possible
Research Topics

b)
Conclusion




Summary and Conclusion (Cont.)


Summary

45

Summary and Conclusion (
Cnt
.)


Possible Research Topics


Cognitive Scene Classification




Integrating visual cognition findings with
computational vision/machine learning
techniques,


Providing a platform to model scene, space,
context and relation inspired by human
scene perception among different element of
a scene.

46

Summary and Conclusion (
Cnt
.)


Possible Research Topics


Multimodal Scene Classification



Integrating different source of knowledge
provided by different types of features,
classifiers and vision tasks


Overcoming some of the limitations caused
by
uni
-
modal classification systems.

47

Summary and Conclusion (
Cnt
.)


Possible Research Topics


Contextual Scene Classification



Modeling more discriminative and meaningful
representation of concept


Proposing a comprehensive model of
scenes/concepts/objects/parts/regions to
convey different source of context extracted
from image.

48

Summary and Conclusion


Scene classification


Bridging “semantic gap”, from low
level to contextual level features


Classification scale (local vs. global)


Computational vision vs. Cognition
based scene classification


49

References

[Henderson and
Hollingworth
, 1999b]. “High
-
level scene perception”. Annual
Review of Psychology, vol. 50, pp. 243
-
271, 1999.

[Ballard and Brown, 1982] D. H. Ballard and C. M. Brown, Computer Vision,
Prentice
-
Hall, Englewood Cliffs, NJ, 1982. Liu et al., 2004a.

[
Strat
, 1993] T. M.
Strat
, “Employing contextual information in computer vision”,
In Proc. of ARPA Image Understanding Workshop, 1993. Chen et al., 2003.

[
Oliva

and
Torralba
, 2001] A.
Oliva

and A.
Torralba
, “Modeling the shape of the
scene: A holistic representation of the spatial envelope”, International Journal of
Computer Vision, vol. 42, pp. 145
-
175, 2001.

[
Galleguillos

et al., 2008] C.
Galleguillos
, A.
Rabinovich
, and S.
Belongie
, “Object
categorization using co
-
occurrence, location and appearance”. In Proc. of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 1
-
8, 2008.
Paek

and Chang, 2000.


50

References

[
Szummer

and Picard, 1998] M.
Szummer
, R.W. Picard, “Indoor

outdoor image
classification”, In Proc. of the IEEE International Workshop on Content
-
based
Access of Image and Video Databases, in conjunction with ICCV’98, pp. 42
-
50,
1998.

[
Biederman
, 1972] I,
Biederman
, “Perceiving real
-
world scenes”, Science, vol.
177, pp. 77
-
80, 1972.


[Bosch et al., 2006] A. Bosch, A.
Zisserman
, X. Munoz, “Scene classification via
pLSA
”, In Proc. of the European Conference on Computer Vision, vol. 4, pp. 517
-
530, 2006. Vogel and
Schiele
, 2007.

[Carson et al., 1999] C. Carson, M. Thomas, S.
Belongie
, J.
Hellerstein

and J.
Malik
,

Blobworld
: A system for region
-
based image indexing and retrieval”, In Third Int.
Conf. on Visual Information Systems, Springer
-
Verlag
, 1999.

[
Renninger

and
Malik
, 2004] L. W.
Renninger

and J.
Malik
, “When is scene
identification just texture recognition?”, Vision Research, vol. 44, pp. 2301
-
2311, 2004.


51

References

[
Vailaya

et al., 1998] A.
Vailaya
, A. Jain, and H. Zhang, “On image classification:
City vs. landscape”, Pattern Recognition, vol. 31(12), pp. 1921
-
1935, 1998.

[
Heitz

at el, 2008] G.
Heitz
, S. Gould, A.
Saxena
, and D.
Koller
, “Cascaded
classification models: Combining models for holistic scene understanding”, In
Proc. of the Neural Information Processing Systems (NIPS), 2008.

[Li et al., 2010] C. Li, A.
Kowdle
, A.
Saxena
, and T. Chen, “Towards Holistic Scene
Understanding: Feedback Enabled Cascaded Classification Models”, In Proc. of
the Neural Information Processing Systems (NIPS), 2010.

[
Rogowitz

et al., 1997] B.
Rogowitz
, T.
Frese
, J. Smith, C.
Bouman
, and E.
Kalin
,
“Perceptual image similarity experiments”, In SPIE Conference on Human Vision
and Electronic Imaging, pp. 576
-
590, 1997.

[
McCotter

et al., 2004] M.
McCotter
, F.
Gosselin
, P.
Sowden
, and P.
Schyns
, “The
use of visual information in natural scenes”, Visual Cognition, vol. 12(6), pp. 938
-
953, 2004.


52

Q&A

53