Machine Learning in Computer Vision - Free

achoohomelessAI and Robotics

Oct 14, 2013 (4 years and 9 months ago)

972 views

Machine Learning in Computer
Vision
b
y
N. SEBE
Universit
y
o
f
Amsterdam,
The
N
etherlan
d
s
IRACOHEN
ASHUTOSH GARG
an
d
THOMAS S. HUANG
Universit
y
o
f
Illinois at Urbana-Champai
g
n,
H
P Research Labs, U.S.A.
Goog
l
e Inc., U.S.A
.
Urbana, IL, U.S.A.
A
C.I.P. Cata
l
ogue recor
d
for t
hi
s
b
oo
k

i
s ava
il
a
bl
e from t
h
e L
ib
rary of Congress.
P
u
bli
s
h
e
d

b
y Spr
i
nger
,
P
.O. Box 17, 3300 AA Dor
d
rec
h
t, T
h
e Net
h
er
l
an
d
s
.
P
rinted on acid-
f
ree pape
r
All
R
i
g
h
ts Reserve
d
©
2005 Spr
i
nger
N
o part of t
hi
s wor
k
may
b
e repro
d
uce
d
, store
d

i
n a retr
i
eva
l
system, or transm
i
tte
d
i
n any form or
b
y any means, e
l
ectron
i
c, mec
h
an
i
ca
l
, p
h
otocopy
i
ng, m
i
crof
il
m
i
ng
,
recor
di
ng
o
r ot
h
erw
i
se, w
i
t
h
out wr
i
tten
p
erm
i
ss
i
on from t
h
e Pu
bli
s
h
er, w
i
t
h
t
h
e
exce
p
tio
n
o
f an
y
material supplied specificall
y
for the purpose of bein
g
entere
d
a
nd executed on a computer s
y
stem, for exclusive use b
y
the purchaser of the work.
P
rint
ed
in th
e
N
e
th
e
rlan
ds.
I
SBN-10 1-4020-3274-9 (HB) Springer Dordrecht, Berlin, Heidelberg, New York
I
SBN-10 1-4020-3275-7 (e-book) Springer Dordrecht, Berlin, Heidelberg, New York
I
SBN-13 978-1-4020-3274-5 (HB) Sprin
g
er Dordrecht, Berlin, Heidelber
g
, New York
I
SBN-13 978-1-4020-3275-2 (e-book) Springer Dordrecht, Berlin, Heidelberg, New York
To m
y
parent
s
N
icu
To Mera
v
and Yonatan
I
ra
T
o m
y
parent
s
A
sutosh
To my students
:
P
ast,present,an
d f
u t u r e
To
m
Contents
Foreword xi
Pre
f
ace x
iii
1
.INTR
O
D
UC
TI
O
N
1
1
Researc
h
Issues on Learn
i
ng
i
n Computer V
i
s
i
on 2
2 Overview of the Book
6
3 C
o n t r i b u t i o n s 1 2
2.T H E O R Y
:
P R O B A B I L I S T I C C L A S S I F I E R S1
5
1
I n t r o d u c t i o n 1 5
2 P r e
l i
m
i
n a r
i
e s a n
d
N o t a t
i
o n s 1
8
2
.1 M a x
i
m u m L
i k
e
l i h
o o
d Cl
a s s
i fi
c a t
i
o n 1
8
2
.2 I n
f
o r ma t
i
o n T
h
e o r y 1
9
2
.3 I n e q u a
l i
t
i
e s 2 0
3 B a y e s O p t i ma l E r r o r a n d E n t r o p y 2
0
4 A n a l
y
s i s o f C l a s s i fi c a t i o n E r r o r o f E s t i ma t e d (
M
i
s
m a t c
h
e
d
)
D i
s t r
i b
u t
i
o n 2
7
4
.1 H
y
p o t h e s i s Te s t i n
g
F r a me wo r k 2
8
4
.2 C l a s s i fi c a t i o n F r a me wo r k 3 0
5
D e n s i t
y
o f D i s t r i b u t i o n s 3
1
5
.1 D i s t r i b u t i o n a l D e n s i t y 3
3
5
.2 R e l a t i n g t o C l a s s i fi c a t i o n E r r o r 3
7
6
C o mp l e x P r o b a b i l i s t i c Mo d e l s a n d S ma l l S a mp l e E f f e c t s 4
0
7
S
u mma r
y 41
vi
MAC
HINE LE
A
RNIN
G
IN
CO
MP
U
TER
V
I
S
I
ON
3
.THEORY
:
G
ENERALIZATION BOUNDS 4
5
1
Introduction 4
5
2 Pr
e
limin
a
ri
es
4
7
3 A Mar
g
in Distribution Based Bound 49
3
.1 Prov
i
ng t
h
e Marg
i
n D
i
str
ib
ut
i
on Boun
d
4
9
4
Analysis
57
4.1 Comparison with Existing Bounds
59
5
Summar
y
6
4
4.THEORY
:
SEMI-SUPERVISED LEARNING 6
5
1
Introduction 6
5
2 Pro
p
erties of Classification 6
7
3 Existin
g
Literature 68
4
Sem
i
-superv
i
se
d
Learn
i
ng Us
i
ng Max
i
mum L
ik
e
lih
oo
d
Est
i
mat
i
on 7
0
5
As
y
mptotic Properties of Maximum Likelihood Estimatio
n
with Labeled and Unlabeled Data 73
5.1 Model Is Correct 7
6
5
.2 Model Is Incorrect 7
7
5.3 Examples:Unlabeled Data De
g
radin
g
Performanc
e
with Discrete and Continuous Variables 80
5.4 Generatin
g
Examples:Performance De
g
radation with
U
nivariate Distributions 8
3
5.5 Distribution of As
y
mptotic Classification Error Bias 8
6
5.6 Short Summar
y
8
8
6 Learning with Finite Data 9
0
6
.1 Ex
p
eriments with Artificial Data 9
1
6
.2 Can Unlabeled Data Hel
p
with Incorrect Models
?
B
i
as vs.Var
i
ance E
ff
ects an
d
t
h
e La
b
e
l
e
d
-un
l
a
b
e
l
e
d
G
ra
ph
s 92
6
.3 Detecting When Unlabeled Data Do Not Change th
e
Est
i
mates
97
6
.4 Using Unlabeled Data to Detect Incorrect Modelin
g
Assum
p
t
i
ons 9
9
7 Conc
l
u
di
ng Remar
k
s 10
0
C
ontent
s
v
ii
5
.ALGORITHM:
MAXIMUMLIKELIHOOD MINIMUMENTROPY HMM 10
3
1
Prev
i
ous Wor
k
10
3
2 Mutua
l
In
f
ormat
i
on,Bayes Opt
i
ma
l
Error,Entropy,an
d
Conditional Probability 10
5
3 Max
i
mumMutua
l
In
f
ormat
i
on HMMs 10
7
3
.1 D
i
screte Max
i
mumMutua
l
In
f
ormat
i
on HMMs 1
08
3
.2
C
ontinuous MaximumMutual Information HMMs 11
0
3
.3 Unsupervised Case 11
1
4 Di
scuss
i
o
n 11
1
4
.1 Convex
i
ty 11
1
4
.2 Convergence 112
4
.3 Maximum A-
p
osteriori View of Maximum Mutual
Inf
o
rm
at
i
o
n HMM
s
11
2
5
Ex
p
erimental Results 115
5.1 S
y
nthetic Discrete Supervised Data 11
5
5
.2 Speaker Detection 11
5
5
.3 Protein Data 117
5
.4 Real-time Emotion Data 117
6
Summary 11
7
6
.AL
GO
RITHM:
MARGIN DISTRIBUTION OPTIMIZATION 11
9
1
Intro
d
uct
i
on 11
9
2 A Mar
gi
n D
i
str
ib
ut
i
on Base
d
Boun
d
12
0
3
Ex
i
st
i
n
g
Learn
i
n
g
A
lg
or
i
t
h
ms 12
1
4 The Mar
g
in Distribution Optimization (MDO) Al
g
orithm 125
4
.1 Comparison with SVMand Boostin
g
12
6
4
.2 Com
p
utational Issues 126
5
Ex
p
erimental Evaluation 12
7
6 C
o n c l u s i o n s 1 2
8
7
.AL
GO
RITHM:
LEARNIN
G
THE
S
TRU
C
TURE
O
F BAYE
S
IAN
NETW
O
RK
C
LA
SS
IFIER
S 129
1
Introduction 12
9
2 Ba
y
esian Network Classifiers 13
0
2
.1 Na
i
ve Bayes C
l
ass
ifi
ers 132
2
.2 Tree-Augmente
d
Na
i
ve Bayes C
l
ass
ifi
ers 13
3
viii
MA
CHINE LE
A
RNING IN COMPUTER VISIO
N
3 Sw
i
tc
hi
ng
b
etween Mo
d
e
l
s:Na
i
ve Bayes an
d
TAN C
l
ass
ifi
ers 138
4
Learnin
g
the Structure of Ba
y
esian Network Classifiers
:
Existin
g
Approaches 14
0
4
.1 Inde
p
endence-based Methods 140
4
.2 Likelihood and Ba
y
esian Score-based Methods 142
5
Classification Driven Stochastic Structure Search 143
5
.1 Stochastic Structure Search Algorithm 14
3
5
.2 Addin
g
VC Bound Factor to the Empirical Error
Measure 14
5
6 Ex
p
eriments 14
6
6
.1 Results
w
ith Labeled Data 14
6
6
.2 Results
w
ith Labeled and
U
nlabeled Data 147
7 Should Unlabeled Data Be Weighed Differently?1
50
8
Active Learnin
g
15
1
9 Concludin
g
Remarks 15
3
8
.APPLI
C
ATI
O
N:
OFFICE ACTIVITY RECOGNITION 15
7
1
Context-Sensitive S
y
stems 15
7
2 Towards Tractable and Robust Context Sensing 1
5
9
3 Layered Hidden Markov Models (LHMMs) 1
60
3.1 Approaches 1
6
1
3.2 Decomposition per Temporal Granularit
y
16
2
4 Im
p
lementation of SEER 16
4
4
.1 Feature Extraction and Selection in SEER 1
6
4
4
.2 Architecture of SEER 16
5
4
.3 Learning in SEER 1
66
4
.4
C
lassification in
S
EER 1
66
5
Ex
p
eriments 16
6
5.1 Discussion 16
9
6
Related Representations 17
0
7
S
ummar
y 1
7
2
9
.A P P L I C A T I O N:
M U L T I M O D A L E V E N T D E T E C T I O N1 7
5
1
F u s i o n M o d e l s:A R e v i e w1 7
6
2 A Hi
e
r
a
r
c
hi
ca
l F
us
i
o
n M
ode
l 17
7
2
.1 Wor
ki
ng o
f
t
h
e Mo
d
e
l
178
2
.2 T
h
e Durat
i
on De
p
en
d
ent In
p
ut Out
p
ut Mar
k
ov Mo
d
e
l
17
9
C
ontents
i
x
3
Experimental Setup,Features,and Results 18
2
4 S
u m m a r
y 1 8 3
1 0
.A P P L I
C
ATI
O
N
:
F
A
C
IAL EXPRE
SS
I
O
N RE
COG
NITI
O
N
18
7
1 Introduction 1
8
7
2
Human Emot
i
on Researc
h
18
9
2.1 A
ff
ect
i
ve Human-com
p
uter Interact
i
on 189
2.2 T
h
eor
i
es o
f
Emot
i
on 1
90
2.3 Fac
i
a
l
Express
i
on Recogn
i
t
i
on Stu
di
es 19
2
3
Fac
i
a
l
Express
i
on Recogn
i
t
i
on System 197
3.1 Face Trackin
g
and Feature Extraction 19
7
3.2 Bayesian Network Classifiers:Learning the
“Structure” of the Facial Features 20
0
4
Experimental Anal
y
sis 201
4.1 Ex
p
erimental Results with Labeled Data 20
4
4.1.1 Person-dependent Tests 205
4.1.2 Person-inde
p
endent Tests 20
6
4.2 Ex
p
er
i
ments w
i
t
h
La
b
e
l
e
d
an
d
Un
l
a
b
e
l
e
d
Data 20
7
5
Discussion 208
11
.APPLI
C
ATI
O
N
:
B
AYE
S
IAN NETW
O
RK
C
LA
SS
IFIER
S
F
O
R FA
C
E DETE
C
TI
O
N
211
1 In
t
r
oductio
n 211
2
Re
l
ate
d
Wor
k
213
3 Appl
y
in
g
Ba
y
esian Network Classifiers to Face Detection 217
4
Ex
p
eriments 218
5
Discussion 22
2
R
eferences 22
5
I
n
d
ex 23
7
Foreword
It starte
d
w
i
t
h
i
ma
g
e process
i
n
g
i
n t
h
es
i
xt
i
es.Bac
k
t
h
en,
i
t too
k
ages to
di
g
itize a Landsat ima
g
e and then process it with a mainframe computer.Pro
-
c
ess
i
ng was
i
nsp
i
re
d
on t
h
e ac
hi
evements o
f
s
i
gna
l
process
i
ng an
d
was st
ill
ver
y
much oriented towards pro
g
rammin
g.
In the seventies
,
image analysi
s
spun off combinin
g
ima
g
e measurement
wi
t
h
stat
i
st
i
ca
l
pattern recogn
i
t
i
on.S
l
ow
l
y,computat
i
ona
l
met
h
o
d
s
d
etac
h
e
d
themselves fromthe sensor and the
g
oal to become more
g
enerall
y
applicable.
In t
h
e e
i
g
h
t
i
es,mo
d
e
l
-
d
r
i
ve
n
c
omputer v
i
s
i
o
n
or
i
g
i
nate
d
w
h
en art
ifi
c
i
a
l i
n -
t e l l i
g
e n c e a n d
g
e o m e t r i c m o d e l l i n
g
c a m e t o
g
e t h e r w i t h i m a
g
e a n a l
y
s i s c o m p o
-
n
e n t s.T
h
e e m p
h
a s
i
s w a s o n p r e c
i
s e a n a
l
y s
i
s w
i
t
h li
t t
l
e or no
i
nteract
i
on,st
ill
ver
y
much an art evaluated b
y
visual appeal.The main bottleneck was in th
e
amount of data using an average of
5
to
5
0 pictures to illustrate the point
.
A
t the be
g
innin
g
of the nineties,vision became available to man
y
with th
e
a
d
vent o
f
su
ffi
c
i
ent
l
y
f
ast PCs.T
h
e Internet revea
l
e
d
t
h
e
i
nterest o
f
t
h
e gen
-
e
ra
l
pu
bli
c
i
m
i
mages,eventua
ll
y
i
ntro
d
uc
i
n
g
c
ontent-
b
ase
d
ima
g
e retrieva
l
.
Combinin
g
independent (informal) archives,as the web is,ur
g
es for interac
-
t
i
ve eva
l
uat
i
on o
f
approx
i
mate resu
l
ts an
d h
e n c e w e a
k
a
l
g o r
i
t
h
m s a n
d
t
h
e
i r
c
o m b i n a t i o n i n
w
e a k c l a s s i fi e r s
.
I n t
h
e n e w c e n t u r y,t
h
e
l
a s t a n a
l
o g
b
a s t
i
o n w a s t a
k
e n.I n a
f
e w y e a r s,s e n
-
s o r s h a v e b e c o m e a l l d i
g
i t a l.A r c h i v e s w i l l s o o n f o l l o w.A s a c o n s e q u e n c
e
o f
t
h i
s c
h
ange
i
n t
h
e
b
as
i
c con
di
t
i
ons
d
atasets w
ill
over

ow.Computer v
i
s
i
o
n
will spin off a new branch to be called somethin
g
lik
e
archive-based
o
r se-
mant
i
c v
i
s
i
o
n
i
nc
l
u
di
ng a ro
l
e
f
or
f
orma
l k
n o w
l
e
d
g e
d
e s c r
i
p t
i
o n
i
n a n o n t o
l
o g y
e q
u i
p p
e d wi t h d e t e c t o r s.An a l t e r n a t i v e v i e w i s
e
x p e r i e n c e - b a s e d
o
r c o g n i t i v
e
v i s i o n.T
hi
s
i
s most
l
y a
d
ata-
d
r
i
ven v
i
ewon v
i
s
i
on an
d i
n c
l
u
d
e s t
h
e e
l
ementar
y
l
aws of ima
g
e formation.
T
hi
s
b
oo
k
comes r
i
g
h
t on t
i
me.T
h
e genera
l
tren
d i
s e a s y t o s e e.T
h
e me t
h
-
o
d s o f c o mp u t a t i o n w e n t f r o m d e d i c a t e d t o o n e s p e c i fi c t a s k t o mo r e
g
e n e r a l l
y
a p p
l i
c a
b l
e
b
u
i l d i
n g
b l
o c
k
s,
f
r o m
d
e t a
i l
e
d
a t t e n t
i
o n t o o n e a s p e c t
l i k
e
fi l
t e r
i
n
g
xii
F
O
REW
O
R
D
to a broad variet
y
of topics,from a detailed model desi
g
n evaluated a
g
ainst
a
f
ew
d
ata to a
b
stract ru
l
es tune
d
to a ro
b
ust app
li
cat
i
on.
From the source to consumption,ima
g
es are now all di
g
ital.Ver
y
soon
,
arc
hi
ves w
ill b
e over

ow
i
ng.T
hi
s
i
s s
li
g
h
t
l
y worry
i
ng as
i
t w
ill
ra
i
se t
h
e
l
eve
l
o
f expectations about the accessibilit
y
of the pictorial content to a level com
-
pat
ibl
e w
i
t
h
w
h
at
h
umans can ac
hi
eve.
There is onl
y
one realistic chance to respond.From the trend displa
y
e
d
a
b
ove,
i
t
i
s
b
est to
id
ent
if
y
b
as
i
c
l
aws an
d
t
h
en to
l
earn t
h
e spec
ifi
cs o
f
t
he
m
o
d
e
l f
r o m a
l
a r g e r
d
a t a s e t.Ra t
h
e r t
h
a n e x c
l
u
d i
n g
i
n t e r a c t
i
o n
i
n t
h
e eva
l
uat
i
o
n
o
f the result,it is better to perceive interaction as a valuable source of instant
l
earn
i
ng
f
or t
h
e a
l
gor
i
t
h
m
.
This book builds on that insi
g
ht:that the ke
y
element in the current rev
-
ol
ut
i
on
i
s t
h
e use o
f
mac
hi
ne
l
earn
i
ng to capture t
h
e var
i
at
i
ons
i
n v
i
sua
l
ap
-
pearance,rather than havin
g
the desi
g
ner of the model accomplish this.As
a
b
onus,mo
d
e
l
s
l
earne
d f
r o m
l
a r g e
d
a t a s e t s a r e
l i k
e
l
y to
b
e more ro
b
ust an
d
m
ore realistic than the brittle all-desi
g
n models.
This book reco
g
nizes that machine learnin
g
for computer vision is distinc
-
t
i
ve
l
y
diff
erent
f
romp
l
a
i
n mac
hi
ne
l
earn
i
ng.Loa
d
s o
f d
ata,spat
i
a
l
co
h
erence,
and the lar
g
e variet
y
of appearances,make computer vision a special challen
ge
f
or t
h
e mac
hi
ne
l
earn
i
ng a
l
gor
i
t
h
ms.Hence,t
h
e
b
oo
k d
o e s n o t w a s t e
i
t s e
l f
o
n
t h e c o m p l e t e s p e c t r u m o f m a c h i n e l e a r n i n
g
a l
g
o r i t h m s.R a t h e r,t h i s b o o k i s
f
o c u s s e
d
o n m a c
h i
n e
l
e a r n
i
n g
f
o r p
i
c t u r e s
.
I t i s a m a z i n
g
s o e a r l
y
i n a n e w fi e l d t h a t a b o o k a p p e a r s w h i c h c o n n e c t s
t
h
e o r y t o a
l
g o r
i
t
h
m s a n
d
t
h
r o u g
h
t
h
e m t o c o n v
i
n c
i
n g a p p
l i
c a t
i
o n s
.
T h e a u t h o r s m e t o n e a n o t h e r a t U r b a n a - C h a m p a i
g
n a n d t h e n d i s p e r s e d o v e r
t
h
e wor
ld
,apart
f
rom T
h
omas Huang w
h
o
h
as
b
een t
h
ere
f
orever.T
hi
s
b
oo
k
will
sure
l
y
b
e w
i
t
h
us
f
or qu
i
te some t
i
me to come
.
Arnold
S
meulders
Un
i
vers
i
ty o
f
Amster
d
a
m
T
he Netherlands
Octo
b
er,200
4
Preface
T
h
e goa
l
o
f
computer v
i
s
i
on researc
h i
s t o p r o v
i d
e c o m p u t e r s w
i
t
h h
u m a n
-
l
i k e p e r c e p t i o n c a p a b i l i t i e s s o t h a t t h e
y
c a n s e n s e t h e e n v i r o n m e n t,u n d e r s t a n d
t
h
e s e n s e
d d
a t a,t a
k
e a
pp
ro
p
r
i
ate act
i
ons,an
d l
e a r n
f
r o m t
h i
s ex
p
er
i
ence
in
o
rder to enhance future performance.The field has evolved from the applica
-
t
i
on o
f
c
l
ass
i
ca
l
pattern recogn
i
t
i
on an
d i
ma g e p r o c e s s
i
n g me t
h
o
d
s to a
d
vance
d
techniques in ima
g
e understandin
g
like model-based and knowled
g
e-based vi
-
s
i
on
.
In recent
y
ears,there has been an increased demand for computer vision s
y
s
-
tems to address “real-world” problems.However,much of our current models
and methodolo
g
ies do not seem to scale out of limited “to
y
” domains.There
-
f
ore,t
h
e current state-o
f
-t
h
e-art
i
n computer v
i
s
i
on nee
d
s s
i
gn
ifi
cant a
d
vance
-
m
ents to
d
ea
l
w
i
t
h
rea
l
-wor
ld
app
li
cat
i
ons,suc
h
as nav
i
gat
i
on,target recogn
i-
tion,manufacturin
g
,photo interpretation,remote sensin
g
,etc.It is widel
y
un
-
d
erstoo
d
t
h
at many o
f
t
h
ese app
li
cat
i
ons requ
i
re v
i
s
i
on a
l
gor
i
t
h
ms an
d
systems
to work under partial occlusion,possibl
y
under hi
g
h clutter,low contrast,and
ch
ang
i
ng env
i
ronmenta
l
con
di
t
i
ons.T
hi
s requ
i
res t
h
at t
h
e v
i
s
i
on tec
h
n
i
que
s
should be robust and flexible to optimize performance in a
g
iven scenario
.
T
h
e

e
ld
o
f
mac
hi
ne
l
earn
i
ng
i
s
d
r
i
ven
b
y t
h
e
id
ea t
h
at computer a
l
gor
i
t
h
ms
and s
y
stems can improve their own performance with time.Machine learnin
g
has evolved fromthe relativel
y
“knowled
g
e-free”
g
eneral purpose learnin
g
s
y
s
-
tem,the “
p
erce
p
tron” [Rosenblatt,19
5
8],and decision-theoretic a
pp
roaches
f
or learnin
g
[Blockeel and De Raedt,1998],to s
y
mbolic learnin
g
of hi
g
h-leve
l
knowledge [Michalski et al.,1986],artificial neural networks [Rowley et al.
,
1998a],and
g
enetic al
g
orithms [DeJon
g
,1988].With the recent advances i
n
h
ar
d
ware an
d
so
f
tware,a var
i
ety o
f
pract
i
ca
l
app
li
cat
i
ons o
f
t
h
e mac
hi
ne
l
earn
-
i
n
g
research is emer
g
in
g
[Se
g
re,1992].
V
ision provides interestin
g
and challen
g
in
g
problems and a rich environ
-
m
ent to a
d
vance t
h
e state-o
f
-t
h
e art
i
n mac
hi
ne
l
earn
i
ng.Mac
hi
ne
l
earn
i
n
g
technolo
gy
has a stron
g
potential to contribute to the development of flexibl
e
xi
v
PREF
ACE
and robust vision al
g
orithms,thus improvin
g
the performance of practical vi
-
s
i
on systems.Learn
i
ng-
b
ase
d
v
i
s
i
on systems are expecte
d
to prov
id
e a
hi
g
h
er
level of competence and
g
reater
g
eneralit
y
.Learnin
g
ma
y
allow us to use th
e
e
xper
i
ence ga
i
ne
d i
n c r e a t
i
n g a v
i
s
i
o n s y s t e m
f
o r o n e a p p
l i
c a t
i
o n
d
o m a
i
n t
o
a vision s
y
stem for another domain b
y
developin
g
s
y
stems that acquire and
m
a
i
nta
i
n
k
now
l
e
d
ge.We c
l
a
i
m t
h
at
l
earn
i
ng represents t
h
e next c
h
a
ll
eng
i
n
g
f
rontier for com
p
uter vision research.
More specificall
y
,machine learnin
g
offers effective methods for computer
v
i
s
i
on
f
or automat
i
ng t
h
e mo
d
e
l
/concept acqu
i
s
i
t
i
on an
d
up
d
at
i
ng processes,
adaptin
g
task parameters and representations,and usin
g
experience for
g
ener
-
at
i
ng,ver
if
y
i
ng,an
d
mo
dif
y
i
ng
h
ypot
h
eses.Expan
di
ng t
hi
s
li
st o
f
compute
r
vision problems,we find that some of the applications of machine learnin
g
i
n computer v
i
s
i
on are:segmentat
i
on an
d f
e a t u r e e x t r a c t
i
o n;
l
e a r n
i
n g r u
l
e s,
r e l a t i o n s,f e a t u r e s,d i s c r i m i n a n t f u n c t i o n s,a n d e v a l u a t i o n s t r a t e
g
i e s;l e a r n i n
g
a n d r e fi n i n
g
v i s u a l m o d e l s;i n d e x i n
g
a n d r e c o
g
n i t i o n s t r a t e
g
i e s;i n t e
g
r a t i o n o
f
v
i
s
i
o n m o
d
u
l
e s a n
d
t a s
k
-
l
eve
l l
earn
i
n
g
;
l
earn
i
n
g
s
h
ape representat
i
on an
d
sur
-
f
ace reconstruction strategies;self-organizing algorithms for pattern learning
;
bi
o
l
og
i
ca
ll
y mot
i
vate
d
mo
d
e
li
ng o
f
v
i
s
i
on systems t
h
at
l
earn;an
d
parameter
adaptation,and self-calibration of vision s
y
stems.As an eventual
g
oal,ma
-
chi
ne
l
earn
i
ng may prov
id
e t
h
e necessary too
l
s
f
or synt
h
es
i
z
i
ng v
i
s
i
on a
l
go
-
rithms startin
g
fromadaptation of control parameters of vision al
g
orithms an
d
sys
t
ems
.
The
g
oal of this book is to address the use of several important machin
e
l
earn
i
ng tec
h
n
i
ques
i
nto computer v
i
s
i
on app
li
cat
i
ons.An
i
nnovat
i
ve com
bi-
n
ation of computer vision and machine learnin
g
techniques has the promis
e
of
a
d
vanc
i
ng t
h
e

e
ld
o
f
computer v
i
s
i
on,w
hi
c
h
w
ill
contr
ib
ute to
b
etter un
-
d
erstan
di
ng o
f
comp
l
ex rea
l
-wor
ld
app
li
cat
i
ons.T
h
ere
i
s anot
h
er
b
ene

t o
f
i
ncorporatin
g
a learnin
g
paradi
g
m in the computational vision framework.T
o
m
ature t
h
e
l
a
b
oratory-grown v
i
s
i
on systems
i
nto rea
l
-wor
ld
wor
ki
ng systems,
i
t is necessar
y
to evaluate the performance characteristics of these s
y
stems us
-
i
ng a var
i
ety o
f
rea
l
,ca
lib
rate
d d
a t a.L e a r n
i
n g o
ff
ers t
hi
s eva
l
uat
i
on too
l
,s
i
nc
e
n
o learnin
g
can take place without appropriate evaluation of the results
.
Genera
ll
y,
l
earn
i
ng requ
i
res
l
arge amounts o
f d
a t a a n
d f
a s t c o m p u t a t
i
o n a
l
r e s o u r c e s f o r i t s p r a c t i c a l u s e.H o w e v e r,a l l l e a r n i n
g
d o e s n o t h a v e t o b e o n
-
l i
n e.S o m e o
f
t
h
e
l
e a r n
i
n g c a n
b
e
d
o n e o
ff
-
li
ne,e.g.,opt
i
m
i
z
i
ng parameters,
f
eatures,and sensors durin
g
trainin
g
to improve performance.Dependin
g
upo
n
t
h
e
d
oma
i
n o
f
app
li
cat
i
on,t
h
e
l
arge num
b
er o
f
tra
i
n
i
ng samp
l
es nee
d
e
d f
o r
i
n d u c t i v e l e a r n i n
g
t e c h n i q u e s m a
y
n o t b e a v a i l a b l e.T h u s,l e a r n i n
g
t e c h n i q u e s
s
h
o u
l d b
e a
bl
etowor
k
w
i
t
h
vary
i
ng amounts o
f
a pr
i
or
i k
n o w
l
e
d
g e a n
d d
a t a
.
T
h
e e
ff
ect
i
ve usa
g
e o
f
mac
hi
ne
l
earn
i
n
g
tec
h
no
l
o
gy i
n rea
l
-wor
ld
computer
vision problems requires understanding the domain of application,abstractio
n
of
a
l
earn
i
ng pro
bl
em
f
rom a g
i
ven computer v
i
s
i
on tas
k
,an
d
t
h
e se
l
ect
i
o
n
PREF
AC
E
xv
o
f
a
pp
ro
p
r
i
ate re
p
resentat
i
ons
f
or t
h
e
l
earna
bl
e (
i
n
p
ut) an
d l
e a r n e
d
(
i
n t e r n a
l )
e n t i t i e s o f t h e s
y
s t e m.T o s u c c e e d i n s e l e c t i n
g
t h e m o s t a p p r o p r i a t e m a c h i n e
l
e a r n
i
n
g
t e c
h
n
i
q u e ( s )
f
o r t
h
e
g i
v e n c o m p u t e r v
i
s
i
o n t a s
k
,an a
d
equate un
d
er
-
s
tanding of the different machine learning paradigms is necessary
.
A
l
earn
i
ng system
h
as to c
l
ear
l
y
d
emonstrate an
d
answer t
h
e quest
i
ons
lik
e
w
hat is bein
g
learned,how it is learned,what data is used to learn,how to rep
-
resent w
h
at
h
as
b
een
l
earne
d
,
h
owwe
ll
an
d h
owe
ffi
c
i
ent
i
s t
h
e
l
earn
i
ng ta
ki
ng
place and what are the evaluation criteria for the task at hand.Experimen
-
t
a
l d
e t a
i l
s a r e e s s e n t
i
a
l f
o r
d
e m o n s t r a t
i
n g t
h
e
l
e a r n
i
n g
b
e
h
av
i
or o
f
a
l
gor
i
t
h
m
s
a
nd s
y
stems.These experiments need to include scientific experimental desi
gn
m
et
h
o
d
o
l
ogy
f
or tra
i
n
i
ng/test
i
ng,parametr
i
c stu
di
es,an
d
measures o
f
per
f
or
-
m
ance improvement with experience.Experiments that exihibit scalabilit
y
of
l
earn
i
ng-
b
ase
d
v
i
s
i
on systems are a
l
so very
i
mportant
.
I
n this book,we address all these important aspects.In each of the chapters
,
w
e s
h
ow
h
ow t
h
e
li
terature
h
as
i
ntro
d
uce
d
t
h
e tec
h
n
iq
ues
i
nto t
h
e
p
art
i
cu
l
ar
t
opic area,we present the back
g
round theor
y
,discuss comparative experiment
s
m
a
d
e
b
y us,an
d
conc
l
u
d
e w
i
t
h
comments an
d
recommen
d
at
i
ons
.
Acknowledgments
T
his book would not have existed without the assistance of Marcelo Cirelo
,
L
arry C
h
en,Fa
bi
o Cozman,M
i
c
h
ae
l
Lew,an
d
Dan Rot
h
w
h
ose tec
h
n
i
ca
l
con
-
t
ributions are directl
y
reflected within the chapters.We would like to than
k
Th
eo Gevers,Nur
i
a O
li
ver,Arno
ld
Smeu
ld
ers,an
d
our co
ll
eagues
f
rom t
h
e
I
ntelli
g
ent Sensor
y
Information S
y
stems
g
roup at Universit
y
of Amsterda
m
a
n
d
t
h
e IFP group at Un
i
vers
i
ty o
f
I
lli
no
i
s at Ur
b
ana-C
h
ampa
i
gn w
h
o gave u
s
valuable su
gg
estions and critical comments.Be
y
ond technical contributions,
w
e wou
ld lik
etot
h
an
k
our
f
am
ili
es
f
or years o
f
pat
i
ence,support,an
d
encour-
ag
ement.Furthermore,we are
g
rateful to our departments for providin
g
a
n
exce
ll
ent sc
i
ent
ifi
c env
i
ronment
.
Chapter 1
INTRODUCTION
C
omputer v
i
s
i
on
h
as grown rap
idl
y w
i
t
hi
n t
h
e past
d
eca
d
e,pro
d
uc
i
ng too
ls
t
hat enable the understandin
g
of visual information,especiall
y
for scenes wit
h
no accompany
i
ng structura
l
,a
d
m
i
n
i
strat
i
ve,or
d
escr
i
pt
i
ve text
i
n
f
ormat
i
on
.
The Internet,more specificall
y
the Web,has become a common channel fo
r
th
e transm
i
ss
i
on o
f
grap
hi
ca
l i
n
f
o r m a t
i
o n,t
h
u s m o v
i
n g v
i
s u a
l i
n
f
o r m a t
i
o n r e -
t
r i e v a l r a p i d l
y
f r o m s t a n d - a l o n e w o r k s t a t i o n s a n d d a t a b a s e s i n t o a n e t w o r k e d
e n v
i
r o n m e n t
.
P r a c t
i
c a
l i
t y
h
a s
b
e g u n t o
d i
c t a t e t
h
a t t
h
e
i
n
d
ex
i
ng o
f h
u g e c o
l l
e c t
i
o n s o
f i
m-
a g
e s b
y
h a n d i s a t a s k t h a t i s b o t h l a b o r i n t e n s i v e a n d e x p e n s i v e - i n ma n
y
c a s e s mo r e t
h
a n c a n
b
e a
ff
or
d
e
d
to
p
rov
id
e some met
h
o
d
o
f i
n t e
l l
e c t u a
l
a c -
c e s s t o d i
g
i t a l i m a
g
e c o l l e c t i o n s.I n t h e w o r l d o f t e x t r e t r i e v a l,t e x t “ s p e a k
s
f
o r
i
t s e
l f
” w
h
ereas
i
mage ana
l
ys
i
s requ
i
res a com
bi
nat
i
on o
f hi
g
h
-
l
eve
l
con-
cept creation as well as the processin
g
and interpretation of inherent visua
l
f
eatures.In t
h
e area o
f i
n t e
l l
e c t u a
l
a c c e s s t o v
i
s u a
l i
n
f
o r m a t
i
o n,t
h
e
i
n t e r p
l
a y
b e t w e e n h u m a n a n d m a c h i n e i m a
g
e i n d e x i n
g
m e t h o d s h a s b e
g
u n t o i n fl u e n c e
t h
e
d
eve
l
opment o
f
computer v
i
s
i
on systems.Researc
h
an
d
app
li
cat
i
on
b
y
t
he ima
g
e understandin
g
(IU) communit
y
su
gg
ests that the most fruitful ap-
p
roac
h
es to IU
i
nvo
l
ve ana
l
ys
i
s an
d l
earn
i
ng o
f
t
h
e type o
f i
n
f
o r m a t
i
o n
b
e
i
n
g
s
o u
g
h t,t h e d o m a i n i n w h i c h i t w i l l b e u s e d,a n d s
y
s t e m a t i c t e s t i n
g
t o i d e n t i f
y
o
p t i m a l m e t h o d s.
T
h
e g o a
l
o
f
c o m p u t e r v
i
s
i
o n r e s e a r c
h i
s t o p r o v
i d
e c o m p u t e r s w
i
t
h h
u m a n -
l
i k e p e r c e p t i o n c a p a b i l i t i e s s o t h a t t h e
y
c a n s e n s e t h e e n v i r o n m e n t,u n d e r s t a n d
t h
e s e n s e
d d
a t a,t a
k
e a p p r o p r
i
a t e a c t
i
o n s,a n
d l
e a r n
f
r o m t
h i
s e x p e r
i
e n c e
i
n or-
d
er to enhance future
p
erformance.The vision field has evolved fromthe a
pp
li-
cat
i
on o
f
c
l
ass
i
ca
l
pattern recogn
i
t
i
on an
d i
m a g e p r o c e s s
i
n g t e c
h
n
i
q u e s t o a
d
-
2
Intro
d
uctio
n
vanced applications of ima
g
e understandin
g
,model-based vision,knowled
g
e
-
b
ase
d
v
i
s
i
on,an
d
systems t
h
at ex
hibi
t
l
earn
i
ng capa
bili
ty.T
h
e a
bili
ty to reaso
n
and the abilit
y
to learn are the two ma
j
or capabilities associated with these s
y
s-
tems.In recent years,t
h
eoret
i
ca
l
an
d
pract
i
ca
l
a
d
vances are
b
e
i
ng ma
d
e
i
n t
he

eld of computer vision and pattern reco
g
nition b
y
new techniques and pro
-
c
esses o
f l
e a r n
i
n g,r e p r e s e n t a t
i
o n,a n
d
a
d
a p t a t
i
o n.I t
i
s p r o
b
a
b l
y
f
a
i
r to c
l
a
i
m,
however,that learnin
g
represents the next challen
g
in
g
frontier for computer
v
i
s
i
on
.
1.Research Issues on Learning in Computer Vision
In recent years,t
h
ere
h
as
b
een a surge o
f i
n t e r e s t
i
n
d
eve
l
op
i
ng mac
hi
n
e
learnin
g
techniques for computer vision based applications.The interest de
-
r
i
ves
f
rom
b
ot
h
commerc
i
a
l
pro
j
ects to create wor
ki
ng pro
d
ucts
f
rom com
-
puter vision techniques and from a
g
eneral trend in the computer vision fiel
d
to
i
ncorporate mac
hi
ne
l
earn
i
ng tec
h
n
i
ques
.
L
earn
i
ng
i
s one o
f
t
h
e current
f
ront
i
ers
f
or computer v
i
s
i
on researc
h
an
d h
a s
b e e n r e c e i v i n
g
i n c r e a s e d a t t e n t i o n i n r e c e n t
y
e a r s.Ma c h i n e l e a r n i n
g
t e c h n o l
-
o g y
h
a s s t r o n g p o t e n t
i
a
l
t o c o n t r
i b
u t e t o
:
t
h e d e v e l o p me n t o f fl e x i b l e a n d r o b u s t v i s i o n a l
g
o r i t h ms t h a t wi l l i mp r o v
e
t h
e p e r
f
o r ma n c e o
f
p r a c t
i
c a
l
v
i
s
i
o n s y s t e ms w
i
t
h
a
h i
g
h
e r
l
eve
l
o
f
compe
-
t
ence and
g
reater
g
eneralit
y
,an
d
t
he development of architectures that will speed up s
y
stem developmen
t
ti
me an
d p
r o v
i d
e
b
e t t e r
p
e r
f
o r ma n c e.
T
h
e g o a
l
o
f i
m p r o v
i
n g t
h
e p e r
f
o r m a n c e o
f
c o m p u t e r v
i
s
i
o n s y s t e m s
h
a s
b
r o u
g h
t new c
h
a
ll
en
g
es to t
h
e

e
ld
o
f
mac
hi
ne
l
earn
i
n
g
,
f
or examp
l
e,
l
earn
i
n
g
f
rom structured descriptions,partial information,incremental learnin
g
,focus
-
i
ng attent
i
on or
l
earn
i
ng reg
i
ons o
f i
n t e r e s t s ( ROI ),
l
e a r n
i
n g w
i
t
h
ma n y c
l
a s s e s,
e
t c.S o l v i n
g
p r o b l e ms i n v i s u a l d o ma i n s wi l l r e s u l t i n t h e d e v e l o p me n t o f n e w,
m
o r e r o b u s t ma c h i n e l e a r n i n g a l g o r i t h ms t h a t wi l l b e a b l e t o wo r k i n mo r
e
r e a
l i
s t
i
c s e t t
i
n g s
.
F
r o m t
h
e s t a n
d
p o
i
n t o
f
c o mp u t e r v
i
s
i
o n s y s t e ms,ma c
h i
n e
l
e a r n
i
n g c a n o
ff
er
e
ffective methods for automatin
g
the acquisition of visual models,adaptin
g
tas
k
parameters an
d
representat
i
on,trans
f
orm
i
ng s
i
gna
l
s to sym
b
o
l
s,
b
u
ildi
n
g
trainable ima
g
e processin
g
s
y
stems,focusin
g
attention on tar
g
et ob
j
ect,and
l
earn
i
ng w
h
en to app
l
y w
h
at a
l
gor
i
t
h
m
i
n a v
i
s
i
on system.
F
romt
h
e stan
d
po
i
nt o
f
mac
hi
ne
l
earn
i
ng systems,computer v
i
s
i
on can pro
-
vide interestin
g
and challen
g
in
g
problems.As examples consider the follow
-
i
n
g
:learnin
g
models rather than handcraftin
g
them,learnin
g
to transfer experi
-
e
nce ga
i
ne
d i
n o n e a p p
l i
c a t
i
o n
d
o ma
i
n t o a n o t
h
e r
d
o ma
i
n,
l
e a r n
i
n g
f
r o m
l
a r g
e
s e t s o f i ma
g
e s w i t h n o a n n o t a t i o n,d e s i
g
n i n
g
e v a l u a t i o n c r i t e r i a f o r t h e q u a l i t
y
Researc
h
Issues on Learnin
g
in Computer Visio
n
3
o
f learnin
g
processes in computer vision s
y
stems.Man
y
studies in machin
e
l
earn
i
ng assume t
h
at a care
f
u
l
tra
i
ner prov
id
es
i
nterna
l
representat
i
ons o
f
t
he
o
bserved environment,thus pa
y
in
g
little attention to the problems of percep
-
t
i
on.Un
f
ortunate
l
y,t
hi
s assumpt
i
on
l
ea
d
s to t
h
e
d
eve
l
opment o
f b
r
i
t t
l
e s y s t e ms
w
i t h n o i s
y
,e x c e s s i v e l
y
d e t a i l e d,o r q u i t e c o a r s e d e s c r i p t i o n s o f t h e p e r c e i v e d
e
nv
i
ronment
.
Espos
i
to an
d
Ma
l
er
b
a [Espos
i
to an
d
Ma
l
er
b
a,2001]
li
ste
d
some o
f
t
h
e
i
m
-
p
ortant research issues that have to be dealt with in order to develo
p
successfu
l
app
li
cat
i
ons:
Can we
l
earn t
h
e mo
d
e
l
s use
d by
a c o m p u t e r v i s i o n s
y
s t e m r a t
h
e r t
h
a n
h a n d c r a
f
t i n g t h e m?
I n m a n y c o m p u t e r v
i
s
i
o n a p p
l i
c a t
i
o n s,
h
a n
d
c r a
f
t
i
n g t
h
e v
i
sua
l
mo
d
e
l
o
f
a
n
o
bj
ect
i
s ne
i
t
h
er easy nor pract
i
ca
l
.For
i
nstance,
h
umans can
d
etect an
d
i
dentif
y
faces in a scene with little or no effort.This skill is quite robust,
d
esp
i
te
l
arge c
h
anges
i
n t
h
e v
i
sua
l
st
i
mu
l
us.Nevert
h
e
l
ess,prov
idi
ng com
-
puter vision s
y
stems with models of facial landmarks or facial expressions
i
s ver
y
difficult [Cohen et al.,2003b].Even when models have been hand
-
cra
f
te
d
,as
i
n t
h
e case o
f
page
l
ayout
d
escr
i
pt
i
ons use
d b
y s o m e
d
o c u m e n t
i
m a
g
e p r o c e s s i n
g
s
y
s t e m s [ N a
g y
e t a l.,1 9 9 2 ],i t h a s b e e n o b s e r v e d t h a t t h e
y
l i
m
i
t t
h
e use o
f
t
h
e systemto a spec
ifi
c c
l
ass o
f i
ma g e s,w
h i
c
h i
s su
bj
ect t
o
c
h
ange
i
n a re
l
at
i
ve
l
y s
h
ort t
i
me
.
H
ow is machine learning used in computer vision systems?
M
ac
hi
ne
l
earn
i
ng a
l
gor
i
t
h
ms can
b
e app
li
e
d i
n at
l
east two
diff
erent ways
i
n computer vision s
y
stems
:

to
i
mprove percept
i
on o
f
t
h
e surroun
di
ng env
i
ronment,t
h
at
i
s,to
i
m
-
p
rove the transformation of sensed si
g
nals into internal representations,
a
n
d

to brid
g
e the
g
ap between the internal representations of the environ
-
ment an
d
t
h
e representat
i
on o
f
t
h
e
k
now
l
e
d
ge nee
d
e
d b
y t
h
e system to
p
erformits task
.
A poss
ibl
e exp
l
anat
i
on o
f
t
h
e marg
i
na
l
attent
i
on g
i
ven to
l
earn
i
ng
i
nterna
l
representations of the perceived environment is that feature extraction has
rece
i
ve
d
very
li
tt
l
e attent
i
on
i
n t
h
e mac
hi
ne
l
earn
i
ng commun
i
ty,
b
ecause
i
t
has been considered a
pp
lication-de
p
endent and research on this issue is not
o
f
genera
l i
n t e r e s t.T
h
e
i d
e n t
i fi
c a t
i
o n o
f
r e q u
i
r e
d d
a t a a n
d d
o m a
i
n
k
n o w
l -
e
d
g e r e q u
i
r e s t
h
e co
ll
a
b
orat
i
on w
i
t
h
a
d
oma
i
n expert an
d i
s an
i
mportan
t
s
tep of the process of appl
y
in
g
machine learnin
g
to real-world problems.
4
Intro
d
uctio
n
Onl
y
recentl
y
,the related issues of feature selection and,more
g
enerall
y
,
d
ata preprocess
i
ng
h
ave
b
een more systemat
i
ca
ll
y
i
nvest
i
gate
d i
n m a c
h i
n
e
l e a r n i n
g
.D a t a p r e p r o c e s s i n
g
i s s t i l l c o n s i d e r e d a s t e p o f t h e k n o w l e d
g e
d i
s c o v e r y p r o c e s s a n
d i
s c o n

n e
d
t o
d
a t a c
l
e a n
i
n g,s
i
mp
l
e
d
a t a t r a n s
f
o r ma
-
t
i o n s ( e.
g
.,s u mma r i z a t i o n ),a n d v a l i d a t i o n.On t h e c o n t r a r
y
,ma n
y
s t u d i e s
i
n c o mp u t e r v
i
s
i
o n a n
d
p a t t e r n r e c o g n
i
t
i
o n
f
o c u s e
d
o n t
h
e p r o
b l
e ms o
f f
e a
-
t
u r e e x t r a c t i o n a n d s e l e c t i o n.Ho u
g
h t r a n s f o r m,F F T,a n d t e x t u r a l f e a t u r e s,
j
u s t t o me n t
i
o n s o me,a r e a
l l
e x a mp
l
e s o
f f
e a t u r e s w
i d
e
l
y a p p
l i
e
d i
n
i
m a g
e
c l a s s i fi c a t i o n a n d s c e n e u n d e r s t a n d i n
g
t a s k s.T h e i r p r o p e r t i e s h a v e b e e
n
w
e
l l i
n v e s t
i
g a t e
d
a n
d
ava
il
a
bl
e too
l
s ma
k
et
h
e
i
r use s
i
mp
l
e an
d
e
ffi
c
i
ent
.
How
d
o we represent visua
l
information?
I
n many computer v
i
s
i
on app
li
cat
i
ons,
f
eature vectors are use
d
to represen
t
t
he perceived environment.However,relational descriptions are deeme
d
t
o
b
e o
f
cruc
i
a
l i
m p o r t a n c e
i
n
h i
g
h
-
l
eve
l
v
i
s
i
on.S
i
nce re
l
at
i
ons cannot
be
represented b
y
feature vectors,pattern reco
g
nition researchers use
g
raphs
t
o capture t
h
e structure o
f b
o t
h
o
b j
e c t s a n
d
s c e n e s,w
h i l
e p e o p
l
e wor
ki
ng
in the field of machine learnin
g
prefer to use first-order lo
g
ic formalisms.
B
y mapp
i
ng one
f
orma
li
sm
i
nto anot
h
er,
i
t
i
s poss
ibl
e to

n
d
some s
i
m
i-
larities between research done in pattern reco
g
nition and machine learnin
g
.
An examp
l
e
i
s t
h
e spat
i
o-tempora
l d
e c
i
s
i
o n t r e e p r o p o s e
d b
y B
i
sc
h
o
f
an
d
Caelli [Bischof and Caelli,2001],which can be related to lo
g
ical decisio
n
t
rees
i
n
d
uce
d b
y s o m e g e n e r a
l
- p u r p o s e
i
n
d
u c t
i
ve
l
earn
i
ng systems [B
l
oc
k-
eel and De Raedt,1998].
What machine learning paradigms and strategies are appropriate to the
com
p
uter vision
d
omain?
I
n
d
uct
i
ve
l
earn
i
ng,
b
ot
h
superv
i
se
d
an
d
unsuperv
i
se
d
,emerges as t
h
e mos
t
important learnin
g
strate
gy
.There are several important paradi
g
ms that ar
e
b
e
i
n
g
use
d
:conceptua
l
(
d
ec
i
s
i
on trees,
g
rap
h
-
i
n
d
uct
i
on),stat
i
st
i
ca
l
(sup
-
port vector machines),and neural networks (Kohonen maps and similar
a
uto-or
g
an
i
z
i
n
g
s
y
stems).Anot
h
er emer
gi
n
g
para
dig
m,w
hi
c
h i
s
d
e s c r
i b
e
d
i n d e t a i l i n t h i s b o o k,i s t h e u s e o f p r o b a b i l i s t i c mo d e l s i n
g
e n e r a l a n d p r o b -
a b i l i
s t
i
c g r a p
h i
c a
l
mo
d
e
l
s
i
n p a r t
i
c u
l
a r.
Wh a t a r e t h e c r i t e r i a f o r e v a l u a t i n g t h e
q
u a l i t y o f t h e l e a r n i n g p r o c e s s e s i n
c o m p u t e r v i s i o n s
y
s t e m s
?
I
n
b
e n c
h
ma r
k i
n g c o mp u t e r v
i
s
i
o n s y s t e ms,e s t
i
ma t e s o
f
t
h
e p r e
d i
c t
i
ve ac
-
curac
y
,recall,and precision [Hui
j
sman and Sebe,2004] are considered th
e
ma
i
n parameters to eva
l
uate t
h
e success o
f
a
l
earn
i
ng a
l
gor
i
t
h
m.How
-
Researc
h
Issues on Learnin
g
in Computer Visio
n
5
ever,the comprehensibilit
y
of learned models is also deemed an important
cr
i
ter
i
on,espec
i
a
ll
y w
h
en
d
oma
i
n experts
h
ave strong expectat
i
ons on t
he
properties of visual models or when understandin
g
of s
y
stemfailures is im
-
portant.Compre
h
ens
ibili
ty
i
s nee
d
e
d b
y t
h
e expert to eas
il
y an
d
re
li
a
bly
verif
y
the inductive assertions and relate them to their own domain knowl
-
e
d
ge.W
h
en compre
h
ens
ibili
ty
i
s an
i
mportant
i
ssue,t
h
e conceptua
l l
e a r n
-
i
n
g
p a r a d i
g
m i s u s u a l l
y
p r e f e r r e d,s i n c e i t i s b a s e d o n t h e c o m p r e h e n s i b i l i t
y
p o s t u
l
a t e s t a t e
d b
y M
i
c
h
a
l
s
ki
[M
i
c
h
a
l
s
ki
,1983]:
The results of computer induction should be s
y
mbolic descrip
-
tions of
g
iven entities,semanticall
y
and structurall
y
similar to those
a
h
uman expert m
i
g
h
t pro
d
uce o
b
serv
i
ng t
h
e same ent
i
t
i
es.Com-
p
onents o
f
t
h
ese
d
escr
i
pt
i
ons s
h
ou
ld b
e compre
h
ens
ibl
e as s
i
ng
l
e

chunks” of information,directl
y
interpretable in natural lan
g
ua
g
e
,
and should relate
q
uantitative and
q
ualitative conce
p
ts in an inte-
g
rate
d f
a s
h i
o n
.
W
h
e n i s i t u s e f u
l
t o a
d
o p t s e v e r a
l
r e p r e s e n t a t i o n s o f t
h
e p e r c e i v e
d
e n v i r o n -
m
e n t w i t
h d
i
ff
erent
l
eve
l
s o
f
a
b
straction?
In complex real-world applications,multi-representations of the perceive
d
env
i
ronment prove very use
f
u
l
.For
i
nstance,a
l
ow reso
l
ut
i
on
d
ocument
i
ma
g
e is suitable for the efficient separation of text from
g
raphics,while a

ner resolution is required for the subsequent step of interpretin
g
the s
y
m
-
b
o
l
s
i
n a text
bl
oc
k
(OCR).Ana
l
ogous
l
y,t
h
e representat
i
on o
f
an aer
i
a
l
view of a cultivated area b
y
means of a vector of textural features can b
e
appropr
i
ate to recogn
i
ze t
h
e type o
f
vegetat
i
on,
b
ut
i
t
i
s too coarse
f
or t
he
recogn
i
t
i
on o
f
a part
i
cu
l
ar geomorp
h
o
l
ogy.By app
l
y
i
ng a
b
stract
i
on pr
i
n
-
ciples in computer pro
g
rammin
g
,software en
g
ineers have mana
g
ed to de
-
ve
l
op comp
l
ex so
f
tware systems.S
i
m
il
ar
l
y,t
h
e systemat
i
c app
li
cat
i
on o
f
abstraction principles in knowled
g
e representation is the ke
y
stone for a lon
g
t
ermsolution to man
y
problems encountered in computer vision tasks.
H
ow can mutua
l d
e p e n
d
e n c y o
f
v i s u a
l
c o n c e p t s
b
e
d
e a
l
t w i t
h?
I n s c e n e l a b e l l i n
g
p r o b l e ms,i ma
g
e se
g
ments have to be associated with a
class name or a label,the number of distinct labels dependin
g
on the dif
-
f
erent t
y
pes o
f
o
bj
ects a
ll
owe
d i
n t
h
e perce
i
ve
d
wor
ld
.T
y
p
i
ca
lly
,
i
ma
ge
s
egments cannot be labelled independently of each other,since the inter
-
pretat
i
on o
f
a part o
f
a scene
d
epen
d
s on t
h
eun
d
erstan
di
n
g
o
f
t
h
e w
h
o
le
s
cene (holistic view).Context-dependent labelling rules will take such con
-
cept
d
epen
d
enc
i
es
i
nto account,so as to guarantee t
h
at t
h
e

na
l
resu
l
t
i
s
g
loball
y
(and not onl
y
locall
y
) consistent [Haralick and Shapiro,1979].
L
earn
i
ng context-
d
epen
d
ent
l
a
b
e
lli
ng ru
l
es
i
s anot
h
er researc
h i
s s u e,s
i
n c
e
6
Intro
d
uctio
n
most learnin
g
al
g
orithms rel
y
on the independence assumption,accordin
g
t
o w
hi
c
h
t
h
e so
l
ut
i
on to a mu
l
t
i
c
l
ass or mu
l
t
i
p
l
e concept
l
earn
i
ng pro
bl
em
is simpl
y
the sumof independent solutions to sin
g
le class or sin
g
le concept
l
earn
i
ng pro
bl
ems.
O
bviousl
y
,the above list cannot be considered complete.Other equall
y
re
l
evant researc
h i
s s u e s m
i
g
h
t
b
e p r o p o s e
d
,s u c
h
a s t
h
e
d
eve
l
opment o
f
no
i
se
-
tolerant learnin
g
techniques,the effective use of lar
g
e sets of unlabeled ima
g
es
an
d
t
h
e
id
ent
ifi
cat
i
on o
f
su
i
ta
bl
e cr
i
ter
i
a
f
or start
i
ng/stopp
i
ng t
h
e
l
earn
i
ng pro
-
c
ess and/or revisin
g
acquired visual models.
2.Overview of the Book
In
g
eneral,the stud
y
of machine learnin
g
and computer vision can be di
-
v
id
e
d i
n t o t
h
r e e
b
r o a
d
c a t e g o r
i
e s
:
T h
e o r
y
l
e a
d i
n g t
o
A l g
o r i t
h
m s
a
n
d
A
p p
l
i c a -
t i o n
s
b
u i l t o n t o p o f t h e o r
y
a n d a l
g
o r i t h m s.I n t h i s f r a m e w o r k,t h e a p p l i c a t i o n
s
s
h
o u
l d f
o r m t
h
e
b
a s
i
s o
f
t
h
e t
h
eoret
i
ca
l
researc
h l
e a
d i
n g t o
i
n t e r e s t
i
n g a
l
g o
-
r i t h m s.A s a c o n s e
q
u e n c e,t h e b o o k w a s d i v i d e d i n t o t h r e e
p
a r t s.T h e fi r s t
p
a r t
d
eve
l
ops t
h
e t
h
eoret
i
ca
l
un
d
erstan
di
ng o
f
t
h
e concepts t
h
at are
b
e
i
ng use
d in
d e v e l o p i n
g
a l
g
o r i t h m s i n t h e s e c o n d p a r t.T h e t h i r d p a r t f o c u s e s o n t h e a n a l
-
y s
i
s o
f
computer v
i
s
i
on an
d h
u m a n - c o m p u t e r
i
n t e r a c t
i
o n a p p
l i
c a t
i
o n s t
h
a t u s
e
t h e a l
g
o r i t h m s a n d t h e t h e o r
y
p r e s e n t e d i n t h e fi r s t p a r t s.
T h e t h e o r e t i c a l r e s u l t s i n t h i s b o o k o r i
g
i n a t e f r o m d i f f e r e n t p r a c t i c a l p r o b
-
l e m s e n c o u n t e r e d w h e n u s i n
g
m a c h i n e l e a r n i n
g
i n
g
e n e r a l,a n d p r o b a b i l i s t i c
m
o
d
e
l
s
i
n
p
a r t
i
c u
l
a r,t o c o m
p
u t e r v
i
s
i
o n a n
d
m u
l
t
i
m e
d i
a
p
r o
b l
e m s.T
h
e

r s t
s e t o f q u e s t i o n s a r i s e f r o m t h e h i
g
h d i m e n s i o n a l i t
y
o f m o d e l s i n c o m p u t e r v i
-
s
i
o n a n
d
m u
l
t
i
m e
d i
a.F o r e x a m p
l
e,
i
n t e g r a t
i
o n o
f
a u
d i
o an
d
v
i
sua
l i
n
f
o r m a
-
t
i
o n p
l
a y s a c r
i
t
i
c a
l
r o
l
e
i
n mu
l
t
i
me
di
a ana
l
ys
i
s.D
iff
erent me
di
a streams (e.g.,
audio,video,and text,etc.) ma
y
carr
y
information about the task bein
g
per
-
f
orme
d
an
d
recent resu
l
ts [Bran
d
et a
l
.,1997;C
h
en an
d
Rao,1998;Garg et a
l
.
,
2
000b] have shown that improved performance can be obtained b
y
combinin
g
i
nformation from different sources compared with the situation when a sin
g
l
e
m
o
d
a
li
ty
i
s cons
id
ere
d
.At t
i
mes,
diff
erent streams may carry s
i
m
il
ar
i
n
f
orma
-
tion and in that case,one attempts to use the redundanc
y
to improve the perfor
-
m
ance o
f
t
h
e
d
es
i
re
d
tas
k b
y c a n c e
l l i
n g t
h
e no
i
se.At ot
h
er t
i
mes,two streams
m
ay carry comp
li
mentary
i
n
f
ormat
i
on an
d i
n t
h
at case t
h
e system must ma
ke
use of the information carried in both channels to carr
y
out the task.However,
t
h
e mer
i
ts o
f
us
i
ng mu
l
t
i
p
l
e streams
i
s overs
h
a
d
owe
d b
y t
h
e
f
orm
id
a
bl
e tas
k
o
f
learnin
g
in hi
g
h dimensional which is invariabl
y
the case in multi-modal infor
-
m
ation processin
g
.Althou
g
h,the existin
g
theor
y
supports the task of learnin
g
i
n
hi
g
h di
me n s
i
o n a
l
s p a c e s,t
h
e
d
a t a a n
d
mo
d
e
l
c o mp
l
ex
i
ty requ
i
rements pose
d
are t
y
picall
y
not met b
y
the real life s
y
stems.Under such scenario,the existin
g
O
verview o
f
t
h
e Boo
k
7
results in learnin
g
theor
y
falls short of
g
ivin
g
an
y
meanin
g
ful
g
uarantees for
t
h
e
l
earne
d
c
l
ass
ifi
ers.T
hi
s ra
i
ses a num
b
er o
f i
n t e r e s t
i
n g q u e s t
i
o n s
:
C
a n w e a n a
l
y z e t
h
e
l
e a r n
i
n g t
h
e o r y
f
o r mo r e p r a c t
i
c a
l
s c e n a r
i
o s?
C
a n t h e r e s u l t s o f s u c h a n a l
y
s i s b e u s e d t o d e v e l o p b e t t e r a l
g
o r i t h ms?
A n o t h e r s e t o f q u e s t i o n s a r i s e f r o m t h e p r a c t i c a l p r o b l e m o f d a t a a v a i l a b i l
-
i
t y
i
n c o mp u t e r v
i
s
i
o n,ma
i
n
l
y
l
a
b
e
l
e
d d
a t a.I n t
h i
s r e s p e c t,t
h
e r e a r e t
h
r e
e
m
a i n p a r a d i
g
m s f o r l e a r n i n
g
f r o m t r a i n i n
g
d a t a.T h e fi r s t i s k n o w n a
s
s u p e r -
v
i s e
d l
e a r n i n
g
,i
n w
hi
c
h
a
ll
t
h
e tra
i
n
i
ng
d
ata are
l
a
b
e
l
e
d
,
i
.e.,a
d
atumconta
i
ns
b
oth the values of the attributes and the labelin
g
of the attributes to one of
t
h
e c
l
asses.T
h
e
l
a
b
e
li
ng o
f
t
h
e tra
i
n
i
ng
d
ata
i
s usua
ll
y
d
one
b
y an externa
l
m
echanism (usuall
y
humans) and thus the name
s
upervised
.
The second i
s
k
nown a
s
unsupervise
d l
e a r n i n
g
i
n w
hi
c
h
eac
h d
a t u m c o n t a
i
n s t
h
e va
l
ues o
f
th
e attr
ib
utes
b
ut
d
oes not conta
i
n t
h
e
l
a
b
e
l
.Unsuperv
i
se
d l
e a r n
i
n g t r
i
e s t o

n
d
r e
g
u l a r i t i e s i n t h e u n l a b e l e d t r a i n i n
g
d a t a ( s u c h a s d i f f e r e n t c l u s t e r s u n d e r s o m
e
m
e t r
i
c s
p
ace),
i
n
f
er t
h
e c
l
ass
l
a
b
e
l
s an
d
somet
i
mes even t
h
e num
b
er o
f
c
l
asses.
T
h
e t
h i r
d
k i n
d
i
s
s
e mi - s u p e r v i s e d l e a r n i n g
i
n
w
h i c h s o me o f t h e d a t a i s l a b e l e
d
a n
d
s o me u n
l
a
b
e
l
e
d
.In t
hi
s
b
oo
k,
we are more
i
ntereste
d i
n t
h
e
l
atter.
Semi-supervised learnin
g
is motivated from the fact that in man
y
compute
r
v
i
s
i
on (an
d
ot
h
er rea
l
wor
ld
) pro
bl
ems,o
b
ta
i
n
i
ng un
l
a
b
e
l
e
d d
a t a
i
s re
l
at
i
ve
l
y
eas
y
(e.
g
.,collectin
g
ima
g
es of faces and non-faces),while labelin
g
is difficult,
expensive,and/or labor intensive.Thus,in many problems,it is very desirabl
e
t
o have learnin
g
al
g
orithms that are able to incorporate a lar
g
e number of un
-
labeled data with a small number of labeled data when learnin
g
classifiers.
Some o
f
t
h
e quest
i
ons ra
i
se
d i
n s e m
i
- s u p e r v
i
s e
d l
e a r n
i
n g o
f
c
l
a s s
i fi
e r s a r e
:
I
s
i
t
f
e a s
i b l
e t o u s e u n
l
a
b
e
l
e
d d
a t a
i
n t
h
e
l
earn
i
ng process
?
I
s t
h
e c
l
ass
ifi
cat
i
on per
f
ormance o
f
t
h
e
l
earne
d
c
l
ass
ifi
er guarantee
d
to
i
m
-
prove when addin
g
the unlabeled data to the labeled data
?
What is the
v
alue of unlabeled data?
Th
e goa
l
o
f
t
h
e
b
oo
k i
s to a
dd
ress a
ll
t
h
e c
h
a
ll
eng
i
ng quest
i
ons pose
d
so
f
ar.We believe that a detailed anal
y
sis of the wa
y
machine learnin
g
theor
y
ca
n
b
e app
li
e
d
t
h
roug
h
a
l
gor
i
t
h
ms to rea
l
-wor
ld
app
li
cat
i
ons
i
s very
i
mportant an
d
e
xtreme
l
y re
l
evant to t
h
e sc
i
ent
ifi
c commun
i
ty
.
Chapters 2,3,and 4 provide the theoretical answers to the questions pose
d
a
b
ove.C
h
apter 2
i
ntro
d
uces t
h
e
b
as
i
cs o
f
pro
b
a
bili
st
i
c c
l
ass
ifi
ers.We argu
e
that there are two main factors contributin
g
to the error of a classifier.Becaus
e
o
f
t
h
e
i
n
h
erent nature o
f
t
h
e
d
ata,t
h
ere
i
s an upper
li
m
i
t on t
h
e per
f
ormanc
e
o
f
any c
l
ass
ifi
er an
d
t
hi
s
i
s typ
i
ca
ll
y re
f
erre
d
to as Bayes opt
i
ma
l
error.W
e
start b
y
anal
y
zin
g
the relationship between the Ba
y
es optimal performance of
8
Intro
d
uctio
n
a classifier and the conditional entrop
y
of the data.The mismatch betwee
n
t
h
e true un
d
er
l
y
i
ng mo
d
e
l
(one t
h
at generate
d
t
h
e
d
ata) an
d
t
h
e mo
d
e
l
use
d
f
or classification contributes to the second factor of error.In this cha
p
ter,w
e
d
eve
l
op
b
oun
d
s on t
h
e c
l
ass
ifi
cat
i
on error un
d
er t
h
e
h
ypot
h
es
i
s test
i
ng
f
rame
-
w
ork when there is a mismatch in the distribution used with res
p
ect to the tru
e
di
str
ib
ut
i
on.Our
b
oun
d
s s
h
ow t
h
at t
h
e c
l
ass
ifi
cat
i
on error
i
s c
l
ose
l
y re
l
ate
d
t
o
the conditional entrop
y
of the distribution.The additional penalt
y
,because of
t
h
e m
i
smatc
h
e
d di
s t r
i b
u t
i
o n,
i
s a
f
unct
i
on o
f
t
h
e Ku
llb
ac
k
-Le
ibl
er
di
stance
b
e
-
t
w
een the true and the mismatched distribution.
O
nce these bounds are de
v
el
-
o
pe
d
,t
h
e next
l
og
i
ca
l
step
i
s to see
h
owo
f
ten t
h
e error cause
d b
y t
h
em
i
smatc
h
between distributions is lar
g
e.Our avera
g
e case anal
y
sis for the independenc
e
assumptions leads to results that justify the success of the conditional inde-
pen
d
ence assumpt
i
on (e.
g
.,
i
n na
i
ve Ba
y
es arc
hi
tecture).We s
h
owt
h
at
i
n most
c
ases,almost all distributions are very close to the distribution assuming condi
-
t
i
ona
l i
n
d
e p e n
d
e n c e.Mo r e
f
o r ma
l l y
,we s
h
ow t
h
at t
h
e num
b
er o
f di
s t r
i b
u t
i
o n s
f
o r w
h i
c
h
t
h
e a
ddi
t
i
ona
l
pena
l
ty term
i
s
l
arge goes
d
own exponent
i
a
ll
y
f
ast.
Rot
h
[Rot
h
,1998]
h
as s
h
own t
h
at t
h
e pro
b
a
bili
st
i
c c
l
ass
ifi
ers can
b
e a
l
ways
m
apped to linear classifiers and as such,one can anal
y
ze the performance of
these under the probably approximately correct (PAC) or Vapnik-Chervonenkis
(
VC)-
di
mens
i
on
f
ramewor
k
.T
hi
s v
i
ew
p
o
i
nt
i
s
i
m
p
ortant as
i
t a
ll
ows one t
o
directl
y
stud
y
the classification performance b
y
developin
g
the relations be
-
tween t
h
e per
f
ormance on t
h
e tra
i
n
i
ng
d
ata an
d
t
h
e expecte
d
per
f
ormance o
n
t
h
e
f
uture unseen
d
ata.In C
h
a
p
ter 3,we
b
u
ild
on t
h
ese resu
l
ts o
f
Rot
h
[Rot
h
,
1
998].It turns out that althou
g
h the existin
g
theor
y
ar
g
ues that one needs lar
ge
amounts o
f d
a t a t o
d
o t
h
e
l
earn
i
ng,we o
b
serve t
h
at
i
n pract
i
ce a goo
d
gen
-
e
ralization is achieved with a much small number of examples.The existin
g
V
C-
di
mens
i
on
b
ase
d b
o u n
d
s (
b
e
i
ng t
h
e worst case
b
oun
d
s) are too
l
oose an
d
w
e nee
d
to ma
k
e use o
f
propert
i
es o
f
t
h
e o
b
serve
d d
a t a
l
e a
d i
n g t o
d
a t a
d
e p e n
-
d e n t b o u n d s.O u r o b s e r v a t i o n,t h a t i n p r a c t i c e,c l a s s i fi c a t i o n i s a c h i e v e d w i t h
g o o
d
m a r g
i
n,m o t
i
v a t e s u s t o
d
eve
l
op
b
oun
d
s
b
ase
d
on marg
i
n
di
str
ib
ut
i
on.
We develop a classification version of the Random pro
j
ection theorem [John
-
son and Lindenstrauss,1984] and use it to develop data dependent bounds.Our
resu
l
ts s
h
ow t
h
at
i
n most pro
bl
ems o
f
pract
i
ca
l i
n t e r e s t,
d
a t a a c t u a
l l
y r e s
i d
e
i n
a l o w d i m e n s i o n a l s p a c e.C o m p a r i s o n w i t h e x i s t i n
g
b o u n d s o n r e a l d a t a s e t s
s
h
ows t
h
at our
b
oun