Real-Time Vision-Based Gesture Recognition Using Haar-like Features

builderanthologyΤεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 4 χρόνια και 20 μέρες)

73 εμφανίσεις

Real
-
Time Vision
-
Based Gesture
Recognition Using Haar
-
like Features

By: Qing Chen, Nicolas D. Georganas and Emil M. Petriu









IMTC 2007, Warsaw, Poland, May 1
-
3, 2007

2

Outline


1. Introduction


2. Two
-
level Approach


3. Posture Recognition


4. Gesture Recognition


5. Conclusions

3

1. Introduction


Human
-
Virtual

Environment

(VE)

interaction

requires

utilizing

different

modalities

(e
.
g
.

speech,

body

position,

hand

gestures,

haptic

response,

etc
.
)

and

integrating

them

together

for

a

more

immersive

user

experience
.


Hand

gestures

are

a

intuitive

yet

powerful

communication

modality

which

has

not

been

fully

explored

for

H
-
VE

interaction
.


The

latest

computer

vision,

image

processing

techniques

make

real
-
time

vision
-
based

hand

gesture

recognition

feasible

for

human
-
computer

interaction
.


Vision
-
based

hand

gesture

recognition

system

needs

to

meet

the

requirements

in

terms

of

real
-
time

performance,

robustness

and

accurate

recognition
.




4



Vision
-
based

gesture

recognition

techniques

can

be

divided

into

two

categories
:











1. Introduction (cont’d)


Appearance
-
based approaches:












-

Pros: simple hand models; efficient


implementation; real
-
time performance


easier to achieve.

-

Cons: limited capability to model 3D


hand gestures.

-

We choose this approach to achieve the


real
-
time performance.














3D hand model
-
based approaches:












-

Pros: potentiality to model more natural


hand gestures.


-

Cons: complex hand model; real
-
time


performance is difficult; user
-
dependent.









5

2. Two
-
level Approach


Definition

1

(Posture/Pose)

A

posture

or

pose

is

defined

solely

by

the

(static)

hand

configurations

and

hand

locations
.


Definition

2

(Gesture)

A

gesture

is

a

series

of

postures

over

a

time

span

connected

by

motions

(global

hand

motion

and

local

finger

motion)
.












6

2. Two
-
level Approach (cont’d)


With

the

hierarchical

nature

of

the

definition,

it

is

natural

to

decouple

the

gesture

classification

problem

into

two

levels
:


Lower
-
level
:

recognition

of

primitives

(postures)
;


Solution
:

Viola

and

Jones

algorithm


Higher
-
level
:

recognition

of

structure

(gesture)
;


Solution
:

Grammar
-
based

analysis










Posture level

Viola & Jones Algorithm

Gesture level

Grammar
-
based analysis

7

3. Posture Recognition


Viola and Jones Algorithm (2001):


A statistical approach originally for the task of human face detection
and tracking.


15 times faster than any previous face detection approaches while
achieving equivalent accuracy to the best published results.


Employed 3 techniques :


Haar
-
like features


Integral image


AdaBoosting Learning algorithm


Issues for hand postures:


Applicability


Classification besides detection


Selection of posture sets


Calibration










8

3. Posture Recognition (cont’d)


Haar
-
like

features
:








The

value

of

a

Haar
-
like

feature
:



f(x)=Sum
black rectangle

(pixel gray level)


Sum
white rectangle

(pixel gray level)




Compared

with

raw

pixels,

Haar
-
like

features

can

reduce/increase

the

in
-
class/out
-
of
-
class

variability,

and

thus

making

classification

easier
.



Figure 1: The set of basic Haar
-
like features.

Figure 2: The set of extended Haar
-
like features.

9


The

rectangle

Haar
-
like

features

can

be

computed

rapidly

using

“integral

image”
.



Integral

image

at

location

of

x
,

y

contains

the

sum

of

the

pixel

values

above

and

left

of

x
,

y
,

inclusive
:




The

sum

of

pixel

values

within

“D”

can

be

computed

by

:

P
1

+P
4
-

P
2

-
P
3

A

B

C

D

P
2

P
3

P
4

P
1

P (x, y)

3. Posture Recognition (cont’d)

10


To

detect

the

hand,

the

image

is

scanned

by

a

sub
-
window

containing

a

Haar
-
like

feature
.








Based on each Haar
-
like feature
f
j
, a weak classifier
h
j
(x)
is defined as:






where
x

is a sub
-
window, and
θ

is a threshold.
p
j
indicating the direction
of the inequality sign.

3. Posture Recognition (cont’d)

11


In machine vision:



HARD

to

find

a

single

accurate

classification

rule
;


EASY

to

find

rules

with

classification

accuracy

slightly

better

than

50
%

(weak

classifiers)

.


AdaBoosting

(Adaptive

Boosting)

is

an

iterative

algorithm

to

improve

the

accuracy

stage

by

stage

based

on

a

series

of

weak

classifiers
.


Adaptive
:

later

classifiers

are

tuned

up

in

favor

of

the

samples

misclassified

by

previous

classifiers
.






3. Posture Recognition (cont’d)

12



Adaboost

starts

with

a

uniform

distribution

of

“weights”

over

training

examples
.

The

weights

tell

the

learning

algorithm

the

importance

of

the

example
.




Obtain

a

weak

classifier

from

the

weak

learning

algorithm,

h
j
(x)
.




Increase

the

weights

on

the

training

examples

that

were

misclassified
.




(Repeat)



At

the

end,

carefully

make

a

linear

combination

of

the

weak

classifiers

obtained

at

all

iterations
.

3. Posture Recognition (cont’d)

13


A

series

of

classifiers

are

applied

to

every

sub
-
window
.



The

first

classifier
:


Eliminates

a

large

number

of

negative

sub
-
windows
;



pass

almost

all

positive

sub
-
windows

(high

false

positive

rate)

with

very

little

processing
.



Subsequent

layers

eliminate

additional

negatives

sub
-
windows

(passed

by

the

first

classifier)

but

require

more

computation
.



After

several

stages

of

processing

the

number

of

negative

sub
-
windows

have

been

reduced

radically
.


3. Posture Recognition (cont’d)

14


Four

hand

postures

have

been

tested

with

Viola

&

Jones

algorithm
:











Input

device
:

A

low

cost

Logitech

QuickCam

web
-
camera

with

a

resolution

of

320

×

240

up

at

15

frames
-
per
-
second
.









3. Posture Recognition (cont’d)

15


Training

samples

collection
:


Negative

samples
:

images

that

must

not

contain

object

representations
.

We

collected

500

random

images

as

negative

samples
.




Positive

samples
:

hand

posture

images

that

are

collected

from

humans

hand,

or

generated

with

a

3
D

hand

model
.

For

each

posture,

we

collected

around

450

positive

samples
.

As

the

initial

test,

we

use

the

white

wall

as

the

background
.

3. Posture Recognition (cont’d)

16


After

the

training

process

based

on

the

AdaBoosting

learning

algorithm,

we

get

a

cascade

classifier

for

each

hand

posture

when

the

required

accuracy

is

achieved
:


“Two
-
finger” posture: 15 stage cascade classifier;


“Palm” posture: 10 stage cascade classifier;


“Fist” posture: 15 stage cascade classifier;


“Little finger” posture: 14 stage cascade classifier.


The

performance

of

trained

classifiers

for

100

testing

images
:





3. Posture Recognition (cont’d)

17


To

recognize

these

different

hand

postures,

a

parallel

structure

that

includes

all

of

the

cascade

classifiers

is

implemented
:





3. Posture Recognition (cont’d)

18


The real
-
time performance of the posture recognition:





3. Posture Recognition (cont’d)

19


As

a

gesture

is

a

series

of

postures,

a

grammar
-
based

syntactic

analysis

is

suitable

to

describe

the

composite

gestures

based

on

postures,

and

thus

enables

the

system

to

recognize

the

gestures

based

on

their

representations
.


For

pattern

recognition,

a

grammar

G=

(N,

T,

P,

S)


A

finite

set

N

of

non
-
terminal

symbols
;


A

finite

set

T

of

terminal

symbols

that

is

disjoint

from

N
;


A

finite

set

P

of

production

rules
;


A

distinguished

symbol

S



N

that

is

the

start

symbol
.



Issues

in

modeling

the

structure

of

hand

gestures
:


Choice

of

basic

primitives


Choice

of

appropriate

grammar

type

(context

free,

stochastic

context

free,

regular,

HMM)







4. Gesture Recognition

20


The

parallel

cascade

structure

based

Haar
-
like

features

and

the

AdaBoosting

learning

algorithm

can

achieve

satisfactory

real
-
time

hand

posture

classification

results
;



The

experiment

result

shows

the

Viola

and

Jones

algorithm

has

very

robust

performance

against

scale

invariance

and

a

certain

degree

of

robustness

against

in
-
plane

rotation

(
±
15
˚)

and

out
-
of
-
plane

rotation
;


Viola

and

Jones

algorithm

also

shows

good

performance

for

different

illumination

conditions,

but

poor

performance

for

different

backgrounds
;


A

two
-
level

architecture

that

can

capture

the

hierarchical

nature

of

gesture

classification

is

proposed
:

the

lower

level

focused

on

the

posture

recognition

while

the

higher

level

focused

on

the

description

of

composite

gestures

using

grammar
-
based

syntactic

analysis
.







5. Conclusions

21

Dziekuje