SOMM: Self Organizing Markov Map for Gesture
Recognition
Pattern Recognition
2010 Spring
Seung

Hyun Lee
G.
Caridakis
et al.,
Pattern Recognition
,
Vol. 31, pp. 52

59, 2010.
S FT COMPUTING @ YONSEI UNIV . KOREA
16
Contents
•
Introduction
•
Related Work
–
Hidden Markov Models
–
Other Method
•
Proposed Method
•
Experiments
•
Conclusion
1
S FT COMPUTING @ YONSEI UNIV . KOREA
16
Introduction
•
Gesture
–
A motion of the body that conveys information
•
In this paper
–
Focus on hand gestures
2
S FT COMPUTING @ YONSEI UNIV . KOREA
16
Introduction
•
Taxonomy of gesture(McNeill, 1992)
–
Gesticulation
–
Speech

linked
–
Pantomime
–
Emblems
–
Sign Languages
•
Other (Kendon,1992) (
Quek
, 1994)
3
S FT COMPUTING @ YONSEI UNIV . KOREA
16
Introduction
•
Taxonomy by functionality
4
Gestures
Definition
Symbolic gestures
gestures that, within each culture, have come to have a single meani
ng.
Deictic gestures
types of gestures most generally seen in HCI and are the gestures of
pointing to entities or direction.
Iconic gestures
gestures used to convey information about the size, spatial relations,
actions, shape or orientation of the object of discourse display.
Pantominic gestures
gestures typically used to mimic an action, object or concept.
S FT COMPUTING @ YONSEI UNIV . KOREA
16
Related Work
•
Cogan(2006)
–
Discrete HMM which fuse hand shape and position
•
Hossain
(2005)
–
Implicit/Explicit Temporal Information Encoded HMM
–
Discriminated attention and non

attention gestures
•
Mantyla
(2000)
–
On mobile devices
–
Utilized SOM and HMM method
•
Starner
(1998)
–
HMM based American Sign Language(ASL) recognition
–
Sentence level recognition is possible
5
Hidden Markov Model
S FT COMPUTING @ YONSEI UNIV . KOREA
16
Related Work
•
Black and Jepson(1998)
–
CONDitional
dENSity
propagATION
(CONDENSATION)
algorothm
•
Wong and
Ciipolla
(2006)
–
Sparse Bayesian classifier
•
Hong et al.(2000)
–
Finite State Machines(FSM)
•
Su(2000)
–
Fuzzy logic and rule

based approaches and hyper

rectangular
composite Neural network(HRCNNs)
•
Juang
and Ku(2005)
–
Fuzzified
Takagi

Sugeno

Kang(TSK) type recurrent network
•
Yang et al.(2002)
–
Time Delay Neural network
•
Huang and Huang(1998)
–
3D Hopfield Neural Network
6
Other method
S FT COMPUTING @ YONSEI UNIV . KOREA
16
Proposed Method
•
Modules
–
Image processing
: detection an tracking of hands
–
SOM
: quantization of
hand location and direction
–
HMM
: transition probability matrix
7
Overview
S FT COMPUTING @ YONSEI UNIV . KOREA
16
Proposed Method
•
Video based method
–
Creation of moving skin masks (Skin color area)
–
Tracking the
centroid
of the skin masks
–
Prior knowledge is required
•
It should indicate different body parts (Left, right hand, and head)
•
Environment
–
PC platform
–
OpenCV
8
Feature Extraction
S FT COMPUTING @ YONSEI UNIV . KOREA
16
Proposed Method
•
Dataset
•
Gesture instances
•
Gesture instances
9
S FT COMPUTING @ YONSEI UNIV . KOREA
16
Proposed Method
•
cf
) SOM
(1) continuous input space
(2) discrete output space in the form of lattice
(3) time

varying neighborhood function defined around winning
neuron
(4) decreasing learning rate parameter
10
Position Model
S FT COMPUTING @ YONSEI UNIV . KOREA
16
Proposed Method
•
Some based representation of hand position
11
Position Model
S FT COMPUTING @ YONSEI UNIV . KOREA
16
Proposed Method
•
Additional information: Moving direction
12
Direction Model
S FT COMPUTING @ YONSEI UNIV . KOREA
16
Proposed Method
•
Based on
Levenshtein
distance(edit distance)
–
Measuring the amount of difference between two sequences
•
Generalized median of data set
Mj
•
Mean
Levenstein
distance between members
13
Generalized Median
S FT COMPUTING @ YONSEI UNIV . KOREA
16
Proposed Method
•
Position
–
Probability
–
Calculation of
S
som
•
First state: initial probability
•
From second state: transition probability
–
Unit
u
14
Gesture Decoding
S FT COMPUTING @ YONSEI UNIV . KOREA
16
Proposed Method
•
Direction
–
Probability
–
Calculation of
S
of
–
Unit
u
15
Gesture Decoding
S FT COMPUTING @ YONSEI UNIV . KOREA
16
Proposed Method
•
Similarity measurement
–
Problem
•
Shorter gesture instances tend to gain an advantage by having less transitions and
thus less probabilities multiplication
–
Measurement
16
Gesture Decoding
S FT COMPUTING @ YONSEI UNIV . KOREA
16
Proposed Method
•
Error definition for function
f
•
SOM based approach
–
If data containing small error is mapped to the same node of SOM
乯N灲潢汥l
–
Otherwise
䍯湳e煵e湴nyⰠ扥ba畳u映f
e楧桢潲楮o 牥污瑩潮映甬 e牲潲猠湯琠
propagated to the next steps of the recognition process
17
Error Propagation
S FT COMPUTING @ YONSEI UNIV . KOREA
16
Experiment
•
30 gestures 10 repetitions each
18
Data Set
S FT COMPUTING @ YONSEI UNIV . KOREA
16
Experiment
•
SOM clustering
–
Blue: close to input vector
–
Red: not close
•
Recognition accuracy
–
Test with training data: 100%
–
10

fold cross validation: 93%
0.843 ms for decoding a gesture
–
Only HMM

based classifier: 86.36%
19
Result
S FT COMPUTING @ YONSEI UNIV . KOREA
16
Conclusion
•
Key features
–
SOM and HMM based automatic recognition architecture
–
ROI
•
Relative hand position
•
Moving direction
•
Similarity of pattern
•
Application
–
Sign language
–
Gaming environment
20
Thank you
Comments 0
Log in to post a comment