Combined Gesture-Speech Analysis and Synthesis

builderanthologyΤεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 4 χρόνια και 20 μέρες)

79 εμφανίσεις

The SIMILAR NoE Summer
Workshop 2005

Combined Gesture
-
Speech
Analysis and Synthesis

M. Emre Sargın, Engin Erzin, Yücel Yemez, A. Murat Tekalp

{msargin,eerzin,yyemez,mtekalp}@ku.edu.tr


Multimedia Vision and Graphics Laboratory, Koc University

The SIMILAR NoE Summer
Workshop 2005

Outline


Project Objective


Technical Description


Preparation of Gesture
-
Speech Database


Detection of Gesture Elements


Gesture
-
Speech Correlation Analysis


Synthesis of Gestures Accompanying Speech


Resources


Work Plan


Team Members

The SIMILAR NoE Summer
Workshop 2005

Project Objective


The production of speech and gesture is interactive
throughout the entire communication process.


Computer
-
Human Interaction

systems should be
interactive such that, for an edutainment application,
animated person’s speech should be aided and
complemented by it’s gestures.


Two main
goals

of this project:


Analysis and modeling of
correlation

between
speech

and
gestures
.


Synthesis of correlated natural gestures accompanying
speech.

The SIMILAR NoE Summer
Workshop 2005

Technical Description


Preparation of Gesture
-
Speech Database


Detection of Gesture Elements


Gesture
-
Speech Correlation Analysis


Synthesis of Gestures Accompanying Speech

The SIMILAR NoE Summer
Workshop 2005

Preparation of Database


Gestures of a specific person will be investigated.


The video database related with that specific person
should include the gestures that he/she frequently uses.


Locations of head, arm, elbows, etc. should easily be
detectable and traceable.

The SIMILAR NoE Summer
Workshop 2005

Detection of Gesture Elements


In this project, we consider
arm

and
head

gestures.


Main tasks included in detection of gesture elements:


Tracking of head region.


Tracking of hand and possibly shoulder and elbow.


Extraction of gesture features.


Recognition and labeling of gestures.


The SIMILAR NoE Summer
Workshop 2005

Head Region Tracking


To extract motion information coming from head one
should first extract head region.


Exhaustive search of head in each frame is a possible
solution. However this is computationally inefficient.


Tracking is efficient by the means of computational
complexity.


Motion information calculated for tracking will be used
for head gesture features.

The SIMILAR NoE Summer
Workshop 2005

Tracking Methodology


Exhaustive search for head region in initial frame


Haar
-
Based Face Detection


Skin Color information


Extraction of motion information from head region


Optical flow vectors


Fitting global motion parameters optical flow vectors


Warp search window according to motion information.


Search for head region in the search window.

The SIMILAR NoE Summer
Workshop 2005

Head Tracking Results

The SIMILAR NoE Summer
Workshop 2005

Hand Tracking Methodology


Hand region will be extracted using skin color
information.


Robust State
-
Space Tracking will be applied.


Observations are position of hand.


States are position, speed and acceleration of hand.


Kalman Filtering removes unwanted noise from features


In Regular Kalman Filter, parameters are fixed.


In
Robust Kalman Filter

parameters are re
-
adjusted for
each iteration to minimize MSE and overcome the effects of
abrupt changes in motion of hand.

The SIMILAR NoE Summer
Workshop 2005

Extraction of Gesture Features


Head Gesture Features:

Global Motion Parameters
calculated within head region will be used.


Hand Gesture Features:
Hand center of mass position
and calculated velocity will form hand gesture features.

The SIMILAR NoE Summer
Workshop 2005

Gesture
-
Speech Correlation Analysis


Recognized gestures are labeled w.r.t. time.


Head Gestures: Down, Up, Left, Right, Left
-
Right, …


Arm Gestures: Abduction, Adduction, Extension,




Recognized speech patterns are labeled w.r.t. time.


Semantic Info: Approval, Refusal phrases, etc.


Prosodic Info: Intonational phrases, ToBI transcriptions, etc.


Correlation Analysis via examining


Co
-
occurrence Matrix


Input/Output Hidden Markov Models


The SIMILAR NoE Summer
Workshop 2005

Co
-
occurrence Matrix


Estimation of joint probability distribution function, f(g,s)


For each time sample give a vote to related gesture
-
speech label pair.


For a specific speech element the most correlated
gesture feature will be:


g
i
=argmax ( f (g
x
,s
i
) )


Relatively easy to compute.


Gives an intuition about what we are examining.



x

The SIMILAR NoE Summer
Workshop 2005

Input/Output Hidden Markov
Models


IOHMM is a graphical model which allows the mapping of
input sequences into output sequences.


It is used in three tasks of sequence processing:


Prediction


Regression


Classification


The model is trained to maximize the conditional
distribution of an output sequence {y
1
,…,y
t
} given an
input sequence {x
1
,…,x
t
}.


In our project:


Input sequence will be speech labels.


Output sequence will be gesture labels.

The SIMILAR NoE Summer
Workshop 2005

Synthesis of Gestures Accompanying
Speech


Based on the methodology used in correlation analysis
given a speech signal:


Features will be extracted.


Most probable speech label will be designated to speech
patterns.


Gesture pattern that is most correlated with speech pattern
will be used to animate a stick model of a person.

The SIMILAR NoE Summer
Workshop 2005

Resources


Database Preparation and Labeling


VirtualDub


Anvil


Paraat


Image Processing and Feature Extraction:


Matlab Image Processing Toolbox


OpenCV Image Processing Library


Gesture
-
Speech Correlation Analysis


HTK HMM Toolbox


Torch Machine Learning Library

The SIMILAR NoE Summer
Workshop 2005

Work Plan


Timeline of the project:





Schedule of the lectures:

The SIMILAR NoE Summer
Workshop 2005

Team Members


Ferda Ofli


Koc University


Image, Video Processing and Feature Extraction


Yelena Yasinnik


Massachusetts Institute of Technology


Audio
-
Visual Correlation Analysis


Oya Aran


Bogazici University


Gesture Based Human
-
Computer Interaction Systems

The SIMILAR NoE Summer
Workshop 2005

Team Members


Alexey Anatolievich Karpov


Saint
-
Petersburg Institute for Informatics and Automation


Speech Based Human
-
Computer Interaction Systems


Stephen Wilson


University College Dublin


Audio
-
Visual Gesture Annotation


Alexander Refsum Jensenius


Department of Music, Oslo University


G
esture
A
nalysis

The SIMILAR NoE Summer
Workshop 2005

References


Jie Yao and Jeremy R. Cooperstock, “Arm Gesture Detection in a Classroom Environment,”
Proc. WACV’02 pp. 153
-
157, 2002.


Y. Azoz, L. Devi. R. Sharma, “Tracking Hand Dynamics in Unconstrained Environments,” Proc.
Int. Conference on Automatic Face and Gesture Recognition’98 pp. 274
-
279, 1998.


S. Malassiotis, N. Aifanti, M.G. Strintzis, “A Gesture Recognition System Using 3D Data,”
Proc. Int. Symposium on 3D Data Processing Visualization and Transmission’02 pp. 190
-
193,2002.


J
-
M. Chung, N. Ohnishi, “
Cue Circles: Image Feature for Measuring 3
-
D Motion of Articulated
Objects Using Sequential Image Pair,
” Proc. Int. Conference on Automatic Face and Gesture
Recognition’98 pp. 474
-
479, 1998.


S. Kettebekov, M. Yeasin, R. Sharma, “Prosody based co
-
analysis for continuous recognition
of coverbal gestures,”Proc. ICMI’02 pp.161
-
166, 2002.


F. Quek, D. McNeill, R. Ansari, X
-
F. Ma, R. Bryll, S. Duncan, K.E. McCullough “Gesture cues
for conversational interaction in monocular video,” Proc. Int. Workshop on Recognition,
Analysis, and Tracking of Faces and Gestures in Real
-
Time Systems’99 pp. 119
-
126, 1999.


For detailed information visit:
http://htk.eng.cam.ac.uk


Rabiner, L.; Juang, B., “An introduction to hidden Markov models” ASSP Magazine, IEEE,
Vol.3, Iss.1, pp. 4
-

16, Jan 1986


Jae
-
Moon Chung; Ohnishi, N., “Cue circles: image feature for measuring 3
-
D motion of
articulated objects using sequential image pair” Automatic Face and Gesture Recognition,
1998. Proceedings. Third IEEE International Conference on, Vol., Iss., pp. 474
-
479, 14
-
16
Apr 1998


A.

Just
, O.
Bernier,
S.
Marcel
., “
Recognition of isolated complex mono
-

and bi
-
manual 3D
hand gestures

Proc
.

6.

I
C
A
FGR, 2004