Distinguishing gestures from non-gesture human movements in uncontrolled environments

embarrassedlopsidedΤεχνίτη Νοημοσύνη και Ρομποτική

14 Νοε 2013 (πριν από 3 χρόνια και 11 μήνες)

119 εμφανίσεις

1


Distinguishing gestures from non
-
gesture human movements in
uncontrolled environments


Larry Davis

Yaser Yacoob

University of Maryland


Behzad Kamgar
-
Parsi

Naval Research Laboratory



Abstract
: We propose to develop an approach to gesture recognition in
u
ncontrolled environments such that non
-
gesture movements will not be
interpreted as legitimate gestures. Our research will address the problems of
acquisition of 3D gestural models from multiperspective imagery of people
performing natural gestures, compi
lation of these 3D models into a collection of
linked view
-
based models that can be used to efficiently recognize gestures in
real time, and integration of these view based models into a recognition system
that can tolerate large amounts of visual dynamic
clutter (i.e., reject most of the
motions it sees as non
-
gestural).



1. Background
: The Navy of the future will rely on automated systems to
perform a variety of tedious, repetitive, and hazardous tasks. For example, the
future generation of carriers (CV
X) will reduce the number of sailors from 5000 to
3000. These reductions in personnel will require the use of robots to do certain
tasks currently performed by humans.


Interest in automatic visual gesture recognition has increased significantly in
recent
years. A primary objective is to enable humans to interact in a natural way
with robots and computers. While significant progress has been made in visual
gesture recognition, current approaches suffer from the following shortcomings:


1.

They typically requir
e exaggerated body movements to depict a gesture,
rather than natural movements.

2.

People must be stationary while gesturing
-

so for example, if the gesture is
being performed using the movements of the hands and arms, then the
person could not be walking o
r turning while performing the gesture. Another
way to express this shortcoming is that there can be no relative rigid body
motion between the person being viewed and the computer vision system
during the execution of the gesture.

3.

Existing systems employ a

closed world model, interpreting ALL human body
movements as one of their known gestures. However, in natural
environments the gesture recognition system must be able to reject most
human movements as not representing gestures. These irrelavant body
2

mov
ements form a set of dynamic visual clutter that the recognition system
must reject.

4.

There has been little experimentation done to determine whether existing
gesture recognition systems can generalize their recognition models across
many users. It is typi
cally the case that they are both trained and tested using
the same gesturer, or are tested on only a small number of gesturers.

5.

Finally, existing systems have been employed only in indoor environments,
with cooperative lighting and large numbers of pixels

across the subject being
viewed. For Navy applications these systems must be taken outdoors and
should operate over a broad range of apparent sizes of the gesturer.


The research we propose builds upon work done at the University of Maryland
and the AI C
enter at NRL to create a real
-
time gesture recognition system that
can recognize arm and body gestures, and allow the human gesturer freedom of
movement while performing a gesture. Work at Maryland (described in Section
2) emphasizes representation and r
ecognition of gestures (using recognition
algorithms with rejection developed at NRL), and work at NRL (described in
Section 3) emphasizes theory and application of pattern recognition algorithms
with rejection classes (using the gesture models developed a
t Maryland for
evaluation on real applications).


2. Recognizing gestures
: During the past several years, investigators at
Maryland have conducted an extensive program of research on the
representation and recognition of human activity. The approach devel
oped
represents human movements by certain types of parametric motion trajectories,
and recognizes gestures and movements through the application of dynamic time
warping (or Hidden Markov Models) to those trajectories [7,8,9].


A model of a gesture is rec
overed by watching people perform the gesture from a
given viewpoint. The sequence is analyzed by a motion estimation algorithm that
models the person as a collection of linked planes, and recovers parametric
motion trajectories of these planes constraine
d to agree at the joints between the
links. These parametric motion models are then summarized via principal
component analysis. A model is generated for each relevant viewpoint and for
each gesture of interest.


Our earlier work on recognition then assu
med that a fixed camera watches a
person performing one of these gestures from an unknown viewing direction. The
best gesture was chosen as the one that maximizes a tracking criterion
-

i.e., the
one that allows us to best track the motion of the person du
ring the execution of
the gesture. In more recent work, we allow the camera to move while the
gesture is being performed (and recognize the gesture through a decomposition
of the observed motion field into a general camera motion and one of the learned
ge
sture models), although we assume that the generic viewpoint from which the
gesture is observed is not changed by the camera motion (an unnatural
assumption). Our goals for the proposed research project are to:

3




Acquire 3D models of gestures by watching p
eople perform prototypical
gestures from many calibrated viewpoints simultaneously. The facilities
available at the University’s Keck Laboratory for the Analysis of Visual
Movement (see below) will support this activity.



Compile these 3D models into a sma
ll collection of viewpoint specific models.
This will involve identifying subsets of viewpoints that can be used to
smoothly interpolate (predict) what the gesture would look like from other,
arbitrary, viewpoints.



Develop recognition algorithms that can
recognize a gesture even if the
viewpoint from which the gesture is being observed changes during the
execution of the gesture. This would, for example, allow the gesturer to turn
while performing an arm gesture, which might be a natural motion if the
ges
turer is simultaneously trying to communicate a control command while
surveying his surroundings.



Recognize gestures as "rare" events during natural body movements. This
will involve both being able to recognize the beginning and end of a gesture,
as well

as to reject the majority of the gesturer's movements as non
-
communicative. This will be done using technology developed at NRL and
described in Section 3.



Develop real time versions of the gesture recognition algorithms so that they
can be applied and e
valuated on large databases of videos.


Our research will be conducted in the Keck Laboratory for the Analysis of Visual
Movement at the University of Maryland. This Laboratory consists of an array of
64 digital progressive scan cameras that can simultane
ously acquire videos of
people performing activities from many perspectives. This Laboratory gives us
the ability to collect well controlled images both for training and for testing our
gesture recognition algorithms.



3. Recognition with rejection:
In v
isual tasks, determining that an observation
is not an item of interest (and thus needs to be ignored or rejected) has always
been a difficult problem. Most popular object/pattern recognition classifiers, e.g.,
nearest neighbor classifiers, commonly used n
eural networks, statistical
classifiers, etc., have no rejection mechanisms. While such approaches can be
modified in ad hoc ways to reject patterns, it would be preferable to infer the
boundaries of the classes of interest in a more principled way. There
is ample
evidence in the psychophysics literature that people are far more capable in
deciding a correct match (for patterns/objects meaningful to humans) than
techniques based on metric similarity measures, such as Euclidean [2]. Our
recent work indicates

that it may be possible to employ human similarity
measures in a recognition system [3].


Previously, under ONR sponsorship, we proposed an approach that can develop
rejection capabilities for neural networks [4]. We isolate an acceptable class from
4

the r
est of the pattern space by creating barriers (made up of false look
-
alikes)
along its boundaries. Since the acceptable class C is closely surrounded by
patterns in Not
-
C, to obtain the correct decision boundary the network should
ideally be trained on a v
ery large number of acceptable and unacceptable
instances of C; most of the exemplars in the training set should be borderline, i.e.
they should project near the decision boundary, since exemplars projecting in the
middle of a class do not contribute to th
e formation of class boundary and thus
do not carry useful information. Such training sets, however, are generally not
available. What is often available are some exemplars belonging to class C. The
challenge is then to expand a small set of exemplars to o
ne which (a) is
sufficiently large, and (b) is composed of borderline patterns, in particular,
borderline counter exemplars.


Recently, the idea of including negative exemplars has also been suggested by
Rowley, Baluja, and Kanade [6]. The technique in [6]

is to collect negative
examples, which can be tedious and time consuming, whereas, in our approach,
an arbitrarily large number of them are automatically generated and automatically
labeled. The accuracy of the network can thus be increased to the desire
d
degree.


We propose to explore the possibility of extending our object recognition
technique to the problems of recognizing specific body gestures, and
distinguishing gestures from non
-
gestures. Here, a major issue will be the
development of an appropria
te distortion operator to generate realistic variations
of a given gesture, as well as its false look
-
alikes. Previously we have developed
and utilized two very different distortion operators: a random deformation
operator for the recognition of (distant)
aircraft [3], and an image morphing
operator for distinguishing one face from other faces [5]. We have achieved
excellent results in aircraft recognition, far better than one would get with a
Euclidean classifier. Euclidean similarity measure with the opti
mum threshold
(which is generally unavailable) results in 10% false accept and 3% true reject
[3]. These errors are generally not tolerable for an autonomous system.


Furthermore, under an ONR grant, we are currently developing such operators
for different

ATR domains. Our experience should be useful in developing
appropriate operator(s) for gestures. Once such operator(s) are developed, our
approach would allow us to isolate each legitimate gesture by creating a large
number of positive and negative exempl
ars projecting specifically on the two
sides of the classification boundary. We will develop a system for recognizing a
set of well
-
defined gestures, such as those in the Army and Marine Manual, and
rejecting other movements as meaningless, using models fo
r representing and
recognizing those gestures developed at the University of Maryland (Section 2).
An outcome of the proposed research is expected to be a fundamental
understanding of limitations of gestural interface with robots.



5

4. Conclusion
: This wor
k will make it possible for humans to work and interact
with robots in uncontrolled environments, particularly in places where voice
communication with the robot is not possible. Examples of these uncontrolled
environments include the deck of a carrier whe
re excessive noise makes voice
communication impossible, or in covert missions where maintaining silence is
crucial.



Proposed Budget:


We envision a three year effort with a funding of $350K per year:


a)

UMD ($150K)



Larry Davis (1 month)



Yaser Yacoob (50%)



Graduate Research Assistant


b) NRL ($175K)



Behzad Kamgar
-
Parsi



Programming/technical support



References:


1.

J. Triesch and C. von der Marlsburg, "A gesture interface for human
-
robot
interaction," Proc. Int. Conf. Automatic Face and Gesture Recognition, N
ara,
Japan, April 1998, pp 546
-
551.

2.

W.R. Uttal, T. Baruch, and L. Allen, "The effects of combinations of image
degradations in a discrimination task," Perception & Psychophysics, Vol. 57,
No. 5, pp. 668
-
681, 1995.

3.

B. Kamgar
-
Parsi, B. Kamgar
-
Parsi, and A.K.

Jain, "Automatic aircraft
recognition: Toward using human similarity measure in a recognition system,"
CVPR '99, accepted for presentation, June 1999.

4.

B. Kamgar
-
Parsi, and B. Kamgar
-
Parsi. "Rejection with multilayer neural
networks: automatic generation o
f the training set," Proc. World Congress on
Neural Networks (WCNN'95), Washington, DC, Vol. 2, pp. 174
-
177, 1995.

5.

B. Kamgar
-
Parsi, C. Chandler, B. Kamgar
-
Parsi, J.E. Dayhoff, A.K., Jain,
"Face recognition for access control," in progress, 1999.

6.


H. Rowley
, S. Baluja, and T. Kanade, "Neural network
-
based face detection".
IEEE Trans. PAMI, Vol. 20, pp. 23
-
38, 1998.

7.

Y. Yacoob and L.S. Davis, Temporal Multi
-
scale Models for Flow and
Acceleration, Int. Journal on Computer Vision,to appear.

8.

Y. Yacoob and M.J. B
lack, Parameterized Modeling and Recognition of
Activities, Journal of Computer Vision and Image Understanding, 73(2), 1999,
232
-
247.

6

9.

M.J. Black and Y. Yacoob, Recognizing facial expressions in image
sequences using local parameterized models of image mot
ion, Int. Journal on
Computer Vision, 25(1), 1997, 23
-
48.