Statistical Methods for Human Behaviour Recognition

tripastroturfAI and Robotics

Nov 7, 2013 (4 years and 8 months ago)


Statistical Methods for Human Behaviour Recognition

Geoff West & Svetha Venkatesh

This paper will describe a number of projects that are broadly defined as statistical methods for
human behaviour recognition. This has been a major research effort and h
as applications in
human gait analysis, surveilliance, video indexing and smart homes. The statistical methods
are those based on hidden Markov models (HMMs) as, in all cases, we are dealing with
sequences of patterns and signals that have variability in t
erms of duration, and in terms of the
features used. This is because we are dealing with humans and there is inherent variability in

Much of the work is based on a laboratory environment that has a number of ceiling mounted
cameras attached to n
etworked PCs so that many streams of video can be captured
simultaneously. Use is made of background subtraction (Stauffer et al), bounding box and blob
tracking, and bounding box and blob statistics and features to describe the motion of people

the environment. Cameras are calibrated so that positions on the floor are mapped
between the various cameras giving a reasonably complete idea of movement. Much use is
made of Kalman filtering to get good measurements and to deal with occlusion, both beh
objects and where people cross each other.

Smart House

Scene Labelling

In this research, which is part of the smart house project, human activity is being used to build
up a description of an indoor scene i.e. is concerned with scene understanding
and object
recognition. The traditional methods investigated have problems because of the variability of
object shape and the problem of deciding what a particular object is e.g. what is a chair? This
approach uses human interaction and behaviour and is in
fluenced by the work of Stark et al in
the early 1990s in which function of different objects was investigated, mainly from CAD
descriptions. Consider the following scenario. Assume that all pixels in an image are classified
as background i.e. not labelled

as objects. A person walks around a room and occasionally sits
on a chair. By using a HMM to track the height of the bounding box, walking/standing, sitting
down, seated and standing up can be ident
ified as human activities. Then,
each pixel in the
representing the scene can be updated depending on the activity. Pixels near the bottom
of the bounding box of a person detected to be walking or standing can be labelled as floor and
the more frames that this occurs for, the higher the confidence in the l
abel that they are floor
pixels. If a person is detected as sitting down, then pixels inside the bounding box are
reinforced to be chair pixels. Again

the more frames that this occurs for the higher the
confidence that the pixels are chair pixels. Given a

large time and significant human movement
in the scene, eventually a picture is built up of the different objects in the scene. The advantage
of this method is that it is the human activity that is determining the object class. If people
repeatedly sit on

a coffee table, the more likelihood this is regarded as a chair rather than a
table which is as expected. Current work is concerned with more fine scale human activity
determination such as eating and carrying, as well as other coarse scale activity such
as lying
down. The fine scale activity will be used to identify such objects as tables, cupboards etc.

Smart House

Describing Normal Behaviour

An important aspect of the smart house project is the need to describe the activities that occur
such as coo
king and watching television. Analysis of many typical activities reveals a rich
hierarchical structure with simple tasks at the bottom of the hierarchy and more complex tasks
made up of sequences of lower level tasks at the higher levels. For example


a TV
dinner could consist

of various actions in the kitchen: going to the fridge, going to the oven etc.
as well as tur
ing on the TV and sitting on the sofa. Hierarchical HMMs (HHMMs) have been
investigated to learn and describe such hierarchies o
f activity. Different approaches have been

including allowing the HHMM to discover the low level behaviours (number of layers and
number of hidden stat
es at the lower levels defined),
and learning the lower levels first,
followed by fixing the lower

levels and learning the higher levels. Modification of the standard
techniques for HMM training are used. Current research is concerned with adding duration
models to the HHMM structure to enable the identification of abnormal behaviour given that
behaviour has been used for training.


Detecting Normal Behaviour

In the area of surveillance, there is a need to describe various normal activities that may occur
in a room, a floor of a building, the whole building and so on. At any point

in time during human
activity, it is necessary to have probabilistic measures of the most likely activit
ies that are
We have been exploring the use of Hierarchical Dynamic Bayesian Networks for
this. The advantage of using the hierarchy rather

than a flat model is that it incorporates the
hierarchy. Currently, we are using the EM algorithm to estimate the parameters of the
hierarchical model in a probabilistic model, allowing complex activities to be generated from
simple activities.

Video In

Using Accelerometers

The final project does not rely on the processing of video data but uses accelerometers to
measure human activity from limb and body movement directly. These movements can be
quite complex and, again, we use HHMMs to describe

the sequences at different levels. We
have been concentrating on sport official movements such as Australian Rules Football and
cricket. The objective is to be able to link the detected movements to the sport video so we can
extract various types of event

e.g. “show me all the goals scored” or “show me all the cricketers
given out by LBW”. Obviously accelerometers (which are now available in small unobtrusive
packages) need to be worn by the sports officials but this is not a problem and much video and
elerometer footage has been acquired from various sports. So far we have been able to
learn and recognise various cricket martial arts gestures.