ACTION RECOGNITION AND CLASSIFICATION IN MOVIES USING RULE- BASED SYSTEM

gurgleplayAI and Robotics

Oct 18, 2013 (3 years and 8 months ago)

76 views

ACTION RECOGNITION AND CLASSIFICATION IN MOVIES USING RULE
-
BASED SYSTEM


Lili Nurliyana Abdullah
1
, Shahrul Azman Mohd Noah
2

& Tengku Mohd Tengku Sembok
2


1
Department of Multimedia, Faculty of Computer Science And Information Technology

University Putra M
alaysia, 43400 UPM Serdang, Selangor, Malaysia

Tel: +6
03
-
89216725, Fax: +603
-
89466577
, E
-
mail:
liyana@putra.upm.edu.my


2
Department of Information Science, Faculty of Technology and Information Science

Univer
sity Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia

Tel: +
603
-
89216182, Fax: +603
-
89256732,

E
-
mail:
samn, tmts@ftsm.ukm.my



A
bstract


Video classification is a broad topic and some

approaches
are also termed as video

interpretation, video anal
ysis,
video annotation or

video understanding. The ability to
classify videos

into various classes such as sports, news,
movies or

documentaries increases the efficiency of
indexing,

browsing, and retrieval of video in large

databases. Ideal characterizing

must clearly be

rich enough
to permit user convenience of

independent queries and
effectiveness of browsing

and retrieving information based
on his or her

contents of interest. This paper will be
concerning

with the process of video classification, the
sy
stem

and structure of building the fuzzy
-
logic based

framework, and with the automatic recognition

and
generation of syntactic and semantic video

annotations.
Th
rough techniques to be proposed
we can characterize
scenes, segments and

individual frames in v
ideo. This
research is

concerning on the development of fuzzy
-
logic

framework for semantic interpretation of video

sequences
using: edge feature extraction; motion;

shot activity; colour
feature extraction and sound.


Keywords:


Video classification, fuzzy
-
logic based, semantic
interpretation.



INT
RODUCTION


With the digital technology getting inexpensive

and
popular, there has been a tremendous increase in

the
availability of videos through cable and internet

such as
video on demand and online screening.
For

feasible access
to this huge amount of data, there is a

great need to annotate
and organize this data and

provide efficient tools for
browsing and retrieving

contents of interest.

An automatic
classification of the movies on the

basis of their content
is
an important task. For

example, movies containing violence
must be put in a

separate class as they are not suitable for
children.

Similarly, automatic recommendation of movies

based on personal preferences will help a person to

choose
the movie of his i
nterest. However,

classifying a huge
collection of video data without

human intervention is not
an easy task. This has led

to abundant research targeted at
automatically

characterizing and categorizing the content of
video

streams. Classifying the streams
and the scenes

contained within into different categories such as

news,
commercials, sports and etc. enables efficient

cataloguing
and speedy retrieval during search with

large video
archives. Automated classification of

video also leads to
greater efficie
ncy for indexing,

retrieval, and browsing of
the data in the large video

archives.


A basic approach of video annotation is to

detect the shots
and use a set of key frames to

represent the shot content. To
combine similar shots

together and form scenes or
story
units is more

m
eaningful t
han presenting the shots alone.
Ideally,

the digital video will be automatically annotated as
a

result of machine interpretation of the semantic

content of
the video; however, given the state of the

art in computer
vision, s
uch sophisticated data

abstractions may not be
feasible in practice. Rather,

the computer may offer
intelligent assistance in the

manual annotation of video, or
the computer may

perform automatic annotation with
limited semantic

interpretation. In order to

create the video
data

abstraction, it is desirable to identify syntactic and

semantic components in the video material and to

define
data models for concisely describing first the

structural
video properties and second the semantic

video content.
T
he data

models for representing v
ideo

have to be general
and broad enough to accommodate

the range of formats and
lengths of different types of

programs, whether the video is
a two
-
hour movie, a

one
-
hour talk show, a five
-
minute
home
-
video clip,

or a thirty
-
secon
d news segment.


Video classification is a broad topic and some

appr
oaches
are also termed as video i
nterpretation,

video analysis,
video annotation or video

understanding. The ability to
classify videos into

various classes such as sports, news,
movies or

documentaries increases the efficiency of
indexing,

browsing, and retrieval of video in large
databases.

Classifying the streams and the scenes contained

within into different categories such as news,

commercials,
and sports enables efficient cataloguing

and speedy
retrieval during search with large video

archives. Ideal
characterizing must clearly be rich

enough to permit user
convenience of independent

queries and effectiveness of
browsing and retrieving

information based on his or her
contents of intere
st.


VIDEO CATEGORIZATION


Video content summarization and categorization is a
necessary tool for efficient access, understanding, browsing
and retrieval of videos. A frequently used approach towards
video characterization is to treat it has an extension o
f
image characterization


not very suitable as video has
spatio
-
temporal relationships. Most other research work on
video classification advantage was not taken of temporal
features in video. A common approach to searching and

categorizing video data is t
o create textual annotations by
hand


semiautomatic


as a fully automatic system with
interactive access ability will be much more practical for
DVD. To obtain such a representation, a human observer is
required to sequentially watch the video and locate

the
important boundaries or scene edges. However, a manual
content analysis is not feasible for huge amount of data as it
is slow as well as expensive. While significant research
efforts have been devoted to the analysis of movies, news,
reports and sport

videos, the analysis of video scenes has
been virtually neglected by the research community. Only
quite recently a few works explicitly addressing events
investigate the possibility of detecting and extracting
advertising content from a stream of video da
ta. While
users often desire to retrieve objects and events directly
from a video, this higher
-
level information, summarization
and categorization cannot be extracted easily and little work
has been done in this area. Films constitute a large portion
of th
e entertainment industry. While it is feasible to classify
films at the time of production, classification at finer levels,
for instance classification of individual scenes, would be a
tedious and sizeable task. Currently, there is a need for
systems to ex
tract the genre of scenes in films. Applications
of such scene level classification would allow departure
from the prevalent system of movie ratings to a more
flexible system of scene ratings. For instance, a child would
be able to watch a film containing
a few scenes of excessive
violence, if a pre
-
filtering system can prune out scenes that
have been rated as violent. Such semantic labelling of
scenes would also allow far more flexibility while
searching movie databases. For example, automatic
recommendati
on of movies based on personal preferences
could help a person choose a movie, by executing a scene
level analysis of previously viewed movies. While the
proposed method does not actually achieve scene
classification, it provides a suitable framework for s
uch
work. Some justification must also be given for the use of

previews for the classification of movies. This paper is to
classify video sequences on the basis of low
-
level and high
-
level features based representation; to provide a tool with
the ability o
f event
-
based video retrieval; to attempt to
detect,
identify and extract content of the video, placing it
into one of a set of previously defined categories; and to
annotate and organize the huge amount of data and provide
efficient tools for browsing and

retrieving contents of
interest. Our effort was devoted to successfully

implementing idea of scene identification by extracting
syntactic properties of a video such as colour distribution,
cut detection, and motion vectors.

We decided to choose
movie as o
ur video input

because of their relative ease of
availability.


RELATED WORKS


Because of the different sets of genre classes

and the
different collections of video clips these

previous works
chose, it is difficult to compare the

performance of the
differe
nt features and approaches

they used. Comparison of
their work can be seen in

Table 1.


METHOD0LOGY


We proposed a video classification method with the

following characteristics:



f
eature extraction is done automatically;



the method deals with both visual a
nd

auditory
information, and captures both

spatial and
temporal characteristics;



the extracted features are natural, in the

sense that
they are closely related to the

human perceptual
processing.


In particular, it will be concerning with the

process of vi
deo
classification, the system and

structure of building the
fuzzy
-
logic based

framework, and with the automatic
recognition and

generation of syntactic and semantic video

annotations. Through techniques to be proposed, we

can
characterize scenes, segments

and individual

frames in
video. This research is concerning on the

development of
fuzzy
-
logic framework for semantic

interpretation of video
sequences using:



Edge feature extraction



Motion



Shot activity that conveys large amount of

information about the t
ype of video



Colour feature extraction



Sound


We will use a low
-
level feature
-
based representation

th
at
treats video as spatio
-
temporal chunks and uses

colour
h
istogram difference measurement as a

measure of video
similarity. Then we will be using

these fe
atures to integrate
the shot length time to form

fuzzy rules for the extraction of
semantic video

classes. Then a learning mechanism will be
used for detecting the changes of the video (shot)
sequences, camera motion detection, and colour
distribution, in

each unit of the video sequence, and finally
generic

event categorization. For exploring the structure of

video we need to understand and analyse the shots so

a
temporal video segmentation is prerequisites. For

the
purpose of event classification, we need
to

develop a fuzzy
-
theoretic scheme. This scheme

should detect the shots and
also categorizes the shot

transitions as abrupt and gradual.
Output of the

classification phase is used for editing style
based

feature extraction. The fuzzy shot detection scheme

classifies the shot transitions as fuzzy sets. The

motion in
video sequences may be due to camera

movement, or object
movement or a combination of

the two. Information about
camera operation is very

important for the analysis and
classification of video

s
hots, since camera operation often
reflects the

intentions of the director: panning and zooming.
Each

of these operations induces a specific pattern in the

field of motion vectors from one frame to the next.

We
need to develop a fuzzy theoretic scheme for

qualitative
characterization of camera motion in a

video sequence.
Using this scheme we characterize

all the shots of a video
sequence for panning and

zooming motion. We need to
evolve fuzzy systems to

categorize the event sequences in a
hierarchical

manne
r. For each level of the hierarchy a
separate

fuzzy rule based system is being evolved. See
Figure

1.


CONCLUSION


In this research, we will study on techniques for

extracting
meaningful features that can be used to

extract higher level
information from vi
deo shots. We

will use motion, colour,
and edge features, sound and

shot activity information to
characterize the video

data. A learning algorithm based on
fuzzy logic will

be used as it eliminates the need to
incorporate

human reasoning into designing the

fuzzy
system.

A genetic fuzzy theoretic approach for video

categorization will be used. Fuzzy logic provides an

efficient data structure for adequate representation,

analysis
and optimisation of if
-
then rule bases,

independently of the
operations used to
evaluate the

rules, under the hypothesis
of rigid granularity.

Moreover, fuzzy logic allows checking
for

completeness and consistency of the rule base.



To summarize, we will be using the knowledge

base of the
predefined semantic class tree and trained

ru
les that defines
the if
-
then rules for each class with

low
-
level level feature
descriptors. With the rules

learned, video classification is
becoming possible and

efficient.


REFERENCES


[1]
Lee, C.C.
1990.
Fuzzy Logic in Control Systems
. IEEE
Trans. on

Systems, M
an, and Cybernetics
,
20

(2): 405
-
435.

[2]
Cox,

E.

1992.
Fuzzy Fundamentals.
IEE
E Spectrum
, 58
-

61.

[3]
Peterson,

I. 1993. Fuzzy Sets.
Science News
, 144:

55,
July 1993.

[4] L.A., Rowe,
L.A.,
Boreczky,
J., and
Eads,
C. 1994.
Indexes

for User Acce
ss to Large Video Database
.

Storage

and Retrieval for Image and Video Database
II, SPIE
,

2185:

150
-
161
.

[5]
Roach,
M.,
Mason
, J.,

and Pawlewski,

M.

2001.
Video

Genre Classification Using Dynamics
.

Proceedings

ICASSP
.

[6]
Haering,
N.,
Qian
,

R.J.,
and Ibrah
im,

M. 2000.
A

Semantic Event Detection Approach and Its

Application to Detecting Hunts in Wildlife Video
.
IEEE Transactions on Circuits and Systems for Video

Technology
, 10
:

857
-
868
.

[7] Jasinschi,
R. S.
,

and
Louie,

J. 2001.
Automatic TV

Program Genre Cla
ssification Based on Audio

Pattern
.
EuroMicro Conference
,
370
.

[8]
Fischer,

S.,

Lienhart,

R.,

and W. Effelsberg
. 1995.
Automatic Recognition of Film Genres
.
Proc. o
f

the
ACM Multimedia Conference
, 295
-
304
.

[9]
Zhou,

W.,
Vellaikal,

A.,

and Kuo,

C.C. 2000.
Rule
-
based

V
ideo Classification System for Basketball

Video Indexing
.

Proc. o
f the ACM Multimedia

Conference
, 213
-
216
.

[10]
Rasheed,

Z., and
Mubarak,
S. 2003.
Scene Detection in

Hollywood Movie
s and TV Shows.
IEEE Computer

Vision

&

Pattern Recognition Conf
erence
.

[11]
Rasheed,

Z.,
Sheikh,

Y., and
Shah,

M. 2003.
Semantic

Film Preview Classification using Low
-
Level

Computable Features
.

International Workshop on

Multimedia Data & Document Engineering
.