Semantic Indexing Of Images Using A Web Ontology Language

wafflebazaarInternet και Εφαρμογές Web

21 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

532 εμφανίσεις



Semantic Indexing Of Images Using A Web Ontology Language





Gowri Allampalli
-
Nagaraj





A thesis

submitted in partial fulfillment of the

requirements for the degree of





Master of
Science





University of Washington


2007





Program Authorized to Offer Degree:

Institute of Technology
-

Tacoma






University of Washington

Graduate School






This is to certify that I have
examined this copy of a master‘s thesis by






Gowri Allampalli
-
Nagaraj




and have found that it is complete and satisfactory in all respects,

and that any and all revisions required by the final

examining committee have been made.






Committee
Members:




_____________________________________________________



Isabelle Bichindaritz




_____________________________________________________

George Mobus




Date:__________________________________







In presenting this thesis in partial fulfillment of the requirements for a master‘s degree at
the University of Washington, I agree that the Library shall make its copies freely available
for inspection. I further agree that extensive copying of this thesi
s is allowable only for
scholarly purposes, consistent with ―fair use‖ as prescribed in the U.S. Copyright Law. Any
other reproduction for any purposes or by any means shall not be allowed without my
written permission.




Signature ______________________
__



Date ____________________________



































University Of Washington


Abstract


Semantic Indexing Of Images Using A Web Ontology Language


Gowri Allampalli
-
Nagaraj


Chair of the Supervisory Committee:

Professor Isabelle Bichindaritz

Computing and Software Systems




This paper presents a system implemented
to evaluate the retrieval efficiency of images
when they are semantically indexed using a combination of a Web Ontology Language and
the low level fe
atures of the image. Finding a similarity measure algorithm to retrieve
images based on the semantic metadata can be very challenging due to diverse image
content and inadequate domain specific ontologies describing the content. Existing
methods for indexi
ng images
are

primarily based on text. While this method is widely used
due to its simplicity, it is not very efficient as it requires a domain expert and the textual
interpretations of image content vary from person to person. In our approach, we levera
ge
sophisticated image processing techniques to extract image content information and
associate them to existing domain ontologies developed by experts thereby, bridging the
gap between low level features a
nd high level semantics. The work described in thi
s paper
shows that a high retrieval accuracy rate is obtained when all the image descriptors are
combined with an ontology while building the semantic metadata for indexing images.



i


TABLE OF CONTENTS



List O
f Figures

................................
................................
................................
........................

iii

List O
f Tables

................................
................................
................................
..........................

iv

Chapter 1

................................
................................
................................
...............................

1

Introduction

................................
................................
................................
.......................

1

Chapter 2

................................
................................
................................
...............................

3

Motivation
................................
................................
................................
.........................

3

Chapter 3

................................
................................
................................
...............................

4

Problem Statement

................................
................................
................................
...........

4

Chapter 4

................................
................................
................................
...............................

6

Background

................................
................................
................................
.......................

6

4.1 Ontology

................................
................................
................................
.................

6

4.2 Image Databases

................................
................................
................................
.....

6

4.3 Image Semantic Representation Languages

................................
..........................

7

4.4 Image Interpretation Software

................................
................................
...............

8

4.5 MPEG


7

................................
................................
................................
...............

8

4.6 Distance Measure

................................
................................
................................
.

11

Chapter 5

................................
................................
................................
.............................

12

D
atasets

................................
................................
................................
...........................

12

5.1 Visible Human Image Data Set:

................................
................................
..........

12

5.2 University O
f Washington Digital Anatomist Reference Ontology

...................

13

Chapter 6

................................
................................
................................
.............................

14

P
reprocessing

T
ools

................................
................................
................................
.......

14

6.1 MySQL

................................
................................
................................
.................

14

6.2 Adobe Photo Shop

................................
................................
...............................

14

6.3 M
-

Ontomat Annotizer

................................
................................
........................

14

Chapter 7

................................
................................
................................
.............................

16

Preprocessing M
ethods

................................
................................
................................
..

16

7.1 Selection Of Images From Visible Human

................................
.........................

16

7.2 Extraction Of UWDA Ontological Terms From The UMLS Database

.............

16

7.3 Creation Of UWDA Reference Ontology In DAML (Darpa Agent Mark Up
Language)

................................
................................
................................
...................

17

7.5 Conversion Of Image Format To JPEG

................................
..............................

18

7.6 Extracting Image Content And Linking
To Domain Ontology

..........................

18

Chapter 8

................................
................................
................................
.............................

20

M
ethods

................................
................................
................................
..........................

20

8.1 Training And Test Set

................................
................................
..........................

20

8.2
Extracting Image Content From XML Files

................................
.......................

21

8.3 Calculating Distance Measure

................................
................................
.............

21

8.4 Calculating Combined Distance Measure

................................
...........................

25

8.5 Creating
Distance Matrix

................................
................................
.....................

25

ii

8.6 Calculating Retrieval Accuracy Rate

................................
................................
..

25

8.7 Improving Retrieval Accuracy Rates.

................................
................................
.

27

Chapter 9

................................
................................
................................
.............................

29

R
esults, Discussion

A
nd

A
nalysis

................................
................................
.................

29

9.1 Initial Results

................................
................................
................................
........

29

9.2 Increased Training
T
o Test Ratio

................................
................................
........

30

9.3 Combined Descriptors

................................
................................
.........................

32

9.4 Ensemble Classification

................................
................................
.......................

33

9.5 Ten Fold Cross Validation

................................
................................
...................

35

9.6 Excluding Descriptors

................................
................................
..........................

37

9.7 Em
pirical Weight Optimization

................................
................................
...........

38

Chapter 10

................................
................................
................................
...........................

39

R
elated

W
ork

................................
................................
................................
..................

39

10.1 Knowledge


Assisted Video Analysis And Object Detection

........................

39

10.2 Retrieval of Multimedia Objects By Combining Semantic Information From
Visual And Textual Descriptors

................................
................................
................

40

Chapter 11

................................
................................
................................
...........................

41

E
ducational

S
tatement

................................
................................
................................
....

41

Chapter 12

................................
................................
................................
...........................

43

C
onclusion

................................
................................
................................
......................

43

Bibliography

................................
................................
................................
...........................

44

Appendix A

................................
................................
................................
.............................

48

Presentation Slides
................................
................................
................................
..........

48

Appendix
B

................................
................................
................................
.............................

98

Installation & User Manual

................................
................................
............................

98

Appendix C

................................
................................
................................
...........................

102

S
y
stem Ou
tput

................................
................................
................................
..............

102

Appendix
D

................................
................................
................................
...........................

103

Image Descriptor Files

................................
................................
................................
.

103

Appendix E

................................
................................
................................
...........................

109

DAML Ontology F
i
le

................................
................................
................................
...

109

Appendix F

................................
................................
................................
............................

114

Image Annotation Files

................................
................................
................................

114


iii

LIST OF FIGURES



Figure Number Page


1: Image of Abdomen from Visible Human Data Set.

................................
....................

12

2: Image of Thigh from Visible Human Data Set.

................................
...........................

13

3: Screenshot of SQL query used to Extract UWDA terms from UMLS.

......................

17

4: Screenshot of VDE tool in M
-
Ontomat Annotizer showing the i
mage feature ex
traction
and annotation process.

................................
................................
............................

19










iv

LIST OF TABLES


Table Number Page


1:
Accuracy rate for training set.

................................
................................
......................

29

2: Accuracy rate for test set.

................................
................................
.............................

30

3: Accuracy rate for 75% images in training set and 25% images in test set.

................

31

4: Accuracy rate for 50% images in training set and 50% images in test set.

................

31

5: Combined accuracy rate for training Set = 50 % and test Set = 50%.

........................

32

6: Combined accuracy rate for training set = 75 % and test set = 25%.

.........................

33

7: Accuracy rate for Ensemble Classification for 50% test and 50% training.

...............

33

8: Accuracy rate for Ensemble Classification for 75% training and 25% training.
........

34

9: Accuracy rate for Ten Fold Cross Validation for 75% training and 25 % test.

..........

35

10 : Accuracy rate for Ten Fold Cross Validation for 50% training and 50 % test.

.......

36

11: Accuracy Rate excluding Contour Shape and Texture Browsing.

...........................

37

12: Accuracy rate excluding Contour Shape descriptor.

................................
.................

38

13: Accuracy rates for Empirical Weight Op
timization.

................................
.................

38





v

ACKNOWLEDGEMENTS



Special thanks to Professor Isabelle Bichindaritz for all her assistance, guidance and
feedback during the course of this
thesis
. Her involvement was essential
in

the completion
of this
thesis
. I am also very thankful to Professor George Mobus for al
l his
help and
valuable feedback
.
T
hanks to the members of the committee for
all
their valuable input.

vi


DEDICATION



To my
husband,
family
and
friends
.
1




Chapter
1




INTRODUCTION




With the advances in medical technology over the years we have a large number of
digital images like Magnetic Resonance Images

(MRI)
, X
-
Rays, anatomical and
pathological images, etc. Medical research has led to the
development of valuable
knowledge bases consisting of formal domain ontologies, electronic patient records,
statistical medical data and results of various medical studies. Analysis of these images is
of utmost importance to study the different aspects of
a problem. To analyze the
information stored in these images, the concerned doctors / scientists should be able to
access the image information easily and effectively

[15]
. Until lately, medical databases
mostly used textual information to store and retrie
ve images not making potential use of the
rich image content present in the digital images. Handling large collections of images is a
growing challenge and there has been a lot of research in the area of image retrieval
systems to efficiently store and ret
rieve image collections.


The
main
goal
for this thesis work
is to aid the ongoing research in the area of
semantic indexing of images by evaluating the retrieval effectiveness of image collections
when image content information is combined with a formali
zed ontology

to automatically
index images by content
.
Research in

this area has raised questions as to whether or not it is
possible to develop a semantic indexing system with an efficient rate of image
retrieval
[
34]. The challenge involved is to develop

a similarity matching algorithm for analyzing
the image content extracted and producing a match.


In the system
presented here
, we use medical anatomical images from the Visible
Human

[
24]

data set and the Digital
Anatomist

[
22]

formal medical ontology developed for
the human anatomical terms. In our approach, we extract various image features like color,
2



shape, texture, etc in MPEG
-
7[35] standard image feature description format and associate
them to the related anatomical term
s

thus building the semantic metadata. An important
feature of this system is the similarity matching algorithm developed to calculate the
matching between images thereby determining the retrieval accuracy rate for the system.
Various experiments based on
different approaches for improving the accuracy rates were
performed to evaluate the retrieval efficiency of the system.


Chapter

2

describes the motivation behind this research. A detailed description of
the problem being solved and the background information required to understand this
research area are illustrated in Chapters
3

and
4
. Chapters
5

and
6

illustrate the dataset and
pr
eprocessing tools and resources used to process the data

for further analysis. Chapter 7

describes the methods used in pre processing the data. The architecture of the system and
the methodology used to solve the research

issue is described in Chapter 8
. T
he
experimental results, analysis and discussion are described in Chapter
9
. Chapter
10

describes other related work in this area.

Chapter 11

con
tains the Educational Statement.
Finally, Chapter 12
contains the conclusions derived from this implementation.



3



Chapter
2



MOTIVATION




With the number of digital images increasing rapidly, there is a great need to
manage digital image repositories. There is a need to store and retrieve images just like text
documents. Advances in the field of medical technologies have encouraged hospitals

and
medical research centers to use various machines like X
-
Ray,
Magnetic Resonance
Imaging

(MRI)
, Scan, etc. The use of such machines has resulted in the production of
valuable data in the form of digital images on different diseases, physical structures
,
various organisms, etc. Analysis of these images is of utmost importance to study the
different aspects of a problem. To analyze the information stored in these images, the
concerned doctors / scientists should be able to access the image information eas
ily and
effectively.


By indexing images based on semantic descriptors of low level features, doctors
can submit a query like


‗find images with round
calcifications‘

[
3]
. In such a query,
‗calcification‘ is the textual description representing the seman
tics of the region of interest
and the shape ‗round‘ is the textual annotation representing the low
-
level shape feature.
Executing such a query would avoid the retrieval of images with just a round shape or with
just the associated text ‗calcification‘. An
other example query can be of this form
-

‗find all
the images having a blue sky‘. Such a query would yield images whose semantic descriptor
is ‗blue‘ and the corresponding feature representation is the color ‗blue‘. This kind of
semantic annotation
for

ima
ges greatly improves the image classification and query
mechanisms. There is a growing need for research in the area of attaching semantics to low
level features to improve image retrieval a
nd storage methods [25]. In our implementation
,
images

are indexed

based on their semantic content, in order to address the growing need
for representing images with meaningful annotations
and

improv
e their retrieval efficiency.

4




Chapter
3



PROBLEM STATEMENT



The number of digital images
is

growing rapidly, driving the need for the
development of efficient tools to browse, retrieve and navigate through these large image
collections. As information contai
ned in images
is

complex, containing different colors,
shapes, textures and subject, indexing methods designed for storing and retrieving textual
content will not work effectively. There is a need to explicitly capture a sufficient amount
of content information as well as

application specific semantics by means of a variety of
metadata like multimedia indexes, attribute based annotations and intentional descriptions
to allow appropriate selection, browsing and retrieval of images from large collections [1].



Potentially,
images have many types of attributes that could be used for storage and
retrieval. Presence of a particular combination of color, texture or shape features, presence
of a specific type of object, depiction of a particular event, presence of individuals /
l
ocations, presence of specific emotions or metadata such as who created the image, where
and when, etc., are some image attributes that could be used for indexing images. Images
can be indexed based on a single attribute or a combination of attributes to i
mprove the
efficiency of the image retrieval system.


Traditionally images are indexed based on textual annotations. Every image is
examined individually and a textual annotation describing the various characteristics of the
image is stored along with the
image for the purposes of indexing. Given the large number
of images being produced, manual annotations tend to be very time consuming and prone
to error. Querying images with textual annotations is also not very effective, as images
have so much more cont
ent in them making it harder to describe the image with plain text
[15
,

34
].

5



Another approach to indexing images is to extract the content of images like color,
shape and texture and to store the feature representation of such content along with the
images

for indexing purposes. With this approach of indexing, the images could only be
queried on their color, shape and texture but not on the actual subject matter. This approach
is not useful in querying images containing a particular subject matter and is s
aid to have
many limitations when applied to image databases with a broad content [15].



The most recent approach to indexing images is to use the low level features of the
image as semantic descriptors of the image thus bridging the gap between the above two
approaches of indexing images. Digital images are composed of pixels arranged in an
i
nfinite variety of patterns and, in general, it is difficult to predict the particular pattern that
would match the information need. Deciding on the aspects of the image that are
appropriate for indexing is very challenging. Interpretation of the semantic

content is in
itself a challenging task as every interpretation can be different. Such an indexing would
greatly improve the querying capability of images as they can be queried for both low level
features as well as high level semantics.



The

feature
r
epresentation and the semantic descriptors of the image thus obtained
are mapped onto domain ontologies in order to classify the images for retrieval purposes.
Determining the association between semantic descriptors
and
ontologies is a difficult task.

Having a system which indexes images based on the semantic metadata would be very
beneficial to retrieve large collections of images more effectively
and

efficiently
.
With
this
approach, one can leverage and combine the
research efforts in the areas of domain
ontologies and image processing to bu
ild an effective image indexing system.

6



Chapter
4



BACKGROUND



4
.1 Ontology




Ontology

is a formal, explicit specification of a shared conceptualization. A

‗c
onceptualization


refers to the abstract model of some phenomenon in the world,
identifying the relevant concepts of that phenomenon. Explicit means that the type of
concepts is explicitly defined and formal refers to the fact that the ontology can be
expr
essed mathematically. As a result it is machine readable and understandable. In image
retrieval applications, ontology allows the description of
semantics
,

establishes a common
and shared understanding of a domain and facilitates the implementation of
a
user oriented
vocabulary of terms and their relationship with objects in images [12].


4
.2 Image Databases



Image data such as satellite images, medical images and digital pictures are
generated in large numbers every day. The World Wide Web itself is a huge repository of
images. As a result of the huge volume of image data, the use of multimedia databases is
ve
ry essential. Multimedia databases store and retrieve images, texts, videos, sounds and
data stored on any media. The analysis of such images is very useful for archival and
retrieval purposes in fields like medicine, environmental studies, military purpo
ses, etc.
Multimedia databases support querying images based on their content. Images can be
queried based on the shape of the objects present in the image, colors of the object,
textures, volume, spatial relationships, motion, etc.

7




4
.3 Image
S
emantic
R
e
presentation
L
anguages



Searching for images by content implies a first step of extracting features from the
images, to be able to search these features. Image mining deals with the extraction of this
semantic content from a large collection of images. As
sociating the semantic content with
the images is called annotation. Semantic content of images can be stored with images
using standard languages. In image annotation different objects of the image are attached
with textual and spatial information and sto
red in a database using a standard
representation. Images can be queried effectively by indexing the images along with their
semantic content. Metadata is the most important part of data archive and it provides
descriptive data about every stored object. M
etadata includes indexing information that can
be described using a standardized framework to represent an image along with its semantic
content.


Resource Description Framework (RDF)
[20]

is used to represent information and to
exchange knowledge
on

the W
eb. Web Ontology Language (OWL)
[20]

used to publish
and share sets of terms called ontologies, supporting advanced Web search, software agents
and knowledge management. The DARPA Agent Markup Language (
DAML
)
[20]

is an
extension of XML, which provides a ric
h set of constructs to create ontologies and to
markup information so that it is machine readable and understandable. DAML, RDF and
OWL are some of the languages that have been developed to represent the semantic content
of the images. MPEG
-
7
[35]

offers a
comprehensive set of audiovisual description tools to
create metadata descriptions which will form the basis for applications enabling the needed
effective and efficient access

to multimedia content.




8



4
.4 Image
I
nterpretation
S
oftware



Image analysis software provides the tools for segmentation, feature extraction and
statistical analysis of content in images. Segmentation deals with the identification of
objects of interest within an image. Feature extraction is extracting information f
rom the
images by measuring the number, size, shape or color of objects.



4
.5 MPEG


7



MPEG
-
7
[35]

is an ISO/IEC standard developed by MPEG (Moving Picture Experts
Group). MPEG
-
7, formally named "Multimedia Content Description Interface", is a
standard for describing the multimedia content data that supports some degree of
interpretation of the informa
tion meaning, which can be passed onto, or accessed by, a
device or a computer code. MPEG
-
7 is not aimed at any one application in particular;
rather, the elements that MPEG
-
7 standardizes support as broad a range of applications as
possible.


MPEG
-
7 Visu
al Description Tools included in the standard consist of basic
structures and descriptors that cover the following basic visual
features: Color, Texture,
Shape
and Motion
, Localization, and Face recognition. Each category consists of
elementary and sophist
icated descriptors. In this
implementation
, we are only using the
Color, Texture and Shape descriptors. The following section provides a brief description of
the image des
criptors used.


Dominant Color
.

This color descriptor is most suitable for representing local (object or
image region) features where a small number of colors are enough to characterize the
color information in the region of interest. Whole images are also applicable, for
example, flag i
mages or color trademark images. Color quantization is used to extract a
small number of representing colors in each region/image. The percentage of each
9



quantized color in the region is calculated correspondingly. A spatial coherency on the
entire descrip
tor is also defined, and is used in similarity retrieval.


Scalable Color
.

The Scalable Color Descriptor is a Color Histogram in HSV Color
Space, which is encoded by a Haar transform. Its binary representation is scalable in
terms of bin numbers and bit r
epresentation accuracy over a broad range of data rates.
The Scalable Color Descriptor is useful for image
-
to
-
image matching and retrieval based
on color feature. Retrieval accuracy increases with the number of bits used in the
representation.



Color Lay
out
.

This descriptor effectively represents the spatial distribution of color of
visual signals in a very compact form. This compactness allows visual signal matching
functionality with high retrieval efficiency at very small computational costs. It provid
es
image
-
to
-
image matching as well as ultra high
-
speed sequence
-
to
-
sequence matching,
which requires so many repetitions of similarity calculations.


Color
Structure
.

The Color Structure descriptor is a color feature descriptor that
captures both color co
ntent (similar to a color histogram) and information about the
structure of this content. Its main functionality is image
-
to
-
image matching and its
intended use is for still
-
image retrieval, where an image may consist of either a single
rectangular frame o
r arbitrarily shaped, possibly disconnected, regions. The extraction
method embeds color structure information into the descriptor by taking into account all
colors in a structuring element of 8x8 pixels that slides over the image, instead of
considering e
ach pixel separately.


Texture
Browsing
.

The Texture Browsing Descriptor is useful for representing
homogeneous texture for browsing type applications, and requires only 12 bits
(maximum). It provides a perceptual characterization of texture, similar to a

human
characterization, in terms of regularity, coarseness and directionality. The computation of
10



this descriptor proceeds similarly as the Homogeneous Texture Descriptor. First, the
image is filtered with a bank of orientation and scale tuned filters (mo
deled using Gabor
functions); from the filtered outputs, two dominant texture orientations are identified.
Three bits are used to represent each of the dominant orientations. This is followed by
analyzing the filtered image ions along the dominant orientat
ions to determine the
regularity (quantified to 2 bits) and coarseness (2 bits x 2). The second dominant
orientation and second scale feature are optional.



Edge Histogram
.

The edge histogram descriptor represents the spatial distribution of five
types of

edges, namely four directional edges and one non
-
directional edge. Since edges
play an important role for image perception, it can retrieve images with similar semantic
meaning. Thus, it primarily targets image
-
to
-
image matching (by example or by sketch),

especially for natural images with non
-
uniform edge distribution. In this context, the image
retrieval performance can be significantly improved if the edge histogram descriptor is
combined with other Descriptors such as the color histogram descriptor.


Region Shape
.

The shape of an object may consist of either a single region or a set of
regions as well as some holes in the object. Since the Region Shape descriptor makes use of
all pixels constituting the shape within a frame, it can describe any shapes,

i.e. not only a
simple shape with a single connected region but also a complex shape that consists of holes
in the object or several disjoint regions. The Region Shape descriptor not only can describe
such diverse shapes efficiently in a single descriptor
, but is also robust to minor
deformation along the boundary of the object.


Contour Shape
.

The Contour Shape descriptor captures characteristic shape features of an
object or region based on its contour. It uses so
-
called Curvature Scale
-
Space
representa
tion, which captures perceptually meaningful features of the shape.


11



4
.6 Distance Measure



A d
istance is a numerical description of how far apart objects are at any given
moment in time. In
physics

or everyday discussion, distance may refer to a physical length,
a period of time, etc.

In
mathematics
, the Euclidean distance or Euclidean metric is the
"ordinary"
distance

between two points that one would measure with a ruler, which can
be proven by repeated application of the
Pythagorean Theorem
.



12



Chapter
5



DATASETS




This chapter illustrates the image data set and the reference ontology used for this
implementation
.



5
.1 Visible Human Image Data Set:




Images from the Visible
Human

[
24]

Data Set were used
.

The
V
isible
H
uman
dataset contains anatomically detailed, three
-
dimensional representations of the normal
male and female human bodies. This digital image dataset contains complete human male
and female cadavers in MRI, CT and anatomical modes. The image
s were obtained via
academic licensing through National Library of Medicine.


Figure
1
:

Image of
A
bdomen from Visible
Human Data Set
.


13




Figure
2
:

Image of Thigh from Visible Human
Data Set
.


5
.2
University O
f Washington Digital Anatomist Reference Ontology




The University of Washington Digital
Anatomist (
UWDA)

[22]

reference ontology
fro
m the medical domain was chosen.
UWDA is an abridged version of the Foundation
Model of Anatomy

[27]

Ontology and is incorporated into the UMLS (Un
ified Medical
Language System) M
eta source. UWDA is a domain ontology that represents knowledge of
the human body. It contains classes and relationships that provide a symbolic model of the
structure of the hu
man body. This domain is computer based and was designed for
bioinformatics. It was developed by the structural information group at the University of
Washington. UMLS was obtained through academic licensing in order to access the
UWDA Ontology.

14



Chapter
6



PREPROCESSING TOOLS




This chapter illustrates the tools used to process the image data set and create the
reference ontology.


6
.1 MySQL




MySQL is an open source SQL Database Management System. MySQL was used
in this
implementation

to house the UMLS database containing the University of
Washington Digital Anatomist reference ontology. The ontological terms contained in the
UWDA ontology was retrieved using SQL queries from the MySQL instance of UMLS.


6
.2 Adobe Photo Shop




Adobe Photoshop is a graphics editor developed by Adobe Systems for image
manipulation. Images obtained from the visible human data set are in the raw format.
Adobe Photoshop was used to convert these images to JPEG format in order to access any
informatio
n contained in the images.


6
.3 M
-

Ontomat Annotizer


M
-
OntoMat
-
Annotizer (M stands for Multimedia)
[26]

is a user
-
friendly tool
developed inside the
aceMedia
. It is an extension of the CREAM (CREAting Metadata
for th
e Semantic Web) framework and its reference implementation,
OntoMat
-
Annotizer
. M
-
OntoMat
-
Annotizer
Visual Descriptor Extraction Tool developed as a
plug

in to Ontomat

Annotizer
presents a graphical interface for loading and
15



processing visual content (images and videos), extraction of visual features and
association with domain ontology concepts. M
-
Ont
oMat
-
Annotizer is a Java
-
based
application and is distributed under the
GNU LESSER GENERAL PUBLIC LICENSE

[
R1].

16



Chapter
7



PREPROCESSING METHOD
S




The following chapter describes the various
steps involved in preparing the image
data set and the reference ontology for this
implementation

using the tools and data sets
described in the above chapters.


7
.1 Selection
O
f

Images

F
rom Visible Human




A subset of 90 images from the Vi
sible Human
Data Set was chosen
. This subset
consisted of both the male and female images spanning from head to toe of the human
body. 15 categories based on different regions of the human body such as Head, Abdomen,
Thigh, Abductor Magnus, Kidney, Eyes, Brain, Glute
al Muscles, Hamstring, Biceps,
Pectoralis Major, Colon, Pelvis, Thorax and Lungs were chosen. The categories were
chosen such that the images range in their content i.e. they have differen
t colors, shapes and
textures. 90 images were selected by picking
6
images
from each of the 15 categories
to act
as test and training images for our experiments.


7
.2 Extraction
O
f
UWDA

O
ntological
T
erms
F
rom
T
he UMLS
D
atabase




A subset of 15 UWDA ontological terms corresponding to the 15 categories of
images described
in the above section was extracted from the UMLS database for our
experiment. MySQL was used to install the UMLS database and SQL queries were
designed to extract the UWDA ontological terms from the UMLS database. The UMLS
database has various tables in th
e databases containing information such as concepts,
definitions, terms, etc. The following SQL query was used to extract the UWDA
ontological terms and their definition from the UMLS tables.

17





Figure
3
:

Screenshot of SQL query used to
Extract UWDA terms from UMLS
.


7
.3 Creation Of

UWDA R
eference
O
ntology
I
n DAML (Darpa Agent Mark Up
Language)




An empty ontology file was created in the DAML format. The 15 extracted
ontology terms and definitions were t
hen added to the file in DAML format using the
DAML references and guidelines. This file containing the 15 UWDA ontological terms
was used in M
-
Ontomat Annotizer as the reference ontology file in DAML format.


7
.4 Loading Domain Ontology
I
n M
-
Ontomat
Annotizer




The reference ontology DAML file is loaded into M
-
Onto Annotizer using the
Ontology Explorer. The Ontology Explorer displays all the ontological terms contained in
18



the domain ontology file created above. Ontology Explorer provides a way to create
prototype

instances for ontology terms to be linked to image feature content.



7
.5 Conversion
O
f
I
mage
F
ormat
T
o JPEG





The s
ubset of images chosen for the implementation
from the Visible Human Data
Set is in the raw format. These images need to be converted
to the bitmap or JPEG format
to access the image content information. The raw images were opened with Adobe
Photoshop after specifying the width, size and resolution as per guidelines set by National
Library of Medicine for this data set. These images were

then saved as JPEG files through
Adobe Photoshop. The JPEG image files were then used for image segmentation and
feature extraction as described in the next section.


7
.6 Extracting Imagta set. Each image will have 8 XML files
containing the image content, 1 RDF file containing the domain ontology and references to
the XML files and 1 DAML file containing the domain ontology terms. These files form
the core data set and were used to build the semantic retrieval system described in the next
section.


Figure 4: Screenshot of VDE tool in M-Ontomat
Annotizer showing the image feature extraction
and annotation process.

20

Chapter 8


METHODS


This chapter describes the methodology used in the development of the system to
semantically index images and calculate the retrieval efficiency. The first step in the
implementation involved selecting the test and the training images. Once the test set and
the training set was obtained, every test image was compared to a training image by
extracting all the feature descriptors for each image and calculating the distance measure
for each feature type. Distance matrices were built containing the distance measures for test
versus training images for every feature. The test images were then classified using
similarity matching algorithms and the Ensemble classification approach. The accuracy
rate was determined for every approach. The following sections describe the methods and
approach used to develop the system.

8.1 Training And Test Set


The chosen subset of 90 images is divided into 2 sets. The first set is the training set
and the second set is the test set. 3 approaches were followed for populating the test set and
the training set. In the first approach, 15 representative images from each category were
used as the training set and the remaining images were in the test set. Many studies show
that with a larger training set, the accuracy rate results can be improved. Hence, in the
second approach, a training set that contained 50% of the images and a test set that
contained the remaining 50% of the images were used. Also, an algorithm was developed
to randomly populate both the test and the training images. In the third approach, the test
and the training images were randomly populated. However, the training set contained 75%
of images and the remaining 25 % of the images were in the test set.
21

For every image in the test set, the distance measure between the test image and
every other training image for a particular feature descriptor was calculated and stored in a
distance matrix for that feature descriptor. Also, for every training image, the distance
between the training image and every other training image for a particular feature
descriptor was calculated and stored in a distance matrix for that particular feature
descriptor.

8.2 Extracting Image Content From XML Files


Image content information for a particular image is extracted from the descriptor
XML files. Every visual descriptor file has a different format and hence different XPath
expression methods were developed for parsing each type of file. Image content from the
XML files are extracted at run time while calculating similarity measure for each image.

8.3 Calculating Distance Measure


Distance measure calculations require the image content information for the 2
images whose distance needs to be calculated. The image content information is extracted
for the 2 images as described in the above section. Every feature descriptor has a different
formula for calculating the distance as attributes of the descriptor are unique to a particular
descriptor. The distance measure is thus calculated using one of the following formulae
depending on which feature descriptor the distance measure is being calculated for.

Dominant Color. The distance between two dominant color descriptors, F and F , is
1 2
calculated by the following distance function [28]:

. (1)

22

where F is the dominant color and p is the corresponding percentage value. N is the total
number of dominant colors, and a is the similarity coefficient between two colors. The
k,l
formula for a is shown below
k,l :

.

d , T and d are defined as follows: (3)
k,l d max

.

.

where is the dominant color coefficient between 1 and 1.5 [28].

.

where, c and c , are colors. (5)
k l

Color Layout. The distance between two color layout descriptors values [Y,Cb,Cr] and
[Y‘,Cb‘,Cr‘] can be calculated as follows[28]:

.

23

where , and denote weighting values for each coefficient. Y, Cb and Cr are
color layout descriptors also known as YCoeff, CbCoeff and CrCoeff.

Color Structure. The color structure distance measure between their descriptors is shown
in the following formula [28]:

. (7)

where h and h are the color structure descriptor vectors of images A and B and i is the
A B
total number of color structure descriptors.

Texture Browsing. The texture browsing descriptor captures the regularity v1, direction v2
and v4, and scale v3 and v5 in the texture pattern. The distances between two sets of
corresponding coefficients of TBC vector is shown in following formula [28]:
TBC = . (8)

Edge Histogram. Edge histogram distance E is measured as the distances between two sets
of inverse quantized edge histograms A and B is shown below [28]:
. (9)
where, and are Edge Histogram descriptors and i is the total number of Edge
Histogram descriptors.

Contour Shape. Contour shape distance measure M is computed as a weighted sum of the
distance measure between the global curve parameters and the distance measure between
the Curvature Scale Space (CSS) peaks associated with the object and the semantic entity
[28].

24

. (10)

where E and C are the absolute values of Eccentricity and Circularity. M is the distance
css
measure value between the CSS matching peaks with an additional penalty for each
unmatched peak equivalent to the missing peak height [28].


.

where xpeak and ypeak are coordinate values in x and y axes and i is the total number of
Contour Shape descriptors.

Region Shape . The distance function between 2 region shape descriptor is obtained from
the following formula [28]:

. . (12)

where p and q are region shape attributes and i is the total number of attributes.

Scalable Color. The distance function between 2 scalable color descriptors is obtained
from the following formula [28]:

. (13)

where p and q are scalable color attributes and i is the total number of attributes.


25

8.4 Calculating Combined Distance Measure


Combined distance measure is calculated by summing the weighted distances
obtained for all the image descriptors as described in the above section. Different weights
were used while combining all the distances. The process of weight determination is
explained in the Results and Analysis section.

8.5 Creating Distance Matrix


A distance matrix is created for every feature descriptor. The elements of the matrix
are the distance measures calculated using the methods stated in the above section. The
dimensions of the matrix are Test X Training or Training X Training. Totally, 17 distance
matrices are generated for image retrieval calculations. 8 matrices, one for every feature
description is created for the dimension - Test X Training. The remaining 8 matrices, one
for every feature description is created for the dimension – Training X Training. These
distance matrices are used in the image retrieval algorithms to calculate the retrieval
accuracy rate as described in the following sections. The elements of the last distance
matrix contain the combined distances of all image descriptors.

8.6 Calculating Retrieval Accuracy Rate

Two algorithms based on different classification approaches were developed to
calculate the retrieval accuracy rate. The first algorithm uses a simple classification
technique based on smallest distance matching. The second algorithm follows the
Ensemble Classification technique.

Smallest Distance Classification. The algorithm for smallest distance classification is
based on calculation on distance matrices. To further explain the algorithm, let us consider
any distance matrix - Test X Training for Scalable Color. The first row of the matrix
26

containing the distance measure for the test image and all the training images is scanned
and the smallest distance measure is calculated using fundamental sorting techniques.
Once, the smallest distance measure is obtained, the first row is scanned again to find all
the training images that have the same smallest distance measure. A count of all the
matches and the matching training images ID‘s are stored for calculating the retrieval
accuracy. The ontology term for the test image is retrieved using XPath expression parsing
of the ontology RDF files. The ontology terms are retrieved for all the matching training
images using XPath expressions as well. If any one of the training ontology terms matches
the test ontology term then the algorithm classifies the image to the right category for
identification. Each positive match is reflected in the accuracy count. The algorithm is
repeated for all the rows in the distance matrix. The overall accuracy is obtained once the
algorithm finishes with all the rows. The overall accuracy is a percentage obtained as a
ratio of the number of test images classified over the total number of test images. The
following are the different retrieval efficiencies that were calculated for all the test and
training images using the smallest distance matching algorithm, Independent retrieval
efficiency for every feature descriptor and Retrieval efficiency when combining all the
feature descriptors.

Ensemble Classification. The Ensemble technique is a popular and efficient classification
technique. It derives from the concept of voting. Every image descriptor votes for a
particular category. The test image will be classified to the category that has the maximum
number of votes. An algorithm was developed to reflect this method. The algorithm uses
the distance matrices produced for all the image descriptors. The algorithm considers the
distance matrices belonging to a particular image descriptor. The first row of the matrix
containing the distance measure for the test image and all the training images is scanned
and the smallest distance measure is calculated using fundamental sorting techniques.
Once, the smallest distance measure is obtained, the first row is scanned again to find all
the training images that have the same smallest distance measure. A count of all the
matches and the matching training images ID‘s are stored for calculating the retrieval
27

accuracy. The ontology term for the test image is retrieved using XPath expression parsing
of the ontology RDF files. The ontology terms are retrieved for all the matching training
images using XPath expressions as well. The training ontology terms retrieved is stored in
an array. These steps are repeated for the first row of every distance matrix belonging to all
the image descriptors. At the end of these steps, the array contains the matched training
image ontology terms. Each set of ontology terms added to this list by the feature
descriptors are analogous to votes added. The frequency of all the ontology terms is
counted and the term with the highest frequency/vote is the obtained. This term is then
compared to the ontology term for the test image and classified as positive if they match
and the count of positive matches is tracked for retrieval accuracy rate calculations. The
above procedure is repeated for all the rows in the distance matrices i.e. for all the test
images. The overall retrieval accuracy rate is calculated as described earlier.

8.7 Improving Retrieval Accuracy Rates.


Ten Fold Cross Validation and Empirical Weight Optimization techniques were
used to improve the retrieval accuracy rates produced by the system.

Ten Fold Cross Validation. In the Ten Fold Cross Validation approach, all the
calculations performed in system are repeated 10 times and the calculations are averaged at
the end of the last iteration. This approach is aimed at generalizing the errors caused by
random operations such as populating the test set and the training set. The whole program
runs in a loop of 10 iterations. In each of the iterations, the training and the test sets are
populated, the distance matrices and accuracy rates are calculated. At the end of each of
the iterations the results are summed. At the end of all the iterations the results are
averaged.

28

Empirical Weight Optimization. Empirical weight optimization technique was used to
determine the weights while calculating the weighted combined distance measure.
Combined distance measure is calculated as a weighted sum of all the descriptors. To start
with, all the descriptors are assigned equal weights. One of the descriptors is chosen and its
corresponding weight is varied from +1 to -1 in increments of +/- 0.1 each time. For every
weight measure, the difference between the maximum weight and the weight chosen for the
descriptor is calculated and the difference is distributed as among all the other descriptors
equally. Combined accuracy rate is calculated for every variation. This technique is then
applied to all the other descriptors.

29

Chapter 9


RESULTS, DISCUSSION AND ANALYSIS


The following chapter illustrates the results obtained from the implementation
approach described above. An analysis of the results the various methods used to improve
the implementation results are described in detail in this section.

9.1 Initial Results


The initial results for the implementation contained 15 images in the training set and 75
images in the test set. The tables below show the results for test vs. training and training vs.
training.
Table 1: Accuracy rate for training set.

Training Set = 15 Images, Training Set = 15 Images
Image Descriptor Accuracy Rate
Color Layout 100%
Color Structure 100%
Contour Shape 100%
Dominant Color 100%
Edge Histogram 100%
Region Shape 100%
Scalable Color 100%
Texture Browsing 100%

From the training vs. training results table we can see that the retrieval accuracy rate for
all training images is 100%. The retrieval rate for training images is calculated to verify
that the algorithm developed is able to correctly classify images in the training set.
30

Table 2: Accuracy rate for test set.

Training Set = 15 Images, Test Set = 75 Images
Image Descriptor Accuracy Rate
Color Layout 42.6666666666667%
Color Structure 50.6666666666667%
Contour Shape 14.6666666666667%
Dominant Color 37.3333333333333%
Edge Histogram 41.3333333333333%
Region Shape 68%
Scalable Color 53.3333333333333%
Texture Browsing 29.3333333333333%

From the test vs. training results table we see that highest accuracy rate is obtained
by indexing images only on the Region Shape descriptor. Scalable Color and Color
Structure provide the second best retrieval rates. This accuracy rate is definitely better
compared to a random classifier accuracy rate of 6.66 %. The random classifier rate is
obtained as the percentage probability of the test image being classified as one of the 15
training images.

9.2 Increased Training To Test Ratio


Data mining best practices indicate that the Training to Test ratio should be high for
improved retrieval accuracy rates. In our experiments we selected 2 ratios for training and
rd rd
test sets. The first ratio was 2/3 training and 1/3 test. The second ratio was 1/2 training
and 1/2 test. The training and the test sets were populated randomly based on another data
mining best practice guidelines. The following table indicates the results obtained with the
2 ratios of training and test sets.

31


Table 3: Accuracy rate for 75% images in
training set and 25% images in test set.

Training Set = 75%, Test Set = 25%
Image Descriptor Accuracy Rate
Color Layout 48%
Color Structure 87%
Contour Shape 26.07%
Dominant Color 47.83%
Edge Histogram 65.22%
Region Shape 65.22%
Scalable Color 78.26%
Texture Browsing 52.17%


rd rd
With training to test ratio being 2/3 and 1/3 , the best retrieval accuracy rates are
obtained for Color Structure descriptor. Scalable Color also gives good results.


Table 4: Accuracy rate for 50% images in
training set and 50% images in test set.

Training Set = 50%, Test Set = 50%
Image Descriptor Accuracy Rate
Color Layout 44.44%
Color Structure 57.77%
Contour Shape 17.77%
Dominant Color 37.77%
Edge Histogram 44.44%
Region Shape 68.88%
Scalable Color 64.44%
Texture Browsing 62.22%


32

With training to test ratio being1/2 and 1/2, the best retrieval accuracy rates are
obtained for Region Shape descriptor followed by Scalable Color.

From the results, we can see that the retrieval accuracy rates have significantly
improved with a higher number of images in the training set. By increasing the number of
images in the training set, the maximum value for the retrieval accuracy rate for a
descriptor has increased from 68% to 87%.

9.3 Combined Descriptors


To further improve the accuracy rate, we combined the distance measures for all the
descriptors and calculated the accuracy rate on the combined value. Above mentioned
ratios for the test and training sets were used. The test and the training sets were also
randomly populated.

Table 5: Combined accuracy rate for training Set
= 50 % and test Set = 50%.

Training Set = 50%, Test Set = 50%
Image Descriptors Accuracy Rate
Combined Descriptors (Equal 73.33%
Weights)


With test to training ratio being ½ and ½, the combined accuracy rate is shown
above.





33

Table 6: Combined accuracy rate for training set
= 75 % and test set = 25%.

Training Set = 75%, Test Set = 25%
Image Descriptors Accuracy Rate
Combined Descriptors (Equal 86.95%
Weights)


With test to training ratio being 1/3 and 2/3, the combined accuracy rate is shown
above.

The retrieval accuracy rate obtained by combining all the descriptors is almost
equivalent to the highest retrieval accuracy rate obtained for one of the descriptors in the
previous experiment (Color Structure). Due to the combined retrieval accuracy rates not
being significantly higher compared to accuracy rates obtained by single descriptors, we
experimented with some more methods to improve the combined accuracy rates as
described in the following sections.

9.4 Ensemble Classification


The next approach used to improve the retrieval accuracy rate was Ensemble
Classification.

Table 7: Accuracy rate for Ensemble
Classification for 50% test and 50% training.

Training Set = 50%, Test Set = 50%
Image Descriptors Accuracy Rate
Ensemble 37.77%

34


With test to training ratio being ½ and ½, the Ensemble accuracy rate is shown
above.

Table 8: Accuracy rate for Ensemble
Classification for 75% training and 25% training.
Training Set = 75%, Test Set = 25%
Image Descriptors Accuracy Rate
Ensemble 43.47%


With test to training ratio being 1/3 and 2/3, the Ensemble accuracy rate is shown
above.

Good results were not obtained using the Ensemble classification approach due to
the votes being distorted for certain descriptors. Due to the nature of the image descriptors,
we found that there were more than one training images with the smallest distance
measures for a particular test image. The images from the Visible Human Data Set are very
similar in terms of dominant colors and textures in the images. Many training images
having the same smallest distance measure meant that the test images were voted to be in
different training classes thereby skewing the voting calculations for the Ensemble
classification method.

For example, for test image 1, training images 3, 8, and 9 had the same smallest
distance measures. However, training images 3 and 9 voted the test image to be in the
―Head‖ class whereas training image 8 voted for ―Eyes‖. While predicting the class of the
test images using the Ensemble classification technique, we considered all the votes for a
particular test image across all the descriptors distance matrices and calculated the vote
with the maximum occurrence and assigned the test image to the class with the maximum
35

votes. In the above example, the test image will be assigned to the ―Head‖ class. In actual,
the test image belongs to the ―Eyes‖ class. Hence, the retrieval accuracy rate is reduced due
to incorrect classification.

9.5 Ten Fold Cross Validation


We used Ten Fold Cross Validation method to further improve the accuracy rates
for single descriptors and combined descriptors. With the Ten Fold Cross Validation we
can average out any errors that might occur due to random selection of training and test
images.

rd rd
From the table below, for the training to test ratio of 2/3 and 1/3 , the best results
are obtained when all the descriptors are combined. The Ensemble accuracy rate is also
improved but the results are not as high as the combined accuracy rate. However, Scalable
Color, Edge Histogram, Color Structure provide good results as well.

Table 9: Accuracy rate for Ten Fold Cross
Validation for 75% training and 25 % test.
Training Set = 75%, Test Set = 25%
Ten Fold Cross Validation
Image Descriptor Accuracy Rate
Color Layout 55.65%
Color Structure 71.304%
Contour Shape 30.86%
Dominant Color 52.60%
Edge Histogram 71.73%
Region Shape 66.95%
Scalable Color 75.65%
Texture Browsing 65.65%
Combined Descriptors (Equal Weights) 84.34%
Ensemble 64.78%
36

From the table below, for the training to test ratio of ½ and ½, the best results are
obtained when all the descriptors are combined. The Ensemble accuracy rate is also
improved but the results are not as high as the combined accuracy rate. However, Scalable
Color and Region Shape descriptors provide good results as well.

Table 10 : Accuracy rate for Ten Fold Cross
Validation for 50% training and 50 % test.

Training Set = 50%, Test Set = 50%
Ten Fold Cross Validation
Image Descriptor Accuracy Rate
Color Layout 52%
Color Structure 64.22%
Contour Shape 26.22%
Dominant Color 46.44%
Edge Histogram 63.55%
Region Shape 68.44%
Scalable Color 70.22%
Texture Browsing 55.11%
Combined Descriptors (Equal 81.33%
Weights)
Ensemble 62.22%


From all the above experiments, we observed that Scalable Color, Color Structure,
Region Shape and Edge Histogram provided consistent good results.
However, Contour Shape consistently has the lowest accuracy rates followed by Texture
Browsing and Dominant Color. Color Layout lies in between, with an average of around
50% accuracy rate across all experiments. The next section describes experiments done by
excluding descriptors with low individual retrieval accuracy rates while calculating the
overall combined accuracy rate.


37


9.6 Excluding Descriptors


Texture Browsing and Contour Shape descriptors were excluded from the
combined accuracy rate calculations. The results obtained from this exclusion are shown
below. There is an increase in the combined accuracy rate (87.39%) compared to previous
experiment results (~84%).

Although, Contour Shape has consistently given low accuracy rates, Texture
Browsing did give average results in some of the experiments described above. Hence,
removing both the Texture Browsing descriptor and the Contour Shape descriptor from the
combined descriptor calculations did not significantly improve the accuracy rates.

Table 11: Accuracy rate excluding Contour
Shape and Texture Browsing.
Training Set = 50%, Test Set = 50%,
Training Set = 75%, Test Set = 25%
Image Descriptors Accuracy Rate
Combined Descriptors (Equal 84.88%
Weights, No Contour Shape and
Texture Browsing)
Combined Descriptors (Equal 87.39%
Weights, No Contour Shape and
Texture Browsing)


The combined accuracy rate significantly improved when Contour Shape
Descriptor was excluded from the combined accuracy rate calculations. A high accuracy
rd rd
rate of 90.434 % was obtained with the training and test ratio as 2/3 and 1/3 .
38

Table 12: Accuracy rate excluding Contour
Shape descriptor.
Training Set = 50%, Test Set = 50%,
Training Set = 75%, Test Set = 25%
Image Descriptors Accuracy Rate
Combined Descriptors (Equal 84.44%
Weights, No Contour Shape)
Combined Descriptors (Equal 90.434%
Weights, No Contour Shape)


Accuracy rates obtained for Contour Shape have been consistently lower across all
experiments and hence excluding it from the combined descriptor calculations significantly
improved the retrieval accuracy rates.

9.7 Empirical Weight Optimization


By using the Empirical Weight Optimization Technique, we were able to further
improve the retrieval accuracy rates by combining weighted descriptors and not excluding
any descriptors from the semantic metadata. The highest retrieval accuracy rate obtained
from this approach is 93.48% with weights for the descriptors as shown in the table. These
results also show that by maximizing the weight for Region Shape, the accuracy rates
significantly improve when combining all the descriptors.

Table 13: Accuracy rates for Empirical Weight
Optimization.
Training Set = 75%, Test Set = 25%
Image Descriptor Weights Accuracy Rate
Region Shape = 1.9 93.48%
Other descriptors = 0.0148

39

Chapter 10


RELATED WORK


10.1 Knowledge – Assisted Video Analysis And Object Detection


Gabriel Tsechpenakis, Giorgos Akrivas, Giorgos Andreou, Giorgos Stamou and
Stefanos Kollias presented a method for object recognition in video sequences [28]. The
goal of the system is to extract semantics automatically by detecting and tracking moving
objects in video sequences and then using low-level features of each semantic entity, in
order to associate moving objects with them. The proposed algorithm consists of two
main steps: the detection and localization of ―regions-of-interest‖ in a sequence, and the
estimation of the main mobile object contours. Visual descriptors, which are used to
model visual content associated with semantic entities, are categorized according to the
MPEG-7 framework. Visual descriptors extracted were mapped to the conceptual terms
to build the semantic indexing metadata. Similarity matching algorithms were used to
match the moving regions extracted. The simulation of this system was able to identify
moving regions based on the extracted semantics.

A similar approach was used in our implementation. Our implementation focused
on the content of images and not videos. The main difference is that in our
implementation, the semantics are manually extracted by selecting the region of interest
and formalized domain ontology is used for mapping the extracted content to meaningful
terms. Also, the above system used similar videos to build the training and test sets
whereas in our implementation, we used images diverse in their content.




40

10.2 Retrieval of Multimedia Objects By Combining Semantic Information From
Visual And Textual Descriptors


Mats Sjöberg, Jorma Laaksonen, Matti Pöllä and Timo Honkela proposed a
method of content-based multimedia retrieval of objects with visual, aural and textual
properties [33]. In their method, training examples of objects belonging to a specific
semantic class are associated with their low-level visual descriptors (such as MPEG-7)
and textual features such as frequencies of significant keywords extracted from audio
tracks. A fuzzy mapping of a semantic class in the training set to a class of similar objects
in the test set was created by using Self-Organizing Maps (SOMs) trained from the visual
and textual descriptors. Query by example (QBE) is the main operating principle in
SOM, meaning that the user provides the system a set of example objects of what he or
she is looking for, taken from the existing database. The various experiments performed
by them on the system proposed showed a promising increase in retrieval performance.
The results also showed that the retrieval performance increased with the use of textual
features.

The implementation approach described above is less similar to the approach used
in our implementation. We classified images using a similarity matching algorithm based
on smallest distances and Ensemble classification. This approach is slightly different to
the SOM approach used in the implementation described above. Also, in our approach all
the training images in a particular class have the same textual descriptor whereas this
implementation uses a range of words and their frequencies.





41

Chapter 11


EDUCATIONAL STATEMENT


This research work benefited from the knowledge obtained from many classes
taken as a part of the Graduate curriculum at the Institute of Technology, UW Tacoma.
Strong foundations obtained from the TCSS 543 – Advanced Algorithms class helped in
the mathematical aspects involved in this research. Knowledge obtained from this class was
also useful in selecting and implementing the right data structures needed for this
implementation. Image processing foundations from the TCSS 451 - Digital Media class
was very useful in extracting image features which was a significant part of this
implementation. Database design basics learnt from the TCSS 545 class was extremely
helpful during the data pre processing phase. The basics of scientific research obtained
from the TCSS 598 – Master‘s Seminar was extremely helpful while researching on this
area. The exposure to formal technical writing in this class was also very helpful while
writing this paper. Concepts of Bioinformatics such as data mining and domain ontologies
helped me a lot when trying to understand the concepts related to the medical domain.
TCSS588 - Bioinformatics class was very useful in determining areas for future research
that would benefit the medical domain. Apart from these classes, programming knowledge
gained from many other classes was very useful in the design and implementation stages.

Exposure to image processing tools, similarity matching algorithms and techniques
proved to be very knowledgeable, as it can be applied to solve indexing problems in
various domains. Many indexing algorithms were researched during the course of this
research. This knowledge will be very useful to build information retrieval applications in
the future. This research also proved to be very beneficial in learning the languages of the
Semantic Web such as RDF and DAML. Working on this thesis has given me the
opportunity to research and learn about various areas of computer science like imaging,
42

multimedia databases, knowledge representation languages, etc. I thoroughly enjoyed the
learning experience and exposure to various technologies during the course of this research.
43

Chapter 12


CONCLUSION


The implementation described in this paper has shown that a high retrieval accuracy
rate is obtained by semantically indexing images using a web ontology language and the
visual descriptors of the image. The biggest challenge in this implementation was to
develop a similarity matching algorithm to retrieve matching images by combining all the
visual descriptors and the ontology terms. A retrieval accuracy rate of 93.48 % was
obtained using the algorithm developed. The approach proposed in this paper will benefit
the medical community to a large extent as large collections of medical images can be
indexed and retrieved semantically. Future improvements to this implementation include
automating the image segmentation and feature extraction phase and using learning
techniques to improve the similarity matching algorithm developed.


44

BIBLIOGRAPHY


[1] Boll, S., Klas, W., Sheth, A. (1998). Overview on Using Metadata to Manage
Multimedia Data. In Multimedia Data Management—Using Metadata to Integrate and
Apply Digital Media (1-24).

[2] Chavez-Aragon, A., Starostenko, O. (2004). Ontological Shape – Description, A New
Method for Visual Information Retrieval. Proceedings of the 14th IEEE International
Conference on Electronics, Communications and Computers. Retrieved Nov 27, 2004,
from http://ieee.org

[3] Comaniciu, D., Foran, D., Meer, P. (1998). Shape –Based Image Indexing and
Retrieval for Diagnostic Pathology. Proceedings of the 14th IEEE International
Conference on Pattern Recognition, 1 (902-904). Retrieved Nov 27, 2004, from
http://ieee.org

[4] Fayyad, U.M. (1996). Automating the Analysis and Cataloging of Sky Surveys. In
Advances in Knowledge Discovery and Data Mining (471-493)

[5] Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., et al. (1995).
Query by Image and Video Content. IEEE Computer, 28(9), (23-31). Retrieved Nov 1,
2004, from http://ieee.org

[6] GIS Images. Retrieved Nov 10, 2004, from http://earth.jsc.nasa.gov/sseop/efs/query.pl

[7] Golbeck, J., Alford, A., Hendler, J. Organization and Structure of Information using
Semantic Web Technologies. Maryland Information and Network Dynamics
Laboratory, University of Maryland. Retrieved Nov 1, 2004, from
http://www.mindswap.org/papers/Handbook.pdf

[8] Hand D., Mannila, H., Smyth, P. (2001). Retrieval by Content. In Principles of Data
Mining (449-484). England: The MIT Press.

[9] Hu, B., Dasmahapatra, S., Lewis, P., Shadbolt, N. (2003). Ontology Based Medical
Image Annotation with Description Logics. Proceedings of the 15th IEEE International
Conference on Tools with Artificial Intelligence. Retrieved Nov 1, 2004, from
http://ieee.org

45

[10] ImageJ. Retrieved Nov 11, 2004, from http://rsb.info.nih.gov/ij/docs/intro.html


[11] Jorgensen, C. Image Indexing- An Analysis of Selected Classification Systems in
Relation to Image Attributes Named by Naïve Users. Retrieved Nov 8, 2004, from
http://digitalarchive.oclc.org/da/ViewObject.jsp?fileid=0000002655:000000059275&re
qid=8078

[12] Knublauch, H., Olivier, D., Musen M. Weaving the Biomedical Semantic Web with the
Protégé OWL Plug-in. Stanford Medical Informatics, Stanford University: Stanford.
Retrieved Nov 18, 2004, from http://protege.stanford.edu

[13] Maybury, M.T. (Ed.). (1997). Intelligent Multimedia Information Retrieval. Menlo
Park, CA: AAAI Press.

[14] Mejino, J., Rosse, C. Conceptualization of Anatomical Spatial Entities in the Digital
Anatomist foundation Model. Structured Informatics Group, Department of Biological
Structure, University of Washington School of Medicine. Retrieved Nov 4, 2004 from
http://sig.biostr.washington.edu/s/da/

[15] Mojsilovic, A., and Gomes, J. (2002). Semantic Based Categorization, Browsing and
Retrieval in Medical Image Databases. IEEE International Conference on Image
Processing, III (145-148). . Retrieved Nov 1, 2004, from http://ieee.org

[16] Ontology Web Language. Retrieved Nov 21, 2004, from http://www.w3.org/TR/owl-
features/

[17] Pentland, A., Picard, R.W., Sclaroff, S. (1994). Photobook: Tools for content-based
manipulation of image databases. International Journal of Computer Vision, 18 (233-
254).

[18] Protégé. Retrieved Nov 3, 2004, from http://protege.stanford.edu/

[19] Rui, Y. Huang, T.S., Ortega, M., Mehrotra, S. (1997). Relevance feedback: a power
tool in interactive content-based image retrieval. Proceedings of the IEEE Transactions
on Circuits and Systems for Video. Maybury, M.T. (Ed.) Intelligent Multimedia
Information Retrieval Technology, 8(5), (644-655). Retrieved Nov 1, 2004, from
http://ieee.org

46

[20] Semantic Web. Retrieved Oct 17, 2004, from, http://www.w3.org/2001/sw

[21] Smith, J.R., Chang, S. (1997). Querying by color regions using VisualSeek content-
based visual query system. Intelligent Multimedia Information Retrieval, In: Maybury,
M.T. (Ed.) (23-41). Menlo Park, CA: AAAI Press.

[22] The Digital Anatomist. Retrieved Oct 17, 2004, from,
http://www9.biostr.washington.edu/cgi-bin/DA/imageform

[23] UMLS. Retrieved Oct 17, 2004, from http://www.nlm.nih.gov/research/umls/

[24] Visible Human . Retrieved Oct 17, 2004 from,
http://www.nlm.nih.gov/research/visible/visible_human.html

[25] Visser, P., Bench-Capon, T. (1996). On the Reusability of Ontologies in Knowledge -
System Design. Conference Proceedings of the Seventh International Workshop on
Database and Expert Systems Applications, (256-261)

[26] M – Ontomat Annotizer. Retrieved Jan 30, 2006 from,
http://www.acemedia.org/aceMedia/results/software/m-ontomat-annotizer.html

[27] Foundation Model of Anatomy. Retrieved Nov 11, 2005 from,
http://sig.biostr.washington.edu/s/fm/AboutFM.html

[28] Tsechpenakis, G., Akrivas, G., Andreou, G., Stamou, G., Kollias, S. Knowledge –
Assisted Video Analysis and Object Detection. Image Video and Multimedia
Laboratory, Department of Electrical and Computer Engineering, National Technical
University of Athens. Retrieved Oct 30, 2006 from,
http://www.cbim.rutgers.edu/papers/eunite_2002.pdf

[29] Christopoulas, C., Berg, D., Skodras, A. The Colour In the Upcoming MPEG – 7
Standard. Retrieved Jan 5, 2007 from,
http://www.eurasip.org/content/Eusipco/2000/sessions/ThuAm/SS2/cr1634.pdf

[30] Eidenberger, E. Evaluation and Analysis of Similarity Measures for Content –Based
Visual Information Retrieval. Interactive Media Systems Group, Institute of Software
Technology and Interactive Systems, Vienna University of Technology. Retrieved
Dec 15, 2006 from,
http://www.ims.tuwien.ac.at/media/documents/publications/acmms2004b.pdf
47


[31] Geradts, Z., Hardy, H., Poortman, A. Bijhold, J. Evaluation of contents based image
retrieval methods for a database of logos on drug tablets. Netherlands Forensic
Institute. Retrieved Nov 21, 2006 from,
http://citeseer.ist.psu.edu/cache/papers/cs/30794/http:zSzzSzgeradts.comzSzhtmlzSzDo
cumentszSzArticleszSzSPIE2001zSzdrugs.pdf/geradts01evaluation.pdf

[32] Papadopoulos, S., Mezaris, V., Kompatsiaris, I., Strintzis, M.G. A Region Based
Approach to Conceptual Image Based Classification. Information Processing
Laboratory, Electrical and Computer Engineering Dept., Aristotle University of
Thessaloniki. Retrieved Jan 5th, 2006 from,
http://www.iti.gr/~bmezaris/publications/vie05.pdf

[33] Sj¨oberg, M., Laaksonen, J., P¨oll¨a, M., Honkela, T. Retrieval of Multimedia
Objects by Combining Semantic Information from Visual and Textual Descriptors.
Laboratory of Computer and Information Science , Helsinki University of
Technology. Retrieved Feb 15, 2007 from,
http://www.cis.hut.fi/s/cbir/papers/icann2006mats.pdf

[34] Eakins, J., Graham, M. Content Based Image Retrieval. University of Northumbria at
th
Newcastle . Retrieved Dec 15 , 2006 from
http://www.jisc.ac.uk/uploaded_documents/jtap-039.doc

[35] MPEG - 7. Retrieved Nov 11, 2005 from,
http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm






48

APPENDIX A

PRESENTATION SLIDES


This appendix contains the PowerPoint slides prepared for the thesis presentation.


49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96



97







98

APPENDIX B


INSTALLATION & USER MANUAL


Installation:

1