Student Name: Christos Ioannou

goldbashedΤεχνίτη Νοημοσύνη και Ρομποτική

15 Νοε 2013 (πριν από 3 χρόνια και 10 μήνες)

398 εμφανίσεις








University of Sunderland

Department of Computing & Technology


CET300 Project: Literature Review








Student Name: Christos Ioannou

Registration Number: 1073480371

Programme:
Computer Science



2

|
P a g e




Table of Contents

1.

Introduction

................................
................................
................................
................................
.....

3

2.

The Binary
search technique

................................
................................
................................
...........

4

2.1 Binary search types

................................
................................
................................
.......................

4

2.2 Searching Queries

................................
................................
................................
.........................

6

2.2.1.

Searching Intersection Queries

................................
................................
.......................

6

2.3.

The search algorithm

................................
................................
................................
...............

7

3.

Image Search Engines

................................
................................
................................
.....................

8

3.1. W
eb Seer Image Search Engine

................................
................................
................................
...

8

3.2.

Rise Image Search Engine

................................
................................
................................
....

10

3.3.

PageRank algorithm

................................
................................
................................
..............

12

4.

Critical Evaluation

................................
................................
................................
........................

13

5.

Conclusion and Recommendations

................................
................................
...............................

14

References

................................
................................
................................
................................
.............

17





3

|
P a g e


1.

Introduction



In the last century, internet
-
related technology has made a fascinating leap. The
internet has expanded its practical uses so widely and enriched the knowledge it can provide
so much, that today it is an absolute necessity and a most useful tool for someone seeking
information on any subject. And as a consequence to the ever increasing
number of users
searching for all kinds of subjects, naturally, various search techniques have been developed
to facilitate the user for the faster and more conclusive search that a search engine is capable
of.


Search engines have advanced rapidly over th
e very small amount of time that they
have been
in our lives
. From t
he oldest, most primitive form of a search engine, ‘Archie’ as
it was called, until today where we have the dominant ‘Google’ giant and many other less
prominent search engines, there has
been about a decade of history around the evolution of
navigation of the Web that has made search engines the vital tool for the user who seeks a
specific piece of information. Search engines are constantly changing, being updated and
enhanced to fit the c
riteria and make more and more informa
tion accessible to the public.
The theme of the present

research project is
‘Image Search Engines and How t
he
Algorithm Works in a Different T
ree
s’

and it specifically revolves around search engines
for the purpose of acquiring sufficient knowledge around the author’s project, which deals
with the university’s image search engine.

In the first chapter, the author
can study the most widely used
and gr
eatly
advantageous
search technique that search engines operate with, the
binary search
technique
. The author

w
ill look into the structure of this simple and effective search
technique

and the trail of action it follows when elements are entered for compar
ing in the
binar
y tree. In more detail, the author

will examine the types of binary trees and their
respective manner of function, enhancing our understanding of it t
hrough a diagram.
Continuing, the author

will

look at

search queries

and specifically the search intersection
query and its two different types.

Additionally, the author

will
examine the
search
algorithm

in depth in the last part of the chapter
,

which outlines its nature and present
s

the
full proced
ure that the algorithm f
ollows.


In the final chapter, the author

can take a closer look at the
image search engines

that
the author has
dealt with in the quest to

find
information coherent to his project.
Two search
engines that have proved to be helpful for the purpose
and whos
e
operational

details we will
separately look into
are the
Web Seer

and
Rise

image search engines
.

Both of these
complex and interesting search engines have aided the author’s understanding of the
structure his own project could have.
To conclude this
brief

we
examine

the
PageRank
algorithm
, a solution

too

many issues that can arise in image search engines and a most
important tool for them to function efficiently.


4

|
P a g e





2.

The Binary search technique


The way binary search works is relatively simple and
that is why it is achieved
quickly and accurately and anyone can use it. It is accomplished by creating a binary search
tree which compares the various elements together. When new elements are entered

for
comparing
, the binary tree begins

from
the bottom a
nd compares them

with the existing data
items

until the user reaches the desired result. Binary search has a very important and exciting
structure for database design, because
many different queries
can be answered efficiently
(Bentley, 1979).

Due to its s
tructure, the binary search tree performs three tasks at the same time which
carries the benefit of very fast processing. Firstly, it stores the records of a given set.
Secondly, it

impose
s

a partition on the data space that divides the line in different p
arts

and
thirdly
, b
inary search trees provide a directory that allows us to
rapidly locate

the position of
a new point in the partition by making a logarithmic of comparisons (Bentley, 1979).

The binary tree organizes document collections which are linked

with a set of
concepts by every internal node of the tree. There are buckets at the bottom of the

tree which
store the documents and for each document there

is one bucket.

The documents are
linked

by
being inserted in a subtree
rooted in a certain
node;

it is,
at that moment
,

compared to the
notion

set associated with the certain node. When the intersection is empty
,

the document
gets introduced
recursively. T
his is acknowledged as the

left child


of the unique node

(Eastman, 1978)
.

The
re is also a righ
t subtree that

is no
nempty which means that,

when the intersection
contains data
,

it is

inserted here.

Until a leaf node is reached
,

these processes which are called

intersect


and

descend


will continue. As the process comes to an end
,

the document will

be
positioned in the definite leaf node in the bucket. The node concept set must contain at least
one concept in

comm
on

with the right su
bstree.

The left subtr
ee document contains none of
the concepts
(Eastman, 1978)
.

The time taken to put a document into a binary tree of height is
O (L).

T
he value of

L

is defined as the maximum equitable value which is the collection of size
N

is






. The
average bucket to such a certain tree is sized as one. The empty buckets

are guaranteed to any
higher tree. Generally

L

is less than or equal to






and therefore putting

N

(number of
documents) into the tree at poorest


O

(
N




)


(Eastman
,

1978
)
.

2.1
Binary search types



5

|
P a g e


Binary search types are separated into

two different types, one dimensional binary
search trees and multidimensional b
inary search trees. As they begin, the

one dimensional
b
inary search trees work to see if a certain value called
y

is stored in a tree
. Y
ou need to
begin at the roo
t and compar
e
y

to the value of

the

key stored in the root which can be
defined as

z
. It can only be found if
y

is equal to

z

but if
y

is less than

z

the search continues
to the left and then to the right, and so on. This
procedure

continues down the one
dimensional binary tree until the wanted element is found
.

When introducing an element
,

we
can put on
an

incis
ive procedure till it drops through

the tree which changes the last null
pointer detected to point to the new element
. One

dimensional search trees can be used for
establishing a case of data in which the records hold one key field or other data fields. When
there are many key fields in every record
,

binary search trees are unsuitable
,

since they only
use one of the key field
s for
the
forming

of

the tree (
Bentley, 1979)
.

There are many ways to implement an abstract binary search tree but one of the most
popular representations is called the

homogeneous
’,

which serves two different purposes
:

representation of the certain reco
rd and direction of the specific search. The individual
purposes are separated into two diverse types of node
:

internal and external.
There are certain
types of situations
in
which nonhomogeneous trees are higher t
han the homogeneous
tree,
which happens

wh
en records are kept on secondary storage device (Bentley, 1979).










When standard binary search trees are generalized
,

the process is to

make

use of all
the key fields in a file
F

of
N

records of
k

keys each. A standard binary search tree tells the
insertion to
precede

either left or right
dependent

on the basis of one key field
.



T
he multidimensional binary
tree
(k
-
d tree) can be

done
inversely;

it is done

specifically as
each record in the file
has

K

keys
,











. As we begin the first level of the
tree we
need to pick whether to
go right or left when adding a new record
,

with the first key


of the
new record being the first key of the record and
kept on the root of the k
-
d tree
. The
discriminator on the second level can be used as the second key of the tree til
l the

k
-
th level
.
When
on
the
(k +
1)st

level of the tree we can routine the first key as the discriminator once
6

|
P a g e


more.

There are two keys:





for the name and





for the age. This implementation
is
shown in the
diagram below

(Bentley,
1979).







2.2
Searching
Queries



There are different types of
searching queries which satisfy the various information
needs of users. Searching queries are typically categorized

as those who are intend
ed for
covering a general topic (
for which the
re may be thousands of results),
those who are
intended for the finding of a single website or web page and those who are intended for the
completion of a particular act, like a purchase

or a download. In this chapter we will look at
the mechanics of search
ing

queries, and particularly we will deal with one of the most
frequent type of searching query, the searching intersection query

and

its subclasses
,
exact
match query and
partial matc
h query.


2.2.1.


Searching
Intersection Queries


On
e of the most common queries is the intersection
query
,

and it is
called that
be
cause
it

specif
ies

that the registers to be retrieved are those that intersect some subset of the set of
valid records
.
It is a record in
F

to be retrieved if it is in a certain subset q(


) of




, with
the purpose of

q(F)


F

q(


)

. When F=



implies that

q(F)=

q(


)


, it is kn
own as its
notation is constant. T
he
intersection formula contains

the sets
q(


)
that
completely
symbolize

the functions q(F) for any file being
F. The intersection queries have the property
that
some records
,

for example r

F is in q(F) sees to not be subject to upon the rest
of the file
which i
, on F


{
r} so that no global dependencies re
main elaborated

(Rivest, 1976)
.


The
category

search intersection queries


maintain
s

a lot of subclasses which help in the solution
of the queries.
Some
queries are clarified below:

Exact match queries

is

one of the most known simple type of query in a file of the k
-
key records
. It

is the precise match query

of

a particular record districted by the k keys in the
file.
The search continu
es down the tree going
either
right or
left through linking the desired

records key to the discriminator in the node
.

The figure of contrasts to achieve an exact match
search is O(lg N) in the poorest imaginable case
. If the tree is perfectly stable,

it i
s also
known as O(lg N) on the average intended for random built trees

(Bentley, 1979)
.

When the
exact match search is th
e only type of the query

to be set, k
-
d trees shouldn’t remain used as
7

|
P a g e


the data structure to keep the records.

To the user though
, the keys look as if they were

independent
. T
hey should almost always be mer
ged together into one superkey and a more
likely recognized data structure designed for uni
-
dimensional storage

and retrieval

to always
be employed

(
Bentley, 1975).

Partial match queries
are a more complex kind of query in a multikey file with
t
keys
identified.

The values

need to be identified for the t keys, so that t < k. This sample of
the certain query might occur in a personnel file: report
of
all employees with length
-
of
-
service = 5 and classification = manager, ignorin
g all other keys in the re
cords. Usually we
specify values for
t
of the
k

keys and ask for all the records that
must

be

t

values, independent
of the other k minus t values

(Bentley, 1975).

R. Rivest
mentioned

that a partial match query
q

with
s

keys specified (for some s



k) is represented by a record

̂


R with k
-
s keys
changed

by the
distinct

symbol * which means

undetermined

.
When
,
at that
point

k


s values of j, we
must


̂

=*. The set q
(


)

is the set of all records agreeing by means
of

̂

in the specified locations
,

therefore:

.
An example application might be a crossword puzzle dictionary, where a characteristic query
might require finding all words of the form B*T**R that is:


BATHER, BATTER, BETTER,
BETTOR


etc
. (Rivest, 1976).



2.3.

The
search algorithm


The nature of the search algorithm is one of continuous repetition.

In this algorithmic
procedure
,
the
argument is the node under investigation, where the first invocation passes the
root of the tree.
The domain of the specific node exist
s as a global array
(
which refers to the
geometric limitations that configure the boundaries of the
sub file

which the node represents
)

(Friedman, 1977)
.


The domain of the root node is specified as plus and minus infinity on all keys and these
boundaries are
formed based on the partitions of the nodes above it in the tree. At each node,
the partition operates in a twofold manner: it divides the current
sub
file

and it forms a lower
or upper limit on the value of the discriminator key for each record in the two new
sub files
.
The accumulation of these boundaries in the ancestors of any node defines a cell in the
multidimensional record
-
key space that contains

its
sub file
. This cell will have a smaller
volume for
sub files

defined by nodes deeper in the tree

(Friedman, 1977)
.

In the case where the node under investigation is terminal, all the records on the bucket are
examined. A list of the
m

closest records

so far detected and their dissimilarity to the query
record is always maintained as a priority queue during the search

and

the list is up
dated
whenever,
after examination
,

a record is found to be closer that the most distant member of
this list

(Friedman,

1977)
.

8

|
P a g e


If

the node under investigation is not terminal, the repetitive procedure is called for the node
which represents the
sub file

on the same side of the partition as the query record. When
control is re
-
established, a test is conducted to find out i
f is essential to consider the
records
on the side of the partition opposite the query record. It is essential to consider that
sub file

only if the geometric limitations of those records overlap the ball at the centre of the qu
ery
record, with radius
equal to the dissimilarity to the

m
th closest record so far detected. This is
called the ‘bounds
-
overlap
-
ball’ test. If t
his test fails, then no

record

on the opposite side
can
be part of the
m

records closest to the query record

(Friedman, 1977)
.


In the
case where the records do overlap the ball, it is essential that they are considered and
the procedure is called repetitively for the node which represents that
sub file
. Before
returning to find out if it is necessary to continue the search, a ‘ball
-
withi
n
-
bounds’ test is
conducted. This test ascertains whether the ball falls entirely within the geometric domain of
the node. If that is the case, the current list of
m

best matches is accurate for the
whole

file
and there is no need to examine more records

(
Friedman, 1977)
.

3.

Image Search E
ngines




In the quest to create an image search engine many questions have surfaced. These
questions are to be
examined

further

down
.

In recent years, the development of various search engines has been explosive but for
image, video and sound search engines, it has not been long since they made their first
appearance (Swain, Frankel, Athitsos, 1997). Also, it should be mentioned that current
technology for performance of the major image search engines is only able to deli
ver an
average precision of around 42% and an average recall of around 12%, but the best
performers are capable of producing over 70% for precision and around 27% for recall
(Stevenson, Leung, 2005). For this reason it will be interesting to analyse how to

start, what
problems will occur and what percentage could be imposed.


To find the problems in the creation of a search engine, you need to analyse several
search engines which work in different ways for the image search. The image search
engines

which wi
ll be examined are
the
Web

S
eer image search engine and also
the
RISE image
sea
rch engine. The purpose of the

analysis of these

two separate search engines is to collect

some information which is useful for the establishment of the author’s image search en
gine.


3.1.
Web Seer Image Search Engine


Web S
eer is
a

search engine
which we can

use to

find images in the World Wide
Web.
This particular
ima
ge search engine uses two specific

sources of information, which are
the
text
connected to the image and the image itself.
To find valuable information about
a
9

|
P a g e


speci
fic image which is contained in
a certain page, one must understand the structure of a
particular document which can reveal the information on that page
. In this

documen
t,
we can
locate several places which
have

important information about image content.

The
greatest

possibility
is that
of the text
being useful in an image search and that

include
s

alternative text,
file names, image captions, and HTML titles

(Swain, Frank
el, Athitsos, 1997)
.

Furthermore
,

Web S
eer uses data taken from analysing the particular image content to
supplement the text related
by means of

an image and data
resulting

from the image header
.
This information is at that point used to produce an environment in which the image analysis
algorithms could successfully function. The image
analysis algorithms are to be used to
classify the image inside

a categorization of types such as portraits, ph
otographs, drawings,
and so on
,

to extract beneficial semantic data (Swain, Frankel, Athitsos, 1996).



The World Wide Web image search engine has its problems in image
similarity
.

The
prob
lem with the existing methods on

image similarity is that they sta
nd
at a very low
level.
Frequently the user’s notions of image similarity need more clarification of the image than
provided by distinctive image similarity metrics such as texture and colour.
The looked
-
for
form of similarity can differ generally dependin
g on the goals of the particular user.

The
Word Wide Web search engines are made to have a quick response to a large number of
queries
which do not get helped by the complexity of the similarity matching high
dimensional feature spaces. For
this

reason
,

pr
esent procedure of similarity retrieval search
time rises dramatically with the dimensionality of the feature space and logarithmically with
the amount of imageries in the given database

(Swain, Frankel, Athitsos, 1997)
.


The image database systems of
similarity

searches
in the present day are

an
increasingly important factor. This is
completed

typically by means of comparing and
extracting the particular features of the images.
The colour distributi
on of the pixels is the
feature which is

primarily cha
racterized in the images
, texture of images,
and shape

of the
objects in the images

(Chao, Chengjian,
and Jun
, 2005)
.

A method used is the

global feature
extraction to get the image description. This method extracts particular features from images
,

commonl
y texture colour and shape features
. Colour histograms for colour features extract
mat
erial obtained globally, which are

global texture data on roughness, contrast, and
direction
. The
average value of pixel features defines colour and texture features whic
h ar
e
extracted on each pixel belonging

to the object

(
Maheswary,
Srivastava,
2009)
.

One of the algorithms which
can work with the information

described

above

is the
K
-
Means Clustering algorithm
. This algorithm is going to be useful for the solution of some
particular problems for the author’s search engine.
The K
-
Means Algorith
m is based on four
simple tasks:
the computation of the difference among an
object and the average of a cluster,
the col
lection of the initial K
-
Means

for K
-
clusters, the recalculation of the given average of a
cluster beginning from the object allocated to it so in which the intra
-
cluster variation is
decreased
, and “allocation of an object of the cluster whose average rem
ains the closest to t
he
object”. K
-

Clustering is

most
suitable for data mining
be
cause of its effectiveness

in
10

|
P a g e


processing large quantities of data sets. A cluster is the gathering of data
objects which are
like

one another inside the same cluster and are different to the objects in the other groups.
There are six main features to be extracted from each block so that the segment of an image
turns into objects.
There are two different groups which include the tree

colour features and
the tree texture features.

After

getting the six features from every single pixel on the image
and storing it in an array of
K
-
M
eans c
l
ustering, they are set together

to form K=3 clusters.
On each and every single image it is the same
process.
The K
-
Means algorithm has a great
benefit because it works well when clusters are not divided from each other, which is
repeatedly met in images.

Also the K
-
Means algorithm calls for the user to specify the
appropriate initial cluster
centre (Mahe
swary, Srivastava,
2009).


In addition
,

Web Seer uses multiple decision trees and it explains how to use them. This
will help us in the future
, because we will be aware of

how multiple decision trees work; the
author will be able to use them in

his

image search engine. The dec
ision tree is an instance
-
based organization algorithm which is primarily drawn out
of
the decision tree classification
rules from a group of no order or no rule of instance. Some of the testing of the property of
instance can
be found in each single node in the tree (Qingyun, 2011).
To classify objects into
certain categories we must first look at how to use multiple decision trees. Towards
classifying an image we need to use multiple decision trees.

Every single tree is binary
and
each no
de is either a leaf or if not
,

it has a test field which
defines

the next test
. T
he
certain
image should be submitted to
,
such that if the image scores
lower

than
that

it should
, it will
be

move
d

to the left child, or else it must change to the right child. The trees
are different
from each other regarding

the tests they send to the image, including the order in which those
specific tests occur in the given tree.
D
ifferent kinds of trees

are used
to save separate sets of
trees for unlike files (Swain, Franklen, Athi
t
sos, 1997).

3.2.


Rise

Image Search Engine


The R
ise image search engine uses
a very interesting m
ethod of image search which is
extremely suitable for the author’s image search engine.
The

R
ise image search engine

is
basically established by

means of using the average of colour components of 8x8 blocks. One
of the key factors of using the average is that it can

be useful to whichever image format.
Every single image in the rise search engin
e is characterized through a quad tree structure
with leaf nodes which contain the average colour data. The quads at whichever level in the
tree beginning from the root of it are the identical size.

Every single quad at a given level
inside the tree is to
be collected in a relational database table, with particular quads at diverse
levels in isolated relations.

As the like images

in the database, a search image

is divided

into
the quad tree structure and data in dissimilar nodes which get compared against
the data from
the quads of the identical size from the index relations
.

The
contrast is

m
easured

by a
distance measurement which can be used to decide the exact resemblance of the query to
diverse imageries in the data base.

So that there
is an additional
development

in retrieval
amelioration using the lower limit, some of the images are overlooked from more
11

|
P a g e


comparison.

The change of the lower limit holds the desired number of images so
as to
equal
the query

(Goswami, Bhatia, 2006)
.

There are three main factors of efficiency in the Rise stems.
Firstly, to

allow extensive
early pr
uning,
use

of

certain elements of the leaves of the quad tree structure

is necessary
.

For the storing of certain data for quicker data retrieval a relational

database

is used
.

As a
final point
,

Rise image search engine can handle every single image format which i
s not only
limited to just PNG, but also

accepts i
nput images in RGB colour

space before converting it
to L
*a*b* colour space for the contrast into pe
rceptual colour space.

Then the image is
separated

into 8x8 blocks by Rise and
calculates

the average of the given colour
properties

from every single block. Rise then uses a very well
-
organized colour layout method which

is
region
-
based properties
.

The query image

signature is based on its global features but
nevertheless
,

regional properties of the indexed i
mages are searched. Nowa
days it has a good
acceptance for scaling and of limited object translations

(Goswami, Bhatia, 2006).

Each and every in
ternal node has up to four children in the quad tree data structure. To
represent an image efficiently
,

the quad tree data structure must be used. A type of quad tree
where every single node should have just four children or be a leaf with no children is c
alled
a point region quad tree.

Th
is tree

represents a number of data points in two dimensions. The
point region quad tree does this by recursively decomposing the region which holds the data
points into to four equal quadrants, sub quadrants; it does this till no leaf node has more than
a single

point (Larabi, Aouat, 2010
)
.

The quad tree in the Rise system is a full tree which in
every single node has four children or none.

On further analysis,

all the leaf nodes are at the
exact same level inside t
he certain tree and represent

the average value

of the matching 8x8
blocks in the unique uncompressed imagery.

Rise then scales the particular image to 512x512
pixel images to develop
the quad tree. The number of length of every side of the
new square
image is noted as a power of two. As explained by
th
e image below the quad tree can result

from each of the square blocks by repeatedly dividing each side by two (Goswami, Bhatia,
2006).






The query procedure is objectively simple when the quad tree has been established for
each image in t
he data base

in such a way that

the required information is saved at the
database tables. The particula
r query has to be considered an image. This

image is then taken
and handled to
progress

the quad tree of its average colour component of blocks
. The
remaining stages

in the query then completely depend on the rest of the quad tree. The next
stage is that Rise calculated the distance between the highest levels of the query image from
every single image in the given database
. A predefined limit parameter is used to sele
ct the
only images which maintain a distance less than the limit.

This process defined is repetitive at
every level and images which have large distances are cut back.
The distance between two
particular images can be defined by the equation below between
the corresponding pixels.

12

|
P a g e


The ‘m’ and ‘n’ variables are defined as the number of rows and columns. The ‘L, a, and b’
specify the three values for colour in the perceptual colour space

(Goswami, Bhatia, 2006).




3.3.

Page
Rank algorithm

The PageRank is an effective
algorithm that can be used to

create a ranking

of images
into
the image search engine. Also
,

one of the most important problems that the image search
engines have is a ranking and the PageRank algorithm
is a safe solution for
use (Baluja, Jing,
2008).

Big numbers of the web pages, repeatedly referenced from other

people by
hyperlinks, establish an enormous web graph of the internet. Most pages are voted from many
authors, which will then lead to high voting scores. In simpler
terms
,

the page

would have
been considered to be the greatest
, most

fascinating page. If anyone is related to the

particular
user’s query, then it should be classified at the main section of the search outcome
(Rungsawang, Manaskasemsak, 2006).

PageRank i
s one of the best ways to prioritize the
different results of the specific web keyword searches.
Fo
r more used topics,

text matching
search supports

as it is controlled by

the web page titles since

PageRank prioritizes the
results.


PageRank is normally w
ell
-
defined as the resulting:

PageRank assumes page A which has pages T1


Tn which in point are the citations. The
particular parameter which can be defined as d is a da
mping factor that can be set in
between
the number 0 and 1. ‘D’ is commonly set to

0.85. Furthermore the variable

C(A) is normally
defined as the amount of links going out of the page of A. The page A of PageRank is
respectively given as the following:


PageRank
,

which is also known as PR(
A) should be calculated using a

very easy itera
tive
algorithm, which links to the main eigenvector of the controlled link matrix of the web. On
further note
,

PageRank can calculate 26 million web pages in just a few hours on an average
size workstation (Brin, Page,
1998
).



13

|
P a g e


4.

Critical Evaluation


The aut
hor decided to conduct the research on how binary trees work and what are its
types
in order to explore new and useful

knowledge. By discovering the types that exist, their
manner of operation and their
positive and negative aspects, the author will subsequently
have the
ability to decide which information is adaptab
le for the specific p
urposes that are
required by the

project

and, with the algorithm that will prove most suitable, to reach
a
satisfying re
sult. The basic types of binary trees, multi
-
dimensional and one
-
dimensional
ones have been looked

into in more depth in section

2.1. In general, o
ne
-

dimensional search

trees are effective for the creation of
a case of data in which the records hold one k
ey field or
other data fields
, something which is accomplished very fast. The algorithm which will be
utilized by the author for his project will be using many basic key fields in

one folder and for
that reason

the author

plans on using binary trees becaus
e they can efficiently manage many
key fields in one folder, and it is an appropriate choice for the purpose.

Shifting the focus on the search engines that playe
d a key part for this
review
first is the
Web Seer

search engine, a World Wide Web engine des
ig
ned particularly to allow

the user to
find t
he ima
ges
.
Basic characteristics of
Web Seer

are that it
uses associated text and markup
to supplement information derived from analyzing image c
ontent as well as

multiple kinds of
metadata
,

and categorizes

emphasizing on

research
, not metadata
-
based

search
.
Recently,
World Wide Web search engines have multiplied and advanced rapidly. A noteworthy
positive aspect which led th
e author to study and develop thi
s research
around
Web Seer

was
finding,
after readin
g

several articles, and verifying

that the algorithm this specific search
engine uses enables the speedy and productive search of an image, an aspec
t very important
for the project the author has to complete
. This is realized by the text that the user prov
ides as
well as the elements that the image carries.
All image and surrounding text analysis

is
conducted
off
-
line during the creat
ion of the database, which is why
Web Seer

can give fast
query responses to a possibly huge number of users.

Additionally,
Web Seer

through the
algorithm used

finds solutions to the problem that many search engine
s are faced with;

image
similarity.

A
n important positive element about World Wide Web search engines is that they are
connected to each other and derive several ans
wers from each other which are put to use
accordingly to give the user the appropriate result. But this
otherwise useful characteristic
which is evident in
Web Seer

has a rather negative impact for the author’s proje
ct since the
appropriate
search engine
d
erived from the present project
will be able

to conduct search only
in the one serve
r that the university has. This
however does not prove
Web Seer

entirely
disadvantageous for the pro
ject at hand as it was still an important resource of understanding
the
functions of the average image search engine.

The Rise search engine on the other hand, works on specific databases where it also
conducts searches

for images, a similar state

to the one that the a
uthor’s search engine will be
in.
For that reason

it

proves

to be of a little more help
to the present project.
The Rise image
14

|
P a g e


search engine uses whichever query derives from the information that the user provides and
decides the similarity of queries from an abundance of images in the databases while also
using t
he average of the
colors

of the pictures to exhibit results.
RISE builds on the JPEG
indexing algorithm, extending it to formats other than JPEG and into colo
u
r spaces, in
addition to using a relational database.

A very important useful aspect of this
image search
engine is that its algorithm separates similar pictures and consequently does not show the
same results too many times.

T
he author has not identified an

aspect of this image search
engine whose study would be
thoroughly
impracti
cal for this pr
oject, since RISE image
search engine

is designed in a
way very similar to the one
that this project is planned to
evolve.

Lastly, the author spent a substantial part of his research on the PageRank algorithm which
is one of the most known and influential

algorithms for compu
ting the relevance of web
pages and a famous trademark of Google. It
aids the correct distribution of resu
lts derived
from search engines

in accordance to the level of priority that the user has given.

Some argue
that the PageRank algo
rithm is not quite the milestone is it regarded as. B
ecause PageRank
was
f
rom the beginning an important,
if not the most important ranking factor, and algorithm
s
by many othe
r search engines
operated similarly
, once
links were of high economic value.
But after they
we
re sold, rented or exchanged, unexpectedly, numerous
manufacturers
and /
or services focused

exclusively on the marketing of specialized links.

This resulted in sites
being flooded with link entries. Neve
rtheless, the advantage of relevance and correct
distribution of results
is a significant advantage that the author will make use of in the course
of this project.



5.

Conclus
ion

and Recommendations


During this year, the author has done extensive r
esearch
on topics related to the
assignment

so as to do a good job on the

project and deliver a s
atisfying result to the

client.
These research topics included the binary search technique and various binary search types,
searching queries, various image search eng
ines to take example of and the study of
algorithms like the search algorithm and the PageRank algorithm.
Having worked wit
h all the
topics
mentioned above
,

has greatly enhanced the author’s intelligence

on the subject of the
project and
has improved the
author’s

ability to work efficiently for its completion.

The
knowledge
acquired is now aimed to be implemented into the practical pa
rt of the assignment
where the author

called to fix the problematic aspects of the image search engine of the
university and

to generally improve its functions and
,

consequently
,

the user experience.

But to conclude, it would be constructive to look into further depth
behind

the

author’s

reason
ing

for the

choice of the specific resources that have been used and, what is more,
look
at the endeavour from a critical point of view. That would serve the purpose of gaining
15

|
P a g e


greater understanding of the gradual formation of the project
and of getting

an overview of
the process that will be followed in order for it to bring satisfying r
esults.

The reason why the author conducted this
careful and detailed

research was so that
it
would
resolve several queries, questions

and problems that
occurred

when the author

set out
to begin

creating an image search engine for the unive
rsity. In the
author
’s

opinion, the best
course of action was to

initially
examine how different search engines work, what algorithms
they use and how they work through the different trees. For that reason
,

careful research was
conducted on

how one
-
dimensional and multi
-
dimensional binary trees work.

Subsequently
,

the author

studied several queries which make a better, faster and more useful algorithm
following the demands of the
creator of the search engine as well as

its future users. The
author

also thought it right t
o study the operating manner of the search algorithm in detail
,

because it would be necessary for
the initiation of the image search engine project.

In that
way
,

the author

would reach the

final goal of
adjusting
the
results derived
from the study of
the a
lgorith
m and implementing them into the

project to create a user
-
friendly search engine
,

which
simultaneously displays results quickly and efficiently.

So

to reach that
outcome it is
evident that the author

needed to have sufficient knowledge around the search algorithm.

To further understand the procedure the author followed, it would be constructive

at this
point

to specifically address the sources from which the material related to this project
was
derived and
to
express a number of findings.

Caroline M. Eastman, with the article “A Tree Algorithm for Nearest Neighbor Searching
in Document Retrieval Systems” has provided valuable information concerning binary trees
to the author, who made use of the

part that explains the general operating procedure of
binary trees, enhancing the
author’s
understanding of binary trees overall
.

From the work of Jon L. Bentley in “Multidimensional Binary Search Trees in Database
Applications” the author derived a more
comprehensive review around the two types of
binary trees, one
-
dimensional and multi
-
dimensional and examined several positive and
negative aspects of each and how each works. From here the author also learned about exact
match queries and partial match qu
eries. Concerning this, also look at the article written by
Jon L. Bentley, “Multidimensional Binary Search Trees Used for Associate Searching” and
the work of Ronald L. Rivest “Partial Match Retrieval Algorithm”.

Through the joint venture of
Jerome H. Fri
edman, Jon L. Bentley and Raphael A. Finkel
in the article “An Algorithm for Finding Best Matches in Logarithmic Expected Time” the
author gained greater knowledge regarding the step by step procedure that the search
algorithm follows in simple terms and t
his has greatly aided the author’s profound
understanding of the algorithm. The search algorithm has provided an example for the
creation of an adequate image search algorithm. It is additionally a useful resource because
parts of it might eventually be ne
cessary for the completion of the image search algorithm.

In another joint work, that of Michael J. Swain, Charles Frankel and Vassilis Athitsos for
the article “WebSeer: An Image Search Engine for the World Wide Web”, the author found
much information reg
arding how the WebSeer image search engine operates. In this work,
16

|
P a g e


the three authors in unison lay out the negative and positive elements of WebSeer, the way in
which it uses multiple decision trees and also, how it uses image similarity and how that
works
. Image similarity is also the subject of the work of Xie Chao, Wei Chengjian and Xu
Jun in their article“Evolutionary Wavelet
-
Based Similarity Search in Image Databases” from
where the author derived the three characteristics of images which help image si
milarity
search that WebSeer uses. Additionally, the author found useful the work of Priti
Masheswary and Dr Namita Srivastava titled “Retrieval of Remote Sending Images Using
Color & Texture Attribute” for the further analysis of the aforementioned charac
teristics and
to describe an effective algorithm which utilizes them.

From the collaboration of Debangshu Goswami and Sanjiv K. Bhatia in the article “Rise:
A Robust Image Search Engine” the author learned how the Rise image search engine works
and examine
d its use of quad trees. Elements from algorithms and queries mentioned are
important for the author’s project.

Lastly, t
hrough the works of Debangshu Goswami and Sanjiv K. Bhatia in “Page Rank
for Product Image Search”, Sergey Brin and Lawrence Page in
“The Anatomy of a Large
-
Scale Hypersexual Web Search Engine” and Arnon Rungsawang and Bundit Manaskasemsak
in“Parallel Adaptive Technique for Computing PageRank”, the author comprehended the
workings of the PageRank algorithm, explaining its operation and
helpfulness in the effective
display of the results that the user requires. Part of this algorithm is also mentioned in this
endeavor.

After acquiring t
he essential knowledge basis, the author

proceeded to further analyze
specific image search engines thro
ugh which
much
useful information

would extracted.

The
author

learned about

how they operate, how they were created, the rationale of their creators
and what each and every one of them separately offer
s

to its respective users. Following a
study,
the autho
r

decided to evaluate two image se
arch engines, Seek and Rise. The
primary
goal was to enrich the pool of knowledge

available and,
in the course of
the

evaluation, to get
some
valuable
parts from the algorithms
and from the different queries
, so as to
comp
ose an
image search engine which will comprise of all the positive aspects of the two successful
search engines combined.

The author
is of the opinion

that the Rise image search engine will

in fact be of more help
after l
ooking at several conclusions
reach
ed from the
research. That is
because in effect,
his
images search

engine
shares more common groun
d with
the Rise image
search engine
than it does with the Seer image search engine
.
The im
age search engine
will
be getting elements solely from the
university´s server
,

a
situation

more similar to
the Rise
application rather than the Seer application, which receives elements from all the World Wide
Web applications, analyzes them with each other and according to the results that it gets from
them
, dis
plays the results that best fit the user’s query.

An important element for a good image search engine is the display of results
according
to their
usefulness
to

the user’s query. The Page Rank algorithm is an efficacious algorithm
which
greatly helps in th
e distribution of all the research elements by order of priorit
y.

T
hat is
the main reason why it was
also
a key part of the author’s research
.

17

|
P a g e



.

References


Aouat S., Larabi S., (2010). Indexing Binary Images using quad
-
tree Decomposition.

In
Systems Man
and Cybernetics (SMC), 2010 IEEE International Conference on.

Istanbul, 10
-
13 Oct. 2010. Istanbul: IEEE. p.3074
-

3080.


Bentley

J. L. and Burkhard W.A. (1975)
Heuristics for Partial
-
Match Retrieval Data Base
Design
. 5 (2), p. 1
-
11.


Bentley J. L., Member, IEEE, (1979).
Multidimensional Binary Search Trees in Database
Applications
.

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, SE
-
5, p.

333
-
340.


Bentley L.J. (1975)
Multidimensional Binary Search Trees Used for Associative Searching
.
18 (9
), p. 509
-
517.


Brin S., Page L, (1998). The anatomy of a large
-
scale hypertextual Web search engine.

In

Proceedings of the seventh international conference on World Wide

Web
.
Amsterdam, 1
April 1998 . Amsterdam: Elsevier Science Publishers. p. 107
-
117.


Chao X., Chengjian W., Jun X., (2005). Evolutionary wavelet
-
based similarity search in
image
databases. In

VLSI Design and Video Technology, 2005.
Proceedings of 2005 IEEE
Int
ernational Workshop.
Suzhou, 28
-
30 May 2005. Suzhou: IEEE. p. 385
-

388.


Eastman
C. M., Weiss S. F.
, (
1978). A tree algorithm for nearest
neighbour

searching in
document retrieval systems.

In Proceedings of the 1st annual international ACM SIGIR
conference on Information storage and retrieval.

New York, May 1978. New York: ACM.
p.131
-
1
49.


Friedman J.H., Bentley J.L. and Finkel R.A. (1977)
An Algorithm for Finding Best Matches
in Logarithmic Expected Time
. 3 (3),
p.209
-
226.


18

|
P a g e


Goswami D., Bhatia S.K., (2006). RISE: A Robust Image Search Engine.

In
Electro/information Technology, 2006 IEEE

International Conference on.

East Lansing, 7
-
10
May 2006. East Lansing: IEEE. p. 354
-

359.


Jing Y., Baluja S., (2008). Pagerank for product image search.

In Proceedings of the 17th
international conference on World Wide Web.

Beijing, 21
-
25 April 2008. Be
ijing: ACM.
p.307
-
316 .


Priti

M. and Namita

S. (2009) Retrieval of

Remote

Sensing Images Using Colour & Texture
Attribute. 4 (1
-
2), p.

1
-
5.


Qingyun C., (2011). Research on incremental decision tree algorithm.


In Electronic and
Mechanical Engineering and Information Technology (EMEIT), 2011 International
Conference on.

Zaozhuang, 12
-
14 Aug. 2
011. Zaozhuang: IEEE. p. 303
-
306
.


Rivest R.L.
(1976
) PARTIAL
-
MATCH RETRIEVAL ALGORITHMS
. 5 (1), p. 19
-
50.


Rungsawang A., Manaskasemsak B.,

(2006). Parallel Adaptive Technique for Computing
PageRank.

In Proceeding PDP '06 Proceedings of the 14th Euromicro International
Conference on Parallel, Distrib
uted, and Network
-
Based Processing
. Washington, 15
-
17
February 20
06. Washington: IEEE. p. 15
-

50
.


Stevenson K., Leung C., (2005). Comparative evaluation of Web image search engines for
multimedia applications.

In Multimedia and Expo, 2005. ICME 2005. IEE
E International
Conference on. Amsterdam
, 6
-
8 July. Amsterdam: IEEE. P.

1
-
4.


Swain M.J., Frankel C., Athitsos V., (
1996
).

WebSeer: An Image Search Engine for the
WorldWideWeb
. Chicago, Illinois: The University of Chicago.

p.1
-
24


Swain M.J., Frankel C.,
Athitsos V., (1997).

WebSeer: An Image Search Engine for the
WorldWideWeb
. Chicago, Illinois: The University of Chicago.

p.1
-
8