Content based search engine optimization

povertywhyInternet και Εφαρμογές Web

18 Νοε 2013 (πριν από 3 χρόνια και 7 μήνες)

52 εμφανίσεις

1

Content based search engine optimization

|
Kelvin CHENG







Content based search engine optimization

Research Proposal for PhD









Author:




Ran
C
HENG

(Kelvin)

Supervisor:



Professor Xiaofang Zhou





2

Content based search engine optimization

|
Kelvin CHENG


Table of Contents

1.

Introduction

................................
................................
................................
.............................

2

1.1

Statement of Research Problem

................................
................................
.....................

2

1.2

Significance of Study

................................
................................
................................
........

3

2.

L
iterature Review

................................
................................
................................
....................

4

2.1

Multimedia Information Retrieval: What is it, and why isn’t anyone using it?
...............

4

2.1.1

Definition of multimedia

................................
................................
.........................

4

2.1.2

Difference between MIR and retrieval of non
-
multimedia information

.................

4

2.1.3

Most important technical problems in MIR

................................
............................

4

2.1.4

Killer applications in MIR

................................
................................
.........................

4

2.2

Metadata base system for semantic image search

................................
.........................

5

3.

Research Method

................................
................................
................................
....................

6

3.1

Research Approach

................................
................................
................................
..........

6

3.
2

Time
-
line

................................
................................
................................
..........................

7

4.

Bibliography

................................
................................
................................
.............................

8


1.

Introduction

1.1

Statement of Research Problem

A search
engine (SE)

is an information retrieval system designed to help find information stored
on a computer system.
Search engines help to
minimize the time required to find information
and the amount of information which must be consulted, akin to other techniques for managing
information overload.


Current market share

Most popular search engines worldwide, Dec. 2007

Company

Millions of
searches

Relative market share

Google

28,454

46.47%

Yahoo!

10,505

17.16%

Baidu

8,428

13.76%

3

Content based search engine optimization

|
Kelvin CHENG


Microsoft

7,880

12.87%

NHN

2,882

4.71%

eBay

2,428

3.9%

Time Warner

(includes
AOL
)

1,062

1.6%

Ask.com

and related

728

1.1%

Yandex

566

0.9%

Alibaba.com

531

0.8%

Total

61,221

100.0%

(Wikipedia, 2008)

Currently, the major search method used by users is text based search which means when users
input search quer
ies
, the SE
return a list of results which are in text format

(it c
an be links to the
website contains the users search query keywords or some hints to help the users to refine the
search queries)
.
As the development of Internet and broadband, users started using the search
engines to search more and more multimedia mater
ials, such as
music
, movies, pictures and etc.
At this stage, most of the search engines use the descriptions or titles of the multimedia
materials to locate the multimedia materials. Obviously, this method is not good enough to
accurately locate the multi
media materials since it does not check the content of them.
Imaging
somebody uploaded a video clip to YouTube, and did not give
an

accurate description or
title;

it
is just not possible to locate this clip by other people.
Another example, we assume the
d
escription or title of a video clip is accurate, when somebody come to YouTube, and know the
content of video clip, but do not know how to describe it
, the problem r
ise again.

1.2

Significance of Study

It’s hoped that that when the work for this project (both
research and implementation) is
completed that the study will contribute to the field of

content based search engine
optimization in the following ways:



A conceptual model which explain how to check the content of multimedia materials
when search engines t
ry to locate the multimedia materials.



A conceptual model which can let the search engine crawl the multimedia materials and
automatically generate the search engine friendly content description
.




A software application prototype which can be further extended into
fully

functional
software used by the search engine.

We hope after this research project, search engine only need a few change or extension to
create new content based search engine plug
-
i
n, which will increase the chance of accurately
locating the multimedia materials in the Internet.

4

Content based search engine optimization

|
Kelvin CHENG


2.

Literature Review

2.1

Multimedia Information Retrieval: What is it, and why isn’t
anyone using it?

In this paper, a few researchers in the Multimedia Information

Retrieval (MIR) area answer
questions about what multimedia is, how MIR is different from other kinds of retrieval, the most
important technical challenges in MIR, killer applications, opportunities, and future directions.

2.1.1

Definition of multimedia

A few r
esearchers define multimedia slightly different, but in common they all believe
multimedia should be considered as something contains more than one type of
medias
, such as
text, image, video and audio.
In broader sense, a web page with both text and image
can be
considered as multimedia.

2.1.2

Difference between MIR and retrieval of non
-
multimedia information

Wei
-
Ying Ma: I think one of the challenges for MIR is simple but effective way of forming a query.

Mike Christel: MIR when dealing with visual or aural info
rmation deals with data that humans
directly sense with their eyes and ears, and so can allow for approaches that open up the
multimedia data for more intuitive inspection by the user than is possible with non
-
multimedia
information like text.

Sebastien Gi
lles: The difference between MIR and non
-
multimedia retrieval is that MIR
comprises another layer of complexity: fusion. Multimedia to multimedia search requires data
fusion at some point, during either feature, metric or ranking computation.

Ramesh Sarkka
i: One of the key differences in multimedia information retrieval is a higher level
of “perceptual gap” between end user consumption of the media versus current system analysis
of the data.

2.1.3

Most important technical problems in MIR

Mike Christel: First, how

to address the semantic gap between low
-
level features and high
-
level
user information needs for MIR. A second key problem for multimedia retrieval against video is
demonstrating that techniques from the computer vision community scale to materials outsid
e
of the researchers’ particular test sets. Third, how to best leverage the intelligence and goals of
human users in accessing multimedia contents meeting their needs, rather than overwhelming
them with exponentially expanding amounts of irrelevant materia
ls.

Wei
-
Ying Ma: First, large scale image classification. Second, relevance feedbacks. Third, UI for
presenting multimedia retrieval results.

2.1.4

Killer applications in MIR

Mike Christel: The killer functionality which crosses domains and applications is trans
forming
our capability to produce and store massive amounts of multimedia materials into a benefit.

5

Content based search engine optimization

|
Kelvin CHENG


Ramesh Sarukkai and Wei
-
Ying Ma: The killer application is already there: Web based video
search.

This paper is not a research paper which introduces new
ideas or new approaches, but it

answered a few important questions in MIR area from a few different researchers’ point of
views.

This is an introduction level paper, and it did not go very deep in each question. It is very
helpful to bring new researchers
into MIR area.

2.2

Metadata base system for semantic image search

Two major approaches are direct retrieval using partial pattern matching and indirect retrieval
using abstract information of images. Metadata base system us the latter approach for
extracting images.
Metadata base system creates a new model for realizing
the semantic
associative search and extracting information by giving a keyword and its context words which
explain the context of the keyword. This model can be applied to extract images by giving the
keyword and its context words which represent the impre
ssion and contents of the images.

This model consists of
a few

steps:

1.


A set of m words (testing data set) is given, and each word is characterized by n
features. The m and n is used to form a data matrix.

2.

The correlation matrix with respect to the n feat
ures is constructed. Then, the
eigenvalue decomposition of the correlation matrix is computed and the eigenvectors
are normalized. The orthogonal semantic space is created as the span of the
eigenvectors which correspond to nonzero eigenvalues.

3.

Images and
keywords are characterized by using the specific features (words) and
representing them as vectors.

4.

The images and keywords are mapped into the
orthogonal semantic space by
computing the Fourier expansion for the vectors.

5.

A set of all the projections from
the orthogonal semantic space to the invariant
subspaces (eigen spaces) is defined. Each subspace represents a phase of meaning, and
it corresponds to a context or situation.

6.

A subspace of the orthogonal semantic space is selected according to the user’s
i
mpression or the image’s content, which is given as a context represented by a
sequence of words.

7.

The closet image to the keyword in the user’s impression and the image’s contents is
extracted in the selected subspace.”

(Yasushi Kiyok
i)

In this model,
although
the researc
hers use images as the example, the approach (the keywords
and context words which are used to represent the users’ impression and the images’ contents
in semantic image retrieval) still can be applied to other mu
ltimedia material.

6

Content based search engine optimization

|
Kelvin CHENG


3.

Research Method

3.1

Research Approach

There are a bunch of research methodologies available. In this project, I plan to use s
cientistic

methodology.


Scientific method is a body of techniques for investigating phenomena and acquiring new
kn
owledge, as well as for correcting and integrating previous knowledge. It is based on
gathering observable, empirical, measurable evidence, subject t
o the principles of reasoning.”
(Newton, 1999)

Normally, the scientific method consists of the following components:




Observe
: collect evidence and make measurements relating to the phenomenon you
intend to study



Hypothesize
: invent a hypothesis explaining the phenomenon that you have observed



Predict
: use the hypothesis to predict the results of new observations or measurements



Often advanced mathematical and statistical hypothesis testing techniques are used to
design experiments that attempt to effectively test the plausibility of hypotheses



Verify
:

perform experiments to test those predictions.



Attempting to experimentally falsify hypotheses is thought to be a better choice of term
here



Evaluate
: if the experiment contradicts your hypothesis, reject it and form another. If
the results are compatible

with predictions, make more predictions and test it further.



Publish
: Tell other people of your ideas and results, and encourage them to verify the
claims themselves, in particular by inviting them to challenge your
reasoning and check that your
experimental results can be repeated. This
process is known as ‘peer review’
. “

(Gable, 2006)


The scientific method is a general concept,
which can be used as research
method in different areas rather than computer science onl
y
.

In terms of
scientific method, there are two possible ways the researchers can take
-

Deductive Reasoning

and
Inductive Reasoning
. “
Deductive reasoning is
generally used to predict the results of the hypothesis. That is, in order to
7

Content based search engine optimization

|
Kelvin CHENG


predict what measure
ments one might find if you conduct an experiment, treat the hypothesis
as a premise, and reason deductively from that to some not currently obvious conclusion, then
test for that conclusion.


(Gable, 2006)

This approach starts

from

more general
then keeps
narrow the topic and makes it

more specific
.
Sometimes
it’s
called ‘top down’

approach. The
right
-
hand side picture shows the procedures of Deductive Reasoning. Inductive reasoning
works the reverse way; it starts from the mor
e specific or the bottom to the more general or up.
The left
-
hand side picture shows the procedures of Inductive Reasoning. Two ways of scientific
form the cyclical Nature of Research, which is shown below.



3.2

Time
-
line

Within 6 months



Literature Review



Fully understand the research problem and divide the big problem into small pieces.

Within 12 months



Understand the characteristics of different multimedia material categories (such as
pictures, movies and music)



Come out the ideas to retrieve the content

information of the multimedia material
according to their characteristics

Within 30 months



A conceptual model which explain how to check the content of multimedia materials
when search engines try to locate the multimedia materials.



A conceptual model whi
ch can let the search engine crawl the multimedia materials and
automatically generate the search engine friendly content description.

8

Content based search engine optimization

|
Kelvin CHENG





A software application prototype which can be further extended into fully functional
software used by the search engine.

Within 36 months



Finish PhD dissertation

4.

Bibliography

Alejandro Jaimes, M. C.
-
Y. (2005). Multimedia Information Retrieval: What is it, and why isn't
anyone using it?

Chang, S.
-
F. m. (1999). Multimedia Access and Retrieval: The State of the
Art and Future
Directions. Orlando FL.

Christel, M. S. (14998). Evolving Video Skim into Useful Multimedia Abstractions. Los Angeles,
CA.

Davis, M. K. (2004). From context to Content: Leveraging Context to Infer Media Metadata. New
York, NY.

Gable, G. (200
6).
Scientific Method
-

ITN100 Research Methodology Lecture Note.

Brisbane.

Hart, P. P. IEEE Magazine.

Hauptmann, A. a. (2004). Successful Approaches in the TREC Video Retrieval Evaluations. New
York, NY.

Hauptmann, A. (2005).
Lessons for the Future from a

Decade of Informedia Video Analysis
Research.

Singapore.

Newton, I. (1999).
The System of the World(B. Cohen&A. Whitman, Trans. 3 ed); University of
Califonia Press.


Rowe, L. a.
ACM SIGMM Retreat Report on Future Directions in Multimedia Research.

http:/
/www.sigmm.org/Events/reports/retreat03/sigmm
-
retreat03
-
final.pdf, March, 2004.

Wikipedia. (2008).
current se market share.


Yasushi Kiyoki, T. K.
A Metadatabase System for Semantic Image Search by a Mathematical
Model of Meaning.

Tsukuba.