A New Web Query Method and Semantic

pogonotomygobbleΤεχνίτη Νοημοσύνη και Ρομποτική

15 Νοε 2013 (πριν από 3 χρόνια και 11 μήνες)

63 εμφανίσεις


1




Abstract


Nowadays,
World
-
Wide Web has developed to a distributed information space which
is including nearly 100 million workstations and several billion pages
.

This situation brings the people
big trouble in finding needed information although huge amount of information ava
ilable
on webs.
The Search Engines is a very important tool for the people to get information fro
m internet but the
low
-
accuracy

and low
-
recall exists widely in current search engines. With rapid development of
Internet, the effective and accurate intellig
ent search engine based on the Web Mining technology has
become important research issue.
So, the development of web search technology is early stage. Search
Engines

provide only primitive data query capabilities

and require a detailed syntactic specificat
ion to

retrieve
relevant data. Also, web data can include PDF documents, images and sound clips that are
difficult to be queried. So, this project investigates a new web query method which is called Smart
Web Query method for semantic retrieval of web data
. The SWQ method uses domain semantics
represented as context ontologies. These are specified and formulate appropriate web queries to
search.


Index Terms


Search Engines, Information Retrieval, Web Mining, Ontology Development
and Smart Web Query Engine
.


Duygu Celik,

035313

Eastern Mediterranean University

Computer Engineering Department

Gazimagusa, K.K.T.C

duygu.celik@emu.edu.tr

A New Web Query Method and Semantic
Retrieval of Web Data


2


CONTENTS


I.

INTRODUCTION

................................
................................
................................
....................

3

II.

WEB INTELLIGENCE IN IFORMATION RETRIEVAL

................................
.................

4

2.1 Multimedia Information Retrieval

................................
................................
..............................

5

2.2 Introduction for Search Engines

................................
................................
................................
..

5

III.

RELATED RESEARCHS for SEARCH ENGINES

................................
.............................

7

3.1 Syntactic Search

................................
................................
................................
..........................

7

3.2 Metadata Search

................................
................................
................................
..........................

7

3.3 Query
-
by
-
Example

................................
................................
................................
......................

8

3.4 Autonomous Navigational Search

................................
................................
..............................

8

IV.

ONTOLOGY DEVELOPMENT AND APPLICATION

................................
......................

9

4.1 Ontology Development and Problems Occurred

................................
................................
........

9

V.

SMART WEB QUERY METHOD AND ENGINE

................................
...............................

9

5.1 SWQ Architecture and Process

................................
................................
................................
...

9

5.1.1Web Query Parse

................................
................................
................................
.................

11

5.1.2 Ontology Determination

................................
................................
................................
.....

12

5.1.3 Synonym Determina
tion

................................
................................
................................
.....

13

5.1.4 Web Query Formulation

................................
................................
................................
.....

13

5.1.5 Determination of Web Pages Relevance

................................
................................
............

14

5.1.6 Filter Search

................................
................................
................................
........................

15

5.1.7 Page Ranking

................................
................................
................................
......................

16

VI.

CONTEXT ONTOLOGY

................................
................................
................................
......

16

6.1 Flexible structure of Context Ontologies

................................
................................
..................

17

6.2 Contents and Organization

................................
................................
................................
........

17

6.2.1 Organization of Context Ontolog
ies

................................
................................
...................

17

6.2.2 Ontological and Term Properties

................................
................................
........................

18

6.2.3 Term Relationship Properties

................................
................................
.............................

18

6.2.4 Relational Scheme of SWQ Ontologies

................................
................................
.............

20

6.3 Benefits and characteristics of the flexible structure

................................
.............................

21

VII.

SEMA
NTIC SEARCH FILTERS

................................
................................
.........................

22

7.1Readibility Filter

................................
................................
................................
........................

22

7.2 Document Structure (Layout) filter

................................
................................
...........................

23

7.3 Word Sense Filter

................................
................................
................................
.....................

23

VIII.

Conclusion

................................
................................
................................
...............................

24

IX.

References

................................
................................
................................
................................

25





3



I.

INTRODUCTIO
N




orld Wide Web
is largest an most accessible source for data. But
Web searches are very
difficult
for several reasons.

One of
these

reasons search engines are very simple. Search request is
making by using some keywords or terms separated by Boolean op
erators. These keywords are input
from the user.
For accuracy of search results and recall may require the input of many keywords.

Second reason is Context Sensitive web search is not available because search engines can
only retrieve data on a syntactic
basis.

I mean not possible to arrange domain specific knowledge in
to the search engines’ queries.

Third reason is web data can be heterogeneous format such as PDF documents and images.
Most of search engines are not consider such data sources in deriving
their findings.

Internet users currently can not perform semantic retrieval of web data from the WWW. For
example, a user wants to search “bonds” which is related with financial Investments. But search
engine results are giving undesired information which

is about adhesives
. The
“bonds” term

refer to
an attachment between two objects.

Firstly, I will make an introduction to this subject area. Then, the second part I will investigate
the web intelligence in information retrieval and then
I
will make an intr
oduction to

search engines.
The

third part will be about related researches for search engines and about some approaches on web
search.
The

fourth

part will be about ontology development and application.
The

fifth part will be
about SWQ method and engine i
n and the role of context
ontologies in the SWQ method. The

sixth
part will be about Context ontology and then will investigate context ontology’s contents and
organization and show how terms and relationships can be presented in a flexible manner. After t
hat,
semantic search filters will be explained in seventh part. The last two part of the project will be
about evaluation of performance results and conclusion.





W


4



II.

WEB

INTELLIGENCE

IN

IFORMATION

RETRIEVAL


The web is increasing information and an inte
lligent system is required. Web intelligence is
including Artificial intelligence and Information technology which are producing an intelligent
system.


Fig.1 Web Intelligence

(taken from [
2
]).

The web intelligence Consortium is an international organizat
ion and they are studying on this
area. This organization identifies 9 key topics in this area. Those topics divided in to 75 sub areas.
Multimedia retrieval is one of the
subsections

in the web information retrieval category.

This project focuses on ontol
ogy
-
based information retrieval and multimedia retrieval. Web
information retrieval part includes are showing the next figure.


Fig.2 Web Information Retrieval

(taken from [
2
]).


5


Intelligent multimedia information retrieval is a multidisciplinary area tha
t lies at the
intersection of artificial intelligence, information retrieval and multimedia computing.

2.1
Multimedia Information Retrieval

More information is available on the web. The efficient and effective retrieval and
management
of these web document
s are still very popular research issues. With the rapid
development of the internet technology, the number of internet users and the amount of multimedia
information on the internet is ever increasing. Recently, web sites including a lot of image
informat
ion and to find a specific image from these image sources, we usually use image database
engines or web search engines. Image
contents are

more
complicated

to retrieve than say
textual

stored data in traditional databases. Image
retrieval

techniques
should

provide support for user
queries in an effective and efficient way, just as conventional information retrieval does for textual
retrieval. An agent on the web can be described as
a program that collects

information of performs
some other service without y
our immediate presence and some regular schedule. Usually, an agent
program, using parameters provided by the user will search all or some part of the internet. Gather
information you are interested in, and present it to you on a predefined periodic basis.

Soma agents
are Semantic Agents, Information Filtering Agents, and Remembrance Agents.
Next part of the
project you will find a smart search engine and its properties which is ontology based information
retrieval.

2.2 Introduction for Search Engines

When
people use the Search Engine through the Web browser, they can only see part of the
“frontal end”. This interface is showing human
-
machine interaction. When
user put

forward a
searching request

by the
browser which
i
s connecting the web server. The web ser
ver can find the
matched documents in a large indexed database and list the indexes documents and then returns the
results to the user by the
browser

[3]
.

At the “hinder end”, a

spider moves automatic on the web and summary the web pages’
indexed informati
on, saves them with the web page’s URL in the indexed database. The
indexed
database

must be updated continuously to satisfy the dynamic changes of the internet

[3]
.



6


Fig. 3 The architecture of the
Search Engine

(taken from [
3
]).

Search engines are divi
ded in to two categories according to the information collecting method
and the service providing manner. These search engines are Directory Based Search Engine and
Crawler
-
Based Search Engine. Previous collected information and
formed summary by manual wo
rk
after the editor browsed the information on Web

[3]
.

Having added the people’s intelligence, this kind of search engine has the quality of
information accuracy and high
-
quality navigation. But it can’t search the Web overall and its
indexed database can
’t be updated in time.

The last catches the web site and all pages that it is linked
to, analyzes their content,
and adds

them into the indexed database
automatically according

to the
hyperlinks one page to another page
by running correlative programs name
d crawlers, Spider or
Robots.

Because of this reason, the indexed databases of this kind of the search engine are very
huge, and the information can be updated in time. The disadvantage is the search results are too
many to display the information that the

user wants at the front pages correctly.

Another type of the search e
ngine named Meta
-
search Engine. It
doesn’t have the indexed
database of its own nor distributes the query to multiple underlying search engines after the process
of pretreatment. When
th
e

engine receives the query results from underlying search engines, it can
rank the order for these results and then returns them to the user. The engine has many benefits such
as increasing the coverage of the web

and improving the
scalabi
li
ty

of the syst
em, but it can’t use the
function of the original search engine sufficiently and should do more process to the search results.


7


III.

RELATED

RESEARCHS

FOR
SEARCH

ENGINES


The development of SWQ method is related with two areas. Those areas are Web Search
Techno
logy and second one is Development and use of Ontologies


in web searches. There are some
approaches for web search which are;




Syntactic Search



Metadata Search



Query
-
by
-
example



Autonomous Navigational Search

I will explain them in next sections in this p
art with generally.



3.1 Syntactic Search

Syntactic search is most using search approach on unstructured or semi
-
structured data. It is
using most existing search engines. Documents are discovered, classified and referenced in a search
database by human s
ubject experience or by automated agents/spiders. Database instances are
accessed by some Boolean are using for separating between keywords such as ‘and’ and ‘or’ or
hierarchal searches.



3.2 Metadata Search

Metadata search improves on syntactic search by

using relevant aspects of a document’s
schema.

Metadata search is making two
types
. Those are syntactic and semantic search. In syntactic
metadata search, syntactic elements of a document such as a title, section headings, and links are
considered
. For ex
ample, the metadata search engine Harp integrates queries on multiple search
engines into relational database format. Harp’s
collect

results are then compared to identify relevant
web pages. Google identifies relevant web pages based on
documents

(through
links) by other web
pages.


8

Semantic Metadata search is usually performed in
generalized

domains. Domain specific
knowledge is used to increase the accuracy and recall of the query. For example, the CiteSeer
system
retrieves

research publications about spec
ific subject areas. In CiteSeer, domain specific knowledge
such as commonly referenced
authors

can be specified by the searcher to identify
appropriate
publications. The system the checks the document bibliography for relevant author.



3.3 Query
-
by
-
Exampl
e




In QBE, the search engine is given a sample document. The search engine then retrieves
suspected relevant documents. The user can get bigger
the example document pool and filter the
search by

selecting appropriately
relevant documents from those retur
ned by search engine. QBR is
used for non
-
textual searches, such as those for web images.



3.4 Autonomous Navigational Search



Navigational search
look like

the web browsing search
process
. A robot
or autonomous
agent is given a sample page that contains

links. The robot traverses the links, evaluating each page
it encounters for relevance
to

a search specification.

As a result

a semantic metadata search approach will be most popular for retrieving relevant
web documents.

Syntactic search is poor becaus
e it is not considering the context of the search.

QBE is useful in very generalized domains such as querying for images and it is difficult to
work for a generic search.

In navigational search is employing an agent. Typically must have its search specific
ation
represented using one of the other search strategies.

Syntactic metadata is poor as well because purely syntactic search.

The most important
difficulty

for semantic metadata
search which

is rapid
use of metadata
representations for multiple contexts.

Most metadata search engine is limited themselves to
particular contexts because of context representation is difficult and time consuming task.





9


IV.

ONTOLOGY

DEVELOPMENT

AND

APPLICATION


4.1

Ontology Development

and Problems Occurred


Ontologies can devel
oped in different research areas such as Chemistry, Law, Phone directory,
Product catalogs etc…
Ontologies are important for catching the semantics of terms. Ontologies can
direct specify as conceptualization. Conceptualization is providing simplification o
f the world. Most
ontologies including terms, their definitions and relationship between them and ontologies using this
property to create an organization like hierarchy/taxonomy. There are some general or specific
problems
can occurs

when an ontology deve
loping
for general purpose search engines.

For example,
t
raditional
o
ntologies are normally developed by a human domain expert. For wide range of Web is
difficult and expensive to obtain a domain expert for the more than enough of topics for which users
wi
ll search.

Those problems are;



Difficulty

of Obtaining Human Domain Expertise



Extensibility



Validation Difficulty



Heterogeneous Definitions



Ontology Integration



Adaptability



Ontologies Adopted For Web Search


V.

SMART

WEB

QUERY

METHOD

AND

ENGINE


5.1 SWQ Arch
itecture and Process

SWQ architecture can be dividing in to three parts. First part is Internet users and their user
interfaces and second part is SWQ engine and its components and the last part is some search
engines. The second part is the main part of S
WQ engine which is including two components. These

10

components are Context Ontologies and Semantic Search Filters. Context Ontologies are using for
application domains and search filters are using for improving search precision based on properties
in the
context ontology. Also User interface is similar to traditional search engines. A user is giving
some set of keywords which are separated with some Boolean operators and also some search
parameters such as maximum number of pages to consider for search ope
ration. Below figure is
showing SWQ architecture;




Fig.4 the System Architecture of Smart Web Query

(taken from [1]).


The SWQ is performing seven
-
step process. Each process each time is working and the next
step is continued to reach semantic search re
sults. After that those results are sending to user. These
results are getting from user search request by using some taken keywords from the user. The
overview of those process as some steps in the below figure;




11



Fig.5 Semantic Web Search Process perf
ormed by SWQ Engine

(taken from [1]).


5.1.1Web Query Parse

The user enters some keywords and also some search parameters then the engine is building a
parse tree for the users’ query. Fo
r

example, a user wants to search high grade bonds and he/she
wishes
only consider 50 web pages can form a query through the user interface of SWQ. You can
see this part in below figure as an example;



12


Fig.6 User Interface for Specifying Web Queries

(taken from [1]).


5.1.2 Ontology Determination

SWQ engine is supporting
multiple Context Ontologies. The user should select most relevant
ontology for helping to search operation. To helping ontology determination, the engine matches the
user’s keywords to the ontologies and present appropriate Ontologies. There is making sequ
ence
from most appropriate to least appropriate ontologies. For example “Bond” and “High Grade” terms
exist both financial trading and adhesives ontologies. At this point the engine will ask to the user
which ontology is most relevant her/his search. Then
the user will select correct ontology. Next
figure is showing this step as an example;


Fig.7 User Interface for Identifying the Relevant Ontology

(taken from [1]).


13


5.1.3 Synonym Determination

Some synonym keywords are producing according to users’ key t
erms. Then set of the
keywords and also synonym terms passed to Boolean search engines such as AltaVista and Yahoo.
For example, “AAA”, “investment grade”, and “low risk” are all synonyms for “high grade”.
Similarly, “T
-
bill”, “municipal” and “note” are ei
ther a synonym or a kind of bond. The next figure
is showing SWQ is enhancing the search terms with these synonyms.


Fig.8 Synonym Determination for SWQ Engine

(taken from [1]).


5.1.4 Web Query Formulation

Firstly, SWQ engine used to selected context on
tologies to refine and then enhanced the key
terms identified by the user. The enhanced key terms are then used to identify appropriate search
engines to query. For example, some terms in the context ontologies are also associated with
particular kinds of
data formats such as graphics, PDF, sound. There are specialized search engines
that can perform queries on those kinds of data formats. If a data format term is used in the search
query, or is a synonym of a term used in the search query, then SWQ accesse
s those specialized
search engines. The term “technical” in Financial Trading refers to an analysis using charts and
graphs that are represented graphically on web pages. Thus, when the term technical in request,
SWQ sends a query to AltaVista’s image quer
y search engine to find chart like images that match
the search specification. Sample queries sent to various search engines are shown in below figure;


14


Fig.
9

Web Query Formulation in SWQ

(taken from [1]).


5.1.5 Determination of Web Pages Relevance

Most
search engines return some URLs and their snippets of text found those the web pages.
These snippets are including some bold terms which are used in search terms and also it is including
some italic terms which are found from the selected ontology by user
but not used in the search
terms. These italic terms are matching terms. These number of matching terms is comparing with
number of total terms in this snipped for rank operation of web pages relevance.
There is a threshold
value

and that comparison is giv
ing a score about those web pages. If this score is lower than
threshold value (0.05) and then those web pages rejected.

Next figure is determining this situation. There are three snipped returned to SWQ engine from

using key terms “bond” and “high grade”.

The first page is Bonds
-
Online and the second one is
Uranium and Nuclear Power Centre in Australia and the last one is from a company which is related
with sanding pads. The second
and third
page
s are

retrieved

because search terms are (bonds and
high gra
de) also
appear

on these
web pages. Now SWQ identify which pages are relevant with
matching terms (which are taken from the selected context ontology as synonym terms for search
terms). SWQ making the comparison operation and then specify the score and the
n make a decision
the page is relevant or not according to threshold value.


Finally the SWQ engine identify Bonds
-
Online page as relevant and the other will be rejected
because of their snipped is not including enough Financial Trading Ontology terms.



15


Fig.
10

Determination of Web Page Relevance

(taken from [1]).


5.1.6 Filter Search

The final list of the relevant web pages are sending to set of the semantic search filters. Each
filter are using different semantics domain. For evaluation of those web pag
es the engine is using
these types of filters.


Three filters are being currently used which are readability, document structure and word
sense respectively.

Next figure presents the readability score and appropriate grade level of readers for each web
pa
ge by applying readability filter.


Fig.11 Readability Filter applied to SWQ snippets

(taken from [1]).


16


5.1.7 Page Ranking

If number of result web pages is greater than number of requested pages by user than the user
can see the web pages highest ranking
s. Or the user can filter
the matching key
terms in the selected
ontology.
Suppose SWQ identify 153 web pages as relevant and the user select 50 web page as result
number of pages. Then the engine will say this situation is not appropriate for ranking oper
ation.
Then the engine will prepare a list of ontological terms. The user sees the list and select additional
terms to reduce the relevant web pages. For example, if trend is chosen the number of relevant web
pages is decreasing to 20.

You can see this si
tuation in the next figure;


Fig.12
Selection of Additional Terms to
filter

the search

(taken from [1]).


VI.

CONTEXT

ONTOLOGY

The context ontologies are defining the terms and their relationships.

The aim of the context
ontologies is capture and organizes do
main semantics in flexible manner. They may be reuse
d and
shared by search engines.
Context ontologies are dividing in to three components. These are
Terms
,
Terms Relationships

and
Properties
.

Terms:

The vocabulary of a domain obtains a set of basic terms
.

Term Relationship:

a set of relationship between these terms.

Properties:

a set of the domain

(ontology)
, terms and their relationships
.

For example, the
readability

of the domain or the word sense of the term and
transitivity

of the relation ship.


17

In th
is section I investigate detail of
contents

and the organization
structure

of context
ontologies.


6.1 Flexible structure of Context Ontologies

Traditional approaches of the ontology development demonstrate as hierarchies of terms which
is called taxonomie
s.
Taxonomies

structure

mean is
hierarchical order of terms. Taxonomies are
ordering according to term relationship. For example, when we want to put some terms such as
stock, share and bond terms in to ontology according to taxonomy then, necessary to est
ablish
hierarchical order among them and other terms before organizing. But, this kind of structure is not
good according to semantic web search.

The context ontology is not organizing the terms according to hierarchical ordering.
The
relationship among te
rms such as “stock is a synonym of share”. This can be added after any
sequence. SWQ is accepting this terms that have no relationship with other terms. This flexible
structure in context ontologies is providing more adaptive, extensible and rapidly
organi
zing for
SWQ engine.


6.2 Conte
nts and

Organization


6
.
2
.
1

Organization of Context Ontologies

Some ontologies are related with each other by number of terms they have in common. For
example, the “Financial Trading” and “Accounting” ontologies are strongly
related. So, some terms
are common such as “Annual Report”, “Profit” and “Loss”. Because of this reason ontologies can be
sub
-
ontologies of other ontologies (super
-
ontologies).
An ontology can be a sub
-
otology if only if all
its terms are including by anot
her ontology. For example, “Bond Trading” is a sub
-
ontology of
Financial Trading ontology. The Bond Trading ontology is including only from Financial Trading
terms.

Properties of Sub
-
ontologies are always restricts by properties of their super
-
ontologies.

For
example, “
yield


mean is an interest obtains from a financial instrument according to financial

18

trading ontology. But, “yield” mean is only to the guaranteed yearly return one obtains from the
bond. The next figure is showing this simplified organizat
ion:


Fig.1
3

Ontology Hierarchy for Financial Trading

(taken from [1]).

SWQ have related
various Trading ontologies in hierarchical manner as you can see above
sub
-
ontologies.

Also, SWQ allows non
-
hierarchical

organization of ontologies as well. For examp
le
Financial Trading can convert a sub
-
ontology by adding super
-
ontologies such as “Finance” and
“Financial Law”.

6
.
2
.
2

Ontological and Term Properties

Ontologies and terms can have properties. It is called ontological and term properties
respectively. Eve
ry
ontology

has a
readability

property. Readability measures amount of formal
education a reader requires to understand a document. For example, documents on Financial Trading
require more education to read than documents on “Leadership Self
-
Help”. Similar
ly, every
ontological
term

has a
Definition

and
Word Sense

(e.g. noun, verb, etc…)

property.

6
.
2
.
3

Term Relationship Properties

Terms can interact with other terms by using term relationships. There are two type properties
in term relationship properties.
These are
generic

and
specific
.

Generic term relationship properties area available all term relationships. For example, some
terms relationship can be symmetric such as “stock is a synonym of share”. Other term relationship
can be asymmetrical such as “bo
nd is a kind of instrument”. This is
direction

property for term
relationship.

Term relationship can be transitive in SWQ. For example, “T
-
Bill is a kind of bond” and
“Bond is a kind of instrument” and if both are exists in
an ontology

then, no need “T
-
Bil
l is a
n


19

instrument” relationship.

A property may be related only some term relationship. For example,
“stock” and “share” are synonym in financial trading ontology. The synonym term relationship has a
property which is called
Semantic Distance
. Semantic d
istance is

showing

a
degree for synonym
between two terms

with the range 0 to 7. A score 7 is showing strong synonymity such as “stock”
and “share”.

Another score 4 is showing weak synonymity such as “market” and “exchange” and
another score 1 is showing v
ery weak synonymity such as “high risk” and “high return”.

It is not
necessary to specify all term relationship properties. For example a person who develops an ontology
about Financial Trading may not be capable to identify the semantic distance between s
ynonymous
terms. Leaving this relationship property
which is

not forbid the SWQ.

SWQ ontologies can be extended, enhanced and refined incrementally with additional terms,
relationships and
properties
.

Synonymy, antonym, homonymy and metonymy (
standard
lex
icographic

relationships
) can be modeled in that manner. Next figure is showing model of the
SWQ ontologies.




Fig.1
4

Model of SWQ Ontologies

(taken from [1]).



20

6
.
2
.
4 Relational Scheme of SWQ Ontologies

SWQ ontologies and Relational database schemas shar
e many common characteristics. For
example, both forms don’t direct order data elements. For this reason they choose relational database
management system (DBMS) for organize and store SWQ ontologies. Next
table

is showing
relational schema for organizing
SWQ ontologies.



Ontology

Ontology (
Ontology Name
)

Ontology Properties (
Ontology Name
,
Property Name
, Value)

Ontology
Relationship

(
Ontology Name
,
Super Ontology Name
)

Terms

Terms
(
Ontology Name
,
Term
,
Definition, Source
)



Terms
Relationship

Relati
onships (Ontology Name,
Relationship Code,

Relationship

Type
)

Relationships
Properties
(
Relationship Code
,
Relation
ship Property Name
,

Relationship

Property Value
)

Relationships

Terms

(
Relationship Code,

Term
)

Relationships

Origin

(
Relationship Code,

Term
)


Tab
.
1

Relational Schema for Organizing SWQ Ontologies

(taken from [1]).


Now I will explain each of them below;



First of all
Ontology

relation is storing name of the ontology.



Ontology Properties

relation is storing name of properties of the ontolo
gy and its value

(e.g.

Readability score)
.



Ontology Relationship

relation is storing ontologies’ name
s

which are may be super
-
classes
or
sub
-
classes ontology.



Terms

relation is storing terms (e.g. high grade and bond) and term properties such as word
sense

used in the ontology.



Relationships

relation is storing
relationships of terms.



Relationships Terms

relation is storing intersection data of these two relations.


21



Relationships Origin

relation is identifying the term that addressed the other terms

for
dir
ectional relationships
. For example Bond is
an

instrument relationship.
The

Bond term is
presented in the Relationship Origin relation.



Relationships Properties

relation is storing properties of the term relationships. These
properties are
distinguished

fr
om the
Relationships

relation, because different kinds of
relationships have different values.


6
.
3

Benefits and characteristics of the flexible structure

First benefit of flexibility is easy to organize ontology about a subject domain. Minimally, an
ontol
ogy can be organized by entering a set of ontological terms without any relationships among
them. If you have many domains/ontologies, you can
discover terms

by identifying a
domain/ontology dictionary.


Fig.1
5

An example of SWQ Ontology

(taken from [1]).


A
portion

of the financial trading ontology is shown above figure. Rectangles are showing
logical terms. Ovals represent term relationship. Underlined words represent properties (for terms or
terms relationships). Different segments of this ontology can
be defined by different people. One
user can define bond trading terms and another person can define share trading terms. Also third

22

person can define relationships between bonds and stocks. Because the terms are organized using a
flexible
structure
. Relat
ionships defined by one user will not compromise relationships defined by
another. For example, the relationship “municipal is a government Bond” does not change the
relationship “government bond is a Bond”. The deletion or modification of one or the other

relationship does not affect the other.


VII.

SEMANTIC

SEARCH

FILTERS

Search filters are
increasing

accuracy of the search
in SWQ.
Semantic search filters are
specific search modules that use one or few relationships or properties in the ontology for evaluate
the relevance of the web page. Currently SWQ have three search
filters

which are readability filter,
document structure (Layout) filter and word sense filter. Next sections are investigated details of
those
types

of filters.


7.1Readibility Filter

Readabil
ity filter is using Flesch
-
Kinkaid readability score for identifying relevant web pages.
This score can compute from below equation;

R=206.835
-
0.84w
-
1.015s

W
, is number of syllables per 100 words

S
, is number of words in a sentences. Next table is showing
the score’s interpretation.


Tab
.
2

Grade Level of Flesch
-
Kinkaid Readability Score

(taken from [1]).



23

Each returns web pages from search engines which are evaluating for their grade level. This
grade level comparing with Readability ontological property w
hich contains a range of acceptable
readability scores. Web pages grade levels do not fall in appropriate grade level ranges if it is they
will be rejected.

For example, financial trading reports are typically targeted towards high school or
higher educati
on level graduates. Thus, taking in to account unusually simple or complex financial
trading reports, the language level expected on these reports would range from 30 to 69. Web
pages
that match the financial trading ontology key terms must have scores wit
hin that range to be
considered by SWQ.


7.2 Document Structure (Layout) filter

Web pages found in specific domains look to have similar structures. For example, most
financial trading web pages have services to create trading accounts, recommendation colu
mns and
price quotes for trading instruments. The layout filters identify web pages according to their layout

[1]
.

Layouts are identified as a set of ontological terms.
For example, a financial trading home page
will have one of the terms such as
connect
,
sign on, open,
source

for opening an account. Also,
schedule, quote, chart

terms are using for present stock market graphs. And
tips, news, facts

are
using for linking to a web page presenting financial advice.
These terms are often partly ordered. For
exa
mple, the option for opening an account of financial trading normally,
seeing before

the option
for viewing financial advice.


7.3 Word Sense Filter

The word sense filter identifying ontological terms which are identified by the filter as noun,
verb, adjec
tive etc. The filter parses the text pieces or snippets then makes identification word sense
of ontological terms. Then the
filter compares

the word sense with Word_Sense term property. If
a
text snippet of the relevant web page is

include many
wrong

word
sense terms then this page will be
rejected. For example
,

if the snippet contains a
n

expression

such as “quick drying cements bond
multiple surfaces”

then this page will be rejected because the bond term is a verb in there. However
bond term is a noun in f
inancial trading ontology.




24


VIII.

C
ONCLUSION

In this study I investigate a smart web
query method and its engine. Ho
w it is work what are
the steps of the engine which are explained with details and to make strong with graphs and table
explanations.
This searc
h engine has an
intelligence

which is using context ontologies and semantic
search filters.
Traditional ontologies
are

based on taxonomies but SWQ context ontologies adopt a
more flexible organization structure.
The structure support more extensibility, f
lexibility and
adaptability than traditional ontologies. The SWQ ontologies
support domain semantics to increase
the accuracy and recall of web search. For example, semantic search filters use domain semantics
such as readability, document structure, synon
yms, word sense and associated media such as sound,
PDF to measure the relevance of web pages.

In my opinion, the search engine if contain some intelligent parts or some intelligent agents
between parts of the architecture of the SWQ engine such as between

smart web query part and
context ontology part, the engine will specify more appropriate ontology query and if some
intelligent agents can be interact SWQ engine and other traditional search engine so, the relevant
pages are more correct and may be search

filters will not be necessary.

For example, in r
anking step (
which is
step 7)
, an

agent can choose some terms in selected
ontology and then it can filter according to
those

matching key terms. So, may be no need use this
interface.

Also, in f
iltering step

(
which is
step 6),if this step is include more filtering type then number of
relevant web pages will be more decrease and this mean is the accuracy of relevant web pages will
be high.

As a conclusion, the search engine is a good research for semantic met
adata search engines.
This research is beginning of this area and this research may be developed with using more flexible
ontologies or adding more term properties to domain semantics. Those researches will provide more
useful for searching operation on ww
w.







25




IX.

R
EFERENCES

[1]. Roger H.L. Chiang, Cecil Eng Huang Chua and Veda C. Storey,
A smart web query
method for semantic retrieval of web data
,
Data & Knowledge Engineering, Volume 38, Issue 1, July
2001, Pages 63
-
84,
http://www.sciencedirect.com/science
,
Last visited page: 04 November 2004.

[2]. Kevin Curran, Cliona Murphy and Stephen Annesly,
Web Intelligence in Multimedia
Retrieval
, IEEE/WIC International Conference on Web Intelligence, 2003, Last visited page: 04
November 2004.


[3]. Yan Li, Xi
n
-
Zhong Chen and Bing
-
Ru Yang,
Research on Web Mining Based on
In
telligent Search Engine
,
Machine Learning and Cybernetics, 2002. Proceedings, 2002 International
Conference on Volume: 1,

4
-
5 Nov. 2002 Pages: 386
-

390 vol.1
, Last visited page: 04 November
2004.

[4]. Craig A.N. Soules and Gregory R. Ganger,
Why can’t I find my files? New Methods for
Automating Attribute Assignment
, Carnegie Mellon University, Last visited page: 04 November
2004.

[5].

Udo Kruschwitz,
Exploiting structure for intelligent Web Search

,
System Sciences, 2001.
Proceedings of the 34th Annual Hawaii International C
onference on,

3
-
6 Jan. 2001
, Last visited
page: 04 November 2004.

[6].
Giuseppe Amato , Fausto Rabitti and Pasquale Savino
,
Multimedia document search on
the Web
,
Computer Networks and ISDN Systems, Volume 30, Issues 1
-
7, April 1998, Pages 604
-
606,
Last visited page: 25 November 2004.