Personalized Ontologies for Web

draughtplumpInternet and Web Development

Oct 22, 2013 (3 years and 9 months ago)

85 views

Personalized Ontologies for Web
Search and Caching


Susan Gauch

Information and Telecommunications
Technology Center

Electrical Engineering and Computer Science

The University of Kansas

Department of Electrical Engineering and Computer Science

I T T C

Professor Susan Gauch

December 1999

Outline

Motivation

User profiles


creation and maintenance


evaluation

Applications


re
-
ranking (and filtering) search results


Web caching

Conclusions

Department of Electrical Engineering and Computer Science

I T T C

Professor Susan Gauch

December 1999

Motivation

Decrease access time for Web pages


Server approaches

-
use access logs to decrease access times for popular
pages

-
not tailored to individuals

-
doesn’t decrease network traffic


Network approaches

-
cache popular pages multiple places in the network

-
not tailored to individuals

Department of Electrical Engineering and Computer Science

I T T C

Professor Susan Gauch

December 1999

Personalization

Different information needs for different users


can we learn user’s interest?

-
Explicitly?

-
Implicitly


can we use this information?

-
improved search

-
improved browsing

-
faster Web page access

Department of Electrical Engineering and Computer Science

I T T C

Professor Susan Gauch

December 1999

Intelligent Web Caching

Improved (and faster) search results


pre
-
caching all search results expensive

-
Internet search engines return 50% irrelevant pages


improved knowledge of user’s likely behavior

-
intelligent pre
-
caching

-
use past behaviors to predict future behaviors

-
pre
-
cache “best” pages close to individuals


Department of Electrical Engineering and Computer Science

I T T C

Professor Susan Gauch

December 1999

Context

ProFusion:
www.profusion.com

OBIWAN:
distributed content based IR


Web clustered into regions


clustering criteria: content, location, company


search: query brokered to “best” regions; within
region brokered to most promising sites


browsing a region means browsing its sites
simultaneously


www.ittc.ukases.edu/obiwan

Department of Electrical Engineering and Computer Science

I T T C

Professor Susan Gauch

December 1999

User Profiles

Applications


Usenet news filtering


recommendation services: web browsing, books


intelligent pre
-
caching

Should


accurately reflect actual interests


require as little feedback as possible


be dynamic

Department of Electrical Engineering and Computer Science

I T T C

Professor Susan Gauch

December 1999

User profiles: Creation

Obvious and often used: keywords


not structured (ambiguous)


static


have to be explicitly mentioned

Our approach


watch over a user
'
s shoulder while surfing


automatically determine documents’ content


central: large
ontology

(concept hierarchy)

Department of Electrical Engineering and Computer Science

I T T C

Professor Susan Gauch

December 1999

Document Classification

Documents as weighted

keyword vectors:


n different words

-
> n dimensions


weights based on


word frequency and rarity

Browsing hierarchy: 10 web pages per node

Concatenate them
-
> keyword vector

Content of a page
:

most similar vector

Department of Electrical Engineering and Computer Science

I T T C

Professor Susan Gauch

December 1999

Updating profiles

Static: document related


content: weights of top nodes for surfed document


length of page

Dynamic: time spent

Combine them


for instance:

weight * (time/length)


changes in interest in the five categories

User profile: weighted ontology

Department of Electrical Engineering and Computer Science

I T T C

Professor Susan Gauch

December 1999

Profile evaluation

Accordance with actual user interests


10/20 interest categories describe actual interests


describe interests


“pretty well”: 3.5/5

Convergence


stabilization of # of

categories over time?


do converge after 320

surfed pages!

Department of Electrical Engineering and Computer Science

I T T C

Professor Susan Gauch

December 1999

Profiles: Summary

Stored as weighted ontologies

Profiles represent actual interests quite well

Up to 150 top categories

Two adjustment functions make profiles converge


after 320 pages


length of page doesn
'
t really matter, but time spent
does


Department of Electrical Engineering and Computer Science

I T T C

Professor Susan Gauch

December 1999

Personalizing Search Results

50% of top 20 results irrelevant

Same search mechanism for 200 million people?

Goal:


identify relevant documents and put them on top of the
result list


(pre
-
fetch relevant results)

Difficult problem: 10% increase is very good

Department of Electrical Engineering and Computer Science

I T T C

Professor Susan Gauch

December 1999

Re
-
Ranking

Ranking a function of:


search engine's
original ranking


extents to which top 5 categories describe
document's
content


personal interest

in each of these top categories

“More relevant items on top of result list”:


system’s ability to

present
all

relevant items


system’s ability to present

only

relevant items

Department of Electrical Engineering and Computer Science

I T T C

Professor Susan Gauch

December 1999

Recall and Precision

Combination: Recall/Precision graphs

Example: ranked documents 1,…,20


relevant 2,5,10,14,19


recall points 1/5, 2/5,

3/5, 4/5, 5/5


precisions 1/2, 2/5, 3/10,

4/14, 5/19

Department of Electrical Engineering and Computer Science

I T T C

Professor Susan Gauch

December 1999

Re
-
Ranking: Evaluation

Overall performance increase of up to 8%


at each recall cutoff, up to 10% more relevant
documents have been retrieved



Department of Electrical Engineering and Computer Science

I T T C

Professor Susan Gauch

December 1999

Browsing Assistance

Analyze current page


locate links

Identify which links are most likely to be
followed by the user


popularity of the link overall


relevance of linked page to user’s interests

Problem


if you have to download the whole page to analyze
it, you’ve increased the network utilization

Department of Electrical Engineering and Computer Science

I T T C

Professor Susan Gauch

December 1999

Privacy

Is the user aware that their behavior is being
monitored?

Can users turn it off?

Where are profiles stored?

With whom are profiles shared?

How are profiles protected?

How are profiles used?

Department of Electrical Engineering and Computer Science

I T T C

Professor Susan Gauch

December 1999

Conclusions

Automatic creation of structured user profiles is
possible

Profiles are reasonably accurate

Applications in improving the search quality and
Web page access efficiency

Evaluation of re
-
ranking search results:
performance increase of up to 8%

Department of Electrical Engineering and Computer Science

I T T C

Professor Susan Gauch

December 1999

Future Work

Incorporating profile generator into browser

Connect system to ProFusion, OBIWAN

Personalize
structure

of ontology

Re
-
train classifier

More applications: recommendation service, web
caching, browsing, ...

Explicit user feedback?