PowerPoint-Präsentation

erminerebelAI and Robotics

Nov 15, 2013 (3 years and 11 months ago)

122 views

Geographic Web Information
Retrieval

Alexander Markowetz
, University of Marburg

Thomas Brinkhoff, FH Oldenburg

Bernhard Seeger, University of Marburg

2

Current Situation In Web
-
IR


Everybody is online


But never seen

3

Current Situation In Web
-
IR


Queries are too short


Resultsets are too
large


You can effectively block
your competitors


Good results get buried


Smaller Results


Ways to drill the ice
-
berg



4

Solutions


Personalized Search


Dynamic/Interactive Search

5

Geographic Web
-
IR


Location is the most personal property


„All business is local“



People already use the web geographically


„Yoga Brooklyn“


„Linux usergroup Frankfurt“


And get poor results




We are going to make that a lot better

6

How
-
Not
-
To


Semantic Web


„If just everybody included Geographic Markup in
their web
-
pages“


Two problems


Chicken
-
Egg


Malicious Webmaster


Metatags Anyone?




Bottomline


Semantic web is for „B2B“ situations only.

7

How
-
To


Modify traditional IR techniques to extract
geographic markers


Multigranular approach


Extending basic Web
-
IR


Map pages to geographic positions


Footprint


Aggregate and Cluster them


Build Applications


Geographic Search


Geographic Web
-
Mining

8

Geocoding


Footprint


Geographic Position of
a Webpage


Set of points and
polygons, associated
with some amplitude

9

Preliminaries


Basic IR Assumptions can easily be
extended to „geographic
-
IR“


Radius
-
1 Hypothesis


Radius
-
2 Hypothesis (co
-
citation)


Intra
-
Site Hypothesis


Intra
-
subdomain


Intra
-
directory

10

Multigranularity


Information extraction on
different levels


Domain


Subdomain


Directory


File


Need to aggregate

Dir

File

Dom

SDom

SDom

Dir

File

11

Sources


On all levels


Names of places


Zip
-
codes


Area
-
codes


On Site Level


Whois


Business Directories


Links


Density over a given area


Radius
-
1 and Radius
-
2


Geospatial Mapping and
Navigation of the Web
,
Kevin S. McCurley,
10
th

WWW,
2001


Computing Geographical
Scopes of Web Resources
,
J. Ding, L. Gravano, and N.
Shivakumar,
VLDB

2000

Dir

File

Dom

SDom

SDom

Dir

File

12

Geographic Search


A simple interface


Not so exciting, but...


Key Words


City


Street


State


Area code


SEARCH

13

Dynamic Geographic
-
IR


Replacing the „next“ button


Closer


Continue


Wider


Next


Closer


Wider


Next


½ mile


1 mile


2 miles


5 miles


10 miles


25 miles

100 miles

14

Locality


Final ranking is a (linear) combination of
importance and geographic distance.


Chances are:


Amazon will still rank first:
no matter where you are


Amazon is a „global bully“


Idea:


Eliminate global bullies by computing importance
differently


Give less weight to links that span a longer
distance

15

Evaluation


Evaluation Web
-
IR is hard


Evaluating geo
-
Search is even harder



Mistakes are hard to find

16

Impact of geo
-
IR


Next generation Search Engine


Location based Service


For cellphones under UMTS


Move traffic from A&E


Local companies will get more traffic


Increase Profits from Adwords


Smallest businesses will advertise online


Locally focused


The „Leaflet
-
industry“ will shrink

17

Geographic Web
-
Mining


The web reflects human society.


Distorted


Delayed/Ahead


A lot of interesting social questions can be
answered by looking at a large webcrawl


You can save time and money compared to door
-
to
-
door surveys


This is widely used




But:


Most of them are of
geographic

nature

18

Example Queries


Where in Germany are
vintage sneakers a
trend
?


Is there a fashion
authority that is
accepted in all regions
of Germany?


Do Britney and
Madonna have the same
audience?


Draw a map of Germany
with all sites about
vintage sneakers.


Find all fashion
-
sites
that get a min of 1000
equally distributed links.


Map the areas in
Germany, where there
are significantly more
Sites for B. than for M.



Precise Semantics?

19

Current Work


Older Prototype


Metasearch on top of lycos.de


Screen
-
scrape & re
-
order


Whois only


Did very well

20

Current Work


Current Prototype for Geographic Search


Limited to Germany = .de domains


50.000.000 Pages


Expected online by late summer


In co
-
operation with


Yen
-
Yu Chen


Xiaohui Long


Torsten Suel



Polytechnic University, Brooklyn

21

Reinventing Web
-
IR


Nearly no (academic) work in geo
-
IR


Allmost every aspect of Web
-
IR needs to be
looked at again


Interfaces


Query processing


Index distribution


Link analysis


User profile analysis


Spam detection


Even:


Other aspects of personalized search


Changes in the web

22

Thank you

Any questions?