GI retrieval based on Natural Language Processing

estonianmelonAI and Robotics

Oct 24, 2013 (3 years and 9 months ago)

59 views

GI retrieval based on
Natural
Language P
rocessing


Adrian Zafiu
1,2
, Tiberiu Boros
1
,

1
Research Institute for Artificial Intelligence, Romanian Academy,

Bucuresti, Romania,
tibi@racai.ro

2
University
of

Pitesti,
Faculty

of Electronics, Communications and Computers,

Pitesti, Romania
,

adrian_zafiu@yahoo.com



Lately, natural language processing (NLP) has benefited from technological advances
and increased interest from importan
t research groups and
large companies.

One of the
reasons is that NLP analysis can
be used to
convert unstructured information (e.g. text)
into structured
information. Running simple queries on search engines, shows that the
Internet
co
ntains a lot of unst
ructured Geographic Information (GI)
.

To prove our idea
we investigate the
data
that can be extracted from a simple passage in
an article
downloaded from Wikipedia:


Bucharest is the primary entry point into Romania.
(…)
Known in the past as
"The Little Pa
ris" Bucharest has changed a lot lately and
(…)
Finding a 300
year old church near a steel
-
and
-
glass building that both sit next to a
communist style building is commonplace in Bucharest.
(…)

Ex
t
r
acted from
http://wikitravel.org/en/Bucharest#Understand
.


Our analysis will cover only the first sentence from our passage due to the abstract size
restriction. After running a typical NLP analysis (part of speech tagging, chunking,
parsing, named entit
y recognition etc.) we
find the following relations forming between
entities (figure 1):


Figure
1

-

Rule based analysis of the first Sentence


“Bucharest” and “Romania” represent named entities, “the primary entry point”,
“Buchar
est” and “Romania” are
also
locations and the words “is” and “into” represent
relations between elements.
From a
GIS point of view, this information
may not seem
useful but when properly stored and structured
they
can provide answers to questions
like: “wh
at is the primary entry point of Romania
?
”, “
what is Bucharest
?

etc
.
Also, if
we have the exact shape and location of the areas represented by Romania and
Bucharest, we can extrapolate the answer to “
where

is the primary entry point into
Romania
?
”.
Again
this answer may be ambiguous if we think from the spatial
perspective and we try to locate the exact
location of the entry point
into the perimeter
that defines Romania.

However, other sources may contain more accurate information
from the GI
S

point of vie
w and this becomes a problem of choosing the correct source
of information. We consider this to be a hybrid NLP


GI problem.



Figure 2


Domain entities



Figure 3


Domain entities


This paper focuses on using natural language processing (NLP) techniq
ues for
retrie
ving geographic information (GI
) from text. We also introduce
and evaluate
an
ontology
(figure 2)
that defines the basic entities and relations

(figure 2)

that are used to
store the information retrieved from
plain
texts. We are currently dev
eloping a system
for
GI
extract
ion
from text sources, that combines
statistical and rule based NLP
methods with
available
GIS
information
.

We split our GI entities i
nto referenced
(known data,

e.g.: locations that are stored in a database such as “Buchares
t) and
unreferenced (unknown
data,
e.g.: a 300 year old church).
NLP analysis
enables

us to

create references for some unreferenced data and we can also determine the usability of
the
information retrieved from texts

(e.g.: the data can be used for spatial

representation
of object
s

or
it has a different level of usability

-

question answering

etc.
)