Introduction_to_Semantic_Web_Technology_and_Geodata_v4

jockeyropeInternet and Web Development

Feb 2, 2013 (4 years and 5 months ago)

88 views

1


Introduction to Semantic Web Technology
and Geodata

Arnulf CHRISTL

Heerstraße 162

53111 Bonn

E
-
Mail:
arnulf.christl@metaspatial.net

Abstract: The Semantic Web is an emerging idea. Currently we can see
t
hree phases, the first of which has already started, as a diverse set of
independent technologies. Semantics cannot be implemented by one
single new technology or software and thus it is not an obvious target
for developers or big vendors. We use the Web w
ithin our own
semantic context without paying much notice because we are blessed
with intuition, inference and association. We can visually deduce and
coordinate content by simply looking at web sites (images and text).
Machines do not have any of these ca
pabilities. Instead they are really
fast but also incredibly dumb. The Semantic Web is about capacitating
machines by preparing data in a way that is intelligible to machines.
Initial efforts to put geographic data on the Web in a semantic context
are ongo
ing. This article gives an introduction to current Internet
technology and is aimed at geospatial professionals who want to get a
better understanding of how their data can become part of the
Semantic Web. The Geoweb is just one aspect of the semantic web,

albeit a highly interesting one because it ties virtual data to real world
locations. The outlook of converging standards, crowd sourcing and
semantics is promising.
1

Introduction

The term "Semantic Web" is not formally defined, it is just an idea, albeit
a
very good one [Tim Berners
-
Lee (2008)]. It is used to describe concepts,
formats and standards, some of which have been proposed by the World Wide
Web
Consortium (W3C). The key idea of the Semantic Web is to always
technically associate (link) data with
a meaningful context. This means that the
2


meta data required to fully understand the data always need to be readily
available. Data and meta data have often been considered as separate things
but actually the difference is mostly just a point of view. One
conveys the
semantics of the other, ideally in a clearly defined ontology.

As any new idea the Semantic Web is worth nothing unless it is realized and
put to real world use. In this article we will look into concepts and
technologies that enable the geosp
atial Semantic Web. We will look into the
technologies used to implement the Internet and Web starting from DNS,
TCP/IP, HTTP, HTML, XML. From this basis we move on to resource
oriented patterns and RDF.

There are many other technologies like microformat
s [Berriman 2011] that
have been developed with similar goals even although they are not always
described as "Semantic Web" components. All of these technologies, formats
and standards intend to provide a formal description of concepts, terms, and
relation
ships within a given domain. Finally we will look into the Resource
Oriented Architecture (ROA) pattern to see how it all fits together on the Web
[Richardson 2007].

The geospatial aspect is focused on data formats including raster graphics for
map images
, vector data coded in GML and KML and spatially extended
syndication formats like GeoRSS. This type of data can be created dynamically
by web services as specified by the Open Geospatial Consortium (OGC), for
example in the WMS, WFS and SWE implementation

standards. These
standards have been created before resource orientation and the Semantic Web
and lack some of these aspects.

Currently most data on the Web are only implicitly linked to their
corresponding ontology and these are oftentimes not defined i
n technical
terms at all. We also lack seamlessly integrated tools that will help us to
understand the semantic context that the Web can already give us.

One reason why the Semantic Web is so slow to emerge is that it is fairly easy
(albeit error prone) f
or human beings to associate content of a web site with
the correct domain. Machines have not developed this capacity at all. The
3


availability of machine
-
readable meta data is intended to help remedy this
problem and eventually enable software agents to ac
cess the Web, perform
tasks and locate information automagically. The Semantic Web as a global
vision is only slowly emerging and many critics still say that it is not feasible at
all.

2

Semantics

Semantics is the study of meaning. In the context of the sc
ience of linguistics,
semantics is used to describe the meaning of words, terms and phrases used by
humans to communicate. This concept can be extended to computer sciences
by describing the meaning of words, terms and phrases in programming
languages. The

technicality of computers makes this type of definition much
narrower and preciser than for human language.

In information sciences the term semantics describes the technical relation
between a datum and its context or ontology. In the context of the Web

the
term semantics describes the relation of data and it's meta data. The goal is to
make both intelligible to machines by formatting and structuring them in
standard ways.

This paper assumes that a lot of semantic content is already on the Web but
that
machines are incapable of uncovering or using it coherently. A better
understanding of the goals of the Semantic Web will allow geospatial
professionals to make much better use of existing Web technology and achieve
better interoperability.

2.1

Syntax and Pra
gmatics

Linguistic science has two more areas of research beside semantics: Syntax and
pragmatics [Levinson 2000]. Syntax describes the rules by which terms and
words can be constructed into sentences and phrases.

But it is possible to create syntacticall
y correct sentences which make no sense
semantically. Noam Chomsky coined the phrase "colorless green ideas sleep
4


furiously" [Chomsky 1957] as an example of a syntactically correct sentence
with nonsensical semantic meaning. Several interpretations of this

sentence
have been undertaken to show that it could make sense in special contexts.
Especially when speaking figuratively (by adding a context or meta data)
"colorless" can be interpreted as "nondescript" and "green" as "new" or
"fresh". Giving a short
introduction to the reader it would be possible to
wrench some meaning from the otherwise nonsensical sentence. This shows
that the context of any datum has a thorough influence on the related
meaning.

This makes up the third pillar of studies in Linguist
ics: Pragmatics. It is the
relation between the term or word and the observer. This highly interesting
aspect of Linguistics has so far not been formally adopted in information
technology which may be one reason why semantics is still irrelevant to many
pr
actical aspects of the Web.

2.2

Ontology

Ontology is the study of the nature of being and existence and their relations.
It is a branch of philosophy known as metaphysics and analyzes what exists or
can be said to exist, how these entities can be grouped and

are related with
each other. Typically relations are grouped and subdivided in hierarchies
according to similarities and differences. In computer and information sciences
ontologies are formal representation of knowledge of different domains
[Gruber 2009]
. Ontologies can be used to describe the domain in a formal
manner. The relationships between domains can also be described in
ontologies. Ontologies are formal, explicit specifications of shared concepts
providing a vocabulary with defined semantic meani
ng. The vocabulary can be
used to model a domain with a defined syntax by describing the type of
objects, their properties and relations.

Ontologies can be formally described using different standards and languages,
for example the Web Ontology Language (O
WL). For the context of this
article we will not go into further detail but first get an overview of the
technologies already in common use.

5


3

Web Technology

The Web (or World Wide Web) is a complex network of interlinked hypertext
documents, typically serv
ed through web sites. The Web is accessed through
the interlinked computer network known as the Internet. As we will see later it
is important to clearly separate these two concepts because the Internet is a
hierarchically organized computer network wherea
s the Web is a logically
organized directed graph of resources residing in the Internet. This means that
the Web is an application that runs on the Internet.

We will first explore some of the Internet technologies required to run the
Web and then look int
o patterns and concepts which enable the Web
semantically using this same technology. This section is not intended to be
comprehensive on either the Internet or the Web. Instead it only highlights
specific aspects of the Internet which are relevant to buil
ding semantic context
on the Web.

3.1

Internet Protocol

The technical foundation of the Web is the Internet Protocol. It has been
created to connect nodes. A node can be a server hosting a web site and
documents, an email server (mail delivery agent), a route
r, a firewall or even a
printer


basically anything that is addressable with an Internet Protocol (IP)
address as 94.23.196.65. The protocol has been designed on the assumption
that the underlying physical and logical network infrastructure is inherently
unreliable. Nodes may unexpectedly disappear or dynamically move elsewhere.
The the location of objects and servers can change at any time. Transport can
be interrupted and must be failsafe. This assumption very well reflects the
current experience of the
Web at large, including geospatial services and data.
There is no central monitoring which tracks or maintains the state of this
network.

The Internet provides the basis for the logical domain naming system of the
Web which typically has a two level namin
g schema. The Internet top
-
level
domain (TLD) comprises the root
-
level [Iana 2010]. It consists of two letter
combinations usually derived from political jurisdictions such as "de" for
Deutschland, "fr" for France, "us" for the USA, and so on. Some special
ly
6


reserved TLDs consist of three letters. These include "com" for commercial,
"org" for organization, "gov" for the government of the USA, "mil" for the
military of the USA, "edu" for educational institutions of the USA and reflect
the origins
of the Inte
rnet as a U.S. federal government
-
sponsored research network.

Top
-
level domains are not directly addressable, they are empty nodes. To the
left of the TLD appears the domain name as in osgeo.org, w3.org or
metaspatial.net. These names are directly address
able. A web browser typically
runs the Hyper Text Transfer Protocol, therefore the domain names are
normally prefixed by "http://". Between this protocol identifier and to the left
of the domain it is possible to add sub domains. Oftentimes this is simply
"www" as in
http://www.gov.vu/
. Other domains are hierarchically broken
down into further sub domains as in
http://inspire.jrc.ec.europa.eu/
. This has
no effect on naviga
tion or addressability.

To the right end of the URL directories can be added. Older sites make a habit
of organizing their content in virtual and otherwise empty directories as in the
example

http://www.osgeo.org/content/sponsorship/sponsors.html
. The exactly
identical content of that page is also referenced by the URL
http://www.osgeo.org/sponsors
. The additional structure implemented by
ad
ding the virtual directories "content" and "sponsorship" to the URL does
not add meaning and is mostly superfluous. The extension .html indicates what
type of document the browser should expect but is otherwise also superfluous.
Some of the bigger websites

like
Wikipedia

have no directory hierarchy at all
with almost every content available on exactly the same one level.

The redundancy and flexibility of Web content appearing through the Internet
becomes apparent wh
en we access the very same document through a variety
of resources. The front page of the private web site of the author is currently
reachable through the following URLs:



http://arnulf.us



http://www.arnulf.us

7




http://arnulf.us/Main_Page



http://arnulf.us/R
under_tisch_gis/introduction_to_the_Web



http://zpatial.org



http://r32916.ovh.net



http://94.23.196.65



http://178.32.100.197/


This will change over time, remember that one of the assumptions of the
Internet is that everything is dynamic and in a constant st
ate of flux
.

3.2

Transport: Push and Pull; SMTP and HTTP

The Internet is about data transport. Several protocols are used to transmit
data and messages across the Internet. Transport can either be typified as
initiated by push or pull.

The Simple Mail Transfe
r Protocol (SMTP) is a typical example of a push
based protocol. It is used by the mail delivery agents to send, relay and deliver
emails. The work flow of sending and receiving emails is fairly straight
forward: One machine is ordered to send an email. To

do this it will wrap the
message in a package and add a sticker to it that contains the address. Then it
sends (pushes) the package to the next node which will pass the package on
until it ends up at the given destination address. If the destination serve
r does
not accept the message it returns the mail as undeliverable including a message
including the reason of rejection. If the destination server is unavailable
altogether then the last node that accepted the package will return the message
stating just
that.

The Hyper Text Transfer Protocol (HTTP) is an application layer protocol
designed within the framework of the Internet Protocol Suite. It is the
foundation of data communication in the Web and it is pull
-
based. It is
8


important to remember that HTTP
is not the transport protocol (which is the
Internet thorugh TCP/IP) but that it is the application layer on the Internet.

HTTP implements four well defined operations following the CRUD
paradigm, which translates into "Create", "Read", "Update" and "Dele
te" data.
The four main HTTP operations are HTTP PUT, GET, POST and DELETE
correspondingly. Each of the four main HTTP operations have a set of error
codes to address errors that can occur, either in the underlying Internet
network or the application runn
ing HTTP. The protocol includes a special set
of codes to deal with changing Internet addresses, broken links and moving
information. This is again based on the core assumption that the underlying
network is unreliable. The most important aspect of HTTP a
re that it is
simple, failover tolerant and well defined.

HTTP is by definition stateless. This means that it does not rely on a defined
status between client and server but handles every request independently to the
next. It is a
framework allowing to ac
cess documents through an otherwise
opaque network. The user has no information about the path that the data
takes. The client always requests for data instead of a server actively sending
anything.

These two basic protocols show the difference between th
e push and pull
paradigm. But for many work flows the architecture has to allow for a
combination of pushing and pulling data. An example: When users want to
read emails that have been sent through SMTP they will typically first have to
pull

the email from

the mail delivery agent, for example by using the Post
Office Protocol (POP3). But in other scenarios emails can also be
pushed

to
the user's hand held device by an active server component as soon as it arrived
at the mail delivery agent.

Push concepts c
an also be implemented on top of HTTP by adding another
layer of architectural logic. One such concept is called WebHooks
(http://www.webhooks.org). It is based on the assumption that a server might
be interested in delivering data instead of relying on cl
ients pulling them on
their own. To do this the server must have some information about where to
push the data. WebHooks does this by allowing clients to register with the
9


server. Once the server has new data to distribute it will simply let the client
act
ively know. The comparable pull oriented version of this process is known
as syndication and comes in the flavor of the standards RSS, Atom and the
like. These are strictly stateless pull based which requires that the clients
actively retrieve information.

As the client has no information about when the
data of interest changes on the server it must regularly poll for changes which
can be advertised using a syndication protocol.

Another prominent combination of active push and pull based data transport
is
implemented by social network systems like LinkedIn, Facebook, Twitter
and the like. These platforms notify the user by sending an email relying on the
fact that the user will poll the SMTP server (see above) in regular and frequent
intervals. In general t
he mail does not contain all the data, just a short teaser
and the link through which the complete data can be accessed.

These are typical methods of combining pull and push systems to implement
an (almost) seamless user experience. Unfortunately most mac
hines have no
email account or cannot use it properly.

3.3

Content: Web Sites, HTML, Data and XML

For the context of this article a web site can be seen as an arbitrarily directory
structure containing documents which are made accessible through Internet
tec
hnology. The directories contain HTML documents which typically contain
texts and references to images or other data which can be displayed directly by
web browsers.

The primary document format on the Internet is the Hypertext Markup
Language (HTML).
HTML

is a markup language to describe web
pages.
HTML allows to format text and other multimedia content, mostly
images, videos, sound and the like.

HTML syntax was not intended to give semantic meaning to the data it
encodes. It was implemented to work well
with HTTP, to be displayed on a
computer screen and to be consumed by human beings. HTML defines a
10


specific set of tags to add meta data but theses are often not used. Meta
information can be given in the TITLE and specific meta tags including about
author
ing information, date of creation, expiration, or an abstract of the
content. Inside HTML documents images can have ALT
-
tags which make
them intelligible to clients who cannot "see" (this can be a blind person but
also a machine or robot).

One of the most

important aspects of HTML are links. Links are relations,
typically to other web sites or data, sometimes also to references within the
same document. Links make up the logical network aspect of the web.
Interestingly even although links function on the I
nternet this logical network
is independent of the underlying physical network. Links make up the Web, a
directed graph residing on a hierarchical structure.

Data referenced through links in HTML documents typically come in files of
arbitrary formats. A s
mall subset of standard formats has been captured as
MIME types [IANA 2011] increasing the chances that interested software will
eventually learn how to interpret the data correctly and do something coherent
with it. Due to the sheer vastness of incompatib
le data formats and the very
limited number of web capable software (besides web browsers) most data
simply has to be downloaded before it can be used coherently.

XML (eXtensible Markup Language)
is similar to HTML in that it is a markup
language, but it
is more generic. Whereas HTML has
been
explicitly designed
to encode documents for web sites XML can be used to encode practically any
information. Additionally XML allows to add arbitrary semantic context to the
text and data it references.

XML is a commo
nly accepted format to represent trees and hierarchies. This
means that web site structures can be represented as XML trees. Added
together the whole Internet could theoretically be represented as one single
XML tree resulting in a very flat hierarchy. On
the root of this tree (the
Internet as a whole) each domain and website represents a branches and every
HTML document or single chunk of data a leaf.

11


Currently the Internet comprises more than a hundred million active web sites
but most of them only have

one or at most a very few levels of content
"depth". Therefore this representation would represent the Internet as a very
flat hierarchy. The representation of the Web (as opposed to the Internet)
therefore also needs to include the (semantic) relations b
etween web sites and
documents. The Internet is too "wide" and too "flat" to be useful as a
hierarchy of content.

We have to extend the concept of a leaf on a tree in order to describe the Web.
Each leaf can become the node of a network if it has links to

other documents
or is linked from from other documents.

XML Linking Language (Xlink) is designed to create internal and external links
within XML documents. These links can also be created with associated
metadata. It is a W3C specification but is curren
tly not yet well supported by
most software packages. Xlinks has great potential to become the common
data source for tools of a semantically enabled web providing meta data
together with the associated documents and data. Xlink could be an option to
combi
ne the hierarchical concept of XML with the network node concept of
the Web.

3.4

Relations: The Graph as URL in RDF

To understand the Web as a directed graph we need an appropriate format to
represent the relations. One such format is the Resource Descriptio
n
Framework (RDF). It is a family of World Wide Web Consortium (W3C)
specifications originally designed as a meta data data model. It is now used as a
method to conceptually describe and model information that is implemented
in web resources, using a varie
ty of syntax formats. Concepts can be described
in RDF Schema and modeled using the Web Ontology Language (OWL).
Special languages such as SPARQL can be used to make rule based queries on
RDF structures [Hitzler 2009].

RDF can represent relations between
HTML documents in triples. A triple
consist of a subject, a relation and an object. The subject can be any HTML
12


document which links (relates) to any other HTML document (object). Any
level, branch and leaf of one tree (data or document on a web site) can
relate
to any other branch or leaf on any other tree. From this perspective the
hierarchy is mostly irrelevant. It is replaced by relations, typically represented
through links.

Links on the Web are different to trees because they are always directed. A li
nk
from one hyper text document can point to any addressable URL but the
document at that URL does not necessarily need to link back. On a tree (the
Internet) this is different because going up and down does not make much of a
difference. This is one of th
e main differences between the Web which is a
directed graph and the Internet which is the hierarchical structure in which the
Web resides.

Currently the Resource Description Framework (RDF) is the best technology
to explore the graph that represents the
Web. The Semantic Web does not
need to be reinvented, it is already there. What we are lacking is a common
way of representing it in comprehensible way. Even although RDF is typically
formatted in a readable XML format the content is not immediately visual
ly
intelligible to human perception. This type of representation of data on the
Web is very much designed to be consumed by machines.

The graph, RDF and triple stores can be represented using XML, which brings
us back to the hierarchy of the Internet and
working concepts. Currently there
is no good way to visualize the graph as a whole [Christl, 2010]. All we can
currently create are two
-
dimensional representation as shown in this example
of
Linking Open Data
[LinkedData 2010] see image 1.

13



Image 1: Linki
ng Open Data cloud diagram, by Richard Cyganiak and Anja
Jentzsch. http://lod
-
cloud.net/

Simple representations of the Web graph are already in use but we are still a
long way off making the intricately linked Web readily intelligible to human
perception.
Currently the only practical way is to cut the multidimensional
graph into planes creating two
-
dimensional cross sections. These could be a
starting point for human interaction which can lead to creating new ever
changing two dimensional hierarchical cross

section planes. Basically these are
maps of patterns as we use them in geospatial contexts. The potential
relationship between geographic data and the Web still requires further
research.

3.5

The Resource Oriented Architecture

Currently there are three commo
n architectural styles in use on the Internet:
Remote Procedure Call (RPC), Key Value Pair (KVP) and Representational
State Transfer (REST). The RPC style architecture has evolved right out of
software development. It allows to call procedures (functions)
on a remote
machine. This requires intimate prior knowledge of the interface of the
software which is called, including parameters, values, and error codes. The
14


remote machine performs the operation and typically send back a message of
the result. This arc
hitecture style is message oriented, a commonly used
technology is SOAP.

The REST style limits operations to the protocol it is based on, in the case of
the Web this is HTTP. As we have learned HTTP has four well defined
(CRUD) functions for persistent st
orage. No other function or operation
beyond these can be used in a RESTful interface. All the logic has to be
designed in the data model and work flow. This is a very different approach to
simply opening up a software through an API style remote procedure

call
interface. The architecture paradigm associated with REST is the Resource
-
Oriented Architecture (ROA). It proposes four concepts:




Resources




Their names (URIs)




Their representations




The links between them



and four properties:




Addressability




St
atelessness




Connectedness


15




A uniform interface


These concepts and properties can be implemented perfectly using HTTP and
hypermedia making the ROA the best fit for the requirements of the Semantic
Web. The ROA describes the pattern that results from appl
ying REST
principles to make best use of HTTP based Internet technology. It also
describes a set of Best Practices and shows how data needs to be designed to
be made available on the Web.

In this architecture pattern software and services are the result o
f designing
data. The focus does not lie on implementing functionality for a user in a
software. The ROA reduces software to a thin and opaque layer around data.
This emphasis on the data makes the ROA highly relevant to the emerging
Semantic Web. At the s
ame time it is hard to understand and follow in
traditional software development which is used to think much more software
-
centric.

4

Geographic Data on the Web

Geographic data will probably go through at least three phases of the
evolution of the Semantic
Web. We are already experiencing the first results of
phase one. This includes simple publication of maps and data which is
inherently linked to other data.

In the second phase geospatial data will be published in semantically enabled
formats like RDF. Th
is only requires little changes in the existing
infrastructure and some sites are already coming up, for example Ordnance
Survey UK with the OS OpenData initiative [Ordnance Survey 2011]. During
both of these phases traditional GIS work will still mostly b
e done on local
machines with powerful query languages like SQL and highly specialized tools
for geospatial operations in traditioanl GIS. In the third phase, which is
probably still off by many years, spatial operations might become an inherent
feature of

the Web, which then may become a real GeoWeb.

16


To make geographical data available on the Web it needs to be formatted in a
way that can be browsed (or crawled by agents) just like the Web. Many
current catalogs and structured meta data follow ISO standar
ds and rules and
regulations as defined by INSPIRE [INSPIRE (2011)]. This meta data
provides a valuable source of information but it is not yet related (linked) well
enough. The technology and the processes around this meta data still fall short
of address
ing the need of a spatially enabled Semantic Web. In addition to the
highly structured, hierarchical XML meta data we need to add a more
relational perception of the data itself. The understanding of how geospatial
data relates and links with other data wi
ll eventually grow beyond the
geospatial expert domain. But to get there the experts first have to make the
data accessible in a way that follows semantic paradigms.

4.1

OGC Web Services

The members of the Open Geospatial Consortium (OGC) have created a set
o
f service standards to publish geographical data. Maps are increasingly
delivered through the OGC Web Map Service standard [OGC 2011] which
can be parameterized to deliver dynamically rendered images of maps.

Geospatial features can be made available thro
ugh the OGC Web Feature
Service (WFS) standard. The WFS interface standard allows to access
geographic data objects individually and implements a query language similar
to SQL but less powerful. WFS services can be configured to only serve data
or to also
store objects. Geographic objects can be modeled using GML (see
below).

Both standards are mature but especially the OGC WFS standard is complex
and hard to access without prior knowledge of the client. This currently still
prevents wider usage in contexts

other than those of geospatial professionals.
Even the much simpler OGC WMS standard is considered "difficult" by many
Web developers.

17


4.2

Geographic Data Formats

Raw geographical data is typically made available in the OGC standard formats
GML, KML and incre
asingly also GeoRSS. All have in common that they are
designed in XML. Especially GML can be so complex and individually
modeled that up to date practically no Web software has evolved that can use
use this data right away. One exception is the Open Source

software
OpenLayers which has been extended to be able to dynamically render GML
in the EU funded project "European SDI Network" [ESDIN (2011)].

Additionally to carrying geographic coordinates, KML can also contain
rendering instructions. This makes it e
asier for software to overlay the data
visually on top of other maps and have lead to a wider adoption in the Web.
But even Google which is the original designer of the format does not fully
support it in it's web map application. Both GML and especially K
ML make
use of Xlink and promise interesting future options for the Semantic Web.

Most of the software packages have in common that they only display maps
and offer little or no functionality to further process or even link geographical
data.

5

Examples

We
will finish this short excursion by looking at two very different but each in
their own way promising examples of leveraging Internet technology to make
geographical data accessible on the Semantic Web. One project is
OpenStreetMap (OSM), the other is OS O
penData by Ordnance Survey of
UK.

5.1

OpenStreetMap

OpenStreetMap (OSM) [OpenStreetMap 2011] is based on a crowd source
process. The project collects, maintains and makes geographic data available in
a crowd sourced process. Michael Goodchild coined the term
Volunteered
Geographic Information (VGI) [Goodchild 2008] to describe the production
18


side of of the project and describes how it changes the world of mapping. This
is a very good definition but lacks to put an emphasis on the openness aspect
of the OSM pro
ject which also allows anyone to access and use the data for
whichever purpose. For the Geo enabled Web this is probably an even more
important aspect.

Anyone is allowed to use OSM data for any purpose, can download, use and
pass it on, similar to the def
inition of Open Source as it is in common use in
the development of Free Software. This differentiates OpenStreetMaps from
other geographical data producers who collect user's data but without giving
back full access to the data. One example is the navigat
ion system provider
TomTom who operates MapShare which allows users to submit corrections to
the data and also allows to access changes submitted by others, but denies
access to the underlying original data source [TomTom 2011].

The core data of OpenStree
tMap is maintained in a Wiki
-
style mode. This
means that there is no predefined, fixed structure of the data. This allows for a
lot of flexibility but at the cost of defined structured which would allow access
with a priori knowledge of the data. The data
can also be stored in traditional
object
-
relational databases like PostgreSQL and PostGIS allowing more
structured access.

OpenStreetMap implements several levels of access to this data, most
prominently the OSM Application Programming Interface (API), an

interface
that over the years has grown to suit the needs of the communities. The OSM
API is the server component to which REST requests are addressed. The
REST requests take the form of HTTP GET, PUT, POST, and DELETE
messages. Any payload is in XML form
, using the MIME type "text/xml" and
UTF
-
8 character encoding, and may be compressed on the HTTP layer if the
client indicates through the HTTP "Accept" header that it can handle
compressed messages.

Although this API is not an international open standard
itself it does comply
to several others standards, including a correct implementation of HTTP. It
has a lot of traction due to the momentum of the project itself. The current
(January 2011) stable version 0.6 has been in use for more than 20 months.
19


The Op
enStreetMap project can be considered a spatial data infrastructure in a
resource oriented architecture pattern. It is easy to include with other web
applications, allows access through other software projects and makes use of
hyper media.

The most common

representation of the geographical data contained in the
OpenStreetMap database is through map images. Several cartographic layouts
based on a variety of different rendering engines are available and maintained
by specific domain groups (for example for h
iker, biker, street traffic,
environmentally interested, and many more). The most commonly used
interface to this map data is the OpenStreetMap tiling system which breaks
down the world into a set of predefined tiles at predefined scale levels in a
predefi
ned coordinate system. OpenStreetMap data is based on the geographic
coordinate system EPSG:WGS84 (latitude/longitude). The corresponding
open standard is the OGC Web Map Tile Service implementation standard.

OSM data can also be downloaded as database du
mps to create individual
maps with specific content and cartographic layouts. Again following an
international OGC standard, Well known Text (OGC WKT).

The LinkedGeoData project [LinkedGeoData 2011] regularly creates RDF
dump files for download, probably
making OpenStreetMap the most
prominent candidate for a geographically enabled Semantic Web. But
-

the
sheer size of the data makes it very difficult to handle. RDF is not the right
format for mass data processing. How the RDF data can be broken down to
be

of use in the Semantic Web is still up for a lot of work for researchers,
architects and software developers.

5.2

Ordnance Survey OpenData

Ordnance Survey in Great Britain has a long history of collecting, maintaining
and publishing maps. Recently Ordnance S
urvey has considerably enhanced
access to online maps by publishing an API which allows access to the
OpenData [Ordnance Survey 2011] tiling map scheme. The main difference to
OSM is that the access is only allowed to the map images, not the underlying
20


dat
a (similar to TomTom but without the possibility to post updates). The
maps can be accessed through a web based framework by using a special API
key. All maps are served from servers under the control of Ordnance Survey,
in general free of cost.

Additiona
lly to providing access to map images the OpenData initiative also
publishes some administrative data in RDF format. This data can be used to
easily link documents and data already on the web with location information


and it gives access to the data, eve
n although in a non GIS
-
typical format and
without including the coordinates of the geographic objects.

There are several options how to make use of this data. John Goodwin
describes a simple case of linking tabular data through the postal code of
address
es [Goodwin 2010]. In all cases it is required to find a unique identifier
that can be used as linkage between the geography and the dataset. In the
example provided by John Goodwin the data comes with addresses and post
codes. As post codes are part of th
e OS OpenData model the data can be
readily linked using the RDF datasets.

6

Conclusion and Outlook

The Internet provides a functional, highly scalable technological foundation
for creating, publishing, maintaining and (to a certain degree) processing
geogr
aphical data on the Web. Traditional GIS processing and associated work
flows are still miles away from leveraging this potential. Interestingly some
OGC standards like WMS and WFS were implemented long before the
Semantic Web or Resource Oriented Architec
ture concepts were laid out


but
they already implement some of the Web paradigms presented in this article. It
will be interesting to follow the development of the existing standards and the
convergence of GML, KML on one hand, Atom, RSS and RDF on the o
ther.

Now is time to upgrade the existing OGC standards to be able to address
these new challenges. This may also include a change in self perception of the
standardization community which has grown into a highly expert domain
which now has trouble integr
ating with the more general Web. To allow this
21


integration the OGC community needs to embrace the communities living in
the Web which will also require structural changes, some of which are already
under way.

OpenStreetMap on the other hand may want to g
row it's professional
background to better interface with existing expert domains. This might
include upgrading the OpenStreetMap API to an OGC standard so that it can
eventually run through the ISO standardization process. This process must not
to kill i
nnovation this eminent danger must be taken seriously. On the positive
side it will allow other structures like public administrations who are bound by
ISO to leverage the power of OpenStreetMap.

Personally I do not foresee this happening anytime soon but

some crossover
between communities already does take place for example in the Open Source
Geospatial Foundation (OSGeo). The OSGeo Public Geospatial Data Project
lists initiatives, organizations and individuals interested in pursuing this
broader and prom
ising perspective [OSGeo 2011]. The Semantic Web will
grow right midways between the data producers, the consumers, the crowds
and the standards. As we typically belong to one or two of these groups but
seldom to all at the same time progress is hard to pe
rceive for many.

7

Literature

Berners
-
Lee, Tim (2008): The Time for the Semantic Web is Now. URL:
http://www.readwriteweb.com/archives/tbl_calls_for_semweb.php, Last
accessed on 2010
-
12
-
14

Berriman, Frances (2011), Artificial Intelligence. URL:
http://fbe
rriman.com/2010/06/16/science
-
hack
-
day
-
turing
-
tests
-
and
-
google/ Last accessed on 2010
-
12
-
10

Chomsky, Noam (1957): Syntactic Structures. The Hague: Mouton.

Christl, Arnulf (2010): The Hierarchy and the Graph. URL:
http://arnulf.us/sevendipity/archives/35
-
The
-
Hierarchy
-
and
-
the
-
Graph.html Last accessed on 2010
-
11
-
20

22


ESDIN (2011): European Spatial Data Infrastructure Network; Support in
Action for INSPIRE.

URL: http://www.esdin.eu Last accessed on 2011
-
01
-
17

Goodchild, Michael F. (2008): Citizens as Censors
: The World of Volunteered
Geography. URL:
http://www.ncgia.ucsb.edu/projects/vgi/docs/position/Goodchild_VGI
2007.pdf Last accessed on 2009
-
03
-
21.

Goodwin, John (2010): So what can I do with the new ordnance survey linked
data. URL: http://johngoodwin22
5.wordpress.com/2010/10/25/so
-
what
-
can
-
i
-
do
-
with
-
the
-
new
-
ordnance
-
survey
-
linked
-
data/ Last accessed
on 2011
-
01
-
17

Gruber, Tom (2009): Ontology. In: The Encyclopedia of Database Systems,
Ling Liu and M. Tamer Özsu (Hrsg.), Springer
-
Verlag.

Hitzler, Pasca
l; Krötzsch, Markus; Rudolph, Sebastian (2009): Foundations of
Semantic Web Technologies, Chapman & Hall/CRC

Iana (2011): Root Zone Database.

URL: http://www.iana.org/domains/root/db/ Last accessed on 2011
-
01
-
11

INSPIRE (2011): Infrastructure for Spatia
l Information

in the European Community.

URL: http://inspire.jrc.ec.europa.eu/ Last accessed on 2011
-
01
-
24

Levinson, Stephen C. (2000): Pragmatics. Cambridge Press: Cambridge

LinkedGeoData (2011): LinkedGeoData Data Set. URL:
http://linkedgeodata.org/Da
tasets Last accessed on 2011
-
01
-
24

OGC (2011): Web Map Standard. URL:
http://www.opengeospatial.org/standards/wms Last accessed on 2011
-
01
-
24

OpenStreetMap (2011): OpenStreetMaps: Free Maps for the World. URL:
http://www.openstreetmap.org

Last accessed on
2011
-
01
-
21

Ordnance Survey (2011): OS OpenData. URL
http://www.ordnancesurvey.co.uk/oswebsite/opendata/ Last accessed
on 2011
-
01
-
11

23


OSGeo (2011): Public Geospatial Data Project URL:
http://wiki.osgeo.org/wiki/Public_Geospatial_Data_Project Last accessed
on 2011
-
01
-
24

Richardson, Leonard; Ruby, Sam (2007): Restful Web Services. O'Reilly Media,
Inc: USA