An Integrative Approach for Geospatial Data Fusing

waisttherapeuticSoftware and s/w Development

Nov 4, 2013 (3 years and 9 months ago)

62 views

An Integrative Approach for Geospatial Data Fusing


Silvija S
tankutė
, Hartmut A
sche

University of Potsdam, Department of Geography,

Karl
-
Liebknecht
-
Strasse 24/25,
14476 Potsdam, Germany,
silvija.stankut
e@uni
-
potsdam.de



Abstract

Geo
data are used in different branches of industry or academic fields, e.g. scenery pla
n-
ning, environmental protection, traffic, urbanism, economy, tourism et cetera. Depen
d-
ing on set objectives

and problems to be solved, geo
da
ta must fulfil certain requir
e-
ments in order to be applicable, e.g. concerning geometry and/or semantics. Often pro
b-
lems arise on finding suitable datasets. The geometrical demands are often fu
l
filled.
However, the semantic requirements are not fulfilled o
r chosen geo dataset exhibits si
m-
ilar geometrical properties, but differently distinctive semantics. At this point it makes
sense to combine the available geo data in a manner that the resulting dataset fulfils the
geometric as well as the semantic demands

of the desired application. This work intr
o-
duces algorithms allowing for joining of linear structures. The method is based on the
comparison of coordinates. Thereby, search, comparison and assignment of the coord
i-
nates are performed, followed by the trans
fer of the thematic information. The deve
l-
oped DataMerge algorithms allow automatic merging process of user selected attributes.
Algorithms were implemented using the PERL scripting language and thus are not
bound to any specific GIS software.


1. Introduc
tion


The integration/merging of heterogeneous data is necessary and/or helpful for a variety
of geo applications. There are many GIS datasets available for a wide spectrum of such
applications. Nevertheless, they can not cover all application fields, espe
cially new
co
m
ing applications with new requirements on datasets. The process of generating GIS
datasets is very cost and time intensive process, thus one endeavours using already
available datasets extending them in order to meet the needs of a certain ap
plication.
These datasets can have suitable and non
-
suitable features for a new application (ge
o-
metrical features in
fig. 1.

A combination of two or more different datasets is desirable
taking only dataset features which are suitable for the application. T
his way one gene
r-
ates a customized dataset meeting the geometric and semantic requirements. This pr
o-
c
ess can happen by a manual merging of input dataset, which is very time consuming
and often error
-
prone. Ther
e
fore an automatic extension of one dataset wi
th geometric
and semantic features of other datasets is preferable. In present work the datasets of
follo
w
ing sources were e
x
amined:




NAVTEQ

-

Navigation Technologies



ATKIS

-

Official
Topographically
Cartographic Information System



SIB

-

Street Information

Database


2


Using this data bases the development of an automatic procedure for the integration of
spatial data from different GIS datasets is showed and discussed. The goal of the intr
o-
duced integration algorithms is the increase of semantic accuracy of a
target geospatial
data. Merging data from different dataset sources is not trivial, which is shown in liter
a-
ture by many groups
(Butenuth, 2007; Sester, 2007; Xiong, 2000; Walter, 1999)
.




Figure

1
.
Geometrical differences of geospatial datasets used in

this work: blue line
-

NAVTEQ, red line
-

ATKIS, green line
-

SIB


Therefore the first step was an in
-
depth comparison of used geo data to identify the di
f-
ferences in structure and data consistency, which is important for the presented alg
o-
rithms. After i
ntroducing the main workflow the integration procedure is shown and the
results are discussed.


2.
Input geo datasets

Any spatial object contains geometric and semantic information. Geometric information
is described with geometrical parameters (coordinate
, spatial extension, length, area,
form et al.) and topology. All vector structures in vector data are based on points, which
characterize all geometrical informations exlusively, lines and polygons are complex
structures. Spatial reference of vector data
is give by their coordinates directly. Only
one point or node having unique coordinates may occur at same location with respect of
topology. That is important for model consistency. Geometrical primitives in vector
data have to be fulfil certain consistenc
y conditions i.e. uniqueness and completeness.
The object uniqueness means, that for a object in the real world only one object in d
a-
t
a
sets may be saved. Completeness requires that all known relations are included in a
dataset explicitly. In topological mo
del the nodes ar
a interrelated using edges (Bartelme,
2005
)

The geometrical consistency conditions imply that the edge is defined with star
t-
ing and end point and without any branchings and breaks between these points. A line

3

object as from real world corre
sponds to one edge in a dataset only. Fig
. 2

shows an
e
x
ample for an inconsistent vector model.



Figure 2.

Edge redundance leads to inconsistency of datasets


Two objects "`forest"' and "`field"' are adjo
ined at the same location
a
. While data a
c-
quisitio
n of these objects it is important that these objects are not described mu
l
tiple.
This means that nodes
B

and
C

must interrupt the edge
AD
. Three edges (
AB, BC, CD
)
instead of two edeges
(
AD, BC
) are generated this way. This guarantees a consi
s
tent and
red
undancy
-
free dataset and fulfils the uniqueness cond
i
tion.


Comparison of the vector data

After the short introduction about requirements on the
vector model structure, the geodata used in this work is compared in this section. A
comparison of datasets con
cerning the geometrical structure is much more important for
the integration approach than the necessity for a comparison of the semantic structure
and geometrical accuracy (absolute and/or relative), because the approach is based on a
direct comparison of

geometrical structures (see section

3
). Thereby the following que
s-
tions and problems need for clarification:



is a direction defined and does it corresponds to the real world?



format of saved data;



consistency of data sets;



existence of intermediate points

on edges;



geometrical integrity of data sets.

The SIB and ATKIS geodata is administrated by state departments, while the NAVTEQ
datasets are managed and distributed commercially. Their generation is based on partly
different concepts and are differently a
dministrated. For this reason the examined d
a-
t
a
sets are differently suitable for direct use by the algorithms developed here.

Fig.

3
shows the geometrical structures of the used datasets. The edge direction is di
f
ferent and
the geometrical reality is model
led in different ways. Therefore all datasets had to be
partially preprocessed using external GIS freeware (OpenJump) and/or subroutines pr
o-
grammed in PERL (Practical Extraction and Report Language). Next step was applying
the correction process in order t
o satisfy the above conditions. The deve
l
oped approach
is presented in section

3.

The final dataset resulting from the data fusion should be consistent and free of redu
n-
dancy. The edges may contain only linear segments and may not exhibit intermediate
poin
ts. NAVTEQ dataset fulfils these conditions. Edges in SIB and ATKIS datasets

4

contain intermediate points. Thus a conversion to edges without intermediate points
must be performed.




Figure

3
.
Overview about geometrical b
asic structure of used geodata




Figure

4
.
Problems in data

set
s
: a) SIB data inconsistency; b) ATKIS dire
c
tion error


Fig.
4a

also shows the problem of data inconsistency, i.e. "`breaks"' in contiguous
branches, which had to be fixid manually. The small dimension of the break allows for

a
conclusion that it arose during the data acquisition. It is very hard to program subro
u-
tines detecting such errors. The used ATKIS data reveals a
n
other consistency problem.
While the direction of edges in a SIB dataset corresponds to the reality (compar
ed with
aerial images), ATKIS edges do not do this (fig.

4
b). Ho
w
ever, it is necessary that the
direction of all lanes is modelled as in reality it is.




Fig
ure

5
.
Geometrical r
edundancy in the SIB data set


The data fusion procedure developed here is no
t applicable to inconsistent datasets.
However, geometrical data redundancy occurs in both datasets (ATKIS and SIB). Thus,

5

multiple edges representing same real objects have to be deleted. Thereby thematic i
n-
formation loss could occur. In this case it must

be guaranteed that these thematic info
r-
mation will be inserted as additional attributes of corresponding edges in the final d
a-
t
a
set. The geo data must be unique concerning the geometry information, otherwise the
corresponding edges of two datasets can

not

be uniquely assigned. Fig. 5

shows an e
x-
ample of data redundancy in the SIB dataset. The marked segment (fig
. 5

left
-
hand side)
contains 13 lines in the dataset, which correspond to 13 edges including their a
t
tributes
(fig
. 5

right
-
hand side).


3
.

Methodo
logy

DataMerge is based on direct comparison of coordinate. Therefore all used spatial data
must have the same coordinate system. Each dataset is splitted into multiple segments,
i.e. branches. These branches are assigned to each other. Then a semantic inf
ormation
can be transferred from one dataset to another dataset.

For the fusion/extension of two datasets it is absolutely necessary to define a relation
between the edges of both datasets. In this case the edge geometry serves as a relation.
Thus, a same

coordinate system is one of the most important requirements for the data
fusion. Furthermore, other assumptions must be defined, which have to be fulfilled b
e-
fore applying DataMerge approach. The spatial data must:



have the same coordinate system,



repres
e
nt the same investigation area,



be „ piecewise
” parallel to each other,



use linear primitives.

Any geo object contains geometrical and semantic information. A spatial object is d
e-
scribed using one or several edges without intermediate points together with
its attri
b-
utes. Therefore, every line in the used datasets, which are stored in ASCII format, corr
e-
sponds to an edge. Every such line stores information about semantics and geometry of
an edge. Fig.
6

shows the processing order of the DataMerge algorithms.

In the follo
w-
ing subsections they will be drawed near.


Decomposition of edges in segments without intermediate points

The geodata will be converted into a directly readable format (ASCII) for their subs
e-
quent processing. For unifying the datasets, all in
termediate points of each edge were
removed, so that all edges contain a beginning and final node only. For this purpose in
every dataset available polylines were split up. A polyline edge with attributes
A

ha
v
ing
n

intermediate points is split up into
n+1

segments without intermediate points. Ther
e-
by, every segment is assigned a copy of the original attributes
A
.


6



Figure
6
.
DataMerge workflow


Preparing the ATKIS data set

As seen in section

2

the used ATKIS dataset contains
some g
e
ometry redundancies. In

order to fulfil the requirements of redundancy free data
defined above, redundant data entries have to be removed. This is done by performing a
search for identical edges. The objects in used ATKIS dataset are classified by object
types. This property is
used for simplification of the dataset by splitting it in
M

subr
e
c-
ords containing only one object type, where
M

is the count of object types defined in the
main ATKIS dataset.


SIB lane axis generation

Unlike the ATKIS or NAVTEQ datasets the Street Info
r-
m
a
tion Database contains no direct information about the geometry of the road axis, but
only information about the median axis. The information about the road axis is ind
i
rec
t-
ly stored in the attribute table as the roadway width, which gives the distance to
the m
e-
dian axis. However, for further processing the description of the road axis g
e
ometry is
explicitly needed. An algorithm is developed in order to generate this geom
e
try from
the existing SIB data.


Extraction of sub branches

The data fusion algorithm
is based on the search of corr
e-
sponding geometries of two or more different datasets. This search makes less effort as
smaller are the sub datasets to be compared. For this reason the used geo datasets are
disassembled in different branches with unique

sta
rting (
SP
) and end point (
EP
). D
e-
composition of a dataset re
sults in
B

individual branches, where
B

is the count of star
t-
ing or end points, respectively. In general starting and end points can have global, cros
s-
road, fusion or junction chara
c
ter.
Table 1

i
llustrates these different cases. Using criteria
defined in
table 1

a decomposition of each dataset branch into its sub branches was pe
r-
formed.







7


Starting
points

End points

Type

Sketch

Slip roads



Fus
i
on




Jun
c
tion


Cros
s
roa
ds



Cros
s
roa
d



Table

1
.
Branch intersection types


All characteristic points (global
SP
/
EP
, junction, fusion and crossroad points, locations
with a direction mistake) were detected by determining the counts
N
sp

and
N
ep

of every
starting and end point of each segment
, respectively. A point with
N
sp
=1 or
N
ep
=1 is a
global starting or end point, respectively. For
N
sp
=1
and

N
ep
=2 a fusion point is given,
while a junction point results from
N
sp
=2
and
N
ep
=1. A crossroad is given for
N
sp
=
N
ep
=2. Locations exhibiting two
SP

o
r two
EP

represent

a direction error (see se
g-
ments
1

and
10

in fig.
7
b).



Fig
ure
7
.
Extraction of particular segments


The characteristic points of the considered data set are assumed to be starting and/or end
points of avai
l
able segments in a data set.
Starting from a
SP

a path following is carried
out searching for the next corresponding edge. This process is interrupted at a compl
e-
mentary characteristic point resulting in an e
x
tracted sub branch.
Decomposition of a
data set results in
B
S

separate sub b
ranches, where

B
S

is the count of starting (
SP
) points.

An example is shown in fig. 5. On the left
-
hand side a motorway

detail with

certain
driving directions is shown. The right
-
hand side shows the results of
the branch deco
m-
position. 9 particular sub bra
nches with unique starting and end points were ge
n
erated
this way.





8

Union of the sub branches

The assignment of branches from different data sets is simpler and more reliable for a
co
m
parison of less branches. Therefore, unifying interrelated sub branche
s the branch
number is minimized. The union of contiguous sub branches results in searching for sub
branch interfaces at which starting and end points matches fulfilling the condition
α<α
max
, where
α

is the contact angle (
Stankute
,

2007).
All sub branches that fulfil this
condition are co
m
bined into contiguous branches.


Branch assignment between different data sets

After the preprocessing step
each
used dataset is stored in several fil
es. Each file co
n
tains exactly one contiguous branch
with an unique global starting and end point. Before merging the datasets (attributes and
geometry) the particular branches from different datasets must be assigned to each ot
h-
er. The merging process of
different datasets becomes simpler and more reliable this
way, because only two branches corresponding in the reality are compared. Due to the
dataset structures the a
pplied assignment has a
1
-
to
-
N

character.

During the automatic
assig
n
ment of branches fro
m different datasets always two datasets are considered only.
Taking first branch of the first dataset all matching branches are searched from the se
c-
ond dataset. This process is done for all branches of the first dataset.

In the first step the
distance
d
s

between the starting points of two branches from two different data sets is
checked for the condition
d
s

>
d
s
max

In this case,
also follo
w
ing conditions are checked:



de<de
max

(distance between end points of two branches),



d
s
e<d
s
e
max

(distance between sta
rting and end point of two branches) and



de
s
<de
s
max

(di
s
tance between

end point and starting point of two branches).

This algorithm is used to check whether two branches are close enough to each other.
However, this is not sufficient for an unambiguous ass
ignment. Additionally, it is ne
c-
essary to check whether two considered branches have the same direction, too. In this
case the branch assignment is completed. This process is performed for all available
branches.


Integration of the geodata sets

After succ
essful branch assignment a fusion of the data
is carried out. The data set containing the basic geometry and thematic

information

ta
k-
en for an extension is called
target
data set. The data set from which the data are su
p-
posed to be read is called
source
da
ta set (cf. fig.
8
). Starting from the
SP

of a branch
from the
source

data the corr
e
sponding starting point of each edge from the
target

data
set is being searched accor
d
ing to certain criteria. The search conditions are listed in tab.
2
. The right column
illustrates the corresponding search co
n
dition.




Tab
le
2
.
Conditions used for searching the corresponding edges


9




Fig
ure
8
.
Search and transfer

of points


As tab.
2

shows the goal of the algorithm is to find the corresponding
target

data set
edge whic
h gets
a new starting point taken from a source data set edge (
Wolff &
Stankute, 2008
). This search lead
s

to a comparison of the coordinate components. Fig.
8

i
l
lustrates the geometry transfer process.
The data sets contain only linear structures.
Thus, ea
ch edge corresponds to a straight line
g
. The intersection
point of the straight
lines
g
h

and
g
t

is i
n
serted into the current edge (
g
t
) of the branch from the
target
data set.
The end point of the current
source

data set edge (
g
s
) corresponds to the inse
r
t
ion of the
starting point of the next edge from the
source

data set. Therefore starting points have to
be tran
s
ferred, only. The attributes of the current
source
edge (
g
s
) are transferred onto
all
target

edges between the two inserted points. This process
is repeated for all branch
edges of all branches from the
sour
ce data set
. Fig.
8

shows the results of the fusion of
two branches (the
lower

branch contains the new added points).


4

Results

The first reliability and quality analysis is performed using dat
asets of a test area d
e-
fined around southern Berlin. The developed algorithms for merging geodata were tes
t-
ed with multiple attributes of three geo datasets (ATKIS, NAVTEQ and SIB).
DataMerge tried to transfer information (bridges and traffic signs) from t
he SIB dataset
to the NAVTEQ and ATKIS datasets, respectively. The test area contained 36 traffic

signs and 9 bridges. While 100
% of bridges were transferred correctly, DataMerge was
able to correctly transfer 86.1
% of traffic signs to the target dataset

(
NAVTEQ), 5.5
%
were transferred false and 13.8
% were not transferred. Using ATKIS dataset as the ta
r-
get dataset similar results were obtained. Unfortunately, fig.
9

does not cover all search
cases, which is the main reason for the not transferred inform
a
tio
n (traffic signs). There
is a region in the search space, where the corresponding edge can not be found while
data integration. Fig.
9

illustrates this problem. Points located in the grey marked area
can not be transferred, because the corresponding edge c
annot be found.


Summary and concluding remarks

The developed algorithms are implemented using
the scripting language PERL. It is possible to use them on the command line. However,
a user friendly graphical user i
n
terface allows for a simple appliance and

control of the
DataMerge command line tools, respectively. The developed DataMerge algorithms

10

allow automatic merging process

of user selected attributes. The main focus is set on the
data preprocessing (decompos
i
tion of polyline based datasets, extractio
n of branches) in
the first step and merging of ge
o
metrical and semantic information in the second step
(branch assignment, integr
a
tion of geo data). Many data processing problems resulted
from the inconsistency (wrong edge direction, topological inaccurac
y) of used datasets.
However, used datasets are based on different kinds of modelling the reality. This a
l-
lowed for non
-
trivial comparison of the datasets requiring deep efforts in the prepr
o-
c
essing step in order to fulfil certain criteria required by the
DataMerge algorithms.





Figure
9
.

Restricted search space


First reliability and quality analysis of implemented DataMerge algorithms has shown
rathe
r good results. About 90
% of desired information could be transferred. However
there are still restricti
ons in presented algorithms (e.g. restricted search space). Ther
e-
fore, our future work will focus on improving the presented approach for merging di
f-
ferent geo datasets, i.e. increasing the quality and accuracy of transferred geo info
r-
m
a
tion covering whole

search space. An additional goal will be extending the approach
to more general datasets including polygon structured geo data.


References

Bartelme, N.

2005.

Geoinformatik: Modelle, Strukturen, Funktionen. Springer, Heide
l-
berg

B
utenuth,
M
.
, ed
. 2007.

Int
egration of heterogeneous gespatial data in a federated dat
a-
base.
ISPRS Journal of Photogrammetry & Remote Sensing
,

Volume 62,
pp
. 328
-
346.

S
ester,
M
.
et al.,
2007
.

Semantische und Geometrische Integration von Geodaten
.

In:

Ka
r
tographie als Baustein modern
er Kommunikation, Symposium 2007 in Königslutter
am Elm,
Kartographische Schriften
, Band 14,
pp
. 53
-
58.

S
tankute,
S
., 2007.
Entwicklung und Implementierung von Algorithmen für ein
autom
a
tisiertes Ve
r
fahren zur Zusammenführung von Parametern aus Geodatenban
ken
unter besonderer Beachtung von Unschärfen in der Georeferenzierung.
Unpublished
master thesis

(Universität Potsdam/DLR
-
Verkehrsstudien).

W
alter,
V
.,
F
ritsch,
D
.
,

1999.

Matching spatial data sets: a statistical approach.
Intern
a-
tional Jou
r
nal of Geograp
hical Information Science
, Nr. 13,
pp
. 445
-
473.

W
olff,
M
.,
S
tankute,
S
.,
A
sche,
H
. &
Z
enner,
C
.
,

2008.

Erzeugung von GIS
-
Datenbeständen für themen
-
kartographische Anwendungen mittels Datenfusion. In:
Strobl, J
.,
B
laschke,
T
. &
G
riesebner,
G
.
,

ed. 2008.
Angewandte Geoinformatik 2008.
Beiträge zum 20. AGIT
-
Symposium
, Sal
z
burg,
pp
. 83.

X
iong,
D
.,
2000
.

A three
-
stage computational approach to network matching.
Transpo
r-
tation R
e
search
, part C 8 (1
-
6),
pp
. 71
-
89.