Part B B1. Title. Spatial Mining for Data of Public Interest

bistredingdongΜηχανική

31 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

193 εμφανίσεις

SPIN!, IST
-
99
-
10536, 15.06.1999


1


1


Part B




B1. Title.
Sp
atial M
in
ing for Data of Public
Interest






SPIN!










Proposal No. IST
-
1999
-
10536














Proposal for:

IST programme, 1.1.2
-
5.1.4 Cross
-
Programme Action CPA4:
New Indicators and statistical
methods

SPIN!, IST
-
99
-
10536, 15.06.1999


2


2


B3. OBJECTIVES
................................
................................
................................
................................
................................
.....

3

B4. CONTRIBUTION TO
PROGRAMME/KEY ACTION

OBJECTIVES

................................
................................
.....

5

B5. INNOVATIONS

................................
................................
................................
................................
................................
.

6

S
TATE OF THE
A
RT
................................
................................
................................
................................
..............................
6

T
ECHNOLOGICAL
&

S
CIENTIFIC
A
DVANCES
................................
................................
................................
....................
7

D
ISTRIBUTION OF
W
ORKLOAD ON WORK PACK
AGES
................................
................................
................................
..
11

I
NTRODUCTION TO WORKP
ACKAGES
................................
................................
................................
..............................
16

R
ISK MANAGEMENT
................................
................................
................................
................................
............................
17

P
ERT DIA
GRAM

................................
................................
................................
................................
................................
....
20

W
ORK PACKAGE DESCRIPT
ION

................................
................................
................................
................................
........
21

C2. CONTENTS FOR PA
RT C

................................
................................
................................
................................
...........

40

C3. COMMUNITY ADDED
VALUE AND CONTRIBUTI
ON TO EU POLICIES

................................
........................

41

C4.

CONTRIBUTION TO COMM
UNITY SOCIAL OBJECTI
VES

................................
................................
..............

42

C5. PROJECT MANAGEME
NT
................................
................................
................................
................................
...........

43

C6. DESCRIPTION OF T
HE CONSORTIUM

................................
................................
................................
...................

45

C7. DESCRIPTION OF T
HE PARTICIPANTS
................................
................................
................................
.................

46

GMD

-

G
ERMAN
N
ATIONAL
R
ESEARCH
C
ENTER FOR
I
NFORMATION
T
ECHNOLOGY

................................
............
46

D
EPARTMENT OF
I
NFORMATICS OF THE
U
NIVERSITY OF
B
ARI
................................
................................
.................
48

S
CHOOL OF
G
EOGRAPHY AT THE
U
NIVERSITY OF
L
EEDS

................................
................................
...........................
49

T
HE
I
NSTITUTE FOR
I
NFORMATION
T
RANSMISSION
P
ROBLEMS
,

R
USSIAN
A
CADEMY OF
S
CIENCES
(IITP

RAS)

................................
................................
................................
................................
................................
................................
50

D
IALOGIS
S
OFTWARE
&

S
ERVICES
G
MB
H,

S
T
.

A
UGUSTIN
,

G
ERMANY

................................
................................
.....
51

P
R
OFESSIONAL
GEO

S
YSTEMS
B.V.

(PGS),

A
MSTERDAM

................................
................................
.............................
52

G
EO
F
ORSCHUNGS
Z
ENTRUM
,

P
OTSDAM
,

G
ERMANY
D
ESCRIPTION OF THE PA
RTNER

................................
...........
52

M
ANCHESTER
M
ETROPOLITAN
U
NIVERSITY
/MIMAS

................................
................................
...............................
54

C8. ECONOMIC DEVELOP
MENT AND SCIENTIFIC
AND TECHNOLOGICAL PR
OSPECTS

..........................

54

APPENDIX


PUBLICATIONS OF PART
NERS CITED IN PART B

................................
................................
..........

58

R
EFERENCES PARTNER
P1



GMD

................................
................................
................................
................................
...
58

R
E
FERENCES PARTNER
P2

-

U
NIVERSITY OF
B
ARI

................................
................................
................................
........
59

R
EFERENCES PARTNER
P3



IITP,

R
USSIAN
A
CADEMY OF
S
CIENCES
................................
................................
........
59

R
EFERENCES PARTNER
4



L
EEDS

................................
................................
................................
................................
....
59

R
EFERENCES PARTNER
P5



D
IALOGIS
................................
................................
................................
............................
60

R
EFERENCES PARTNER
P6



PGS
................................
................................
................................
................................
......
60

SPIN!, IST
-
99
-
10536, 15.06.1999


3


3

B3. Objectives


To develop an integrated interactive internet
-
enabled spatial data mining system.
Data mining
systems (DMS) and geographical info
rmation systems (GIS) are complementary tools for describing,
transforming, analysing and modelling data about real world systems. Most contemporary GIS facilitate
only very basic spatial analysis and data mining functionality and many are confined to simp
listic
analysis that involves comparing maps or descriptive statistical displays like histograms and pie charts.
There is growing demand for integrated geographical or spatial data mining systems (SDMS) from
public and private sector organisations who need

both enhanced decision making capabilities and
innovative solutions to a wide range of different problems. An integrated, user friendly SDMS operable
over the internet offers exciting new possibilities for all manner of geographical research and spatial
d
ecision making. Thus the overall objective of SPIN! is to develop a state of the art, fully functional,
truly integrated, internet
-
enabled, easily extendable and modifiable GIS
-
DMS platform, SPIN
-

a
comprehensive and intuitive SDMS for data of public inte
rest. In recent years, a number of project
partners have developed the technological components and scientific tools that are needed to develop
the kernel of this type of SDMS. During this project these individual efforts and the associated
expertise and e
xperience will be united in a joint European effort.
SPIN! Consortium
partners from
statistical offices and seismic research centres will use the system in applied research and provide
feedback to direct the development efforts. The applications of SPIN wi
ll clearly demonstrate the
generic utility and additional benefits that this type of SDMS will have over existing technologies.
Industrial partners will develop a business model for web
-
based information brokering with
georeferenced statistical data, and e
stimate the likely economic impacts of the technology. The
following scenarios describe some of the wide ranging potential benefits that statistical analysts,
environmental decision makers, seismic data experts, biodiversity researchers and other public an
d
private sector users can expect from such a system and introduce some of the main features that SPIN

will include.


To improve knowledge discovery by providing an enhanced capability to visualise data
mining results in spatial temporal and attribute dim
ensions.

Imagine a statistical officer has to
prepare a report describing unusual aspects of African demography inter
-
related with socio
-
economics
and the physical environment. Suppose initially the officer applies a data mining technique to classify all
c
ountries based on death rate and life expectancy and one classified subgroup with unusually high
death rate and low life expectancy includes 40 African countries and only 51 in all. Suppose the officer
creates a statistical display of all the classified gr
oups (Fig. 1) and then decides to map the geographical

distribution of the unusual subgroup distinguishing between African countries and those elsewhere (Fig.
2). The geographical distribution of the subgroups shown by the map may initiate ideas for furthe
r
analysis. For instance, the analyst may wish to select sets of countries from the map to take a closer
look at their demography and other geographical variables that describe socio
-
economic and
environmental conditions. In addition, the officer may wish
to discover what demographic attributes
best characterise each continent at different points in time and investigate which groups of
demographic attributes have interesting spatio
-
temporal co
-
distributions and inter
-
relationships with
other socio
-
economic
and environmental variables. All the analysis, some of which is quite complex
could clearly be performed quicker and easier if an integrated SDMS with a linked display component
and reporting system were available for use. It would be a major benefit if th
e maps and other data
displays were automatically generated by a knowledge base of statistical display and thematic data
mapping and these were automatically linked so that information the officer is focussing on during the
analysis is simultaneously highl
ighted in all the relevant displays. This type of linked GIS style display
component will be developed as a fundamental part of the integrated visualisation component of SPIN,
which would facilitate this kind of statistical analysis (see partner P1, public
ation 3).

SPIN!, IST
-
99
-
10536, 15.06.1999


4


4


Figure 1
. Descriptions of interesting subgroups


Figure 2.

Visualisation of the subgroup.

To develop new and integrated ways of revealing complex patterns in spatio
-
temporally
referenced data that were previously undiscovered using existing
methods.

Suppose an
environmental decision maker is asked to look for relations between lung cancer and environmental
pollution. What may be desired initially is some kind of exploratory spatial data analysis (ESDA)
technique that automatically detects unu
sual spatial clustering of lung cancer incidence in the entire
data set and for specific time periods. Additional spatial and aspatial analysis methods might then used
to try and explain any unusual spatial clustering patterns observed using a range of oth
er spatio
-
temporal and aspatio
-
temporal variables. In SPIN, exploratory spatio
-
temporal pattern analysis
techniques derived from existing ESDA tools will be integrated with a wide variety of temporal, spatial
and aspatial analysis methods. Partner P4 has d
eveloped a suite of ESDA tools that detect unusual
clusters of incidence and produce mapable output that reveals the clustering pattern. Temporal versions
of these tools and outputs will be developed along with the mechanisms for exporting the results of t
he
analysis into other temporal, spatial and aspatial data mining techniques. Having all the tools available in
one integrated SDMS would allow the decision maker to perform an in
-
depth, spatio
-
temporal analysis
quickly and thereby help develop understandi
ng of the geographical processes and inter
-
relationships
that may result in an increased risk of contracting lung cancer. The analytical speed up will allow the
decision maker to generate and test more hypotheses regarding the observed spatial, temporal an
d
spatio
-
temporal patterns and to investigate even more advanced hypotheses about causal relationships.


To enhance decision making capabilities by developing interactive GIS techniques, which
provide an integrated exploratory and statistical basis for inv
estigating spatial patterns.

Seismic data experts regularly use GIS to help them spot geoenvironmental data patterns related to
seismic activity. However, the complexity of geoenvironmental processes and noise in the spatial
patterns of these variables mak
es it very difficult to objectively compare seismic maps with other
SPIN!, IST
-
99
-
10536, 15.06.1999


5


5

geoenvironmental maps and identify interesting patterns and relationships. To help reduce the likelihood
of becoming overly subjective, a seismologist may wish to initially classify and se
lect groups of areas
with similar geoenvironmental characteristics and then perform statistical tests to investigate general
differences in localised distributions of selected areas belonging to the same geoenvironmental group in
the classification. An int
eractive version of SPIN will clearly aid the seismologist in the process
classifying and selecting these areas and in performing the statistical tests. By simplifying this analysis
task, the user can focus on looking for interesting patterns and testing a

great number of alternative
hypotheses.


To deepen the understanding of spatio
-
temporal patterns by visual simulation.
Imagine a
biodiversity researcher wants to investigate the migratory flight route of a flock of storks travelling
from Europe to Africa.

Suppose the researcher uses a global positioning system (GPS) to track the
progress of these birds and wishes to visually simulate the migration to provide an overview of the
migratory route, the speed of different parts of the journey and identify areas
where the storks rested
along the way. SPIN will provide the capability to develop and play back this type of simulation over
the internet. The same technique can be applied in many other areas, for example, logistics companies
may want to use it to help k
eep track of orders and optimise transport routes or transport planners may
desire it to aid the development of integrated transport networks.


To publish and disseminate geographical data mining services over the internet.
Suppose the
various analysts des
cribed above (i.e. the statistical officer, the environmental decision maker, the
seismic data expert and the biodiversity researcher) want to distribute their results quickly and cost
effectively to encourage similar applications and promote world
-
wide sc
ientific exchange of their
research. Furthermore, suppose they want to publish both the conclusions and the details of their entire
geographical data mining investigation so that other similar research can extend, generalise and build on
their analyses. Im
agine also that these researchers want to enable others to access and use the same
analysis tools that were available to them. To realise all of this, they would probably need a relatively
automatic way to plug
-
in their specific application to a Java
-
based

internet enabled SDMS. This would
then enable anyone with a standard web
-
browser to replicate and perform similar analyses wherever
and whenever desired (see partner P4, publications 2 and 9; partner P1, publication 1,2; partner P3,
publication 1,2). The
proposed SDMS, SPIN will provide this type of capability in an integrated
organised fashion.

B4. Contribution to programme/key action objectives


The proposal contributes to the IST programme objective of
building key, user
-
friendly applications that
enabl
e the potential of the information society

in several ways:




Merging data mining and GIS based technology offers exciting new possibilities for spatial data
research that is applicable in a wide variety of problem domains. Much expert geographical
analysis

has been restricted by prescribing in advance and exclusively following either a statistical
or a GIS based approach. When both approaches have been applied, error prone and cumbersome
data transfer between different applications has been necessary, nonet
heless, useful information
has been extracted from georeferenced data much more effectively by employing both approaches
simultaneously. Clearly an integrated SPIN will facilitate such analysis and help to develop
understanding of a wide range of geographi
cal processes faster enhancing research and decision
making in diverse application areas.



SPIN will provide a user friendly interface to advanced data mining functionality, GIS and
exploratory spatial data analysis tools that can be accessed via the inter
net.



The system will enable quick and cost effective dissemination of information via the internet and
enhance web
-
based research capabilities.



SPIN!, IST
-
99
-
10536, 15.06.1999


6


6

The objective of
nurturing emergent technologies

is supported by the development of an innovative
business mod
el. A web
-
based brokering service is proposed that is designed to add value to the
dissemination of data and information providing a key to the commercialisation of the software and the
service it facilitates.


The proposal contributes to
CPA4 (New indicat
ors and statistical methods)

by developing new tools for
extracting information from data by adapting data mining functions specifically for spatial analysis. This
includes adapting methods from Bayesian statistics, machine learning and other adaptive tech
niques so
they can be launched from an integrated environment, which assists experimental comparison of their
relative strengths and weaknesses.


A further contribution to
CPA4

derives from developing technology for the
user
-
friendly dissemination
of stati
stical data
. SPIN will enable the dissemination of interactive statistical maps and provide data
mining services over the internet, where the users need nothing but a standard web
-
browser such as
Netscape or Internet Explorer. Many of the problems relevant

to this use of SPIN will be addressed in
an application that aims to facilitate the analysis of census data over the internet. The proposed web
-
based brokering service aims to go even further by enhancing the user
-
friendly and
cost
-
effective

dissemination

of data.


The proposed system will be generic and easily adaptable to diverse application areas and the research
is specifically relevant to the following key actions of the cross
-
programmatic action (CPA) of the IST
programme:




Key Action I.4
: Systems an
d services for citizen administration; systems enhancing the efficiency
and user
-
friendliness of administrations. This is addressed in work package WP9 by the application
to develop user friendly dissemination of statistical data.



Key Action I.5
: Intellige
nt environmental monitoring and management systems; environmental risk
and emergency management systems (in conjunction with hazards and earth observation). These
are addressed in work package WP8 by an application of the proposed system to the analysis of

seismic and volcano data.



Key Action II.3.2
: New methods of work and electronic commerce. New market mediation
systems, to develop innovative market place concepts and technologies. This will be addressed in
the web
-
based brokering application in work pac
kage WP9.



Key Action II.4.3
: Digital object transfer. This will be addressed by a specific task within work
package WP2 that aims to develop efficient and appropriate means of distributing data and maps
over the internet.



Key Action III.1
: The future prior
ity action line concerning
geographic information

is also clearly
addressed.

B5. Innovations

State of the Art

Contemporary GIS are monolithic closed systems that can be difficult to use and are usually very
expensive. In the last few years a new generation

of GIS has been emerging that enable interactive,
dynamic maps to be disseminated via the Internet (see partner P1, publication 1, 3; partner P4,
publication 4; partner P3, publication 10, 11). So far, most of these systems are confined to projecting
desc
riptive statistical displays, such as histograms or pie charts, onto geographical space (maps). As
decision making and inference using these projected map displays is not always straight
-
forward, data
mining offers great potential benefits. The range of ap
plication areas is huge and there are many
different types of applications in statistical analysis, urban planning, environmental decision making, and
geomarketing for example.


SPIN!, IST
-
99
-
10536, 15.06.1999


7


7

Largely unconnected to GIS research a wide range of analysis techniques now c
ommonly referred to
as data mining functions have been developed. These data mining functions are extensions of analytical
techniques known for decades and have been packaged in various ways to form a large number of
essentially very similar data mining sy
stems (DMS). Some DMS provide user friendly interfaces and
visual programming environments that the non
-
expert can use to help automate the search for hidden
patterns in large databases. Interest in DMS has boomed in recent years partly as a result of the
packaged nature of the technology and improving graphical user interfaces, but mainly because of the
desperate need for commercial enterprises to make returns on often large investments in data
warehouses. Since the GIS revolution in the early 1980s there
has been an explosion of geographically
referenced information forming a rapidly expanding geocyberspace (see partner P4, publication 1),
wherein much of the data is also temporally referenced. Commercial enterprises and government
organisations have been
swamped by this data explosion with few tools to extract useful information
that can be applied in decision making contexts to solve problems and improve their function. By
combining the strengths of GIS and DMS the proposed SDMS, SPIN, will have even grea
ter
functionality and should be a huge help to decision makers and spatial analysts charged with the task of
backing up their intuitive insights using real world data. Some of the integrated components not
currently present in either GIS or DMS include exp
loratory spatial data analysis methods that search
for geographical patterns and relationships in complex space
-
time
-
attribute domains.


Extending and integrating GIS and DMS to develop an internet enabled geographical data mining
system is a logical progr
ession for spatial data analysis technology. This development is poised to play a
major role in the proposed terms of reference 1999
-
2003 of the
Commission on Visualisation and
Virtual Environments
of the International Cartographic Association (MacEachren
and Kraak 1999
1
)
and it can be expected that a great deal of research effort is needed to this effect in coming years.
DMS and GIS are quite complex tools with wide ranging functionality and capabilities, so the
SPIN!
Consortium

does not propose to start
from scratch, but to build on existing tools. Many of these
existing tools have been developed by various partners during 4
th

framework research, and many have
passed the prototype stage and have well established user communities. One major advantage of th
e
SPIN! Consortium

is that the software developers will have access to the source code of all the
various module components, which facilitates a seamless integration of all the technology in SPIN.
(This would not be possible if the system were to be develo
ped on top of third party proprietary
products.) The system will be based on open standards such as Java and TCP/IP. The evolutionary
prototype development approach proposed has many benefits. Users will be able to provide feedback
on SPIN prototype requir
ements and performance throughout the project (starting from day one), and
progressive prototype versions of the system will guide the development effort to fulfil user
expectations by the end. The early development of prototypes is known to be one of the
most effective
counter
-
measures to limit the risks of such software development.

Technological & Scientific Advances


First system that tightly integrates state of the art GIS and data mining functionality in an
open, extensible, internet
-
enabled plug
-
in a
rchitecture
. The system will integrate a rich
functionality:



a data mining platform (see partner P1 and P5, publication 10);



an internet enabled tool for interactive manipulation of statistical maps (P1, publication 1,2);



an application for exploratory sp
atial data analysis (partner P4, publication 2);



new modules for spatial data mining (see below);



new modules for visualising temporal data and spatial data mining results; and



a Java based GIS (partner P6, publication 1).





1

See the following URL for details:
http://www.geovista.psu.edu/ica/icavis/terms.html

SPIN!, IST
-
99
-
10536, 15.06.1999


8


8

The generic system architectur
e is easily adaptable to diverse application areas such as seismic data
analysis and hazard management, environmental decision making, and census data dissemination.


Adapting machine learning methods to spatial analysis
. It is generally accepted that curr
ently
there exists no single data mining or machine learning method that is efficacious in every case.
Available methods differ in many ways in terms of complexity, representational power, accuracy,
scalability, comprehensibility, and their ability to cope

with noise and missing values, and many others
factors. Different methods based in different approaches make different assumptions about the data
being analysed which may not matter in some cases and maybe totally inappropriate in other cases. It
is there
fore important that users have access to a variety of spatial data mining methods, and help so
they choose and combine whichever methods seem most appropriate for their task. In developing SPIN

we will advance the state of the art in spatial data mining in

several ways.


Symbolic machine learning methods will be adapted to spatial data analysis, in particular, inductive logic
programming (ILP) algorithms for the discovery of subgroups and spatial association rules. Efficient
methods for the discovery of (no
n
-
spatial) association rules have been proposed in the field of data
mining, most of which can deal with propositional, or zero
th
-
order representations; however, they are
unsuitable to express higher order spatial relationships. ILP is based on first
-
order

predicate logic which

allows for the representation of relations such as
adjacent_to
,
inside
, and
close_to
. This makes ILP a
natural and promising approach to many forms of spatial data mining. Methods for the induction of
first
-
order rules have been exte
nsively investigated within ILP. Some of these methods have already
been applied to the automated interpretation of topographic maps (see partner P2, publication 2,3). In
this case, symbolic first
-
order descriptions of cells of a map are automatically extr
acted from a vector
representation of maps stored in an object
-
oriented database. Intelligent map feature extraction is a
challenging task. Advances in this field would open new possibilities for enhancing intelligent automated
map design; also first
-
order

descriptions of maps could be fed into (future) first
-
order learning systems
as background knowledge, e.g. for topographically informed subgroup discovery.


Combining the expressive power of first
-
order learning methods with the coherence and
scalability
of Bayesian statistics.

First
-
order machine learning methods tend to be search intensive,
and when dealing with large sets of data and highly dimensional dependencies, scalability might become
a problem. To overcome this problem, we will investigate how sc
alability can be improved by the use of

adaptive sampling, i.e. active learning techniques based on Bayesian Decision Theory. This will also
help to bridge the gap between first
-
order learning and statistics.


Applies advanced Bayesian classification, pre
diction, and interpolation to spatial data.
In the
last years computationally intensive Bayesian methods have been developed that compare favourably
with classical approaches. Instead of selecting an “optimal” model they generate a whole distribution of
mo
dels which characterise their uncertainty in the light of the available data. On the one hand they
derive predictive distributions for new inputs reflecting the actual uncertainty and information. On the
other hand they allow a rigorous assessment of the a
dequacy of different model types. This method has

already been successfully applied by partner P1 (see partner P1, publication x13) to credit scoring and
will now be adapted to spatial data.


Automating the exploratory spatial data analysis of geographical

data.

Various exploratory
spatial data analysis tools have been developed by partner P4 (see partner P4, publication 2) and made
available for research via the internet. However the current format of the application may be criticised
in that it is not use
r
-
friendly enough, and users are restricted to a select few input and output data
formats. The search methods used in it are unintelligent brute force heuristics that could be improved
by the application of artificial intelligence methods to direct the sea
rch. Early experiments by partner
P4 indicate that there is great potential for these heuristics especially when analysing data in a multi
-
attribute space
-
time
-
attribute tri
-
space (see partner P4, publication 3). So by improving the quality of
SPIN!, IST
-
99
-
10536, 15.06.1999


9


9

the search p
rocedure the belief is that much larger more complex data sets can be investigated in a
scalable way. To address the need for the system to communicate with other packages, both local and
remote, the tool developed will make use of CORBA for data input and

results output. Partner P4 also
plans to develop improved visualisation tools to allow users to view the outputs of the tools developed in
an easy and obvious way that aids their understanding of the results instead of hampering them as
many current tools

do.


Uses knowledge based systems technology to involve the expertise on thematic cartography
in supporting visual mining of spatial and temporal data.
Currently there is a recognised need in
combining cartographic visualisation (meaning building maps to
facilitate visual data exploration) with
data mining (see, for example, special issue of Int. J. Geographical Information Science on
Visualization for Exploration of Spatial Data, v.13(4), June 1999). Within the project we plan to develop
both cartographic
al interface for preparing (selecting, preprocessing, etc.) data for data mining and
interactive map presentation of results of data mining dynamically linked with specially designed non
-
geographic illustrations. Especial attention will be paid to interact
ivity of maps and other graphical
displays and to the visualisation and analysis of the temporal aspect of data.


Use of new techniques for efficient distribution of large maps for low bandwidth networks.

Special attention will be given to develop efficien
t mechanisms that reduce the amount of data that has
to be transferred from the client to the server.

SPIN!, IST
-
99
-
10536, 15.06.1999


10


10


B1.

Workpackage list









Work
-
package

No
2

Workpackage title

Lead

contract
or

No
3

Person
-
months
4

Start

month
5

End

month
6

Phas
e
7

Deliv
-
erable

No
8

WP1

Coordination

P1

34

0

36

-

D1.1
-
1.4

WP2

Identify user needs, define and
realize a generic system
architecture that integrates GIS
and Data Mining functionality

P1

69

0

36

-

D2.1
-
2.6

WP3

Extend machine
-
learning methods
to spatial mining

P2

42

0

36

-

D3.1
-
3.9

WP4

Generalize Bayesian Markov
Chain Monte Carlo to spatial
mining

P1

40

0

36

-

D4.1
-
4.7

WP5

Adapt and integrate methods for
spatial pattern analysis

P4

40

0

36

-

D5.1
-
5.7

WP6

Develop support of visual analysis
of time
-
dependent spatial data

P
1

40

0

36

-

D6.1
-
6.6

WP7

Develop methods for visualization
of Data Mining results within GIS

P1

40

0

36

-

D7.1
-
7.6

WP8

Application to seismic and
volcano data

P7

70

0

36

-

D8.1
-
8.9

WP9

Application to web
-
based
dissemination of data from
statistical off
ices

P8

49

0

36

-

D9.1
-
9.6




2

Workpackage number: WP 1


WP n.
-

3

Number of the contractor leading the work in this workpackage.

4

The total number of person
-
months allocated to each workpackage.

5

Relative start date for the work in the specific workpackages, month 0 marking the start of the
project, and all
other start dates being relative to this start date.

6

Relative end date, month 0 marking the start of the project, and all end dates being relative to this
start date.

7

Only for combined research and demonstration projects: Please indicate R for research

and D for
demonstration.

8

Deliverable number: Number for the deliverable(s)/result(s) mentioned in the workpackage: D1
-

Dn.

SPIN!, IST
-
99
-
10536, 15.06.1999


11


11

WP10

Develop a business model for
web based information and
service brokering with geo
-
referenced data

P6

24

0

36

-

D10.1
-
10.5

WP11

Dissemination

P8

38

0

36

-

D11.1
-
11.5


TOTAL


482






Distribution of Workload on work packa
ges




Partner










P1

P2

P3

P4

P5

P6

P8

P8


Total

Coord

WP1

28



6





34

Techn. Dev.

WP2

30


2

9

18

10



69

ML

WP3

18

24







42

Bayes

WP4

30



4


6



40

ESDA

WP5




36





36

Vis. Spa
-
T

WP6

28



12





40

Vis. DM

WP7

28



12





40

Seis.D
at

WP8

3


18

3

2

12

32


70

Stat. Off.

WP9

3



6

2

4


34

49

Web
-
Brok.

WP10

2




12

10



24

Dissem.

WP11

2



8

2

14

4

8

38



172

24

20

96

36

56

36

42

482




SPIN!, IST
-
99
-
10536, 15.06.1999


12


12


B2.

Deliverables list


Deliverable

No
9

Deliverable title

Delivery

date

10

Nature


11

Dissemi
nation

level

12

D1.1

Project workplan

3

R

PU

D1.2

Reports for EC

period.

R

PU

D1.3

Project handbook

6

R

PU

D1.4

Project meetings

period.

R

PU

D2.1

System design document

8

R

CO

D2.2

Prototype 0 (incl. documentation)

12

P

CO

D2.3

Implementation of eff
icient methods for map transfer

15

P

CO

D2.4

Prototype 1 (incl. documentation)

18

P

CO

D2.5

Prototype 2 (incl. documentation)

30

P

CO

D2.6

Revision Release Prototype 2 (incl. documentation) (Final
Release)

32

P

CO

D3.1

Theoretical report on spatio
-
temp
oral subgroup discovery

6

R

PU

D3.2

Theoretical report on adaptive sampling

21

R

PU

D3.3

Theoretical report on spatial association rules

5

R

PU

D3.4

Specifications of the descriptions to be automatically
extracted from vectorized maps

15

R

CO

D3.5

Impl
ementation of subgroup discovery

8

P

CO




9

Deliverable numbers in order of delivery dates: D1


Dn

10

Month in which the deliverables will be available. Month 0 marking th
e start of the project, and all
delivery dates being relative to this start date.

11

Please indicate the nature of the deliverable using one of the following codes:


R

= Report


P

= Prototype


D

= Demonstrator


O

= Other

12

Please indicate the dissemination
level using one of the following codes:


PU

= Public


PP

= Restricted to other programme participants (including the Commission Services).


RE

= Restricted to a group specified by the consortium (including the Commission Services).


CO

= Confidential, only

for members of the consortium (including the Commission Services).

SPIN!, IST
-
99
-
10536, 15.06.1999


13


13

D3.6

Implementation of adaptive sampling for subgroup
discovery

23

P

CO

D3.7

Implementation of spatial association rules

11

P

CO

D3.8

Software for the extraction of symbolic descriptions from
vectorized maps

18

P

CO

D3.9

Report evaluating the application of first
-
order learning
methods to spatial data

36

R

PU

D4.1

Report reviewing current Bayesian approaches

6

R

PU

D4.2

Software Implementation for bootstrap

11

P

CO

D4.3

Report on advanced spatial models and c
orresponding
Bayesian models

15

R

PU

D4.4

Implementation of MCMC

18

P

CO

D4.5

Implementation of model selection

28

P

CO

D4.6

Performance evaluation and guidelines

36

R

PU

D4.7

Generic software library for spatial data transformations

6

P

CO

D5.1

Theor
etical paper on algorithms for handling interaction
with spatial location

5

R

PU

D5.2

Software for handling interaction with spatial location

11

P

CO

D5.3

Theoretical paper evaluating statistical clustering tests

14

R

PU

D5.4

Implementation of selected
statistical clustering tests

18

P

CO

D5.5

Theoretical paper on algorithms for multiple search

24

R

PU

D5.6

Implementation of algorithms for multiple search

30

P

CO

D5.7

Reports on testing and evaluation of Spatial Analysis
software tool

36

R

PU

D6.1

R
ule base on application of visualisation and interaction
techniques depending on characteristics of data and the
type of their time variation.

16

P

CO

D6.2

Software library implementing the proposed methods

26

P

CO

D6.3

Expert system engine performing se
lection of methods
according to characteristics of data

30

P

CO

D6.4

Theoretical paper on algorithms for investigation of
temporal changes

18

R

PU

D6.5

Implementation of algorithms for investigation of temporal
changes

24

P

CO

SPIN!, IST
-
99
-
10536, 15.06.1999


14


14

D6.6

Evaluation report

36

R

PU

D7.1

Description of the presentation methods proposed to
apply to results of the considered data mining methods

6

R

PU

D7.2

Implementation of visualization method for subgroup
discovery

11

P

CO

D7.3

Implementation of visualization method for spati
al
association rules

12

P

CO

D7.4

Implementation of visualization method for Bayesian
classification

17

P

CO

D7.5

Implementation of best
-
practice methods for visualisation
in ESDA

17

P

CO

D7.6

Report on current & potential application methods in
ESDA

36

R

PU

D8.1

Definition of user requirements

3

R

PU

D8.2

Description of the methods of space
-
time analysis and
data mining of seismic data

10

R

PU

D8.3

Description of the methodology for designing seismic
hazard information models

15

R

PU

D8.4

Software i
mplementing the proposed methods within the
SPIN! architecture

26

P

CO

D8.5

Evaluation report

24

R

PU

D8.6

Application of the software tools to the seismic active
Eastern Mediterranean region

34

P

CO

D8.7

Application of the software tools to the high ri
sk Merapi
volcano

36

P

CO

D8.8

Integration of continuous monitoring data into the
analysis process

36

P

CO

D8.9

Report on the application of Spatial Mining to seismic and
volcano data

36

R

PU

D9.1

User requirements document for dissemination of
statisti
cal data

3

R

PU

D9.2

Description of data model

12

R

CO

D9.3

A prototype web site with interactive thematic maps that
can be accessed over the internet

16

P

CO

D9.4

Prototype web
-
site based on SPIN prototype 2

30

P

CO

D9.5

Report about different user ac
ceptance, recommendation
for use, etc.

24

R

PU

D9.6

Report: recommendation of use

36

R

PU

SPIN!, IST
-
99
-
10536, 15.06.1999


15


15

D10.1

Define requirements for web
-
brokering

3

R

PU

D10.2

Report describing existing brokering services, business
model and property of rights problematic

8

R

PU

D
10.3

Report addressing technical infrastructure

24

R

CO

D10.4

Prototype web
-
site for web
-
brokering

30

R

PU

D10.5

Final report on web
-
brokering

36

R

CO

D11.1

Project web page

3

R

PU

D11.2

Project description for the general public

2

P

PU

D11.3

First d
issemination workshop

24

O

PU

D11.4

Second dissemination workshop

36

O

PU

D11.5

Feasibility study about commercialization

33

R

PU






SPIN!, IST
-
99
-
10536, 15.06.1999


16


16

Introduction to workpackages


The workpackages fall into several categories: technology development, research, appl
ication,
exploitation. Figure 1 shows the main dependencies between the workpackages, but does not display
feedback mechanisms which will be set up between all workpackages, as described in the section
about project management.


Building a spatial mining s
ystem is a demanding task. It requires expertise in many fields including
Geographic Information Systems, Cartography, Statistics, Machine Learning, and Databases, as well as
excellent software engineering skills. The consortium has been carefully chosen t
o ensure uncomprising

competence in all these areas. It includes



two industrial partners active in Data Mining and Geographic Information Systems (partner P5 and
P6),



a university and a national research center active in the areas of Data Mining, Machine

Learning,
and GIS (partners P2 and P1),



an institute for geography active in Exploratory Spatial Data Analysis since the 80ies (partner P4),



a university having a leading role in the dissemination of statistical data (partner P8), and



two institutes ac
tive in seismic data research (partner P3 and P7).


Each partner in the consortium has a unique area of competence not shared by the others, and brings
into the consortium his expertise as well as his technologies.




Figure 3
. Main dependencies between
work packages.

SPIN!, IST
-
99
-
10536, 15.06.1999


17


17

Risk management


Many research and technology development projects fail since the typical risks of such a project are
not taken into account. To prevent such a failure, the workplan has been designed to prevent typical
causes of failure in a
dvance. The main approaches taken towards risk management are:



software reuse and incremental evolution of existing technology



modular design of software components (plug
-
in architecture)



strong user involvement



early delivery of prototypes


Involving use
rs at all stages of the systems development is of utmost importance. The development
process will implement iterative improvements to an incremental version of the system having delivered
an original prototype for users to evaluate and suggest generic desi
gn modifications. The users will be
involved in defining the system analysis requirements and in designing and testing the system right from
the start. The users are responsible for providing evaluation reports, which serve as input to specific
system desi
gn modifications.


Since important modules of the final system already exist in a preliminary and non
-
integrated form, the
users will be trained in using the individual systems at an early stage. This will help to shape their
expectations and provide valu
able feedback to the software developers. The users in work package
WP9 already use the GIS technology developed by partner P1, so they can formulate specific
requirements at an early stage minimising the likelihood that generic system requirements will un
dergo
continuous change.


The base integrating system platform will be an object
-
oriented plug
-
in style architecture to facilitate
technological integration. The dependencies between work packages are reduced as plug
-
in
components can be incorporated incr
ementally as they become available. In this way, revisions to the
internal structure of either the client or the server should not affect the other parts. CORBA and RMI
will be evaluated as integrating middle ware.


Strong modularization should minimise th
e dangers of integrating technology developed separately by
different groups. If for some reason one module were not delivered on time, this would not necessarily
affect the implementation of other modules. Since partners P1, P3, and P4 have implemented ma
jor
parts of the existing technology in Java anyway, risks of technology integration problems are already
low. The Unified Modelling Language (UML) will be used for documentation and design to ensure
product quality.


Potential performance bottlenecks sho
uld be easy to spot at an early stage by applying the existing
technology on test data provided by the users. The system needs to be interactive and users should not
be made to wait too long for analysis results. Performance issues are addressed in a speci
al task within
WP2.


Our approach to risk management has been tightly integrated within the overall technology
development cycle of SPIN. Since an evolutionary approach containing several iterations is chosen, all
work packages start at the kick
-
off meetin
g and end with the final workshop.


SPIN!, IST
-
99
-
10536, 15.06.1999


18


18

Gantt Chart

SPIN!, IST
-
99
-
10536, 15.06.1999


19


19

Main stages of technology development cycle


Month

Event

Description of Event

1

Kick
-
Off
-
Meeting

A
kick
-
off
-
meeting
will be held, where the users are informed in detail about the
prospects of developing an

SDMS, where alternative approaches will be discussed, and
where the users will articulate specific expectations and requirements for the system.
There will also be a tutorial session on Spatial Mining based on the existing technology


3

User
requirement
s

report

The developer teams and the users will jointly define the
user requirement report
which
is due by month 3, and for which the users are responsible. This will be a major input for
the system design.


5

Test
applications

The existing, non
-
integrate
d systems will be applied to example data sets for further
clarifying user need, to spot performance bottlenecks at an early stage etc…


8

Design
specification

The
design specification
is due in month 8. It is located mainly in WP2, but all work
packages
will contribute from their perspective. The report defines the intended
applications on a detailed level. On the basis of this document, the integration of the
existing technologies will start and they will be merged in a single, coherent architecture.


1
2

Developer
version
(prototype 0)

A
developer version (prototype 0)
is due by month 12. This will be used for integrating the
modules developed in WP3
-
7, which will start at month 12. Users will get access to this
version as a technology preview.


15

Revi
sed
system
design
document

Initial feedback from users and developers will be used for making a
revised system
design document
which is due to month 15.

18

Prototype 1

This will be used for developing the
prototype 1
, which is due in month 18
.

In this
pro
totype, functionality from all work packages WP3
-
WP7 will be integrated, however,
some functionality will still be missing (e.g. adaptive sampling for subgroup discovery in
WP3). This prototype will be delivered to the users that will use them in their exp
erimental
applications.


24

User
evaluation
report

Users will evaluate whether the system meets the requirements specified in user
requirements, and whether it meets the system design. The users will write an
evaluation
report
, which is due to month 24. I
n this month, an external workshop will be held (WP11),

where additional user groups and partners for commercial exploitation (WP10) will be
targeted. Users will have installed internally and even partially externally accessible web
-
sites, which will featu
re initial applications of the technology.


27

Final design
document
revision

The user evaluation of prototype 1 will lead to modifications of the system design, where
the
final design document
will be delivered in month 27.


30

Prototype 2

This will be
input for the development of the
prototype 2
, which is due to month 30. It will
integrate all technology developed in work packages WP3
-
WP7, and will be delivered to
the users. With the full functionality available, the users will work intensely on their
a
pplications. The web
-
sites should be publicly accessible, so that feedback from a wider
audience can be gathered.


32

Revision
release of
prototype 2

Experience in applications will lead to a
revision release of prototype 2

in month 32. The
revision will
cover the base system as well as the modules from work packages WP3
-
WP7.


36

Final user
evaluation
;

Disseminatio
n workshop

At the end of the project, the users will deliver a
report describing their applications
, and
they will give a
final evaluation
. A

workshop for dissemination
to a wider audience, for
identifying partners for follow
-
up projects (WP11), and for partners for potential
commercialisation (WP10) will be held in this month.

SPIN!, IST
-
99
-
10536, 15.06.1999


20


20

Pert diagram

The diagram shows dependencies between tasks. To giv
e a better overview, we have grouped tasks by

category. Task numbers refer to the Gantt
-
Chart, which shows the exact starting and end date of tasks

SPIN!, IST
-
99
-
10536, 15.06.1999


21


21

Work package description


Co
-
ordination

The project brings together researchers, software developers, and
users from a number of European
countries, with different backgrounds and different approaches to spatial analysis and geographical
modelling. To manage technology development, research, and exploit the component tools and system
effectively, working packa
ge
WP1
is devoted to co
-
ordination. Special attention has been given to
define clear responsibilities and modular work package responsibilities and deliverables. The
SPIN
consortium

will meet approximately every four months to establish and maintain an eff
ective team.
The management plan is based on a successfully applied EU project co
-
ordinated by partner P1 that is
detailed in section C5 below.

Technology development

WP2

has the objective of designing an integrated system for Data Mining and GIS.
This wor
k package
has the overall task of the technological integration of the existing GIS and Data Mining software, and
to incorporate the modules developed in the other work packages in a coherent manner. It’s the
project‘s technological hub, to which all partn
ers will deliver, and whose deliverables all partners will
need to have access to at some point.
This will serve as a technological basis.
We conceptually
distinguish a
base system
and an
integrated Spatial Mining system
.




Figure 4.
The basic architect
ure of SPIN. Spatial mining and visualization methods can be added as
plug
-
ins to the base system. Clients can access the system over the internet




SPIN!, IST
-
99
-
10536, 15.06.1999


22


22


The
base system

contains




internet enabled GIS for automatic generation of interactive thematic maps



Dat
a Mining methods for nearest neighbour, decision trees, association rules, subgroup discovery,
inductive logic programming,



visualisation for these methods



data transformation capabilities for discretization, restriction, projection, union, join, and calc
ulated
rows



access to heterogeneous data sources (JDBC
-
compliant databases, ODBC, flat files, spatial data
interfaces etc.), also over the internet



facilities for organising and documenting analysis tasks.


The existing Data Mining methods complement the s
patial mining methods in the task of “explaining”
spatial patterns in terms of non
-
spatial attributes. The internet enabled basis GIS module contains
facilities for interactive manipulation of thematic maps. To provide automated visualisation, the GIS
inco
rporates the knowledge of thematic cartography in the form of generic, domain
-
independent rules.
To choose the adequate presentation techniques for given data, it takes into account data
characteristics and relations among data components or attributes. Th
e automation of map generation
releases the user from the necessity of thinking how to present the data and from the routine work of
map building and allows you to concentrate on the analysis of your data. This work package includes
the steps of requiremen
t analysis, design, implementation, testing, and documentation.


Building the base system requires to integrate an already existing GIS tool and an existing Data Mining
platform, both developed by partner P1. For tight integration a common Task manager, D
ata
Management Layer, Extension API, and user interface have to be defined and implemented. The
integrated system

incorporates the Spatial Mining and visualisation methods developed in WP3
-
7 into
the base system.


Main input of this work package are the ex
isting Data Mining and GIS systems, and the modules
developed in WP3
-
7, the main output will be the integrated system. This integrated system will be
developed in three main stages: prototype 0 (developer version), prototype 1 and prototype 2. User
feedbac
k will be gathered and evaluated from the first day on and will be used for improving the
system.


Research

Work packages
WP3
,
WP4
,
WP5

develop methods for Spatial Data Mining that can be added as a
plug
-
in to the base system. A variety of methods have be
en selected for implementation, partially
depending on previous experiences and results of the partners. Each partner has chosen a method for
adaptation to whose advancement he has already made a theoretical and practical contribution, so that
he is well a
cquainted with the subtleties of the chosen method; yet by combining the project partners
expertise a broad range of advanced Data Mining techniques will be covered, from





Bayesian Statistics (Partner P1, publication 6,8,9) and



Neural Networks (Partner P
1, publication 7) to



symbolic approaches from Machine Learning and Inductive Logic Programming (Partner P1,
publication 4, 10,11, Partner P2, publication 1,2,3) and



genuine approaches to Spatial Cluster Analysis (Partner P4, publication 2,4).


This give
s the project a quite unique blend of depth of expertise with a broad range of methods
covered. Since all these methods can be launched within a single, coherent platform, the project can
SPIN!, IST
-
99
-
10536, 15.06.1999


23


23

also contribute to a comparison of the relative strengths and weakne
sses of the methods and develop
guidelines for their use in spatial mining.


All these work packages include a) state of the art review; b) theoretical advances, which will be
communicated in a report; c) implementation and validation of the methods; d) in
tegration with the base
system; e) application to real
-
world tasks; f) documentation and final report.


These stages are synchronised with the technology development cycle. These work packages have as
their input previous theoretical and practical work of

the partners and will have as their main output a
theoretical description of the respective methods.


Machine Learning (WP3).
This work package is mainly concerned with the adaptation of symbolic
machine learning methods to spatial data analysis. In part
icular methods to be adapted are Inductive
Logic Programming algorithms for the discovery of subgroups and spatial association rules. They tend
to be search intensive, and when dealing with large sets of data and high dimensional dependencies,
scalability
might become a problem. Moreover, most have been developed in order to satisfy classical
properties of consistency and completeness, while in spatial data mining people are interested to detect
patterns that satisfy minimum criteria for support and consist
ency. Adaptation of these machine
learning tools will be based on the use of adaptive sampling, i.e. active learning techniques based on
Bayesian Decision Theory, or on more efficient search strategies. Another contribution of this work
package is the defi
nition of appropriate algorithms for the automated extraction from vectorised maps
of symbolic descriptions of parts (e.g., cells) of a map.


Bayesian Statistics (WP4)
. A spatial relation may be described by a number of different models,
leading to widely
varying results. Currently the support for assessing and selecting models in GIS is
very limited. Based on the extrapolation of the uncertainty of individual predictions of different models
we will develop methods for a well
-
founded selection or combinatio
n of models. In the last years
computationally intensive Bayesian methods have been developed that compare favourably with
classical approaches. Instead of selecting an “optimal” model they generate a whole distribution of
models which characterise their u
ncertainty in the light of the available data. On the one hand they
derive predictive distributions for new inputs reflecting the actual information. On the other hand they
allow a rigorous assessment of the adequacy of different model types. Partner P1 (p
ublication 8,9) has
developed Bayesian classification methods which use a Bayesian ensemble of decision trees or neural
networks. These methods have already been successfully applied to credit scoring and will now be
adapted to spatial data.


Exploratory
Spatial Data Analysis (WP5).
This work package will explore methods of extending
existing methods of spatial pattern detection. Currently ESDA methods tend to be concerned solely
with the detection of spatial pattern and often overlook other data attribute
s. This shortcoming will be
addressed by extending existing tools developed by partner P4 to handle attribute interaction with
spatial location and to consider how temporal changes in spatial data can be investigated (see partner
P4, publications 4 and 2).

The tool will be expanded to use multiple search methods in addition to the
current heuristic search used currently. There is also potential to investigate how different statistical
tests of clustering can be used in the tool.


Work packages
WP6

and
WP7

d
evelop methods for visualisation of spatial and temporal information,
and for the visualisation of Data Mining methods developed in WP3
-
5.


Visualisation of spatial and temporal data (WP6)
. In most areas, spatially referenced data also
refer to different m
oments or intervals in time. The study of such data is meaningless if their
development in time is not taken into account. Analysis of spatially referenced data should be supported

by their visual presentation in maps. Spatio
-
temporal data require substant
ial advancement of the
SPIN!, IST
-
99
-
10536, 15.06.1999


24


24

traditional map form of presentation towards dynamics and high user interactivity. The work package
aims at development of methods of visualisation of spatio
-
temporal data that can facilitate analysis of
such data. The methods includ
e not only graphical presentation by itself but also various data
transformations and interactive manipulation of the displays.


Visualisation of Data Mining results (WP7).
The form of presentation of data mining results to the
user is crucial for their a
ppropriate interpretation. Large amounts of information or complex concepts
can be more easily comprehended when represented graphically. This especially applies to data and
concepts having spatial reference or distribution. The objective of this work pack
age is to design
appropriate graphical techniques to represent results of the data mining methods developed within the
project. The approach to be taken is a combination of cartographic and non
-
cartographic displays linked

together through simultaneous dyn
amic highlighting of the corresponding parts (see partner P1,
publication 1). The non
-
cartographic displays will represent the data mining results in summarised,
generalised form while maps will provide the transition from general descriptions to individua
l spatial
objects and phenomena characterised by them.

Application

The system will be used in several applications. One criterion for the selection of application areas is
that a broad range of problem domains of special importance for the EU is covered, u
nderlining the
generality of the approach. A second criterion is that each of these areas should contribute in a unique
way to evaluating/validating the adequacy of the chosen approach to Spatial Mining. This makes the
evaluation process more focussed. An
objective common to all application areas is to explore the
applicability of advanced Data Mining methods. Specifically, spatial subgroup discovery, spatial Markov
Chain Monte Carlo, and localised Spatial Point Pattern Analysis will be evaluated in each ap
plication
area.


Application to Seismic Data (WP8).
In WP 1
-
7 a
generic
Spatial Mining System is developed. Such
a kind of system has the important advantage that it has a potentially broad range of application areas
and promotes technology reuse. However
, some application areas will also need to incorporate
specialised analysis methods. One of the main risks associated with the development of generic
information technology is that an architecture that is not extensible may end up in not addressing the
rea
l needs of the user. Work package WP8 addresses this problem in an exemplary way. This will
ensure that the generic system will be designed in a modular and extensible way right from the start. A
key component is the plug
-
in architecture of the already exi
sting Data Mining platform developed by
partner P1, that allows for an easy integration of new modules. The application area selected for this
task is earthquake prediction. This is a well
-
established scientific field belonging to physical geography,
where

a great amount of spatio
-
temporally referenced data from different sources is available.
Research in this area has an obvious and great potential benefit for public health and quality of life.
Advances in earthquake prediction could help to prevent massiv
e financial losses. The objective of this
work package is to adapt the generic system to the specialised application area of earthquake
prediction and hazard assessment by integrating methods for natural hazard assessment that have been
developed by partne
r P3. For achieving this goal, an integration layer between the generic Spatial
Mining system and the specialised methods implemented by partner P3 has to be designed.


Partner P7, which is active in the area of earthquake prediction for a long time, will
profit from this
technology by getting access to advanced and complementary methods for data analysis and by getting
an instrument for the web
-
based dissemination of research results.


Web
-
based dissemination of census data from statistical offices.
A seco
nd application area is the
analysis and web
-
based dissemination of census data from statistical offices. Here the main objective is

to put to practical use the timely, cost
-
effective dissemination of statistical information over the internet.

SPIN!, IST
-
99
-
10536, 15.06.1999


25


25

Partner P8 ha
s several years’ experience in developing tools for web based access to large spatial data
sets and provides an academic service for access to census data. These tools are primarily for
visualising database contents, data browsing and locating and mapping
spatial data and they can handle
spatial and aspatial referencing systems. Partner P8 also has access to a SUNE6500 super
-
server for
academic applications. Additionally the project will be supported by the national census agency, which
currently with the p
artner are planning the tools and services for public access to the forthcoming
national census in 2001.


This work package will allow evaluation of the efficiency of the developed methods and of the
responsiveness of the application as well as acceptance
by customers of statistical offices. Potential
problem areas are the availability of bandwidth, the number of concurrent users, and the size of maps
and data sets. Especially if Data Mining analysis over the internet is permitted, the performance of the
se
rver will be of central importance. Experiences in this application area will be crucial for improving
the prototype 1 system for better efficiency (which is a task within WP2).


Dissemination and Exploitation

Web
-
based brokering.
Statistical offices, publ
ic agencies, and scientific institutions often face the
problem that their initial efforts to build up a public database are externally funded, but the maintenance
of such a service is not. Funding agencies require more and more that these institutions dev
elop
business plans for commercialising such a service in the long
-
run (at least for
-
non scientific use). The
aim of this work package, for which the industrial partners will be responsible, is to develop a detailed
concept for a web based information brok
ering service with georeferenced data as a foundation for a
cost
-
effective dissemination of data.


Web
-
based, interactive Spatial Mining can add a tremendous value to the mere distribution of data.
This added value can be the key for commercialising the di
stribution of data for statistical offices,
public agencies, and scientific institutions. What is new about this proposal is that the customer does not

need to buy or to install any complex and expensive software on his computer, yet is not confined to the

usual printed, non
-
interactive reports. An interactive thematic map is delivered over the internet using
the Java technology. This map can be used by the customer for further exploration as well as for
presentation and decision making. There will be diff
erent levels of service, as suggested by the
following example business scenarios. The project will deliver technology to solve tasks 1
-
4 and
provides the technological basis for task 5. The feasibility of this concept will be tested in a
demonstrator.


Cu
stomer needs

Business Solution

Customer
supplies

Customer gets

1. An institute for ecological studies
prepares a environmental report
and needs a visualisation for their
vegetation data and vegetation
maps to make a presentation

Building a thematic
map fo
r predefined
data and map

Data & Maps

Interactive map on the
internet

2. A statistical office needs a
visualisation of data about land use

Building a thematic
map for predefined
data

Data

Interactive map on the
internet

3. A department for urban
developm
ent needs a local map
showing hazard risks for decision
making


Building a map,
data & map
brokering

Description of
Data &
Location

Interactive Map with cluster
detection, significance
testing

4. A company running a power
Maps periodically
Description
Interactive Map with cluster
SPIN!, IST
-
99
-
10536, 15.06.1999


26


26

plant needs visualisation of mont
hly
aggregated environmental data for
monitoring.

updated from a
database via the
internet

Location;
Data that
have to be
periodically
refreshed

detection, significance
testing, periodically update
d

5. A consulting company prepares a

market study for the chances of
sustainable tourism; for this it needs
access to data from different
sources such as census data and
data about nature protection and
pollution in this area.

Geomarketing
consulting

A de
scriptive
task

Interactive Map with cluster
detection, significance
testing, visualisation of data
mining results; a summary
report about Data Mining
results


Dissemination
. The technology developed in this project is of a generic nature and has a broad r
ange
of potential applications. Yet potential user groups may be unaware of the existence of the type of
technology the project develops, or they may have false expectation

about it. The aim of this work
package is to address the general public, as well po
tential users and partners for commercial
exploitation.


Dissemination will be an ongoing activity and will include organisation of workshops, maintaining a
project web page, systematically identifying additional user groups that could act as partners in f
ollow
-
up projects, providing project descriptions for the general public.


Partner 6 will perform a feasibility study for commercialising technology developed especially within
the application to seismic data. To this end they will actively search for a p
artner in the area of noise
-
level zoning. This is expected to become a major issue in the next two to three years in Holland,
because of anticipated new legislation. This third application, where the partner will not be directly
involved into the project,
is also an application that demonstrates the potential of the technology for
environmental decision making.


A project sheet will be due in month 3, as well as a project web
-
site. Beginning with month 12, when a
technological preview version will be availa
ble, potential additional user groups and potential customers
will be systematically identified and contacted, so that knowledge about the project will be spread
around. This activity will increase when the prototype 1 becomes available in month 18. A publ
ic
workshop

will be organised bringing together users, developers, potential users, as well as other
interested people, in month 24. A second public workshop

will be organised in month 36, concluding the
project.


SPIN!, IST
-
99
-
10536, 15.06.1999


27


27


B3.

Workpackage description


Workpackag
e number :

WP1
-

Coordination

Start date or starting event:

0

Participant number:

P1

P4






Person
-
months per participant:

28

6







Objectives


Overall and technical management. This will involve

A) Overall Management



Ensure that the various phases

of the project are properly coordinated



Development of project workplan



Monitoring and reviewing progress of work



Handling administrative procedures relating to European Commission



Reporting to the European Commission



Supporting a good communication betw
een the partners


B) Technical Management



Writing of a project handbook including quality management plan



Responsibility for critical technical decision which affect the project as a whole



Definition of quality standards relevant to the project and determi
nation how to satisfy them


Description of work


A) Overall Management

T1. Ensure that the various phases of the project are properly coordinated

T2. Development of project workplan (partners P1, P4)

T3. Monitoring and reviewing progress of work

T4. Hand
ling administrative procedures relating to European Commission

T5. Reporting to the European Commission

T6. Scheduling of meetings

B) Technical Management

T7. Write a project handbook including quality management plan (partners P1, P4)

T8. Responsibility f
or critical technical decision which affect the project as a whole (partners P1, P4)

T9. Define quality standards relevant to the project and determination how to satisfy them (partners P1, P4)


Deliverables


D1. Project workplan (T2)

D2. Reports for EC (
T5)

D3. Project handbook (T7)

D4 Periodical project meetings (T6)


Milestones

and expected result

Milestones of this workpackage are synchronized with the milestones of WP2:

M1: System design (8), M2: Prototypes 0 (12), M3: prototype 1 (18), M4: prototy
pe 2 (30)

SPIN!, IST
-
99
-
10536, 15.06.1999


28


28


B3.

Workpackage description


Workpackage number :

WP2 Integrate Data Mining and GIS (Technology development)

Start date or starting event:

0

Participant number:

P1

P4

P3

P5

P6



Person
-
months per participant:

30

9

2

18

10




Objectives


This workpackage has the overall task of the technological integration of the existing GIS and Data Mining software, and
to incorporate the modules developed in the other workpackages in a coherent manner. It’s the project‘s technological
hub, to which al
l partners will deliver, and whose deliverables all partners will need to have access to at some point. For
tight integration of existing components a common Task manager, Data Management Layer, Extension API, and user
interface have to be defined and impl
emented. The base system is designed as an object
-
oriented plug
-
in architecture,
facilitating technological integration. Unified Modelling Language (UML) will be used for documentation and design to
ensure product quality. CORBA and RMI as a middleware for

integration will be evaluated. The integrated system
incorporates the Spatial Mining and visualization methods developed in WP3
-
7 into the base system.


Description of work


T1. Organize kick
-
off meeting for identification of users needs

T2. Design of th
e SPIN! system architecture

T3. Develop efficient methods for transfer of data and maps over the internet (partner P6)

T4. Implementation of developer version (prototype 0)

T5. Technological integration of software developed in Task 1.3, 1.4 with spatial
mining modules and visualization
modules, resulting in prototype 1

T6. Testing and validation, revision of design, getting user input, improving system, resulting in prototype 2

T7. Revision release of second prototype (final release)