# Spatial Data Mining

Διαχείριση Δεδομένων

20 Νοε 2013 (πριν από 4 χρόνια και 7 μήνες)

130 εμφανίσεις

Spatial Data Mining

2

Introduction

Spatial data mining is the process of discovering
interesting, useful, non
-
trivial patterns from large
spatial

datasets

E.g. co
-
location patterns of water pumps and cholera

Determining hotspots: unusual locations

Classification/Prediction

Co
-
location Mining

Clustering

Recap of special properties of Spatial Data

Spatial autocorrelation

Spatial heterogeneity

Implicit Spatial Relations

3

Spatial Relations

Spatial databases do not store spatial
relations explicitly

Additional functionality required to compute them

Three types of spatial relations specified by
the OGC reference model

Distance relations

Euclidean distance between two spatial features

Direction relations

Ordering of spatial features in space

Topological relations

Characterise the type of intersection between spatial
features

4

Distance relations

If dist is a distance
function and c is
some real number

1.
dist(A,B)>c,

2.
dist(A,B)<c and

3.
dist(A,B)=c

A

B

A

B

B

A

5

Direction relations

If directions of B and C
are required with
respect to A

Define a representative
point, rep(A)

rep(A) defines the
origin of a virtual
coordinate system

planes define the
direction relations

B can have two values
{northeast, east}

Exact direction relation
is northeast

A

C

B

rep(A)

C north A

B northeast A

6

Topological Relations

Topological relations describe how geometries
intersect spatially

Simple geometry types

Point, 0
-
dimension

Line, 1
-
dimension

Polygon, 2
-
dimension

Each geometry represented in terms of

boundary (B)

geometry of the lower dimension

interior (I)

points of the geometry when boundary is
removed

exterior (E)

points not in the interior or boundary

Examples for simple geometries

For a point, I = {point}, B={} and E={Points not in I and B}

For a line, I={points except boundary points}, B={two end
points} and E={Points not in I and B}

For a polygon, I={points within the boundary}, B={the
boundary} and E={points not in I and B}

7

DE
-
9IM

Topological relations are defined using any
one of the following models

4IM, four intersection model (only B and E
considered)

9IM, nine intersection models (B, I, and E)

DE
-
9IM, dimensionally extended 9 intersection
model

DE
-
9IM is an OGC complaint model

Dim is the dimension function

8

Example

Consider two
polygons

A
-

POLYGON ((10
10, 15 0, 25 0, 30 10,
25 20, 15 20, 10 10))

B
-

POLYGON ((20
10, 30 0, 40 10, 30
20, 20 10))

9

I(B)

B(B)

E(B)

I(A)

B(A)

E(A)

9
-
Intersection Matrix of example
geometries

10

DE
-
9IM for the example
geometries

I(B)

B(B)

E(B)

I(A)

2

1

2

B(A)

1

0

1

E(A)

2

1

2

11

Relationships using DE
-
9IM

Different geometries may give
rise to different numbers in the
DE
-
9IM

For a specific type of
relationship we are only
interested in certain values in
certain positions

That is, we are interested in
patterns in the matrix than
actual values

Actual values are replaced by
wild cards

T: value is "true"
-

non empty
-

any dimension >= 0

F: value is "false"
-

empty
-

dimension < 0

*: Don't care what the value is

0: value is exactly zero

1: value is exactly one

2: value is exactly two

A
over
laps
B

I(B)

B(B)

E(B)

I(A)

T

*

T

B(A)

*

*

*

E(A)

T

*

*

12

Topological Relations

x.Disjoint(y)

FF*FF****

x.Touches(y)

FT******* Area/Area, Line/Line, Line/Area, Point/Area

F**T***** Not Point/Point

F***T****

x.Crosses(y)

T*T****** Point/Line, Point/Area, Line/Area

0******** Line/Line

x.Within(y)

TF*F*****

x.Overlaps(y)

T*T***T** Point/Point, Area/Area

1*T***T** Line/Line

DE
-
9IM string for example geometries was ‘212101212’ (from earlier
slide)

A crosses B

A overlaps B

13

Approaches to Spatial Data
Mining

Materialize spatial features and use
Weka

attributes to the main feature

To create a flat file of data

Use special data mining techniques that
take spatial dependency into account

14

Materializing features
-

Example

15

Materializing features
-

Example
(2)

16

Spatial Data Mining Architecture

Retrieve data belonging
to multiple themes

Preprocess spatial data
to materialize spatial
features

Select the required
features

Use the methods to
compute spatial relations
to create a flat file of
data

Use Weka like tool to
perform data mining

OGC Complaint Spatial DBMS

Feature Selection &

OGC complaint methods

to compute relations

Weka

Flat File

Multiple Themes

17

Spatial Clustering

Also called spatial segmentation

Input

a table of area names and their corresponding attributes such as population
density, number of adult illiterates etc.

Information about the neighbourhood relationships among the areas

A list of categories/classes of the attributes

Output

Grouped (segmented) areas where each group has areas with similar
attribute values

Census Website has plenty of examples

http://www.statistics.gov.uk/census2001/censusma
ps/index.html

18

Similarity with image
segmentation

Spatial segmentation is
performed in image
processing

Identify regions (areas)
of an image that have
similar colour (or other
image attributes).

Many image segmentation
techniques are available

E.g. region
-
growing
technique

2

2

2

2

2

2

2

2

2

2

2

2

1

1

1

1

2

2

2

2

1

1

1

1

1

1

1

1

1

1

1

1

19

Region Growing Technique

There are many flavours of
this technique

One of them is described
below:

Assign seed areas to each of
the segments (classes of the
attribute)

these segments if the
incoming areas have similar
values of attributes

Repeat the above step until
all the regions are allocated
to one of the segments

Functionality to compute
spatial relations (neighbours)
assumed

1

1

1

1

1

2

2

2

2

2

2

2

1

20

Summary

Spatial data storage available as extensions of
RDBMS

Visualization of Spatial data available in GIS

Spatial Data Mining requires functionality to compute
spatial relations

OGC specifications provide the standards for all the
above resources

MYSQL provides data spatial data storage

But only partially provides the functionality for computing
relations

Several OpenSource systems provide all the above
resources for spatial data

OpenJump, GeoTools