Spatial Data Mining
2
Introduction
•
Spatial data mining is the process of discovering
interesting, useful, non

trivial patterns from large
spatial
datasets
–
E.g. co

location patterns of water pumps and cholera
–
Determining hotspots: unusual locations
•
Spatial Data Mining Tasks
–
Classification/Prediction
–
Co

location Mining
–
Clustering
•
Recap of special properties of Spatial Data
–
Spatial autocorrelation
–
Spatial heterogeneity
–
Implicit Spatial Relations
3
Spatial Relations
•
Spatial databases do not store spatial
relations explicitly
–
Additional functionality required to compute them
•
Three types of spatial relations specified by
the OGC reference model
–
Distance relations
•
Euclidean distance between two spatial features
–
Direction relations
•
Ordering of spatial features in space
–
Topological relations
•
Characterise the type of intersection between spatial
features
4
Distance relations
•
If dist is a distance
function and c is
some real number
1.
dist(A,B)>c,
2.
dist(A,B)<c and
3.
dist(A,B)=c
A
B
A
B
B
A
5
Direction relations
•
If directions of B and C
are required with
respect to A
•
Define a representative
point, rep(A)
•
rep(A) defines the
origin of a virtual
coordinate system
•
The quadrants and half
planes define the
direction relations
•
B can have two values
{northeast, east}
•
Exact direction relation
is northeast
A
C
B
rep(A)
C north A
B northeast A
6
Topological Relations
•
Topological relations describe how geometries
intersect spatially
•
Simple geometry types
–
Point, 0

dimension
–
Line, 1

dimension
–
Polygon, 2

dimension
•
Each geometry represented in terms of
–
boundary (B)
–
geometry of the lower dimension
–
interior (I)
–
points of the geometry when boundary is
removed
–
exterior (E)
–
points not in the interior or boundary
•
Examples for simple geometries
–
For a point, I = {point}, B={} and E={Points not in I and B}
–
For a line, I={points except boundary points}, B={two end
points} and E={Points not in I and B}
–
For a polygon, I={points within the boundary}, B={the
boundary} and E={points not in I and B}
7
DE

9IM
•
Topological relations are defined using any
one of the following models
–
4IM, four intersection model (only B and E
considered)
–
9IM, nine intersection models (B, I, and E)
–
DE

9IM, dimensionally extended 9 intersection
model
•
DE

9IM is an OGC complaint model
•
Dim is the dimension function
8
Example
•
Consider two
polygons
–
A

POLYGON ((10
10, 15 0, 25 0, 30 10,
25 20, 15 20, 10 10))
–
B

POLYGON ((20
10, 30 0, 40 10, 30
20, 20 10))
9
I(B)
B(B)
E(B)
I(A)
B(A)
E(A)
9

Intersection Matrix of example
geometries
10
DE

9IM for the example
geometries
I(B)
B(B)
E(B)
I(A)
2
1
2
B(A)
1
0
1
E(A)
2
1
2
11
Relationships using DE

9IM
•
Different geometries may give
rise to different numbers in the
DE

9IM
•
For a specific type of
relationship we are only
interested in certain values in
certain positions
–
That is, we are interested in
patterns in the matrix than
actual values
•
Actual values are replaced by
wild cards
–
T: value is "true"

non empty

any dimension >= 0
–
F: value is "false"

empty

dimension < 0
–
*: Don't care what the value is
–
0: value is exactly zero
–
1: value is exactly one
–
2: value is exactly two
A
over
laps
B
I(B)
B(B)
E(B)
I(A)
T
*
T
B(A)
*
*
*
E(A)
T
*
*
12
Topological Relations
•
x.Disjoint(y)
–
FF*FF****
•
x.Touches(y)
–
FT******* Area/Area, Line/Line, Line/Area, Point/Area
–
F**T***** Not Point/Point
–
F***T****
•
x.Crosses(y)
–
T*T****** Point/Line, Point/Area, Line/Area
–
0******** Line/Line
•
x.Within(y)
–
TF*F*****
•
x.Overlaps(y)
–
T*T***T** Point/Point, Area/Area
–
1*T***T** Line/Line
•
DE

9IM string for example geometries was ‘212101212’ (from earlier
slide)
–
A crosses B
–
A overlaps B
13
Approaches to Spatial Data
Mining
•
Materialize spatial features and use
Weka
–
Required features are added as additional
attributes to the main feature
–
To create a flat file of data
•
Use special data mining techniques that
take spatial dependency into account
14
Materializing features

Example
15
Materializing features

Example
(2)
16
Spatial Data Mining Architecture
•
Retrieve data belonging
to multiple themes
•
Preprocess spatial data
to materialize spatial
features
–
Select the required
features
–
Use the methods to
compute spatial relations
to create a flat file of
data
•
Use Weka like tool to
perform data mining
OGC Complaint Spatial DBMS
Feature Selection &
OGC complaint methods
to compute relations
Weka
Flat File
Multiple Themes
17
Spatial Clustering
•
Also called spatial segmentation
•
Input
–
a table of area names and their corresponding attributes such as population
density, number of adult illiterates etc.
–
Information about the neighbourhood relationships among the areas
–
A list of categories/classes of the attributes
•
Output
–
Grouped (segmented) areas where each group has areas with similar
attribute values
•
Census Website has plenty of examples
–
http://www.statistics.gov.uk/census2001/censusma
ps/index.html
18
Similarity with image
segmentation
•
Spatial segmentation is
performed in image
processing
–
Identify regions (areas)
of an image that have
similar colour (or other
image attributes).
–
Many image segmentation
techniques are available
•
E.g. region

growing
technique
2
2
2
2
2
2
2
2
2
2
2
2
1
1
1
1
2
2
2
2
1
1
1
1
1
1
1
1
1
1
1
1
19
Region Growing Technique
•
There are many flavours of
this technique
•
One of them is described
below:
–
Assign seed areas to each of
the segments (classes of the
attribute)
–
Add neighbouring areas to
these segments if the
incoming areas have similar
values of attributes
–
Repeat the above step until
all the regions are allocated
to one of the segments
•
Functionality to compute
spatial relations (neighbours)
assumed
1
1
1
1
1
2
2
2
2
2
2
2
1
20
Summary
•
Spatial data storage available as extensions of
RDBMS
•
Visualization of Spatial data available in GIS
•
Spatial Data Mining requires functionality to compute
spatial relations
•
OGC specifications provide the standards for all the
above resources
•
MYSQL provides data spatial data storage
–
But only partially provides the functionality for computing
relations
•
Several OpenSource systems provide all the above
resources for spatial data
–
OpenJump, GeoTools
Comments 0
Log in to post a comment