T Te ec ch hn ni ic ca al l R Re ep po or rt t

ticketdonkeyAI and Robotics

Nov 25, 2013 (3 years and 6 months ago)

60 views


1
1


Contact
:

Louis Reymondin


Research Assistant


International Center for Tropical Agriculture (CIAT)


Cali, Col
ombia, Tel:
57 2 4450000 Ext 3455


Email:

louis.reymondin@gmail.com
,

l.reymondin@cgiar.com


www.terra
-
i.org





T
T
e
e
c
c
h
h
n
n
i
i
c
c
a
a
l
l


R
R
e
e
p
p
o
o
r
r
t
t






















H
H
a
a
b
b
i
i
t
t
a
a
t
t


c
c
h
h
a
a
n
n
g
g
e
e
s
s


m
m
o
o
n
n
i
i
t
t
o
o
r
r
i
i
n
n
g
g


u
u
s
s
i
i
n
n
g
g




n
n
e
e
u
u
r
r
a
a
l
l


n
n
e
e
t
t
w
w
o
o
r
r
k
k
s
s


a
a
n
n
d
d


s
s
a
a
t
t
e
e
l
l
l
l
i
i
t
t
e
e


d
d
a
a
t
t
a
a


M
M
O
O
D
D
I
I
S
S


y
y


T
T
R
R
M
M
M
M
.
.




C
C
l
l
u
u
s
s
t
t
e
e
r
r
i
i
n
n
g
g


i
i
m
m
p
p
r
r
o
o
v
v
e
e
m
m
e
e
n
n
t
t
s
s





I
I
n
n
t
t
r
r
o
o
d
d
u
u
c
c
t
t
i
i
o
o
n
n



Although the approach is based on the training of a forecasting model on a per pixel basis, it is not
computationally efficient to train
in this manner
.

A clustering analysis of the training dataset is
therefore needed to group together pixels with the same NDVI trends.

A good calibration of the
number of clusters needed to analyze a given area with Terra
-
I is important. Too few clusters and the
neural ne
twork won’t be able to fit the NDVI time series
with so
many different vegetation types
included within the same cluster. On the other hand, too many clusters
will
lead the neural networks
to
over fit

anomalies (such as clouds) within the NDVI time series.

Unfortunately, to set the right
number of cluster
is a difficult
task which requires
significant
trial and error.

M
M
e
e
t
t
h
h
o
o
d
d
o
o
l
l
o
o
g
g
y
y



In this section, a clustering quality index
called


[Liao, 2005]
is presented. This index can be calculated
for different clustering results in order to compare them. The higher the
index is

the
better the

clustering

result
.



(





)


With
K

being the number of clusters and


2
2






(




)












(




)


















(




)

With N being the total numbers of data

points,
U

being a




matrix
telling if a point is part of a
given cluster or not
,


being the gravity center of
the whole dataset,



the gravity center of a give
n
cluster
and
d(x; y)

being the
distance function between two data points.
p

i
s a parameter used to
increase
the contrast between t
w
o indices and is usually
set to 2. One can note that the
values
g

and
c

are constant for a given dataset
and can therefore be calculated
the same

for all.

Proof of concept in southern Colombia

In order to test the index

, we’ve run the K
-
means algorithm over an area in southern Colombia with
different values
for
K
.


Figure
1
.
The s
tudy area

in southern Colombia


Figure 1 shows the NDVI values during the beginning of the year 2000 in the study area.

It includes

the

departments

of

Amazonas,
Arauca
,
Caqueta
,
Casanare
, Guainía,
Guaviare
, Meta,
Putumayo
, Vaupes

and

Vichada.

The area
is principally characterized
by
tropical

moist forests

and

subtropical

broadleaf


3
3


biomes
(72.9
%).

While these zones dominate the southern majority of the area
,
the biome
s located in
the northeast of the area are
,
in

order

of

prevalence
, shrubland,
savanna

and

tropical and subtropical
grasslands

(23.3
%),
dry forests

of tropical and subtropical

broadleaf

(3.6
%)
and

meadows and

mountain shrubland
(
0.2%).


Figure
2

Quality index for different numbers of clusters

Graph
i
n figure
2

show
s

the quality index for the different values of K. It s
eems to indicate that good
quality results are situated between 3 and 10 clusters.
As
seen

i
n figure
3
, all the clustering results with
a number of clusters between 3 and 10 are valid

in compariso
n with figure 1
.
In contrast
, the clustering
results
between
15 and 50 clusters are much noisier, which is consistent with figure
2
.



0
2
4
6
8
10
12
0
20
40
60
Quality Index

Clusters


4
4






Figure
3
.
Cluster Distribution in
southern

Colombia

To choose

the best results
for
the windows
,

delimit
ing

using the index


is still in the hands of the users.
Nevertheless, the index can be used again to select the best results (here for example 3, 4, 5 and 10).
Choosing one solution
over another
depends on the user and on the need to
focus on particular area
s

or vegetation

types
.


C
C
o
o
n
n
c
c
l
l
u
u
s
s
i
i
o
o
n
n



The index


can be a useful tool to help Terra
-
i users set the
optimal
numb
er of cluster
s

by reducing the
num
ber of trials and errors needed.

Additionally, other

clustering
algorithms, such as Density
-
Based Spatial Clustering of Applications with
Noise (DB
-
SCAN), Growing Neural Gas and
Hierarchical clustering, could also be tested in future
research as they don’t
require
the number of clusters as input.