Identification Via Unsupervised

runmidgeAI and Robotics

Oct 20, 2013 (3 years and 9 months ago)

80 views

Optimized
Weather Pattern
Identification Via Unsupervised
Neural Network Techniques

Jeffrey Copeland


Work supported by

U.S. Army National Ground Intelligence Center (NGIC)

also DHS, BRGM,
& JPEO
-
CBD


March 9, 2012

Motivation

2

How can we make better sense of very large data sets?


Traditional “low
-
order” statistics do not reveal much,
and may be misleading.


What we are really interested in is the the patterns, their
frequencies of occurrence, and the changes with time.


Classification method should not require input from
subject matter expert




Self
-
Organizing Maps

Each node is aware of what is
happening in its neighborhood
and responds to changes in the
neighborhood

1. Nodes
are initialized and
trained


Randomly selected from training
vectors


Randomly chosen to span range
of training data


Etc.

3

Training Vectors

SOM

Self
-
Organizing Maps

1. Regular
mapping of nodes are
initialized and trained

2. For
each training vector find the
best matching node


Based on distance measure


Euclidian,
etc.

4

Self
-
Organizing Maps

1. Regular
mapping of nodes
are
initialized and trained

2. For
each training vector find
the best matching node

3. Best
matching node and
neighbors are made to look
more like the training vector


W
t+1
= W
t

+
Θ
t
L
t
(V
t
-
W
t
)


Θ

neighborhood function


L learning rate

5

Self
-
Organizing Maps

1. Regular
mapping of nodes
are
initialized and trained

2. For
each training vector find
the best matching node

3. Best
matching node and
neighbors are made to look
more like the training vector

W
t+1
=
Θ
t
L
t
(V
t
-
W
t
)

4. Neighborhood
decreases
with successive iterations


Exact form of the weights is
not critical


examples:


L
t

= L
0
e
-
t/λ


ρ
t

= ρ
0
e
-
t/λ


θ
t

= exp(
-
d
2
/2ρ
t
2
)

6

Self
-
Organizing Maps

1. Regular
mapping of nodes
are
initialized and trained

2. For
each training vector find
the best matching node

3. Best
matching node and
neighbors are made to look
more like the training vector

W
t+1
=
Θ
t
L
t
(V
t
-
W
t
)

4. Neighborhood
decreases
with successive iterations

W
final

~
Σw
i
V
i


Similar to but not the true
average of the members

7

Self
-
Organizing Maps

Neighboring nodes bear
a strong similarity to
each other.

Difficulty in interpreting
the large number of
resulting clusters
.
How
many?


Too many can lead
reducing the
importance of
individual clusters


Too
few can lead to
overestimating within
cluster variance and
errors in
selecting a
typical
day


Wind speed

The goal is to develop an objective method to determine the
clusters that the eye can readily
identify

8

Optimize SOM Patterns

Optimize number of
patterns by


Perform hierarchal
clustering for each
permutation of SOM
patterns


Compute Davies
-
Bouldin

metric (mean cluster
scatter / cluster
separation) for each
hierarchal cluster


Optimal number of
clusters defined by
minimum of DB curve

9

Use

of

hierarchal

optimization

stage

more

clearly

defines

relationship

between

climate

patterns

and

allows

for

refinement

of

number

of

cases

based

upon

analyst

workload


Optimize SOM Patterns

10

Looking Forward

Analyzing the climate reanalysis assumes a stationary
climate (but current ≠ historical)

NOAA Climate Forecast System (CFS) provides 4
-
member ensemble forecasts with up to 9
-
month lead

Can we use these short
-
term climate forecasts to re
-
estimate the frequency of occurrence of the historic
patterns?

Is there reasonable stability between forecast leads for
use as planning tool?


11

Looking Forward

12

Summary


Ongoing


NGIC: identification of relevant patterns for T&D case
studies


NGIC: climate forecast of frequency of occurrence of
relevant patterns


Future


FAA: prediction of probability of convection on trans
-
oceanic
air traffic routes


Past


DHS: identification
of relevant patterns for
environmental
impact assessment


BRGM:
identification of relevant
precipitation patterns
for
surface
trafficability


JPOE
-
CBD
: identification of relevant patterns for
instrument
siting
case
studies at DPG

13

Questions?

14

How to deal with the volume of data produced by GCAT?



Typically over 20,000 hourly output volumes produced per
run (30 year simulation of 30
-
day period)


Requirement for some form of intelligent data reduction (i.e.
pattern classification not bulk statistics)


Classification method should not require input from subject
matter expert (i.e. unsupervised learning)


Traditional clustering methods (k
-
means, hierarchal) can be
computationally expensive for large N problems


Classify on model quantities that are relevant to the
problem (training vectors)


We
make use of Self
-
Organizing
Maps
(SOMs
)

applied to
transport and dispersion environmental impact studies
using climate reanalysis and forecasts.


15

Self
-
Organizing Maps

Self
-
Organizing Maps


An
artificial neural network
technique used

for pattern
recognition and
classification


The SOM consists
of
components called nodes or
neurons


N
odes are usually arranged in
a
hexagonal or rectangular grid


The
SOM describes
a mapping
from a higher dimensional input
space to a lower dimensional
map
space


SOMs with
a small number of
nodes behave in a way that is
similar to
k
-
means


SOMs
may be considered a
nonlinear generalization of
PCA