Optimized
Weather Pattern
Identification Via Unsupervised
Neural Network Techniques
Jeffrey Copeland
Work supported by
U.S. Army National Ground Intelligence Center (NGIC)
also DHS, BRGM,
& JPEO

CBD
March 9, 2012
Motivation
2
How can we make better sense of very large data sets?
Traditional “low

order” statistics do not reveal much,
and may be misleading.
What we are really interested in is the the patterns, their
frequencies of occurrence, and the changes with time.
Classification method should not require input from
subject matter expert
Self

Organizing Maps
Each node is aware of what is
happening in its neighborhood
and responds to changes in the
neighborhood
1. Nodes
are initialized and
trained
–
Randomly selected from training
vectors
–
Randomly chosen to span range
of training data
–
Etc.
3
Training Vectors
SOM
Self

Organizing Maps
1. Regular
mapping of nodes are
initialized and trained
2. For
each training vector find the
best matching node
–
Based on distance measure
–
Euclidian,
etc.
4
Self

Organizing Maps
1. Regular
mapping of nodes
are
initialized and trained
2. For
each training vector find
the best matching node
3. Best
matching node and
neighbors are made to look
more like the training vector
–
W
t+1
= W
t
+
Θ
t
L
t
(V
t

W
t
)
–
Θ
neighborhood function
–
L learning rate
5
Self

Organizing Maps
1. Regular
mapping of nodes
are
initialized and trained
2. For
each training vector find
the best matching node
3. Best
matching node and
neighbors are made to look
more like the training vector
W
t+1
=
Θ
t
L
t
(V
t

W
t
)
4. Neighborhood
decreases
with successive iterations
–
Exact form of the weights is
not critical
–
examples:
•
L
t
= L
0
e

t/λ
•
ρ
t
= ρ
0
e

t/λ
•
θ
t
= exp(

d
2
/2ρ
t
2
)
6
Self

Organizing Maps
1. Regular
mapping of nodes
are
initialized and trained
2. For
each training vector find
the best matching node
3. Best
matching node and
neighbors are made to look
more like the training vector
W
t+1
=
Θ
t
L
t
(V
t

W
t
)
4. Neighborhood
decreases
with successive iterations
W
final
~
Σw
i
V
i
–
Similar to but not the true
average of the members
7
Self

Organizing Maps
Neighboring nodes bear
a strong similarity to
each other.
Difficulty in interpreting
the large number of
resulting clusters
.
How
many?
–
Too many can lead
reducing the
importance of
individual clusters
–
Too
few can lead to
overestimating within
cluster variance and
errors in
selecting a
typical
day
Wind speed
The goal is to develop an objective method to determine the
clusters that the eye can readily
identify
8
Optimize SOM Patterns
Optimize number of
patterns by
–
Perform hierarchal
clustering for each
permutation of SOM
patterns
–
Compute Davies

Bouldin
metric (mean cluster
scatter / cluster
separation) for each
hierarchal cluster
–
Optimal number of
clusters defined by
minimum of DB curve
9
Use
of
hierarchal
optimization
stage
more
clearly
defines
relationship
between
climate
patterns
and
allows
for
refinement
of
number
of
cases
based
upon
analyst
workload
Optimize SOM Patterns
10
Looking Forward
Analyzing the climate reanalysis assumes a stationary
climate (but current ≠ historical)
NOAA Climate Forecast System (CFS) provides 4

member ensemble forecasts with up to 9

month lead
Can we use these short

term climate forecasts to re

estimate the frequency of occurrence of the historic
patterns?
Is there reasonable stability between forecast leads for
use as planning tool?
11
Looking Forward
12
Summary
•
Ongoing
–
NGIC: identification of relevant patterns for T&D case
studies
–
NGIC: climate forecast of frequency of occurrence of
relevant patterns
•
Future
–
FAA: prediction of probability of convection on trans

oceanic
air traffic routes
•
Past
–
DHS: identification
of relevant patterns for
environmental
impact assessment
–
BRGM:
identification of relevant
precipitation patterns
for
surface
trafficability
–
JPOE

CBD
: identification of relevant patterns for
instrument
siting
case
studies at DPG
13
Questions?
14
How to deal with the volume of data produced by GCAT?
–
Typically over 20,000 hourly output volumes produced per
run (30 year simulation of 30

day period)
–
Requirement for some form of intelligent data reduction (i.e.
pattern classification not bulk statistics)
–
Classification method should not require input from subject
matter expert (i.e. unsupervised learning)
–
Traditional clustering methods (k

means, hierarchal) can be
computationally expensive for large N problems
–
Classify on model quantities that are relevant to the
problem (training vectors)
We
make use of Self

Organizing
Maps
(SOMs
)
applied to
transport and dispersion environmental impact studies
using climate reanalysis and forecasts.
15
Self

Organizing Maps
Self

Organizing Maps
•
An
artificial neural network
technique used
for pattern
recognition and
classification
•
The SOM consists
of
components called nodes or
neurons
•
N
odes are usually arranged in
a
hexagonal or rectangular grid
•
The
SOM describes
a mapping
from a higher dimensional input
space to a lower dimensional
map
space
•
SOMs with
a small number of
nodes behave in a way that is
similar to
k

means
•
SOMs
may be considered a
nonlinear generalization of
PCA
Comments 0
Log in to post a comment