Mining Weather Data for Decision Support

mudlickfarctateΤεχνίτη Νοημοσύνη και Ρομποτική

25 Νοε 2013 (πριν από 3 χρόνια και 4 μήνες)

81 εμφανίσεις

Mining Weather Data for
Decision Support

Roy George

Army High Performance Computing Research Center

Clark Atlanta University

Atlanta, GA 30314


2

Research


Clustering Algorithms for Data Mining


Spatio
-
Temporal Domain


Parallelization of Algorithms


Algorithms for Feature Extraction and
Knowledge Discovery



3

Challenges of Geographical Data


Complexities associated with data volume


Terabyte databases


Domain complexities


Interesting signals hidden by stronger patterns


Complexities caused by local variation


Systems are interconnected


Data gathering and sampling


Interpretation of aggregated data


Formalizing the domain


4

Background: Issues with Hard
Clustering



Issue: Force data with imprecision and/or
uncertainty into discrete classes


Result: Missing important outliers,
boundary patterns


Approach: Use of Approximate Clustering
Technique


5

Background: K
-
Means
Clustering


Partition the data into K Clusters that are
homogenous


Algorithm


Select K time series as initial centroids


Assign all time series to the most similar centroid


Re
-
compute the centeroids


Repeat till centroids do not change


Variations based on different measures of
similarity

6

Unsupervised Fuzzy K
-
Means
(UKFM) Clustering


Choose the initial number of clusters


Develop a clustering using the Fuzzy K
-
Means


Merge the cluster pair that have maximum


correlation


Compute validity measure



Repeat till until termination condition reached

7

UKFM Results

Weather Data Set

Initial: 11 Clusters

Optimal: 8 Clusters

Final: 4 Clusters

8

Global Earth Science Data


Collaborative Effort with V. Kumar (UMinn)


Test bed for UKFM (comparison with existing
techniques)


Data Set


Global Sea Pressure (1989


1993)


Ocean Climate Indices


Capture Teleconnections


Result


UKFM can capture even weaker OCI’s using
coarse clusters


9

Global Climate Data

(Sea Level Pressure)

Intermediate: 60 Clusters

10

Global Climate Data

(Sea Level Pressure)

Final: 26 Clusters

11

Relation with SOI

12

Integrating Multi Datasets in
UFKM Clustering


Motivation: Data
-
based approach of
Determining “interesting” clusters


Validate using multi datasets


Rule: Retain clusters that have supporting data


Applicable in Data Rich Environment



13

UKFM Clustering with Multi
-
Dataset Validation


Choose the initial number of clusters


Develop a clustering using the Fuzzy K
-
Means


Validate cluster with other datasets D
i=1,n


Merge if clusters is uncorrelated

Else

Consider next candidate pair to merge


Repeat till until termination condition
reached

14

UKFM Multi
-
Dataset Results

Height

Pressure

Temperature

Windspeed

15

Multi
-
threading Parallel Algorithm



For each clustering stage


For each iteration


Slaves: Calculate M

for each cluster

Master: Normalize M

Slaves: Calculate C

for each cluster

Master: Normalize C

16

Multi
-
threading Result


Implemented on Sun Fire workstation with
four 900
-
MHz UltraSPARC® III processors


Near Linear Speed Up Obtained

17

Relevance to the Army


Directly supports the FBKOF STO (B.
Broome)


Development of the Weather Information and
Tactical Support (WITS) System

18

Weather Information and
Tactical Support (WITS)


Objective: Extraction of patterns from
weather to be extracted and fused with
external databases (logistics, terrain, forces,
etc.) for higher level planning

19

Approach


Development of an OLAP
Weather Repository


GA Weather (1981
-
2002)


Sources: Nat. Weather
Svc, GA Env. Network


Development of WITS
Modules


Ad
-
hoc Querying


Real time Analysis and
Planning


Effects on Army Systems


Integration with IWEDA


Abstract Data
Representation

text
text
text
text
YEAR
MONTH
DAY
TEMPERATURE,
PRECIPITATION,
WIND SPEED, etc
20

WITS System Design

U
S
E
R
I
N
T
E
R
F
A
C
E
text
text
t
e
x
t
text
DATA WAREHOUSE
DATA
MINING
MODULES
QUERY
MODULES
KNOWLEDGE
BASES
(IWEDA)
DATA CLEANING
& TRANSFORMATION
DATA
ACQUISITION AGENTS
REAL TIME MODULE
TAPS MODULE
IQ MODULE
21

WITS/IQ

22

WITS/IQ

23

WITS/IWEDA

24

WITS/Analysis

25

WITS/Analysis

26

Work in Progress


Characterization of Analysis Queries



Incorporation into Data Mining Algorithms into
WITS


Enhancement of WITS/TAPS


Implementation of WITS/Real


27

Hybrid Genetic Fuzzy Systems

for Feature Extraction and Knowledge
Discovery

28

Project Goals


Design and implement hybrid genetic fuzzy
system for knowledge discovery.


Develop API/Tools.


Apply tools to Army related problems.

29

Contribution


Hybrid system based on the Simple Genetic
Algorithm (SGA). Enhanced the SGA by adding
three levels of knowledge discovery.



Level 1:

Discovers up to
k

possible rules for a given set of
inputs and outputs. It then attempts to minimize the
number of rules and tune the knowledge base.



Level 2:

Takes the set of rules from
Level 1

and further
minimizes the rules. In addition, it also tunes the
knowledge base.



Level 3:

Makes one last attempt to further tune the
architecture of the knowledge base.

30

Rule Discovery


Search for
k

possible rules from the set of
p

possible rules.
k

is a input parameter of the GA application.



Discover the smallest value of
k,
therefore

reducing the
number of rules needed.



Example Rules:



If INPUT_1 is low AND INPUT_2 is medium THEN
OUTPUT_1 is high



If INPUT_1 is high THEN OUTPUT_1 is low

31

Relevance to the Army


Collaborators: Jeff Passner, John Raby
(ARL)


IMETS weather modeling


Post processing used to predict additional
parameters


Visibility, Turbulence, Fog, etc.


Use of Knowledge Discovery to Predict Parameters

32

Visibility Application


Generate and tune a system that can predict
visibility based on input parameters


Tasks for the fuzzy genetic system


Search for a set of
k

rules from
p

possible rules that
describe the relationship of the input parameters with
the output (visibility)


Concurrently discover the architecture, and optimize
the performance of the knowledge
-
bases in relation to
the
k

rules


33

Results for

Low Visibility Classifier

34

Results for

Medium Visibility Classifier