INTELLIGENT DATA REDUCTION ALGORITHMS FOR REAL-TIME DATA ASSIMILATION

AI and Robotics

Oct 29, 2013 (4 years and 8 months ago)

64 views

INTELLIGENT DATA REDUCTION
ALGORITHMS FOR REAL
-
TIME DATA
ASSIMILATION

Xiang Li
, Rahul Ramachandran, Sara Graves

ITSC/University of Alabama in Huntsville

ESSC/University of Alabama in Huntsville

Steven Lazarus, Mike Splitt, Mike Lueken

Florida Institute of Technology

May 5, 2009

Data Reduction

It is a common practice to remove a portion of or combine
high spatial and temporal resolution observations to reduce
data volume in DA process, due to

High computation resources required for large volume data set
(exponential increase with data volume)

Data redundancy in large volume high resolution observations

Local spatial correlation of satellite data

observation data resolution exceeds assimilation grid resolution

Reducing data redundancy may improve analysis quality
(Purser et al., 2000)

Little

Lot

Analysis

Technique

Data Volume

Horizontal

Resolution

Successive

Corrections

Statistical

Interpolation

4D
-
Var

3D
-
Var

Little

Lot

1km

80 km

Computational Resources Required for Data
Assimilation

Need for ‘new’ Data Reduction Techniques

Current data thinning approaches

Sub
-
sampling

Random Sampling

Super
-
Obing (subsampling with averaging)

Limitations

All data points are treated equally

Information contents that observation data contain
and their contributions to data analysis performance
may be different

Intelligent Data Thinning Algorithms

Reduces number of data points required for an analysis

Maintains fidelity of the analysis (keeps the most important data
points)

High Data Volume from satellite platforms

( e.g. infrared based SST, scatterometer winds) carry
redundant data. Computationally Expensive!

Analyses derived from simple subsampling of data can be inconsistent and are not
optimal in efficiency.

Same data subsampling interval, but shifted.

Simple subsampling strategies can be
susceptible to impact from

missing ‘significant’ data sample.

Example

Intelligent data thinning algorithms

Objective:

reserve samples in the thinned data set that
have high information content and large impact on analysis.

Assumption:

samples with high local variances contain
high information content

Approach:

Use synthetic test to determine and validate the
optimal thinning strategy and then apply to real satellite
observations

Synthetic Data Test: Truncated Gaussian

Real Data Experiment: Atmospheric Infrared Sounder (AIRS) profiles

Synthetic Data Test: Truncated Gaussian

Explicitly defined truth and background fields

Direct thinning method

35 observations sampled to find the 5 observations yielding the
best analysis (1D variational approach)

325,000+ unique spatial combinations

First guess: base of Gaussian function

Observations: created by adding white noise to truth

first guess

truth

analysis

optimal
observation
locations

Synthetic Data Test: Truncated Gaussian (cnt’d)

Optimal observation configuration retains data at the:

peak

anchor points (where gradient changes most sharply)

Dependent on key elements of the analysis itself:

length scale (L)

quality of background and observations

Lesson Learned:

Thinned data samples should
combine
homogeneous

points,

points, and
anchor

points
for optimal performance, and a
dynamic length scale

should be
applied to each thinned data set.

Intelligent Data Reduction Algorithms

Earlier versions of intelligent data thinning algorithms (IDT, DADT,

Density
-
Balanced Data Thinning (DBDT)

Three metrics

are calculated for data samples and samples are put into priority
queues for the three metrics

Thermal Front Parameter (TFP):

High value of TFP indicates rapid change of

Local Variance (LV):

Homogeneity:

low values indicate homogeneous regions

Data selected from the three metrics:

user determines the portions of samples
from these metrics

: used to control uniform spatial distribution of thinned data
set.
Distance between any two samples needs to be larger than R

Data selection process:

select top qualified samples from priority queues. Start
with TFP queue, followed by LV queue and homogeneity queue

DBDT algorithm performs best in these thinning algorithms

-
World Testing Ground

Atmospheric Infrared Sounder (AIRS)

NASA hyperspectral sounder

generates temperature and moisture profiles with ≈ 50
-
km resolution

each profile contains a pressure level above which quality data are
found

version 5.2.5; Bratseth scheme

background comes from a short
-
term Weather Research and
Forecasting (WRF) model forecast

error covariances:

background: standard short
-
term forecast errors cited in ADAS

observation: from Tobin et al. (2006)
*

AIRS validation study

dynamic length scale (L) calculated from average distance of nearest
observation neighbors

*D. C. Tobin, H. E. Revercomb, R. O. Knuteson, B. M. Lesht, L. L. Strow, S. E. Hannon, W. F. Feltz, L. A. Moy, E. J. Fetzer,
and

T. S.
Cress, “ARM site atmospheric state best estimates for AIRS temperature and water vapor retrieval validation,”
J. Geophys. Res.
,
D09S14, pp. 1
-
18, 2006.

Thinning Strategies (11% of full)

Subsample:

Takes profile with most retrieved levels within a 3x3 box

Random:

Searches observations and ensures that retained observations
are thinned to a user
-
defined distance

10 permutations performed to create an ensemble

DBDT:

thins on 2
-
D pressure levels using equivalent potential
temperature; then levels are recombined to form 3
-
D structure

Thinning uses Equivalent Potential Temperature (θ
e
) to account for
both temperature and moisture profiles

Case Study Day: 12 March 2005

700 hPa temperature gradient in observations and
background over midwest and northern Gulf of Mexico

Observations and background show similar patterns

700 hPa AIRS temperature observations

700 hPa WRF forecast temperatures (bckgd)

Subsample

Random

DBDT

700 hPa Temperature Analysis Comparison

Overall analysis increments are
±
1.5
o
C over AIRS swath

Largest differences between analyses in upper
midwest

Full

Subsample

Random

DBDT

# OBS

793

99

100

87

ALYS TIME (s)

244

56

56

106

L (km)

80

146

147

152

θ
e

MSE

N/A

0.60

0.56

0.36

Quantitative Results (Full vs. Thinned)

Computation times are 50
-
70% faster for the thinned data sets

MSEs compare analyses between full and each thinned

DBDT is superior analysis with least observations:

has a longer computation time (thinning algorithm more rigorous)

cuts MSE almost in half with 1/10 the observations of the full

Conclusions

Intelligent data thinning strategies are important to
eliminate redundant observations that may hinder
convergence of DA schemes and reduce computation times

Synthetic data tests have shown that observations must
be retained in gradient, anchor, and homogeneous regions
and that results are dependent on key elements of the
analysis system

Analyses of AIRS thermodynamic profiles using different
thinning strategies yields the DBDT as the superior thinning
technique

Future Work

Manuscript in review with Weather and Forecasting (AMS)

Testing forecasts spawned from the various thinned
analyses to see if superior DBDT analysis produces the best
forecasts

Demonstration of algorithm capabilities with respect to
real
-
time data dissemination

Use of gradient detecting portion of algorithm for
applications in locating cloud edges for radiance
assimilation