INTELLIGENT DATA REDUCTION ALGORITHMS FOR REAL-TIME DATA ASSIMILATION

boorishadamantAI and Robotics

Oct 29, 2013 (3 years and 9 months ago)

53 views

INTELLIGENT DATA REDUCTION
ALGORITHMS FOR REAL
-
TIME DATA
ASSIMILATION


Xiang Li
, Rahul Ramachandran, Sara Graves

ITSC/University of Alabama in Huntsville


Bradley Zavodsky


ESSC/University of Alabama in Huntsville


Steven Lazarus, Mike Splitt, Mike Lueken

Florida Institute of Technology


May 5, 2009

Data Reduction


It is a common practice to remove a portion of or combine
high spatial and temporal resolution observations to reduce
data volume in DA process, due to



High computation resources required for large volume data set
(exponential increase with data volume)



Data redundancy in large volume high resolution observations


Local spatial correlation of satellite data



observation data resolution exceeds assimilation grid resolution



Reducing data redundancy may improve analysis quality
(Purser et al., 2000)



Little

Lot

Analysis

Technique

Data Volume

Horizontal

Resolution

Successive

Corrections

Statistical

Interpolation

4D
-
Var

3D
-
Var

Little

Lot

1km

80 km

Computational Resources Required for Data
Assimilation

Need for ‘new’ Data Reduction Techniques



Current data thinning approaches



Sub
-
sampling



Random Sampling



Super
-
Obing (subsampling with averaging)




Limitations



All data points are treated equally



Information contents that observation data contain
and their contributions to data analysis performance
may be different




Intelligent Data Thinning Algorithms



Reduces number of data points required for an analysis



Maintains fidelity of the analysis (keeps the most important data
points)


High Data Volume from satellite platforms

( e.g. infrared based SST, scatterometer winds) carry
redundant data. Computationally Expensive!

Analyses derived from simple subsampling of data can be inconsistent and are not
optimal in efficiency.

Same data subsampling interval, but shifted.

Simple subsampling strategies can be
susceptible to impact from

missing ‘significant’ data sample.

Example

Intelligent data thinning algorithms



Objective:

reserve samples in the thinned data set that
have high information content and large impact on analysis.




Assumption:

samples with high local variances contain
high information content




Approach:

Use synthetic test to determine and validate the
optimal thinning strategy and then apply to real satellite
observations



Synthetic Data Test: Truncated Gaussian



Real Data Experiment: Atmospheric Infrared Sounder (AIRS) profiles

Synthetic Data Test: Truncated Gaussian



Explicitly defined truth and background fields



Direct thinning method



35 observations sampled to find the 5 observations yielding the
best analysis (1D variational approach)



325,000+ unique spatial combinations



First guess: base of Gaussian function



Observations: created by adding white noise to truth

first guess

truth

analysis

optimal
observation
locations

Synthetic Data Test: Truncated Gaussian (cnt’d)



Optimal observation configuration retains data at the:



peak



gradient



anchor points (where gradient changes most sharply)



Dependent on key elements of the analysis itself:



length scale (L)



quality of background and observations

Lesson Learned:


Thinned data samples should
combine
homogeneous

points,
gradient

points, and
anchor

points
for optimal performance, and a
dynamic length scale

should be
applied to each thinned data set.

Intelligent Data Reduction Algorithms


Earlier versions of intelligent data thinning algorithms (IDT, DADT,
mDADT)


Density
-
Balanced Data Thinning (DBDT)


Three metrics

are calculated for data samples and samples are put into priority
queues for the three metrics


Thermal Front Parameter (TFP):

High value of TFP indicates rapid change of
temperature gradient and ‘anchor’ samples


Local Variance (LV):

high values indicate gradient regions


Homogeneity:

low values indicate homogeneous regions


Data selected from the three metrics:

user determines the portions of samples
from these metrics


Radius of impact (R)
: used to control uniform spatial distribution of thinned data
set.
Distance between any two samples needs to be larger than R



Data selection process:

select top qualified samples from priority queues. Start
with TFP queue, followed by LV queue and homogeneity queue


DBDT algorithm performs best in these thinning algorithms

AIRS & ADAS: Our Real
-
World Testing Ground



Atmospheric Infrared Sounder (AIRS)



NASA hyperspectral sounder



generates temperature and moisture profiles with ≈ 50
-
km resolution
at nadir



each profile contains a pressure level above which quality data are
found




ARPS Data Assimilation System (ADAS)



version 5.2.5; Bratseth scheme



background comes from a short
-
term Weather Research and
Forecasting (WRF) model forecast



error covariances:



background: standard short
-
term forecast errors cited in ADAS



observation: from Tobin et al. (2006)
*

AIRS validation study



dynamic length scale (L) calculated from average distance of nearest
observation neighbors


*D. C. Tobin, H. E. Revercomb, R. O. Knuteson, B. M. Lesht, L. L. Strow, S. E. Hannon, W. F. Feltz, L. A. Moy, E. J. Fetzer,
and

T. S.
Cress, “ARM site atmospheric state best estimates for AIRS temperature and water vapor retrieval validation,”
J. Geophys. Res.
,
D09S14, pp. 1
-
18, 2006.

Thinning Strategies (11% of full)



Subsample:



Takes profile with most retrieved levels within a 3x3 box




Random:



Searches observations and ensures that retained observations
are thinned to a user
-
defined distance



10 permutations performed to create an ensemble




DBDT:



thins on 2
-
D pressure levels using equivalent potential
temperature; then levels are recombined to form 3
-
D structure



Thinning uses Equivalent Potential Temperature (θ
e
) to account for
both temperature and moisture profiles


Case Study Day: 12 March 2005



700 hPa temperature gradient in observations and
background over midwest and northern Gulf of Mexico



Observations and background show similar patterns

700 hPa AIRS temperature observations

700 hPa WRF forecast temperatures (bckgd)

Subsample

Random

DBDT

700 hPa Temperature Analysis Comparison



Overall analysis increments are
±
1.5
o
C over AIRS swath



Largest differences between analyses in upper
midwest

and over Southern Canada

Full

Subsample

Random

DBDT

# OBS

793

99

100

87

ALYS TIME (s)

244

56

56

106

L (km)

80

146

147

152

θ
e

MSE

N/A

0.60

0.56

0.36

Quantitative Results (Full vs. Thinned)



Computation times are 50
-
70% faster for the thinned data sets



MSEs compare analyses between full and each thinned



DBDT is superior analysis with least observations:



has a longer computation time (thinning algorithm more rigorous)



cuts MSE almost in half with 1/10 the observations of the full

Conclusions



Intelligent data thinning strategies are important to
eliminate redundant observations that may hinder
convergence of DA schemes and reduce computation times



Synthetic data tests have shown that observations must
be retained in gradient, anchor, and homogeneous regions
and that results are dependent on key elements of the
analysis system



Analyses of AIRS thermodynamic profiles using different
thinning strategies yields the DBDT as the superior thinning
technique

Future Work



Manuscript in review with Weather and Forecasting (AMS)



Testing forecasts spawned from the various thinned
analyses to see if superior DBDT analysis produces the best
forecasts



Demonstration of algorithm capabilities with respect to
real
-
time data dissemination



Use of gradient detecting portion of algorithm for
applications in locating cloud edges for radiance
assimilation

Thank you for your attention.

Are there any questions?