MIT Lincoln Laboratory
Stat. Fcst.Models
Wes Wilson
11/8/2013
Building Statistical Forecast Models
Wes Wilson
MIT Lincoln Laboratory
April, 2001
MIT Lincoln Laboratory
Stat. Fcst.Models
Wes Wilson
11/8/2013
Experiential Forecasting
•
Idea: Base Forecast on observed outcomes in previous similar
situations (training data)
•
Possible ways to evaluate and condense the training data
–
Categorization
Seek comparable cases, usually expert

based
–
Statistical
Correlation and significance analysis
–
Fuzzy Logic
Combines Expert and Statistical analysis
•
Belief: Incremental changes in predictors relate to incremental
changes in the predictand
•
Issues
–
Requirements on the Training Data
–
Development Methodology
–
Automation
MIT Lincoln Laboratory
Stat. Fcst.Models
Wes Wilson
11/8/2013
Outline
•
Regression

based Models
•
Predictor Selection
•
Data Quality and Clustering
•
Measuring Success
•
An Example
MIT Lincoln Laboratory
Stat. Fcst.Models
Wes Wilson
11/8/2013
Statistical Forecast Models
•
Multi

Linear Regression
F = w
0
+
S
w
i
P
i
w
i
= Predictor Weighting
w
0
= Conditional Climatology
M敡渠Pr敤楣瑯爠s慬略s
•
GAM: Generalized Additive Models
F = w
0
+
S
w
i
f
i
(P
i
)
f
i
= Structure Function, determined during regression
•
PGAM: Pre

scaled Generalized Additive Models
F = w
0
+
S
w
i
f
i
(P
i
)
f
i
= Structure Function, determined prior to regression
•
The constant term w
0
is conditional climatology less the weighted
mean bias of the scaled predictors
MIT Lincoln Laboratory
Stat. Fcst.Models
Wes Wilson
11/8/2013
Models Based on Regression
•
Training Data for one predictor
–
P vector of predictor values
–
E vector of observed events
•
Residual
R
2
=  F
P
–
E 
2
•
Regression solutions are obtained by adjusting the
parametric description of the forecast model (parameters w)
until the objective function J(w) = R
2
is minimized
•
Multi

Linear Regression (MLR)
J(w) =  Aw
–
E 
2
•
MLR is solved by matrix algebra; the most stable solution is
provided by the SVD decomposition of A
MIT Lincoln Laboratory
Stat. Fcst.Models
Wes Wilson
11/8/2013
Regression and Correlation
•
Training Data for one predictor
–
P vector of predictor values
–
E vector of observed events
–
Error Residual: R
2
=  F
P
–
E 
2
•
Correlation Coefficient
r
(倬P䔩‽
D
倠
•
D
䔠
s
D
P
s
D
E
•
Fundamental Relationship
. Let F
0
be a forecast equation
with error residuals E
0
(E
0
=R
0
). Let W
0
+ W
1
P be a BLUE
correction for E
0
, and let F = F
0
+ E
0
. The error residual R
F
of F satisfies
R
F
2
= R
0
2
[ 1

r
⡐( E
0
)
2
]
MIT Lincoln Laboratory
Stat. Fcst.Models
Wes Wilson
11/8/2013
Model Training Considerations
•
Assumption: The training data are representative of what is
expected during the implementation period
•
Simple models are less likely to capture undesirable (non

stationary) short

term fluctuations in the training data
•
The climatology of the training period should match that
expected in the intended implementation period (decade scale)
•
It is irrational to expect that short training periods can lead to
models with long

term skill
–
Plan for repeated model tuning
–
Design self

tuning into the system
•
It is desirable to have many more training cases than model
parameters
The only way to prepare for the future is to prepare to be surprised;
that doesn’t mean we have to be flabbergasted
.
Kenneth Boulding
MIT Lincoln Laboratory
Stat. Fcst.Models
Wes Wilson
11/8/2013
GAM
•
An established statistical technique, which uses the
training data to define nonlinear scaling of the predictors
•
Standard implementation represents the structure
functions as B

splines with many knots, which requires the
use of a large set of training data
•
The forecast equations are determined by linear regression
including the nonlinear scaling of the predictors
F = w
0
+
S
i
w
i
f
i
(P
i
)
•
The objective is to minimize the error residual
•
The structure functions are influence by all of the
predictors, and may change if the predictor mix is altered
•
If a GAM model has p predictors and k knots per structure
function, then the regression model has np+1 (linear)
regression parameters
MIT Lincoln Laboratory
Stat. Fcst.Models
Wes Wilson
11/8/2013
PGAM: Pre

scaled GAM
•
A new statistical technique, which permits the use of training sets
that are decidedly smaller than those for GAM
•
Once the structure functions are selected, the forecast equations
are determined by linear regression of the pre

scaled predictors
F = w
0
+
S
w
i
f
i
(P
i
)
•
Determination of the structure functions is based on enhancing
the correlation of the (scaled) predictor with the error residual of
conditional climatology
Maximize
r
⠠f
i
(P
i
),
D
䔠)
•
The structure function is determined for each predictor separately
•
Composite predictors should be scaled as composites
•
The structure functions often have interpretations in terms of
scientific principles and forecasting techniques
MIT Lincoln Laboratory
Stat. Fcst.Models
Wes Wilson
11/8/2013
Predictors
•
Every Method Involves a Choice of Predictors
•
The Great Predictor Set: Everything relevant and available
•
Possible Reduction based on Correlation Analysis
•
Predictor Selection Strategies
–
Sequential Addition
–
Sequential Deletion
–
Ensemble Decision ( SVD )
•
Changing the predictor list changes the model weights; for
GAM, it also changes the structure functions
MIT Lincoln Laboratory
Stat. Fcst.Models
Wes Wilson
11/8/2013
Computing Solutions for the
Basic Regression Problem
•
Setting: Predictor List { P
i
}
n
and observed outcomes b over
the m trials of the training set
•
Basic Linear Regression Problem
A w = b
where the columns of the m by n matrix A are the lists of
observed predictor values over the trials
•
Normal Equations: A
T
A w = A
T
b
•
Linear Algebra: w = (A
T
A)

1
A
t
b
•
Optimization: Find x to minimize R
2
=  Aw
–
b 
2
MIT Lincoln Laboratory
Stat. Fcst.Models
Wes Wilson
11/8/2013
SVD
–
Singular Value Decomposition
A = U
S
s
T
where U and V are orthogonal matrices
and
S
†
㴠⁛ 匠簠〠]
T
where S is diagonal
with positive diagonal entries
U
T
A w =
S
s
T
w = U
T
b
Set
w
= V
T
w,
b
= [U
T
b]
n
•
Restatement of the Basic Problem
S V
T
w =
b
or S
w
=
b
(original problem space) (V
T

transformed problem space)
•
Since U is orthogonal, the error residual is not altered by
this restatement of the problem
[ S  0 ]
T
=
S
0
CAUTION: Analysis of Residuals can be misleading unless the
dynamic ranges of the predictor values have been standardized
MIT Lincoln Laboratory
Stat. Fcst.Models
Wes Wilson
11/8/2013
Structure of the Error Residual Vector
•
s
i
’s are usually decreasing
•
s
n
> 0, or reduce predictor list
•
For i
<
n,
w
i
=
b
i
/ s
i
•
For i > n, there is no solution.
This is the portion of the
problem that is not resolved
by these predictors
•
Magnitude of the unresolved
portion of the problem:
.
R
*
2
=
S
n+1
m
b
i
2
s
1
s
2
s
3
*
s
n
=
w
1
w
2
w
3
*
w
n
b
1
b
2
b
3
*
b
n
b
n+1
*
*
*
*
b
m
S
w
=
b
•
Truncated Problem: For i > k ,
.
set
w
i
= 0. This increases the
.
error residual to
R
k
2
=
S
k+1
m
b
i
2
=
R
*
2
+
S
k+1
n
b
i
2
0
0
MIT Lincoln Laboratory
Stat. Fcst.Models
Wes Wilson
11/8/2013
Controlling Predictor Selection
•
SVD / PC analysis provides guidance
•
Truncation in
w
space reduces the degrees of freedom
•
Truncation does not provide nulling of predictors:
.
since 0 components of
w
.
do not lead to 0 components of w = V
w
•
Seek a linear forecast model of the form
F(
a
) =
a
T
w =
S
w
i
a
i
,
a
楳i愠a散t潲o潦⁰牥摩捴潲o
v慬略s
•
Predictor Nulling:
–
The i
th
predictor is eliminated from the problem if w
i
= 0
•
Benefits of predictor nulling
–
Provides simple models
–
Eliminate designated predictors (
missing data problem
)
–
Quantifies the incremental benefit provided by essential
predictors (
sensor benefit problem
)
MIT Lincoln Laboratory
Stat. Fcst.Models
Wes Wilson
11/8/2013
Predictor Selection Process
•
Gross Predictor Selection (availability & correlation)
•
SVD for problem sizing an gross error estimation
•
Truncation and Predictor Nulling
浡硩m慬a浯摥氨猩
( there may be more than one good solution)
•
Successive Elimination in the Original Problem Space
浩湩浡氠浯摥氠m畮u楬i卄t慲t猠t漠杲潷 牡灩摬p)
•
Successive Augmentation in the Original Problem Space
•
At this point, the good solutions are bracketed between the
maximal and the minimal models; exhaustive searches are
probably feasible
, cross validation is wise.
MIT Lincoln Laboratory
Stat. Fcst.Models
Wes Wilson
11/8/2013
Creating 15z Satellite Forecast Models (1)
•
149 marine stratus days from 1996 to 2000
•
51 sectors and 3 potential predictors per sector (
153
)
•
Compute the correlation for each predictor with the residual
from conditional climatology
•
Retain only predictors, which have correlation greater than
.25, reduces the predictor list to
45
predictors
•
Separate analysis for two data sets, Raw and PGAM
•
Truncate each when SD reduction drops below
1.5 %
0.80
0.85
0.90
0.95
1.00
1.05
1.10
1.15
1.20
1.25
1.30
1
4
7
10
13
16
19
22
PC Number
RAW:
PGAM:
0.80
0.85
0.90
0.95
1.00
1.05
1.10
1
4
7
10
13
16
19
22
PC Number
MIT Lincoln Laboratory
Stat. Fcst.Models
Wes Wilson
11/8/2013
Creating 15z Satellite Forecast Models (2)
•
SVD
呲T湣慴攠㘠
偲P搮乵汬楮朠
•
In the Truncation space:
Null to 7 predictors with
acceptable error growth
•
Maximal Problems (R

8,P

7)
•
Minimal Problems (R

5,P

4)
•
Neither problem would accept
augmentation according to the
strict cross

validation test
•
Different predictors were selected
Raw
Data
SVD
Raw 6
PGAM
Data
SVD
PGAM 6
Sigma
PC 6
1.134
Sigma
PC 6
0.999
Sigma
1.148
Sigma
0.999
15S2M
15S8M
15S8M
15S1M
15S8C
15W11M
15W8C
15W1C
15B3C
15E5M
15E4M
15W12M
15S1M
15S6M
15N4M
MIT Lincoln Laboratory
Stat. Fcst.Models
Wes Wilson
11/8/2013
Data Quality and Clustering
•
DQA is similar to NWP
–
need to do the training set
–
probably need to work to tighter standards
•
Data Clustering
–
During training

manual ++
–
For implementation

fully automated
•
Conditional Climatology based on Clustering
MIT Lincoln Laboratory
Stat. Fcst.Models
Wes Wilson
11/8/2013
Satellite Statistical Model
(MIT/LL)
•
1

km visible channel (brightness)
•
Data pre

processing
–
re

mapping to 2 km grid
–
3x3 median smoother
–
normalized for sun angle
–
calibrated for lens graying
•
Grid points grouped into sectors
–
topography
–
physical forcing
–
operational areas
•
Sector statistics
–
Brightness
–
Coverage
–
Texture
•
4 year data archive, 153 predictors
•
PGAM Regression Analysis
SECTORIZATION
MIT Lincoln Laboratory
Stat. Fcst.Models
Wes Wilson
11/8/2013
Consensus Forecast
Satellite SFM
Regional SFM
Local SFM
COBEL
Forecast Weighting
Function
Day Characterization

Wind direction

Inversion height

Forcing influences
Consensus
Forecast
MIT Lincoln Laboratory
Stat. Fcst.Models
Wes Wilson
11/8/2013
Measuring Success
STATS
BIAS
0.02
SD
0.96
MAE
0.68
Correlation
0.76
BAYES SKILL
Bayes SD
0.82
Bayes Slope
0.66
Bayes Skill
1.24
Num:
148
PDF Skill
1700
1750
1800
1850
1900
1950
2000
POD
0.63
0.74
0.79
0.82
0.93
0.97
0.96
PFP
0.07
0.11
0.20
0.36
0.45
0.47
0.38
PSS
0.55
0.63
0.58
0.45
0.48
0.50
0.59
LR
8.44
6.71
3.87
2.25
2.06
2.05
2.57
ODDS
1.69
2.28
1.93
1.42
1.50
1.67
2.24
CAR
0.63
0.70
0.66
0.59
0.60
0.63
0.69
+
.5 Hour Skill
0.0
0.2
0.4
0.6
0.8
1.0
1700
1750
1800
1850
1900
1950
2000
Category
CAR
PSS
POD
PFP
CC
MIT Lincoln Laboratory
Stat. Fcst.Models
Wes Wilson
11/8/2013
Conclusions
•
PGAM, SVD/PC, and Predictor Nulling provides a
systematic way to approach the development of Linear
Forecast models via Regression
•
This methodology provides a way to investigate the
elimination of specific predictors, which could be useful in
the development of contingency models
•
We are investigating full automation
Comments 0
Log in to post a comment