Building Statistical Forecast Models

sharpfartsAI and Robotics

Nov 8, 2013 (3 years and 7 months ago)

122 views

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Building Statistical Forecast Models

Wes Wilson

MIT Lincoln Laboratory

April, 2001

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Experiential Forecasting


Idea: Base Forecast on observed outcomes in previous similar
situations (training data)


Possible ways to evaluate and condense the training data


Categorization


Seek comparable cases, usually expert
-
based


Statistical


Correlation and significance analysis


Fuzzy Logic


Combines Expert and Statistical analysis


Belief: Incremental changes in predictors relate to incremental
changes in the predictand


Issues


Requirements on the Training Data


Development Methodology


Automation


MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Outline


Regression
-
based Models


Predictor Selection


Data Quality and Clustering


Measuring Success


An Example

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Statistical Forecast Models


Multi
-
Linear Regression

F = w
0

+
S
w
i

P
i


w
i
= Predictor Weighting


w
0
= Conditional Climatology


M敡渠Pr敤楣瑯爠s慬略s



GAM: Generalized Additive Models

F = w
0

+
S
w
i
f
i
(P
i
)

f
i
= Structure Function, determined during regression



PGAM: Pre
-
scaled Generalized Additive Models

F = w
0

+
S
w
i
f
i
(P
i
)

f
i
= Structure Function, determined prior to regression


The constant term w
0

is conditional climatology less the weighted
mean bias of the scaled predictors

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Models Based on Regression


Training Data for one predictor


P vector of predictor values


E vector of observed events


Residual


R
2

= || F
P


E ||
2


Regression solutions are obtained by adjusting the
parametric description of the forecast model (parameters w)
until the objective function J(w) = R
2

is minimized


Multi
-
Linear Regression (MLR)


J(w) = || Aw


E ||
2


MLR is solved by matrix algebra; the most stable solution is
provided by the SVD decomposition of A

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Regression and Correlation


Training Data for one predictor


P vector of predictor values


E vector of observed events


Error Residual: R
2

= || F
P


E ||
2


Correlation Coefficient
r
(倬P䔩‽
D



D
䔠 
s
D
P
s
D
E


Fundamental Relationship
. Let F
0

be a forecast equation
with error residuals E
0

(||E
0
||=R
0
). Let W
0

+ W
1

P be a BLUE
correction for E
0
, and let F = F
0

+ E
0

. The error residual R
F

of F satisfies



R
F
2

= R
0
2


[ 1
-

r
⡐( E
0
)
2

]



MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Model Training Considerations


Assumption: The training data are representative of what is
expected during the implementation period


Simple models are less likely to capture undesirable (non
-
stationary) short
-
term fluctuations in the training data


The climatology of the training period should match that
expected in the intended implementation period (decade scale)


It is irrational to expect that short training periods can lead to
models with long
-
term skill


Plan for repeated model tuning


Design self
-
tuning into the system


It is desirable to have many more training cases than model
parameters


The only way to prepare for the future is to prepare to be surprised;

that doesn’t mean we have to be flabbergasted
.
Kenneth Boulding

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

GAM


An established statistical technique, which uses the
training data to define nonlinear scaling of the predictors


Standard implementation represents the structure
functions as B
-
splines with many knots, which requires the
use of a large set of training data


The forecast equations are determined by linear regression
including the nonlinear scaling of the predictors

F = w
0

+
S
i

w
i
f
i
(P
i
)


The objective is to minimize the error residual


The structure functions are influence by all of the
predictors, and may change if the predictor mix is altered


If a GAM model has p predictors and k knots per structure
function, then the regression model has np+1 (linear)
regression parameters


MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

PGAM: Pre
-
scaled GAM


A new statistical technique, which permits the use of training sets
that are decidedly smaller than those for GAM


Once the structure functions are selected, the forecast equations
are determined by linear regression of the pre
-
scaled predictors

F = w
0

+
S
w
i
f
i
(P
i
)


Determination of the structure functions is based on enhancing
the correlation of the (scaled) predictor with the error residual of
conditional climatology



Maximize
r
⠠f
i
(P
i
),
D
䔠)



The structure function is determined for each predictor separately


Composite predictors should be scaled as composites


The structure functions often have interpretations in terms of
scientific principles and forecasting techniques

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Predictors


Every Method Involves a Choice of Predictors


The Great Predictor Set: Everything relevant and available


Possible Reduction based on Correlation Analysis


Predictor Selection Strategies


Sequential Addition


Sequential Deletion


Ensemble Decision ( SVD )


Changing the predictor list changes the model weights; for
GAM, it also changes the structure functions

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Computing Solutions for the
Basic Regression Problem


Setting: Predictor List { P
i

}
n

and observed outcomes b over
the m trials of the training set


Basic Linear Regression Problem





A w = b


where the columns of the m by n matrix A are the lists of
observed predictor values over the trials


Normal Equations: A
T
A w = A
T
b


Linear Algebra: w = (A
T
A)
-
1

A
t
b


Optimization: Find x to minimize R
2

= | Aw


b |
2

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

SVD


Singular Value Decomposition


A = U
S

s
T
where U and V are orthogonal matrices




and
S

㴠⁛ 匠簠〠]
T

where S is diagonal

with positive diagonal entries


U
T

A w =
S

s
T

w = U
T

b



Set
w

= V
T
w,
b

= [U
T
b]
n


Restatement of the Basic Problem


S V
T

w =
b

or S
w

=
b


(original problem space) (V
T
-
transformed problem space)


Since U is orthogonal, the error residual is not altered by
this restatement of the problem

[ S | 0 ]
T

=

S


0

CAUTION: Analysis of Residuals can be misleading unless the

dynamic ranges of the predictor values have been standardized

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Structure of the Error Residual Vector


s
i
’s are usually decreasing


s
n

> 0, or reduce predictor list


For i
<

n,
w
i

=
b
i

/ s
i


For i > n, there is no solution.
This is the portion of the
problem that is not resolved
by these predictors


Magnitude of the unresolved
portion of the problem:
.

R
*
2

=
S
n+1
m

b
i
2

s
1


s
2


s
3


*


s
n

=

w
1

w
2

w
3



*

w
n

b
1

b
2

b
3



*

b
n


b
n+1


*


*


*


*

b
m


S

w

=
b



Truncated Problem: For i > k ,
.

set
w
i
= 0. This increases the
.

error residual to

R
k
2

=
S
k+1
m

b
i
2
=
R
*
2

+
S
k+1
n

b
i
2


0

0

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Controlling Predictor Selection


SVD / PC analysis provides guidance


Truncation in
w

space reduces the degrees of freedom


Truncation does not provide nulling of predictors:
.

since 0 components of
w

.

do not lead to 0 components of w = V
w


Seek a linear forecast model of the form


F(
a

) =
a
T

w =
S

w
i
a
i

,
a

楳i愠a散t潲o潦⁰牥摩捴潲o
v慬略s


Predictor Nulling:


The i
th

predictor is eliminated from the problem if w
i

= 0


Benefits of predictor nulling


Provides simple models


Eliminate designated predictors (
missing data problem
)


Quantifies the incremental benefit provided by essential
predictors (
sensor benefit problem
)


MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Predictor Selection Process


Gross Predictor Selection (availability & correlation)


SVD for problem sizing an gross error estimation


Truncation and Predictor Nulling



浡硩m慬a浯摥氨猩



( there may be more than one good solution)


Successive Elimination in the Original Problem Space





浩湩浡氠浯摥氠m畮u楬i卄⁳t慲t猠t漠杲潷 牡灩摬p)


Successive Augmentation in the Original Problem Space


At this point, the good solutions are bracketed between the
maximal and the minimal models; exhaustive searches are
probably feasible
, cross validation is wise.

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Creating 15z Satellite Forecast Models (1)


149 marine stratus days from 1996 to 2000


51 sectors and 3 potential predictors per sector (
153
)


Compute the correlation for each predictor with the residual
from conditional climatology



Retain only predictors, which have correlation greater than
.25, reduces the predictor list to
45

predictors


Separate analysis for two data sets, Raw and PGAM


Truncate each when SD reduction drops below
1.5 %



0.80
0.85
0.90
0.95
1.00
1.05
1.10
1.15
1.20
1.25
1.30
1
4
7
10
13
16
19
22
PC Number
RAW:

PGAM:

0.80
0.85
0.90
0.95
1.00
1.05
1.10
1
4
7
10
13
16
19
22
PC Number
MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Creating 15z Satellite Forecast Models (2)


SVD


呲T湣慴攠㘠


偲P搮乵汬楮朠


In the Truncation space:



Null to 7 predictors with

acceptable error growth


Maximal Problems (R
-
8,P
-
7)


Minimal Problems (R
-
5,P
-
4)


Neither problem would accept
augmentation according to the
strict cross
-
validation test


Different predictors were selected


Raw
Data

SVD
Raw 6

PGAM
Data

SVD
PGAM 6

Sigma
PC 6

1.134

Sigma
PC 6

0.999

Sigma

1.148

Sigma

0.999

15S2M
15S8M
15S8M
15S1M
15S8C
15W11M
15W8C
15W1C
15B3C
15E5M
15E4M
15W12M
15S1M
15S6M
15N4M
MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Data Quality and Clustering


DQA is similar to NWP


need to do the training set


probably need to work to tighter standards


Data Clustering


During training
-

manual ++


For implementation
-

fully automated


Conditional Climatology based on Clustering

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Satellite Statistical Model
(MIT/LL)


1
-
km visible channel (brightness)


Data pre
-
processing


re
-
mapping to 2 km grid


3x3 median smoother


normalized for sun angle


calibrated for lens graying


Grid points grouped into sectors


topography


physical forcing


operational areas


Sector statistics


Brightness


Coverage


Texture


4 year data archive, 153 predictors


PGAM Regression Analysis

SECTORIZATION

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Consensus Forecast

Satellite SFM

Regional SFM

Local SFM

COBEL

Forecast Weighting
Function

Day Characterization


-

Wind direction


-

Inversion height


-

Forcing influences

Consensus

Forecast

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Measuring Success

STATS
BIAS
0.02
SD
0.96
MAE
0.68
Correlation
0.76
BAYES SKILL
Bayes SD
0.82
Bayes Slope
0.66
Bayes Skill
1.24
Num:
148
PDF Skill
1700
1750
1800
1850
1900
1950
2000
POD
0.63
0.74
0.79
0.82
0.93
0.97
0.96
PFP
0.07
0.11
0.20
0.36
0.45
0.47
0.38
PSS
0.55
0.63
0.58
0.45
0.48
0.50
0.59
LR
8.44
6.71
3.87
2.25
2.06
2.05
2.57
ODDS
1.69
2.28
1.93
1.42
1.50
1.67
2.24
CAR
0.63
0.70
0.66
0.59
0.60
0.63
0.69
+
.5 Hour Skill
0.0
0.2
0.4
0.6
0.8
1.0
1700
1750
1800
1850
1900
1950
2000
Category
CAR
PSS
POD
PFP
CC
MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Conclusions


PGAM, SVD/PC, and Predictor Nulling provides a
systematic way to approach the development of Linear
Forecast models via Regression


This methodology provides a way to investigate the
elimination of specific predictors, which could be useful in
the development of contingency models


We are investigating full automation