# Building Statistical Forecast Models

IA et Robotique

8 nov. 2013 (il y a 7 années et 11 mois)

527 vue(s)

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Building Statistical Forecast Models

Wes Wilson

MIT Lincoln Laboratory

April, 2001

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Experiential Forecasting

Idea: Base Forecast on observed outcomes in previous similar
situations (training data)

Possible ways to evaluate and condense the training data

Categorization

Seek comparable cases, usually expert
-
based

Statistical

Correlation and significance analysis

Fuzzy Logic

Combines Expert and Statistical analysis

Belief: Incremental changes in predictors relate to incremental
changes in the predictand

Issues

Requirements on the Training Data

Development Methodology

Automation

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Outline

Regression
-
based Models

Predictor Selection

Data Quality and Clustering

Measuring Success

An Example

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Statistical Forecast Models

Multi
-
Linear Regression

F = w
0

+
S
w
i

P
i

w
i
= Predictor Weighting

w
0
= Conditional Climatology

M敡渠Pr敤楣瑯爠s慬略s

F = w
0

+
S
w
i
f
i
(P
i
)

f
i
= Structure Function, determined during regression

PGAM: Pre
-

F = w
0

+
S
w
i
f
i
(P
i
)

f
i
= Structure Function, determined prior to regression

The constant term w
0

is conditional climatology less the weighted
mean bias of the scaled predictors

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Models Based on Regression

Training Data for one predictor

P vector of predictor values

E vector of observed events

Residual

R
2

= || F
P

E ||
2

Regression solutions are obtained by adjusting the
parametric description of the forecast model (parameters w)
until the objective function J(w) = R
2

is minimized

Multi
-
Linear Regression (MLR)

J(w) = || Aw

E ||
2

MLR is solved by matrix algebra; the most stable solution is
provided by the SVD decomposition of A

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Regression and Correlation

Training Data for one predictor

P vector of predictor values

E vector of observed events

Error Residual: R
2

= || F
P

E ||
2

Correlation Coefficient
r
(倬P䔩‽
D

D
䔠
s
D
P
s
D
E

Fundamental Relationship
. Let F
0

be a forecast equation
with error residuals E
0

(||E
0
||=R
0
). Let W
0

+ W
1

P be a BLUE
correction for E
0
, and let F = F
0

+ E
0

. The error residual R
F

of F satisfies

R
F
2

= R
0
2

[ 1
-

r
⡐( E
0
)
2

]

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Model Training Considerations

Assumption: The training data are representative of what is
expected during the implementation period

Simple models are less likely to capture undesirable (non
-
stationary) short
-
term fluctuations in the training data

The climatology of the training period should match that
expected in the intended implementation period (decade scale)

It is irrational to expect that short training periods can lead to
models with long
-
term skill

Plan for repeated model tuning

Design self
-
tuning into the system

It is desirable to have many more training cases than model
parameters

The only way to prepare for the future is to prepare to be surprised;

that doesn’t mean we have to be flabbergasted
.
Kenneth Boulding

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

GAM

An established statistical technique, which uses the
training data to define nonlinear scaling of the predictors

Standard implementation represents the structure
functions as B
-
splines with many knots, which requires the
use of a large set of training data

The forecast equations are determined by linear regression
including the nonlinear scaling of the predictors

F = w
0

+
S
i

w
i
f
i
(P
i
)

The objective is to minimize the error residual

The structure functions are influence by all of the
predictors, and may change if the predictor mix is altered

If a GAM model has p predictors and k knots per structure
function, then the regression model has np+1 (linear)
regression parameters

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

PGAM: Pre
-
scaled GAM

A new statistical technique, which permits the use of training sets
that are decidedly smaller than those for GAM

Once the structure functions are selected, the forecast equations
are determined by linear regression of the pre
-
scaled predictors

F = w
0

+
S
w
i
f
i
(P
i
)

Determination of the structure functions is based on enhancing
the correlation of the (scaled) predictor with the error residual of
conditional climatology

Maximize
r
⠠f
i
(P
i
),
D
䔠)

The structure function is determined for each predictor separately

Composite predictors should be scaled as composites

The structure functions often have interpretations in terms of
scientific principles and forecasting techniques

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Predictors

Every Method Involves a Choice of Predictors

The Great Predictor Set: Everything relevant and available

Possible Reduction based on Correlation Analysis

Predictor Selection Strategies

Sequential Deletion

Ensemble Decision ( SVD )

Changing the predictor list changes the model weights; for
GAM, it also changes the structure functions

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Computing Solutions for the
Basic Regression Problem

Setting: Predictor List { P
i

}
n

and observed outcomes b over
the m trials of the training set

Basic Linear Regression Problem

A w = b

where the columns of the m by n matrix A are the lists of
observed predictor values over the trials

Normal Equations: A
T
A w = A
T
b

Linear Algebra: w = (A
T
A)
-
1

A
t
b

Optimization: Find x to minimize R
2

= | Aw

b |
2

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

SVD

Singular Value Decomposition

A = U
S

s
T
where U and V are orthogonal matrices

and
S

㴠⁛ 匠簠〠]
T

where S is diagonal

with positive diagonal entries

U
T

A w =
S

s
T

w = U
T

b

Set
w

= V
T
w,
b

= [U
T
b]
n

Restatement of the Basic Problem

S V
T

w =
b

or S
w

=
b

(original problem space) (V
T
-
transformed problem space)

Since U is orthogonal, the error residual is not altered by
this restatement of the problem

[ S | 0 ]
T

=

S

0

CAUTION: Analysis of Residuals can be misleading unless the

dynamic ranges of the predictor values have been standardized

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Structure of the Error Residual Vector

s
i
’s are usually decreasing

s
n

> 0, or reduce predictor list

For i
<

n,
w
i

=
b
i

/ s
i

For i > n, there is no solution.
This is the portion of the
problem that is not resolved
by these predictors

Magnitude of the unresolved
portion of the problem:
.

R
*
2

=
S
n+1
m

b
i
2

s
1

s
2

s
3

*

s
n

=

w
1

w
2

w
3

*

w
n

b
1

b
2

b
3

*

b
n

b
n+1

*

*

*

*

b
m

S

w

=
b

Truncated Problem: For i > k ,
.

set
w
i
= 0. This increases the
.

error residual to

R
k
2

=
S
k+1
m

b
i
2
=
R
*
2

+
S
k+1
n

b
i
2

0

0

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Controlling Predictor Selection

SVD / PC analysis provides guidance

Truncation in
w

space reduces the degrees of freedom

Truncation does not provide nulling of predictors:
.

since 0 components of
w

.

do not lead to 0 components of w = V
w

Seek a linear forecast model of the form

F(
a

) =
a
T

w =
S

w
i
a
i

,
a

v慬略s

Predictor Nulling:

The i
th

predictor is eliminated from the problem if w
i

= 0

Benefits of predictor nulling

Provides simple models

Eliminate designated predictors (
missing data problem
)

Quantifies the incremental benefit provided by essential
predictors (
sensor benefit problem
)

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Predictor Selection Process

Gross Predictor Selection (availability & correlation)

SVD for problem sizing an gross error estimation

Truncation and Predictor Nulling

( there may be more than one good solution)

Successive Elimination in the Original Problem Space

Successive Augmentation in the Original Problem Space

At this point, the good solutions are bracketed between the
maximal and the minimal models; exhaustive searches are
probably feasible
, cross validation is wise.

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Creating 15z Satellite Forecast Models (1)

149 marine stratus days from 1996 to 2000

51 sectors and 3 potential predictors per sector (
153
)

Compute the correlation for each predictor with the residual
from conditional climatology

Retain only predictors, which have correlation greater than
.25, reduces the predictor list to
45

predictors

Separate analysis for two data sets, Raw and PGAM

Truncate each when SD reduction drops below
1.5 %

0.80
0.85
0.90
0.95
1.00
1.05
1.10
1.15
1.20
1.25
1.30
1
4
7
10
13
16
19
22
PC Number
RAW:

PGAM:

0.80
0.85
0.90
0.95
1.00
1.05
1.10
1
4
7
10
13
16
19
22
PC Number
MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Creating 15z Satellite Forecast Models (2)

SVD

In the Truncation space:

Null to 7 predictors with

acceptable error growth

Maximal Problems (R
-
8,P
-
7)

Minimal Problems (R
-
5,P
-
4)

Neither problem would accept
augmentation according to the
strict cross
-
validation test

Different predictors were selected

Raw
Data

SVD
Raw 6

PGAM
Data

SVD
PGAM 6

Sigma
PC 6

1.134

Sigma
PC 6

0.999

Sigma

1.148

Sigma

0.999

15S2M
15S8M
15S8M
15S1M
15S8C
15W11M
15W8C
15W1C
15B3C
15E5M
15E4M
15W12M
15S1M
15S6M
15N4M
MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Data Quality and Clustering

DQA is similar to NWP

need to do the training set

probably need to work to tighter standards

Data Clustering

During training
-

manual ++

For implementation
-

fully automated

Conditional Climatology based on Clustering

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Satellite Statistical Model
(MIT/LL)

1
-
km visible channel (brightness)

Data pre
-
processing

re
-
mapping to 2 km grid

3x3 median smoother

normalized for sun angle

calibrated for lens graying

Grid points grouped into sectors

topography

physical forcing

operational areas

Sector statistics

Brightness

Coverage

Texture

4 year data archive, 153 predictors

PGAM Regression Analysis

SECTORIZATION

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Consensus Forecast

Satellite SFM

Regional SFM

Local SFM

COBEL

Forecast Weighting
Function

Day Characterization

-

Wind direction

-

Inversion height

-

Forcing influences

Consensus

Forecast

MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Measuring Success

STATS
BIAS
0.02
SD
0.96
MAE
0.68
Correlation
0.76
BAYES SKILL
Bayes SD
0.82
Bayes Slope
0.66
Bayes Skill
1.24
Num:
148
PDF Skill
1700
1750
1800
1850
1900
1950
2000
POD
0.63
0.74
0.79
0.82
0.93
0.97
0.96
PFP
0.07
0.11
0.20
0.36
0.45
0.47
0.38
PSS
0.55
0.63
0.58
0.45
0.48
0.50
0.59
LR
8.44
6.71
3.87
2.25
2.06
2.05
2.57
ODDS
1.69
2.28
1.93
1.42
1.50
1.67
2.24
CAR
0.63
0.70
0.66
0.59
0.60
0.63
0.69
+
.5 Hour Skill
0.0
0.2
0.4
0.6
0.8
1.0
1700
1750
1800
1850
1900
1950
2000
Category
CAR
PSS
POD
PFP
CC
MIT Lincoln Laboratory

Stat. Fcst.Models

Wes Wilson
11/8/2013

Conclusions

PGAM, SVD/PC, and Predictor Nulling provides a
systematic way to approach the development of Linear
Forecast models via Regression

This methodology provides a way to investigate the
elimination of specific predictors, which could be useful in
the development of contingency models

We are investigating full automation