# Consequences of the ergodic theorems for classical test theory, factor analysis, and the analysis of developmental processes

Electronics - Devices

Oct 10, 2013 (4 years and 8 months ago)

128 views

1

Consequences of the ergodic theorems for
classical test theory, factor analysis, and the
analysis of developmental processes

Peter C.M. Molenaar

The Pennsylvania State University

2

1. Introduction

The currently dominant a
pproach to statistical analysis in psychology and
biomedicine is based on analysis of inter
-
individual variation. Differences
between subjects
,

drawn from a population of subjects
,

provide the information
for making

inferences about states of affairs at th
e population level (e.g., mean
and/or covariance structure
). This approach underlies all standard statistical
analysis techniques such as analysis of variance, regression analysis, path
analysis,
factor analysis, cluster analysis, and multilevel modeling t
echniques.
Whether the data are obtained in cross
-
sectional or longitudinal designs (or more
elaborated designs such as sequential designs), the statistical analysis always is
focused on the structure of inter
-
individual variation. Parameters and statistic
s of
interest are estimated by pooling across subjects, where these subjects are
assumed to be homogeneous in all relevant respects. This is the hall
-
mark of
analysis of inter
-
individual variation: the sums defining the estimators in statistical
analysis a
re taken over different subjects randomly drawn from a population of
presumably
homogeneous
subjects. In mixed modeling the

population is
considered to be
composed of different sub
-
populations, but within each
subpopulation subjects again are assumed to be

homogeneous.

In the next section definitions will be given of inter
-
individual variation and
homogeneity of a population of subjects, but the intuitive content of these
concepts is clear. These intuitions would seem to imply that
states

of affairs at the population level
obtained by pooling across subjects
constitute general findings that apply to each subject in the homogeneous
population.
Yet in general this is not the case. That is, in general it is not true that
es of affairs at the population level based on analysis of
inter
-
individual variation apply to any of the individual subjects making up the
population. This negative result is a direct implication of a set of mathematical
-
statistical
theorems;

the so
-
calle
d classical ergodic theorems (cf. Molenaar,
2004).
A concise heuristic description of the classical ergodic theorems will be
given below. The main focus of this chapter, however, will be on some of the
implications of these theorems.
For instance, it

will
be shown that classical test
theory is based on assumptions that violate the classical ergodic theorems, and
hence
,

in a precise sense to be defined later on
,

the results of classical test
theory do not apply in individual assessments. This, of course, is
a serious
shortcoming of classical test theory, because many psychological test
s

have
been constructed and standardized

according to

classic
al test theory and are
applied in
the
assessment of individual subjects.

Special emphasis will be given to the fact

that developmental systems constitute
prime examples of non
-
ergodic systems having age
-
dependent statistical
characteristics (mean trends and sequential dependencies). Therefore the
statistical analysis of developmental processes has to be based not on in
ter
-
individual variation, as now is the standard approach, but on intra
-
individual
variation (where the latter type of variation will be defined in the next section). It

3

will be indicated that the insistence that developmental processes should be
studied a
t the individual level has a long history in theore
tical developmental
psychology. The classical ergodic theorems provide a definite vindication of this
theoretical line of thought.

At the close of this chapter a new statistical modeling technique will b
e presented
with which it is possible to analyze developmental processes with age
-
dependent
statistical characteristics at the required intra
-
individual level. This modeling
technique is based on advanced engineering methods for the analysis of
complex dyn
amic systems. It will be shown that the new modeling technique
allows for the optimal
guidance

of ongoing developmental processes at the intra
-
individual level.
Evidently, this opens up entirely new possibilities for applied
developmental psychological sci
ence.

2. Preliminaries

In this section definitions will be given of the main concepts used in this chapter.
The given definition of (non
-
)ergodicity is heuristic; selected references will be
given to the vast literature on ergodic theory for more formal
elaborations.

2.1
Unit of analysis
. Each actually existing human being can be conceived of as
a high
-
dimensional integrated system whose behavior evolves as function of
place and time. In psychology one usually does not consider place, leaving time
as th
e dimension of main interest. The system includes important functional
subsystems such as the perceptual, emotional, cognitive and physiological
systems, as well as their dynamic interrelationships. The complete set of
measurable time
-
dependent variables c
haracterizing the system’s behavior can
be represented as
the coordinates of
a high
-
dimensional space

(cf. Nayfeh &
Balachandran, 1993, Ch. 1)
, which will be called the behavior space. According
to
D
e Groot (1954), the behavior space contains all the scien
tifically relevant

The realized values of all measurable variables for a particular individual
at
consecutive time points
constitutes a tr
ajectory (life history) in
behavior space.
This trajectory in behavior space is our basi
c unit of analysis. Accordingly, the
complete set of life histories of a population of human subjects can be
represented as an ensemble of trajectories in the same behavior space.

2.2
Inter
-

and intra
-
individual variation
. A standard dictionary definitio
n of
variation is: “The degree to which something differs, for example, from a former
state or value, from others of the same type, or from a standard”. The degree to
which something differs implies a comparison, either between different replicates
of the
same type of entity (inter
-
individual variation) or else between
consecutive
temporal states of the same individual entit
y (intra
-
individual variation). Based on
this dictionary definition and using the construct of an ensemble of life trajectories
defined

in the previous section, it is possible to give appropriate definitions of

4

inter
-

and intra
-
individual variation. The following definitions are inspired by
Catell’s (1952) notion of the Data Box.

With respect to an ensemble of trajectories in behavior s
pace, inter
-
individual
variation is defined as follows: (i) select a fixed subset of variables; (ii) select one
or more fixed time points as measurement occasions, (iii) determine the variation
of the scores on the selected variables at the selected time p
oints by pooling
across subjects.
A
nalysis of inter
-
individual variation
thus defined is called R
-
technique by Cattell (1952)
.
In contrast, intra
-
individual variation is defined as
follows: (i) select a fixed subset of variables; (ii) select a fixed subje
ct; (iii)
determine the variation of the scores of the single subject on the selected
variables by pooling across time points.
A
nalysis of intra
-
individual variation
thus
defined is called P
-
technique by Cattell (1952).

2.3 E
rgodicity
. We now
can present
a

heuristic

definition of ergodicity in terms of
the concepts defined in the previous sections
.
foundational question:
Given the same set of selected variables

(of Cattell’s Data
Box)
, under which conditions will an analy
sis of inter
-
individual variation yield the
same results as an analysis of intra
-
individual variation? To illustrate this
question: under which conditions will factor analysis of inter
-
individual covariation
yield a factor solution that is equal to factor
analysis of intra
-
individual
covariation? The latter illustration can be rephrased in ter
ms of Cattell’s Data Box
in the following way
:
U
nder which conditions will R
-
technique factor analysis of
inter
-
individual covariation yield a solution that equals the

analogous P
-
technique
factor solution of intra
-
individual covariaton?

The general answer to this question is provided by the classical ergodic
theorems (cf. Molenaar, 2004; Molenaar, 2003, chapter 3). The answer is
: O
nly if
the ensemble of time
-
dependen
t trajectories in behavior space obeys two
rigorous

conditions

will an analysis of inter
-
individual variation yield the same
results as an analysis of intra
-
individual variation
.
The two conditions
concerned
are the following.
Firstly, the trajectory of ea
ch subject in the ensemble has to
obey exactly the same dynamical law
s

(homogeneity of the ensemble).
Secondly, each trajectory should have constant statistical characteristics in time
(stationarity, i.e., constant mean level and serial dependencies). In c
ase either
one (or both) of these two conditions is not met, the psychological process
concerned is non
-
ergodic, i.e., its structure of inter
-
individual variation will differ
from its structure of intra
-
individual variation. For a non
-
ergodic process, the
results obtained in standard analysis of inter
-
individual variation do not apply at
the individual level of intra
-
individual variation.

The meaning of the homogeneity
and stationarity
assumption
s

will
be elaborated
more fully in later sections, starting w
ith the section on

classical test theory
below.
The requirement that each subject in the ensemble should obey the same
dynamical laws
is expressed in the language of ergodic theory
,

which has its
roots in the
theoretical
foundations of statistical mechanic
s. Statistical mechanics

5

arose as the attempt
by Bol
t
zman
n

to explain the equilibrium characteristics of a
homogeneous gas
kept
under constant pressure and temperature

in a container
,
where the atoms of the homogeneous gas each obey the Newton laws of moti
on.
Nowadays ergodic theory is an independent mathematical discipline; standard
introductions are Petersen (1983) and Walters (1982). An excellent recent
monograph is Choe (2005).
T
he
theorem which for the ensuing discussion is the
most important one in

th
e set of classical ergodic theorems
has been

proven by
Birkhoff (1931).

3.
The non
-
ergodicity of classical test theory
.

Many of the psychological tests currently in use have been constructed according
to the principles of classical test theory
. The basic

concept in classical test theory
is the
concept of
true score: each observed score is conceived of as a linear
combination of a true score and an error score.
In their authoritative book on
classical test theory
,

Lord & Novick (1968) define the concept of

true score as
follows. They consider a
fixed

person P, i.e., P is not randomly drawn from some
population but is the given person for which the true score is to be defined. The
true score of P is defined as the expected value of the propensity distributio
n of
P’s observed scores. The propensity distribution is characterized as a “...
distribution function defined over repeated statistically independent
measurements on the same person” (Lord & Novick, 1968, p. 30). The concept of
error score then follows st
raightforwardly: the error score is the difference
between the observed score and the true score.

Several

aspects of this definition of true score are noteworthy.
T
he
definition

is
based on the intra
-
individual variation characterizing a fixed person P.

Repeated
administration of the same test to P yields a time series of scores of P, the mean
level of which is defined to be P’s true score.
Hence this definition of true score
does not involve any comparison w
ith other persons and therefore is not
at all
dependent on inter
-
individual variation. The single
-
subject repeated measures
design used to obtain P’s time series of
observed
scores
is akin to
standard

psychophysical measurement designs
(e.g., Gescheider, 1997).

Lord & Novick
(1968) require that the re
peated me
asurements are independent. This implies
that the time series of P’s scores
should lack

any sequential depe
ndencies
(autocorrelation).
At the close of this sec
tion we will further discuss
the

require
ment

that repeated measurements
have to
be indep
endent
.

Lord & Novick
(1968, p. 30) do not further
elaborate

their original definition of true
score
in the context of

intra
-
individual variation
because: “… it is not possible in

psycho
logy to obtain more than a few
independent
observations”
.

considering an arbitrary large number of replicated measurements of a single
fixed person P,

Lord & Novick (1968, p. 32)
shift attention to
an

alternative
scheme in which an arbitrary large number of persons is measured at a single
fixed time: “Primarily,
test theory

treats individual differences
or, equivalently, the
distributio
n of measurements over people”
.
Apparently it is expected that using

6

an individual
differences approach
,

valid information can be obtained about the
distinct

propensity distribution
s underlying
individual
true scores. We will see
shortly that this expectation is unwarranted.

Before focusing
in the remainder of their book
solely on the latter definition of
true score based on inter
-
individual variation, Lord & Novick
(1968, p.32)

ma
ke
the following interesting comment about their initial definition of true score based
on intra
-
individual variation
:

The true and error scores defined above
[based on
intra
-
individual variation; PM]
are
not

those primarily considered in test theory …

Th
ey are, however, those that would be of interest to a theory that deals with
individuals, rather than with groups (counseling rather than selection)

.

This is a
remarkable, though somewhat oblique statement. What is clear is that Lord &
Novick consider a t
est theory based on their initial concept of true score, defined
as the mean of the intra
-
individual variation
characterizing

a fixed person P, to be
“… of interest to a theory that deals with individuals …”.
That is,
they consider
such a test theory based

on intra
-
individual variation
to

be important in the
context of individual assessment. But what is not clear is whether they also
consider the alternative concept of true score

based on inter
-
individual variation

(individual differences)
to be

not

of inte
rest to a theory that deals with individuals.
That is, do they imply that classical test theory as we know it
is
only appropriate
for the assessment of groups and not for individuals?
It will be shown that
classical test theory indeed is inappropriate
for
individual assessment.

To summarize the discussion thus far: Lord & Novick (1968) define the concept
of true score as the expected value of the propensity distribution of the observed
scores of a given individual person P. This definition of true score b
ased on intra
-
individual variation then is used in an inter
-
individual context focused on
individual differences, i.e., classical test theory as we know it.
This raises the all
-
important question whether the information provided by individual differences
(
inter
-
individual variation) is able to determine the individual propensity
distributions to a degree which is sufficient to a
pply the concept of true score
based on intra
-
individual variation.
It is noted that this is exactly the question
concerning the er
godicity of the psychological process
concerned: for a given
test, will an analysis of inter
-
individual variation of test scores yield the same
results as an analysis of intra
-
individual variation of test scores?

question it has to be establ
ished that the psychological process presumed by
classical test theory to underlie the generation of test scores obeys the two
criteria for ergodicity.

The psychological process which according to classical test theory underlies the
generation of test sc
ores is very simple. It is implicit in the definition of true score
given by Lord & Novick (1968). Each individual person P is assumed to generate
a time series of independent scores in response to repeated administration of the
same test.
E
ach observed sc
ore of P’s time series constitutes a
n independent

random sample
drawn from P’s propensity distribution. Hence there exists a one
-
to
-
one relationship between the time series of P’s observed test scores and P’s

7

propensity distribution. The psychological proc
ess underlying P’s time series of
observed scores therefore is characterized, according to classical test theory, by
P’s propensity distribution. Statistical analysis of P’s intra
-
individual variation
boils down to statistical analysis based on P’s propens
ity distribution.

Classical
test theory only considers the first two central moments of P’s propensity
distribution (its mean and its variance).

According to classical test theory the propensity distributions of different persons
have different means and

different variances. The true score of person P
1

(i.e.,
the mean of the propensity distribution of P
1
) will in general differ from the true
score of person P
2
. Also the variance of P
1
’s observed scores will in general
differ from the variance of P
2

observ
ed scores. Hence, given the one
-
to
-
one
correspondence between individual time series and individual propensity
dist
ributions
noted above, the ensemble
involving

persons P
i
, i=1,2,…, is
populated by time series (propensity distributions)
which
h
ave differen
t mean
levels

(means of the propensity distributions)
and different variances
.
Clearly
such an ensemble is entirely heterogeneous: the psychological process
according to which P
i
’s time series of observed scores is generated is different
from the psycholog
ical process according to which P
k
’s time series of observed
scores is generated because, for i

k, the underlying propensity distribution of P
j

has mean and variance different from P
k
’s propensity distribution. Consequently
the ensemble of time series (propensity distributions) violates at least one of the
two criteri
a for ergodicity: the trajector
ies

(time se
ries) in the ensemble do

not
obey
the homogeneity criterion for ergodity, i.e., trajectories associated with
different persons do not obey
exactly
the same dynamical laws. Stated more
specifically, the random motion characteri
zing
t
ime series o
f
observed scores
in
the ensemble
has different mean

level

and variance for different persons.
Consequently, the psychological process which according to classical test theory
underlies the generation of test scores is non
-
ergodic. That is, it follows from

the
classical ergodic theorems that results obtained in an analysis of inter
-
individual
variation (individual differences) of test scores based on classical test theory do
not apply at the individual level of intra
-
individual variation. In short, the resu
lts
obtained with classical test theory do not apply in the context of individual
assessment.

3.1 Some formal elaborations
.

We will now present some simple formal elaborations showing the invalidity of
classical test theory for individual assessment. In

particular we will focus on the
concept of reliability as defined in classical test theory, show how estimation of
an individual’s true score in classical test theory depends upon the reliability of
the test, and indicate why this leads to invalid inferen
ces. In what follows
expressions

related to classical test theory
are

based on Lord & Novick (1968).

Consider first the situation with respect to the definition of true score based on
intra
-
individ
ual variation. A particular test has been selected (it wil
l be understood

8

in the rest of this

section that the same
test is being considered). Also a particular
person P is given. Let y(P,t), t=1,2,… denote the time series of P’s scores
obtained by repeatedly administering the test. The number of repeated
measure
ments is left undefined: it is understood that this number can be taken to
be arbitrarily large. Then the true score of P,

(P), is defined as the expected
value (mean) of y(P,t) across all repeated measurements t. Notice that

(P) is a
constant.
The varia
nce of y(P,t) across all repeated measurements is denoted by

2
(P). The variance

2
(P) is a measure of the reliability of a single score y(P,t=T)
which is obtained at the T
-
th repeated measurement (T arbitrary), conceived of
as an indicator of P’s true sco
re

(P). If

2
(P) is large, y(P,t=T) can be very
different from

(P), whereas if

2
(P) is small its value will be close to

(P).

To reiterate, in classical test theory one does not consider an arbitrary large
number of repeated measurements of a single p
considers an arbitrary large number of persons measured at a single time T. This
is the shift from an intra
-
individual variation perspective underlying the concept of
true score to an inter
-
individual variation perspective underlyi
ng classical test
theor
y as we know it. Accordingly we

consider an ensemble of time series of test
scores associated with different persons P
i
, i=1,2,…, where the number of
persons can be taken arbitrarily large. Associated with each distinct person P
i

is
a distinct propensity distribution which has, as explained above, a one
-
to
-
one
relationship with the psychological process according to which P
i

generates
his/her time series of
observed test scores. The mean (true score) of
the
propensity distribution
of
P
i

is

(P
i
) and the observed score of P
i

is y(P,t=T),
where T is arbitrary but fixed
. To ease the presentation we will denote

(P
i
) as

i

and
y(P
i
, t=T) as y
i
.

T
he error score associated with y(P
i
, t=T) =
y
i

is

(P
i
, t=T) and
will be denoted as

(P
i
, t=T)

=

i
,.

We now are ready to express the basic relationships of classical test theory:

(1
a
)
y
i

=

i

+

i
, i=1,2,…

(1
b
)
var[
y
i
] = var[

i
] + var[

i
]
.

According to (1
a
) the observed score y
i

of a randomly selected person P
i

is a
linear combination of the t
rue score

i

and the error score

i

of P
i
. According to
(1
b
) the variance of observed scores across persons consists of a linear
combination of the variance of
the
true scores across persons and the variance
of the error scores across persons. The reliabil
ity

of the test then is defined as:

(1
c
)

= var[

i
] / {

var[

i
] + var[

i
]}.

Hence the reliability

is the proportion of true score variance across persons in
the total variance of observed scores across persons.

9

Now suppose that the reliability

o
f our test is given and that also is given the
observed score y
i

of person P
i
. Then the following so
-
called Kelly estimator of the
true score

i

of P
i

can be defined (cf. Lord & Novick, 1968, p. 65, formula 3.7.2a):

(2
a
) est[

i

y
i
] =

y
i

+ (1
-

)

whe
re

is the mean of observed scores across persons. The error variance
associated with the Kelly estimator (2
a
) is (Lord & Novick, 1968, p. 68, formula
3.8.4a):

(2
b
) var{est[

i

y
i
]} = var[y
i
](1
-

)

.

Expressions (2
a
) and (2
b
) show that the estimate
and

associated standard error
of a person’s true score in classical test theory
are
a direct function of the test
reliability

.
The reliability itself is according to (1
c
) a direct function of the
variance of error scores var[

i
] across persons. Hence the Ke
lly estimate (2
a
) of a
person’s true score is a direct function of the error variance var[

i
] across
persons.

We have reached the conclusion that in classical test theory based on analysis of
inter
-
individual variation (individual differences), the estim
ate of a person’s true

score as well as the standard error

of this estimated true score de
pend directly
upon the reliability

of the test
.
In contrast, it was indicated at the beginning of
this section
that the variance

2
(P)

of the
propensity distributi
on

describing P’s
intra
-
individual variation
is a measure of the reliability

of a single score y(P,t=T)
estimating

P’s true score

(P).

Hence we have two different concepts of
reliability: an intra
-
individual definition in which the reliability is given by

2
(P) and
an inter
-
individual definition in which
the
reliability is a direct function of var[

i
].
Given that the definition of true score as the mean of a person P’s propensity
distribution is the starting point of both concepts of reliability, the defin
ition of
reliability in terms of the
intra
-
individual
variance

2
(P) is basic. The question
then arises whether the classical test theoretical definition of reliability in terms of
the inter
-
individual error variance
var[

i
] is a good approximation of

2
(P
).
The
answer to this question is given by the following expression (Lord & Novick,
1968, p. 35, formula 2.6.4):

(3) var[

i
] =
E
i
[

2
(P
i
)]

where
E
i

denotes the expectation taken
over

persons P
i
, i=1,2,… . Expression (3)
states that the inter
-
individual er
ror variance var[

i
] is the mean of the intra
-
individual variances of
individual
propensity distributions across persons P
i
,
i=1,2,… .

So, coming to
our

final verdict, how good an approximation is (3) for each of the
individual variances

2
(P
i
), i=1,2,
… ? Given that the number of persons in the

10

ensemble is taken to be arbitrarily large,
and given that the

2
(P
i
), i=1,2,… can
differ arbitrarily according to classical test theory,
it is immediately clear that in
general (3) bears no relationship to any of

the variances of the individual
propensity distributions.
Hence (3) is a poor approximation to the variances

2
(P
i
)
of the individual propensity distributions. Suppose
that (3) is small, which implies

that the
(
inter
-
individual
)

reliability

is high. T
hi
s leaves entirely open the
possibility that the variance

2
(P) of a given person P’s propensity distribution is
arbitrary large

(the psychological process generating test scores is
heterogeneous, hence non
-
ergodic)
. Estimation of P’s true score by means of

the
Kelly estimator (2
a
) then will yield a severely biased result. Also the standard
error (2
b
) of this estimate
will be severely biased,
suggest
ing

an illusory high
precision of the Kelly estimate.
Only the actual value of

2
(P) will provide the
correct
precision of taking P’s observed score as an estimate of P’s true score.
The true value of

2
(P) only can be
estimated

in an analysis of P’s intra
-
individual
variation. That is, the test should be repea
tedly administered to P, yielding a

time
series of P’
s

observed scores
.
The mean of P’s time series of observed scores
constitutes an unbiased estimate of P’s true score, and the standard deviation of
P’s time series of observed scores provides an unbiased estimate of the
precision of P’s estimated true score
.

3.2 Fundamental reasons or contingent circumstances

This section presents a critical discussion of the reasons why Lord & Novick
(1968), after having defined the concept of true score in terms of intra
-
individual
variation,
do not further pursue

an

in
tra
-
individual foundation for test theory and

an inter
-
individual perspective. It will be argued that their reasons
for doing so are not fundamental, but pertain to contingent circumstances that
can be dealt with by means of appropriate sta
tistical
-
methodological techniques
.

The key remark leading up to the re
jection of the possibility of a
test theory
based
on intra
-
individual variation
is the following: Characterizing the propensity
distribution
associated with the time series of a given
person P’s observed test
scores, Lord & Novick (1968, p. 30) require that the

“... distribution function
[is]
defined over repeated statistically independent measurements on the same
pers
on”
.

The important qualification is that the repeated measurements s
hould
be statistically independent. This implies the requirement that P’s time series of
observed test scores should lack sequential dependencies (e.g., autocorrelation).

After having postulated the requirement of obtaining statistically independent
obse
rved scores, Lord & Novick (1968, p. 30) conclude:

“… it is not possible in

psycho
logy to obtain more than a few
independent
observations”
.

This is the
reason why they do not consider the possibility of a test theory based on intra
-
individual variation to
be feasible. In general test scores obtained in a single
-
subject time series design will be sequentially dependent, i.e., have significant
autocorrelation. Moreover, the statistical properties of the psychological process
according to which test scores are

generated may change in time. For instance
,

11

the process concerned

may be vulnerable to learning and habituation influences
which induce time
-
dependent changes in the way test scores are being
generated.

Before scrutinizing the details of Lord & Novick’s

(1968) requirement that
repeated measurements of the same person P should be statistically
independent, we first consider their reason not to pursue a test theory based on
intra
-
individual variation. Because the basic concept underlying classical test
the
ory, the concept of true score, is defined at the level of intra
-
individual
variation, one would expect that the reason to leave that level and move to a
different level of inter
-
individual variation would
have to
be
a fundamental reason.
One would expect
to be given an argument involving issues of logical necessity
or impossibility. Yet the actual argument given by Lord & Novick
(1968)
concerns
more an issue of contingent character: repeated measurement of the same
person P yields test scores that are in g
eneral not statistically independent.
Indeed,
all

psychometricians will agree. But the statistical analysis techniques
used to determine P’s propensity distribution can accommodate the presence of
sequential dependencies, and then we still can have a test
theory which
is
directly based on the concept of true score as defined by Lord & Novick

(1968)
.
That is, a test theory based on intra
-
individual variation

which would be of
interest for individual assessment and counseling
.
The reason
which
Lord &
Novick (
1968) give for not further pursuing such a test theory is not fundamental
and does not prove the impossibility of such a theory.

We now turn to
discussion of the requirement that repeatedly measuring the
same person P should yield a time series of statist
ically independent scores.

To
reiterate, no psychometrician will expect this to occur: repeated measurement of
the same person generally will yield a time series of sequentially dependent
scores. But is this problematic? The time series of scores provides
the
information to determine the propensity distribution characterizing person P. In
particular, the mean and variance of P’s propensity distribution have to be
determined.
This is a standard problem in the statistical analysis of time series
that has been

completely solved in case the time series is stationary (cf.
Anderson,
1971
). Hence the important requirement is not that P’s time series
should consist of statistically independent scores, but that the time series is
stationary.
Stationarity of a time se
ries implies that the series has constant mean
level and that its autocorrelation only depends upon the relative distance (lag)
between measurement occasions.

The alternative requirement that a time series has to be stationary can be tested
for in severa
l ways (cf. Priestley,
1988
)
. In case such tests indicate that the
series is non
-
stationary, it can be analyzed by means of special techniques such
as evolutionary spectrum analysis (Priestley,
1988
) or wavelet analysis (e.g.,
Hogan & Lakey, 2005
; Houtveen

& Molenaar, 2001
). At the close of this chapter
a new modeling technique for multivariate non
-
stationary time series will be
presented.
Hence from a statistical analytic point of view non
-
stationary time

12

series can be handled satisfactorily. Yet from the
point of view of a test theory
based on intra
-
individual variation,
a

person P’s
time
series of test scores

should
be stationary in order to
allow estimation of the
constant
mean and
constant
variance
of P’s time
-
invariant propensity distribution.
In case
P’s time series of
test scores is non
-
stationary, the mean and/or variance of the series will
in
general
be time
-
varying. Lord & Novick’s (1968) definition of true score, however,
does not pertain to time
-
varying propensity distributions with time
-
varying
means
and/or variances.

Hence either methodological or statistical techniques have to be invoked in order
to guarantee that P’s time series of test scores is stationary. Only then can the
(constant)
mean and variance of P’s time series be used as estimat
es of the
mean and variance of P’s propensity distribution. Methodological techniques can
be used to guarantee that non
-
stationarity due to lear
ning and habituation

is
avoided. For instance, using a common approach in reaction time research,
registration o
f P’s time series
of test scores
only should begin if P has reached a
steady state after an initial transient due to novelty effects. This will require the
availability of a pool of many parallel test items

in order to avoid learning effects
.
Statistical t
echniques can be used a posteriori to remove transient effects due to
habituation and learning from P’s time series of test scores (e.g., Molenaar &
Roelofs, 1987).
Almost certainly new methodological and statistical techniques
will have to be developed i
n order to
accommodate the intricacies due to non
-
stationarity and
fully exploit the possibilities of a test theory based on intra
-
individual variation. Until now these possibilities have not been pursued
systematically
, for the wrong reasons as has been a
rgued in this section. Given
that the psychological process underlying the generation of test scores is non
-
ergodic according to classical test theory based on analysis of inter
-
individual
variation, psychometricians will have to seriously reconsider their

reasons for not
pursuing a test theory based on intra
-
individual variation.

One promising psychological paradigm which allows for

straightforward
determination of person
-
specific
propensity distributions is mental chronometry.
In his excellent monograph

on mental chronometry, Jensen (
2006, p.96) states:
“The main reasons for the usefulness of chronometry are not only the
advantages of its absolute scale properties, but also its sensitivity and precision
for measuring small changes in cognitive functionin
g,
the unlimited repeatability
of measurements under identical procedures
techniques for measuring a variety of cognitive processes, and the possibility of
obtaining the same measurements with consistently identical tasks
and
procedures over an extremely wide age range” (italics added).
The possibility to
obtain unlimited repeated measurements under identical procedures will allow for
the determination of person
-
specific reaction time propensity distributions with
arbitrary

precision.
Jensen presents impressive empirical results showing the
importance of not only the intra
-
individual means of person
-
specific
reaction time

distributions, but also their intra
-
individual variances in assessing cognitive status
and development (
e.g., in the context of the so
-
called neural noise hypothesis;

13

Jensen, 2006, p.122 ff.).
Consequently, I conjecture that mental chronometry
provides a very interesting approach to pursue a test theory based on intra
-
individual variation.

houghts

The impact of the fact that the ensemble of time series underlying classical test
theory is non
-
ergodic is enormous. Psychological tests are applied for individual
assessment in all kinds of settings. Using the population average expressed by
form
ula (3) as estimate of the intra
-
individual variance

2
(P) of a given person P
can lead to entirely erroneous conclusions. To give an arbitrary example:
suppose that the norm

of a test is

= 100, that the inter
-
in
dividual reliability

of the test is

= 0.9, and that the between
-
subjects variance of test scores is
var[y
i
] = 25. Suppose also that a true score which is larger than
y
C

=
120 is
considered reason for special treatment (clinical, educational, or otherwise).
Finally, suppose that person P has
observed score
y
P

= y(P,t=T) =
12
6
. Then the
Kelly

estimate (2
a
) of P’s true score

P

is:
est[

P

y
P
] = 0.9*126 + (1

0.9)*100 =
123.4. According to (2
b
)

the error variance of this estimated true score is:
var{est[

P

y
P
]} = 25*(1

0.9)*0.9 = 2.25. Hence

the standard error is 1.5 and a
commonly used confidence interval ab
out the estimated true score is
: 123.4
±

2*1.5, yielding 120.4 < est[

P

y
P
] < 126.4.
This confidence interval is entirely
located above the criterion score y
C

= 120, hence it is conclude
d that P needs
special treatment.
Suppose, however, that the intra
-
individual variance

2
(P)
of
P’s propensity distribution is

2
(P) = 36. Then the difference between P’s
observed score, y
P

= 126, and the criterion score for special treatment, y
C

= 120,
is

only 1 standard deviation, which according to standard statistical criteria would
not

indicate that P needs special treatment.

Numerical exercises such as the one given above can be carried out in a variety
of formats, using Monte Carlo simulation techn
iques and alternative settings. We
intend to report the results of one such a simulation study in a separate
publication. But the overall message should be clear: using the (inter
-
individual)
population value of the error variance (based on the inter
-
indiv
idual reliability) as
approximation for the intra
-
individual variance of a person P’s propensity
distribution is vulnerable to lead to erroneous conclusions about P’s true score
,
and, consequently, to erroneous decisions about the necessity to apply specia
l
treatment to P. The fundamental rea
son for the invalidity of
(3) as approximation
for

2
(P) is because the ensemble of time series of observed scores is non
-
ergodic.

4. Hidden heterogeneity

In the previous section we discussed heterogeneity with resp
ect to the means
and variances of the propensity distributions underlying classical test theory.
That

14

kind of heterogeneity can be considered to be a special instance of a much wider
class of heterogeneous phenomena, including also qualitative heterogeneit
y. An
important example of qualitative heterogeneity concerns individual differences in
of inter
-
individual
covariation
is

(using bold face lower case letters for vectors and bold face upper
case le
tters for matrices)
:

(4)
y
i
=

i
+

i
, i=1,2,…

where:
y
i

= [
y
1
i
,
y
2
i
, …,
y
p
i
]’

is the p
-
variate vector of observed scores of a
randomly drawn subject i (the apostrophe denotes transposition);

i

=

[

1i
,

2i
, …,

qi
]’ is the q
-
variate vector of factor sc
ores of subject I;

i

=

[

1i
,

2i
, …,

pi
]’ is the p
-
variate vector of measurement errors for subject i,

and

-

he fac瑯爠mode氠l映楮ier
-

but⁩猠 f⁣en瑲a氠業po牴an捥⁩ much⁯映p獹捨o汯ly⸠he 晡捴c爠rode氠捡n
be⁨eu物獴楣s汬y⁣ a牡捴e物zed⁡猠fo汬ow献s䥮⁴he⁣ n瑥x琠of⁴he behav楯爠獰a捥

y

which are considere
d to be indicators of a
q
-
variate latent factor

⸠周en 瑨e

be瑷een⁴he p⁩ d楣a瑯牳⁡nd⁴he
q
-
va物a瑥

⸠䥴⁩猠 n⁥s獥n瑩t氠
a獳smp瑩tn⁵nde牬y楮i⁴he 晡捴o爠
mode氠瑨a琠th
a捲潳猠獵bje捴献

doe猠not⁤epend upon⁩Ⱐwhe牥⁴he⁳ bs捲cp琠椠獴ind猠
for subject i in the population; i = 1,2,… .
Hence the assumption is that each
individual person
i
in the population has a person
-
s
pecific
q
-
variate
factor score

i

and person
-
specific p
-
variate error score

i
, but the factor model for each
person in the population has the same
(p.q)
-
dimensional matrix of

Suppo獥 now⁴hat⁷e⁣ 牲y⁯u琠a⁳業u污l楯n expe物men琠in⁷h

on汹⁨a猠s pe牳rn
-

q
-
va物a瑥

-
va物a瑥 e牲o爠獣r牥Ⱐbu琠
a汳漠a pe牳rn
-

i
, i = 1,2,…
. Hence
each person has a person
-
specific factor model
:

(5)
y
i
=

i

i
+

i
, i=1,
2,…

This
h

i
, i = 1,2,…,
constitutes a severe violation
of an important assumpt
ion underlying the
standard
factor model, namely
the
assumption
subjects.

The fact that the matrix of factor loadings in (5) is subject
-
specific
implies that the way in which factors are expressed in the observed scores is
qualitatively different for different subjects. These inter
-
individual differences
in
the values of factor
scores

are called qualitative because the
substantial
interpretation

15

Despite the fact that (5) involves a severe violation of the qualitative homogeneity
assumption (invariance of factor lo
standard factor model (4), it was shown in a number of simulation studies that
factor analysis of inter
-
individual covariation
appears to be

insensitive to this
violation.
The typical set
-
up of these simulation studie
s was to generate data
according to the person
-
specific (qualitatively heterogeneous) factor model (5),
and then fit the standard factor model (4) to the simulated data. Although one
would expect
the fit of model (4)

to be poor due to the fact that the sim
ulated data
violate the assumption of qualitative homogeneity underlying model (4), it turns
out that this is not at all the case. The general finding in these simulation studies
is that (variants of) factor model (4) provide
(
s
)

satisfactory fits to data g
enerated
according to (variants of) factor model (5).
Satisfactory fits, that is, according to
all usual criteria of goodness
-
of
-
fit, such as the chi
-
squared likelihood ratio test,
standardized root mean square residual, and root mean square error of
appro
ximation

(cf. Brown, 2006, for definitions and discussion of these criteria)
.
Nowhere in the obtained (Maximum Likelihood) solutions a flag is waving
indicating that something is fundamentally wrong.
These simulation studies were
based on the cross
-
section
al factor model (Molenaar, 1997), the longitudinal
factor model (Molenaar, 1999) and the behavior genetical factor model for
multivariate phenotypes of MZ and DZ twins (Molenaar et al., 2003). A
mathematical
-
statistical proof of the insensitivity of the fa
ctor model of inter
-
individual covariation to
the qualitative
given in Kelderman & Molenaar (2006).

Evidently, the finding that the standard factor model of inter
-
individual covariation
is insensitive to the presen
ce of extreme qualitative heterogeneity in the
popu
lation of subjects, created

by the person
-

i
, i = 1,2,…, in (5), raises serio
us questions.
To reiterate, nothing in the
results
obtained with the
standard
factor analys
es based on model (4) indicates that the
true state of affairs is in severe violation of the assumptions underlying this
model.
The standard factor models yield satisfactory fits to the data generated
according to model (5). Consequently, t
he presence of s
ubstantial qualitative
heterogeneity in the simulated data remains entirely hidden in the standard factor
analyses based on inter
-
individual covariation.

Before discussing some of the
consequences of this finding, it is noted that there exist a prior reaso
ns to expect
that wide
-
spread qualitative heterogeneity actually exists in human populations.
The reasons have to do with the way in which cortical neural networks grow and
adapt during the life span, namely by means of self
-
organizing epigenetic
processes

(cf. Molenaar et al.,
1993). Self
-
give
rise to

emergent
endogenous variation in neural network connections, even
between homologous structures located at the left and right sides
of the brain
within the same subject (cf.
E
delman, 1987). In so far as cognitive information
processing is associated with cortical neural activity, one can expect that these

endogenously generated

differences in neural network architectures will become
discernable as qualitative
heterogeneity

of

t
he structure of
observed behavior
of

16

different subjects
(see Molenaar, 2006, for further elaboration

and mathemat
ical
-
biological modeling of these epigenetic processes
).

On
e

direct consequence of the fact that standard factor analysis of inter
-
individual
covariation is insensitive to qualitative heterogeneity is the following.
Suppose that the standard
q
-
factor model (4) yields a satisfactory fit to the data
obtained with a test composed of p subtests (e.g., items). Let est[

 deno瑥 瑨e
e獴業a瑥d  ⱱ)
-

a汳漠瑨a琠楮⁲ a汩瑹ⁱua汩瑡瑩te⁨ete牯rene楴i⁩猠 牥獥nt⁩ ⁴he popu污lion⁯映

-

P

for

a
given subject P differs substantially

䙯爠

P

whereas

P

is unknown
in the context of standard factor analysis of inter
-
individual variation.
The
estimate of P’s factor score, est[

P
],
is

est[

⁡ndⰠbe捡u獥 es瑛

P
, this estimate
est[

P
] will be
substantially
biased.
this bias the
reader is referred to the publications mentioned above (Molenaar, 1999;
Molenaar et al., 2003; Kelderman & Molenaar, 2006).

Another consequence of the insensitivity of standard factor analysis of intra
-
individual variation to qualitative he
teroge
neity concerns the fact that the

semantic interpretation of factors thus obtained

is inappropriate at the person
-
specific level
. Suppose that
standard
factor analysis of personality test scores
yields

the
expected

co牲e獰ond楮i⁴o⁴he B楧
䙩ve⁴heo特 捦⸠Bo牫敮au…

.⁔ en,⁩fⁱua汩瑡瑩te⁨ete牯rene楴i⁩猠

P

for a particular person P may not at all conform
to the Big Five pattern and hence the semantic interpretation
of the factors for P
will be different.
Stated more specifically, the nominal semantic interpretation of
the five factors obtained in standard factor analysis is inappropriate for P.
The
reader is referred to Hamaker, Dolan, & Molenaar (2005) for
an elabor
ate
illustration based on empirical personality test scores.

5. Heterogeneity in time

To reiterate, a (psychological) process should obey two criteria in order to qualify
as an ergodic process. Firstly, the trajectory of each subject in the ensemble
sho
uld conform to exactly the same dynamical laws (homogeneity of the
ensemble). Secondly, each trajectory should have constant statistical
characteristics in time (stationarity, i.e., constant mean level and serial
dependencies

which only depend upon relativ
e time differences
). In the previous
sections attention has been confined to psychological processes which are non
-
ergodic because t
hey violate the first criterion, i.e., heterogeneity

of different
trajectories in the ensemble.
Whereas the first criterion
involves a comparison
between different trajectories, the second stationarity criterion involves
comparison of the same
trajectory at different times.
In this section we will

17

consider psychological processes which are non
-
ergodic because the
y

violate
the s
econd criterion, i.e., they are non
-
stationary
.

In general, non
-
stationarity implies that parameters of a dynamic system are
time
-
varying. Prime examples of non
-
stationary systems are developmen
tal
systems

which typically have time
-
varying parameters suc
h as waxing and/or
.
For this reason developmental systems are non
-
ergodic
and their analysis should be based on intra
-
individual variation.
There exists a
long tradition in theoretical developmental psychology in which it is argued t
hat
developmental processes should be analyzed at the level of intra
-
individual
variation (time series data). The general denotation for this tradition is
Developmental Systems Theory (DST). Important contributions to DST include
Wohlwill’s (1973) monograp
h on the concept of developmental functions
describing intra
-
individual variation, Ford and Lerner’s (1992) integrative
approach based on the interplay between intra
-
individual variation and inter
-
individual variation and change, and Gottlieb’s (1992, 2003
) theoretical work on
probabilistic epigenetic development.

Intra
-
individual analysis of non
-
stationary multivariate time series requires the
availability of sophisticated statistical modeling techniques. We developed

such a
technique
based on a
systems

model wi
th arbitrarily time
-
varying
par
ameters

(
Molenaar, 1994;
Molenaar & Newell, 2003)
. Our model can be conceived of as a
suitab
ly generalized factor model for

non
-
stationary
p
-
variate time series

y
(t), t =
1,2,...,T. Its schematic form is

:

(
6
a
)

y
(t)
=

⡴(

⡴(‫

⡴(

b
)

⡴(1
⤠)

⡴(

⡴( +

⡴(1)

c
)

⡴(1⤠)

⡴(‫

⡴(1)

a
)
y
(t) denotes the

observed p
-
variate time series,

⡴(

-

s瑡瑥 p牯捥獳s
Ⱐand

⡴(⁩猠 he p
-
va物a瑥⁭ a獵reme
n琠e牲o爠
p牯捥獳
⸠周e

⡴(崠depen
d⁵pon
the

r
-
va物a瑥

-
vary楮i
pa牡mete爠
ve捴c爠

⡴(⸠

b
) describes the evolution of the latent factor series

⡴(
by⁭eans⁯f aⁱ
-
va物a瑥⁳ ocha獴楣sdi晦e牥n捥 equa瑩tn auto牥r牥獳楯r⤠)e污l楮i

(

⡴(Ⱐwhe牥

(琫ㄩ⁤eno瑥猠瑨eⁱ
-
va物a瑥⁲ 獩sua氠p牯捥獳
.

-
d業en獩潮a氠la瑲tx⁯f⁲eg牥獳楯r⁷e楧h瑳t

(琩崠tepend猠spon⁴he
r
-
va物a瑥

-
vary楮i⁰a牡re瑥爠re捴o爠

⡴(⸠

c
)
describes the time
-
dependent variation
of

the
unknown paramete
rs. The r
-
variate parameter vector process

⡴(⁯bey猠s

r
-
va物a瑥

p牯捥獳s

⡴(

he⁳ 獴sm of equa瑩tn猠

a
), (6
b
) and (6
c
) allows for the modeling of a large
class of multivariate n
on
-
stationary (non
-
ergodic) processes. Equations (6
a
) and
(6
b
) have the same formal structure as the
well
-
known
inter
-
individual longitudinal

18

q
-
factor model,
which helps in their interpretation. Yet the system of
equations
(6
a
), (6
b
) and (6
c
) is

applied to

analyze the structure of intra
-
individual variation
underlying the observed p
-
variate time series
y
(t)

obtained with a single subject.
Generalization of this model to accommodate multivariate time series obtained in
a replicated time series design is stra
ightforward. Also extension of the model
with arbitrary mean trend functions and covariate processes having time
-
varying
effects is straightforward.

The fit of equations (6
a
), (6
b
) and (6
c
) to an observed p
-
variate time series
y
(t),
t=1,2,...,T, where T i
s the number of repeated measurements obtained with a
single subject P, is based on advanced statistical analysis techniques taken from
the engineering sciences (
Bar
-
Shalom
et al., 2001

; Ristic et al., 2004).
It
consists of a combination of recursive esti
mation (filtering), smoothing, and
iteration
(EKFIS: Extended Kalman Filter with Iteration and Smoothing). The
EKFIS yields a time series (trajectory) of estimated values for each of the r
parameters in

⡴(

k
(t), t=1,2,...,T

; k=1,2,...,r.

To illustra
te the performance of the EKFIS, the following small simulation study
has been carried out.
A 4
-
variate

(p = 4)

time series
y
(t)
has been generated by
means of the state
-
space model with time
-
varying parameters

(6
a
), (6
b
) and (6
c
)
.
The model has a univaria
te
(q = 1)
latent state process

(t)
. The autoregressive
coefficient

⡴(

b
)
for the latent state

increases
linearly from 0.0 to 0.9 over the observation interval comprising
T =
100 time
points
: b(t) = 9t/1000, t=1,2,…
,
100.
.
Hence the sequential dependence
(autocorrelation) of the latent state process (latent factor series) increases from
zero to 0.9 across 100 time points and therefore is highly time
-
varying (non
-
stationary, hence non
-
ergodic).
Depicted
in Figure 1
is the e
stimate of this
autoregressive weight
b(t)
obtained by means of the
EKFIS

based on a single
subject time series
y
(t), t=1,2,...,100
. It is clear that the estimated trajectory
closely tracks the true time
-
varying path of this parameter.

19

6. Discussion and conclusion

In this chapter some of the implications of the classical ergodic theorems have
been considered in the contexts of classical test theory, factor analysis of inter
-
individual covariation, and the analysis of non
-
sta
tionary developmental
processes.
In each of these contexts the classical ergodic theorems imply that
instead of using standard statistical approaches based on analysis of inter
-
individual variation
, it is necessary to use single
-
subject time series analysi
s of
intra
-
individual variation. This conclusion holds for individual assessment based
on classical test theory, for testing the assumption of homogeneity (fixed factor
-
individual covariation, and for
t
he analysis of non
-
stationary processes such as learning and developmental
processes.

20

The consequences of the classical ergodic theorems in these and many other
contexts in psychology imply that time series designs and time series analysis
techniques wil
l have to be assigned a much more prominent place than is
currently the case in psychological methodology. The overall aim of scientific
research in psychology still
should be

to arrive at general (nomothetic) laws that
hold for all subjects in a well
-
defi
ned population. But the inductive tools to arrive
at such general laws have to be fundamentally different from the currently
standard approaches
for those

psychological processes
which

are non
-
ergodic.
Only if a psychological process is ergodic, i.e., obey
s the two criteria of
homogeneity and stationarity, can results obtained by means of analysis of inter
-
individual variation be generalized to the level of intra
-
individual variation. But the
two criteria for ergodicity are very strict and many psychologica
l processes of
interest will fail to obey these criteria. Psychologists have to understand that
ergodicity is the special case, whereas non
-
ergodicity is the rule. For non
-
ergodic
psychological processes analysis of

inter
-
individual variation
yield
s

result
s that
may not apply to any of the individual subjects in the population of subjects.

In conclusion, the inductive tools which are necessary to arrive at general
(nomothetic) laws for non
-
ergodic processes involve the search for
communalities between sin
gle
-
subject process models fitted to time series data
obtained in replicated time series designs.
The latter search for communalities
between single
-
subject process models can be based on standard mixed
modeling techniques (see
the excellent textbook of De
midenko, 2004).

Having available appropriate time series models for each individual subject
opens up possibilities which are entirely new in psychology. These possibilities
involve the optimal control of ongoing psychological processes. For instance,
con
sider th
e fo
llowing special instance of the system of equations
(6
a
), (6
b
)
:

(7
a
)
y
(t) =

⡴(‫

⡴(

b
)

⡴(1⤠)

⡴(‫

u
(t) +

⡴(1)

䡥牥⁴he⁳ me⁤ef
i
n楴楯i猠spp汹⁡猠fo爠equa瑩on猠

a
), (6
b
). Notice that in
(
7
a
) and

(
7
b
) the (p,q)
-
dimension

and the (q,q)
-
dimensional
matrix of regression weights

are assumed to be constant in time. This is to
ease the presentation; generalization of what follows to the non
-
stationary model
given by
(6
a
), (6
b
) and

(6
c
) is straigh
tforward. Notice also that
(
7
b
) contains a new
component:

u
(t)
.
The process s
-
variate process
u
(t) represents a know
n
process

that can be manipulated
; for instance dose of medication

or
environmental stimulation.

is a

(q,s)
-
dimensional matrix of regress
ion weights.

Suppose that
(
7
a
) and

(
7
b
) provide a faithful description of the p
-
variate time
series
y
(t)

for subject P. It then is possible to determine
u
(t)

in such a way that
the state process

⡴(

i猠獴ee牥d⁴o⁩瑳 de獩sed⁬ ve氠

#
, where

#

is chose
n by the

21

controller. The optimal input
u
@
(t)

is determined according to the following
schematic feedback function:

(8)
u
@
(t)

=
F
[
y
(t),t]

where
F
[.] denotes an (s,p)
-
dimensional nonlinear feedback function. Application
of
u
@
(t)

at time t guarantees that t
he state process

)

a琠瑨e⁮ex琠瑩te po楮琠

#
.

Optimal control is an important field of research in the engineering sciences.
There exists a vast literature on many different variants of optimal c
ontrol (cf.
Kwon, 2005, for
a thorough explanation of the currently most advanced
approaches). These control techniques can be applied straightforwardly in
analyses of intra
-
individual variation in order to steer psychological processes in
desired directio
ns (cf. Molenaar, 1987, for an application to the optimal control of
a psychotherapeutic process). This opens up an entirely new

promising

field of
applied psychology
: person
-
specific modeling and
control of ongoing
psychological processes.

References

Anderson, T.W. (1971).
The statistical analysis of time series
. New York: Wiley.

Bar
-
Shalom, Y., Li, X.R., & Kirubarajan, T. (2001). Estimation with applications to
tracking and navigation. New York: Wiley.

22

Birkhoff, G.D. (1931)
. Proof of the ergodic theorem.
Proceedings of the National
,
17
, 656
-
660.

Borkenau, P., & F. Ostendorf, (1998). The Big Five as states: How useful is the
five
-
factor model to describe intra
-
individual variations over time?
Journal
of
Personality Research
,
32
, 202
-
221.

Brown, T.A. (2006).
Confirmatory factor analysis for applied research
. New York:
Guilford Press.

Cattell, R.B. (1952). The three basic factor
-
analytic designs

Their interrelations
and derivatives.
Psychological Bu
lletin
,
49
, 499
-
520.

Choe, G.H. (2005).
Computational ergodic theory
. Berlin: Springer.

De Groot, A.D. (1954).
Scientific personality diagnosis.
Acta Psychologica
,
10
,
220
-
241.

Demidenko, E. (2004).
Mixed models: Theory and applications
. Hoboken, NJ:
Wi
ley.

Edelman, G.M. (1987).
Neural Darwinism: The theory of neuronal group
selection
. New York: Basic Books.

Ford, D.H., & Lerner, R.M. (1992).
Developmental systems theory
. Newbury
Park: Sage.

Gescheider, G.A. (1997).
Psychophysics: The fundamentals
. Ma
hwah, NJ:
Erlbaum.

Gottlieb, G. (1992).
Individual development and evolution: The genesis of novel
behavior
. New York: Oxford University Press.

Gottlieb, G. (2003). On making behavioral genetics truly developmental.
Human
Development
,
46
, 337
-
355.

Hamak
er, E.L., Dolan, C.V., & Molenaar, P.C.M. (2005).
Statistical modeling of
the individual: Rationale and application of multivariate time series analysis.
Multivariate Behavioral Research
,
40
, 207
-
233.

Hogan, J.A., & Lakey, J.D. (2005).
Time
-
frequency and
time
-
scale methods:
Adaptive decompositions, uncertainty principles, and sampling
. Boston:
Birkh
ä
user

23

Houtveen, J.H., & Molenaar, P.C
.M. (2001).
Comparison between the Fourier
and wavelet methods of spectral analysis applied to stationary and non
-
stationa
ry heart period data.
Psychophysiology
,
38
, 729
-
735.

Kelderman, H., & Molenaar, P.C.M. (2006).
The effect of individual differences in
Multivariate Behavioral
Research
).

Jensen,
A.R. (2006).
Cloc
king the mind: Mental chronometry and individual
differences
. Amsterdam: Elsevier.

Kwon, W.H. (2005).
Receding horizon control: Model predictive control for state
models
. London: Springer.

Lord, F.M., & Novick, M.R. (1968).
Statistical theories of mental

test scores
.
-
Wesley.

Molenaar, P.C.M. (1987), Dynamic assessment and adaptive optimization of the
therapeutic process.
Behavioral Assessment
,
9
, 389
-
416.

Molenaar, P.C.M., & Roelofs, J.W. (1987).
The analysis of multiple habituation

profiles of single trial evoked potentials.
Biological Psychology
,
24
, 1
-
21.

Molenaar, P.C.M., Boomsma, D.I., & Dolan, C.V. (1993).
A third source of
developmental differences.
Behavior Genetics
,
23
, 519
-
524.

Molenaar, P.C.M. (1994). Dynamic latent vari
able models in developmental
psychology. In: A. von Eye & C.C. Clogg (Eds.),
Analysis of latent variables in
developmental research
. Newbury Park: Sage, pp. 155
-
180.

Molenaar, P.C.M. (1997). Time series analysis and its relationship with
longitudinal anal
ysis.
International Journal of Sports Medicine
,
19
, 232
-
237.

Molenaar, P.C.M. (1999). Longitudinal analysis. In: H.J. Ader & G.J. Mellenbergh
(Eds.),
Research methodology in the social, behavioral and life sciences
.
London: Sage, pp. 143
-
167.

Molenaar, P
.C.M., Huizenga, H.M., & Nesselroade, J.R. (2003).
The relationship
between the structure of interindividual and intraindividual variability: A
theoretical and empirical vindication of Developmental Systems Theory. In: U.M.
Staudinger & U. Lindenberger (Ed
s.),
Understanding human development:
Dialogues with life
-
span psychology
. Dordrecht: Kluwer, pp. 339
-
360.

Molenaar, P.C.M. (2003).
State space techniques in structural equation
modeling: Transformation of latent variables in and out of latent variable mo
dels
.
111 pages.
Website:
http://www.hhdev.psu.edu/hdfs/faculty/molenaar.html

24

Molenaar, P.C.M., & Newell, K.M. (2003). Direct fit of a theoretical model of
phase transition in oscillator
y finger motions.
British Journal of Mathematical and
Statistical Psychology
,
56
, 199
-
214.

Molenaar, P.C.M. (2004). A manifesto on psychology as idiographic science:
Bringing the person back into scientific psychology, this time forever.
Measurement
,
2
, 2
01
-
218.

Molenaar, P.C.M. (2006).
On the implications of the classic ergodic theorems:
Analysis of developmental processes has to focus on intra
-
individual variation

(submitted).

Nayfeh
, A.H., & Balachandran, B. (1995
)
.
Applied nonlinear dynamics: Analyt
ical,
computational, and experimental methods
. New York: Wiley.

Petersen, K.
Ergodic theory
. Cambridge: Cambridge University Press.

Priestley, M.B. (1988).
Non
-
linear and non
-
stationary time series analysis
.

Ristic, B., Arulampal
am, S., & Gordon, N. (2004).
Beyond the Kalman filter:
Particle filters for tracking applications
. London: Artech House.

Walters, P. (1982).
An introduction to ergodic theory
. 2
nd

edition. New York:
Springer.

Wohlwill, J.F. (1973).
The study of behavioral development