Iterated Random Functions: Convergence Theorems C. D. Fuh ...

unwieldycodpieceΗλεκτρονική - Συσκευές

8 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

92 εμφανίσεις

Iterated Random Functions:Convergence Theorems
C.D.Fuh
y
Institute of Statistical Science
Academia Sinica,Taipei,Taiwan,ROC
ABSTRACT
Iterated random functions are used to draw pictures,simulate large Ising models or
likelihood function representation of hidden Markov models,among other applications.
They oer a method for studying the steady state distribution of a Markov chain,and
there is a simply unifying idea:the iterated random Lipschitz functions converge if the
functions are contracting on the average.To be more precise,let (X;d) be a complete
separable metric space and (F
n
)
n0
a sequence of i.i.d.randomfunctions fromX to X which
are uniform Lipschitz,that is,L
n
= sup
x6=y
d(F
n
(x);F
n
(y))=d(x;y) < 1a.s.Providing the
mean contraction assumption Elog
+
L
1
< 0 and Elog
+
d(F
1
(x
0
);x
0
) < 1for some x
0
2 X,
it is known that the forward iterations M
x
n
= F
n
     F
1
(x),n  0,converge weakly to
a unique stationary distribution  for each x 2 X.The associated backward iterations
^
M
x
n
= F
1
     F
n
(x) are a.s.convergent to a random variable
^
M
1
which does not
depend on x and has distribution .In this paper,we describe the essential results about
asymptotic behavior of the iterated random functions M
x
n
.To start with,we summarize
recent results regarding stochastic stability of iterated random functions.Then,we study
limiting theorems for additive functions of a Markov chain that can be constructed as an
iterated random functions,which include ergodic theorem,central limit theorem,quick
convergence,Edgeworth expansion and renewal theorems.Three prototypical methods
are introduced to prove limiting theorems:regeneration method,Poisson equation,and
spectral theory for the transition operator.Several examples are given for illustration.
AMS 2000 subject classications.60J05,60J15,60K05,60G17.
Keywords and phrases:random function,Lipschitz map,Markov chain,Poisson equa-
tion,forward iterations,backward iterations,stationary distribution,Prokhorov metric,
level ladder epoch,moment generating function,product of random matrices,Liapunov
exponent,Harris recurrence,total variation,w-ergodicity,geometric ergodicity,uniform
ergodicity,strictly contraction,drift condition,central limit theorem,quick convergence,
Edgeworth expansion,renewal theorem.
y Research partially supported by NSC 91-2118-M-001-016.
1
1 Introduction
Iterated random functions (IRF) have a wide range of applications including perfect simu-
lation,the generation of fractal images,data compression,queuing theory,autoregressive
processes and likelihood representation of hidden Markov models,and among others.The
reader is referred to Du o (1997),and Diaconis and Freedman (1999) for excellent recent
survey including an extensive list of relevant literature.In this paper,we study the theo-
retical aspect of iterated random functions;to summarize recent limiting theorems in the
literature.To be more precise,a sequence of the form
M
n
= F(
n
;M
n1
);n  0;(1:1)
is called an iterated random functions (IRF) of i.i.d.Lipschitz maps providing
1.M
0
;
1
;
2
;   are independent random elements on a common probability space
(
;U;P);
2.
1
;
2
;   are identically distributed with common distribution  and take values in
a second countable measurable space (;A);
3.M
0
;M
1
;   take values in a complete separable metric space (X;d) with Borel -eld
B(X);
4.F:(X;A
B(X))!(X;B(X)) is jointly measurable and Lipschitz continuous
in the second argument.
Let X
0
be a dense subset of X and M(X
0
;X) the space of all mappings f:X
0
!X
endowed with product topology and product -eld.Then the space L
Lip
(X;X) of all
Lipschitz continuous mappings f:X!X properly embedded forms a Borel subset of
M(X
0
;X) and the mappings
L
Lip
(X;X) X 3 (f;x) 7!f(x) 2 X;
L
Lip
(X;X) 3 f 7!l(f):= sup
x6=y
d(f(x);f(y))
d(x;y)
are Borel,see Lemma 5.1 in Diaconis and Freedman (1999) for details.Hence
L
n
:= l(F(
n
;));n  0;(1.1)
are also measurable and form a sequence of i.i.d.random variables.
In the following,we write F
n
(x) for F(
n
;x).Let F
k:n
:= F
k
  F
n
,F
n:k
:= F
n
  F
k
and F
n:n1
the identity on X for all 1  k  n.Hence
M
n
= F
n
(M
n1
) = F
n:1
(M
0
) (1.2)
2
for all n  0.Closely related to these forward iterations,and in fact a key tool to their
analysis,is the following sequence of backward iterations
^
M
n
:= F
1:n
(M
0
);n  0:(1.3)
The connection is established by the identity
P
x
(M
n
2 ) = P
x
(
^
M
n
2 )
for all n  0.Put also M
x
n
:= F
n:1
(x) and
^
M
x
n
:= F
1:n
(x) for x 2 X and note that
P((M
x
n
;
^
M
x
n
)
n0
2 ) = P
x
((M
n
;
^
M
n
)
n0
2 ):
The reason for introducing these additional sequences is that we will do comparisons of
^
M
x
n
and
^
M
y
n
,or M
x
n
and M
y
n
,for dierent x;y.In the verication of stochastic stability,it
is known that the forward iterations M
x
n
= F
n
     F
1
(x),n  0,converge weakly to a
unique stationary distribution  for each x 2 X;while the associated backward iterations
^
M
x
n
= F
1
   F
n
(x) are a.s.convergent to a random variable
^
M
1
which does not depend
on x and has distribution ,providing the mean contraction assumption Elog
+
L
1
< 0 and
Elog
+
d(F
1
(x
0
);x
0
) < 1for some x
0
2 X.
The theory of additive functional of iterated random functions gives rise to general
results of which typical examples are the ergodic theorem and central limit theorem;the
results describe here can be considered as an innite dimensional extension of this theory.
The aspect of the situation which is new is the non-commutativity of the iteration,and thus
we are led to study a certain Markov chain theory.Clearly,by denition (1.1),(M
n
)
n0
constitutes a temporarily homogeneous Markov chain with state space X and transition
kernel P,given by
P(x;B) = (F(;x) 2 B)
for x 2 X and B 2 B(X).The n-step transition kernel is denoted P
n
.For x 2 X,let
P
x
be the probability measure on the underlying measurable space under which M
0
= x
a.s.The associated expectation is denoted E
x
,as usual.For an arbitrary distribution 
on X,we put P

():=
R
P
x
() (dx) with associated expectation E

.We use P and E for
probabilities and expectations,respectively,that do not depend on the initial distribution.
It is known (cf.Alsmeyer,2003) that the induced Markov chain from iterated random
functions is Harris recurrent on a set H,and w-ergodic for some weight function w if extra
moment conditions is assumed.The results may be more easily derived fromrelated results
in Meyn and Tweedie (1993,Chapter 17) if (M
n
)
n0
is further irreducible (with respect
to some measure on B(X)) in which case it is even positive Harris recurrent on some P-
absorbing set.However,many IRF of i.i.d.Lipschitz functions are not irreducible but only
weak Feller chains.It is this fact that complicates the necessary arguments in the general
situation.
3
In this paper,we study limiting theorems for additive functions of a Markov chain that
can be constructed as an iterated random functions,which include Stochastic stability,
ergodic theorem,central limit theorem,quick convergence,Edgeworth expansion and re-
newal theorems.Three prototypical methods are introduced to prove limiting theorems:
regeneration method,Poisson equation,and spectral theory for the transition operator.To
start with,we introduce the regeneration method to prove rate of convergence and ergodic
theorem for IRF in Section 2.Secondly,without the assumption of irreducibility,we apply
Poisson equation method to prove central limit theorem and quick convergence in Section
3.To prove Edgeworth expansion and renewal theorems,we need to put the irreducibility
assumption,to which two types of conditions are imposed here.A density hypothesis on
 leads to a situation in the context of Harris recurrent;another natural hypothesis is the
positivity of the functions in the support of  and we have then contraction properties
which lead also to precise results.In Section 4,we state the results of Harris recurrent and
w-ergodic for iterated random functions,and introduce a sucient condition,based on the
density hypothesis,for irreduciblity.In Section 5,we study the hypothesis of positivity for
the functions in the support of ,on which basis we develop our spectral theory.Edge-
worth expansion,and renewal theorems,considered in Sections 6 and 7 respectively,are
then follow from the established Markov chains theory.Illustrated examples are included
in Section 8.The rst two satisfy the density assumption;while the third one satises the
positivity assumption.The fourth example does not satisfy neither one.
2 Stochastic stability and ergodic theorem
In this section,we summarize the results of stochastic stability and rate of convergence
for iterations of i.i.d.mean contraction random Lipschitz functions.Ergodic theorem is
also given.A central question for an IRF (M
n
)
n0
is under which conditions it stabilizes,
that is,converges to a stationary distribution .Elton (1990) showed in the more general
situation of a stationary sequence (F
n
)
n1
that this holds true whenever Elog
+
l(F
1
) and
Elog
+
d(F
1
(x
0
);x
0
) are both nite for some (and then all) x
0
2 X and the Liapunov expo-
nent l

:= lim
n!1
n
1
log l(F
n:1
) which exists by Kingman's subadditive ergodic theorem,
is a.s.negative.His results for i.i.d.F
1
;F
2
;   under the slightly stronger assumptions
Elog
+
l(F
1
) < 0,Elog
+
d(F
1
(x
0
);x
0
) < 1 for some x
0
2 X are restated in Theorem 2.1.
The basic idea is to consider the backward iterations
^
M
x
n
= F
1:n
(x) and to prove their a.s.
convergence to a limit
^
M
1
which does not depend on x and which has distribution .The
obvious inequality
d(
^
M
x
n+m
;
^
M
x
n
) 

n
Y
k=1
l(F
k
)

d(F
n+1:n+m
(x);x) a.s.;(2.1)
4
valid for all n;m 0 and x 2 X,forms a key tool in the necessary analysis.Alsmeyer and
Fuh (2001) embarks on that same inequality together with the simple observation that
log

n
Y
k=1
l(F
k
)

=
n
X
k=1
log l(F
k
);n  0;(2.2)
is an ordinary zero-delayed random walk and thus perfectly amenable to renewal theoretic
(regeneration) arguments.Under the mean contraction assumption Elog
+
l(F
1
) < 0,it has
negative drift whence,for arbitrary 2 (0;1),the level log ladder epochs 
0
( )  0,

n
( ):= inf

k > 
n1
:
k
X
j=
n1
( )+1
log l(F
j
)  log

;n  1;(2.3)
are all a.s.nite and constituting an ordinary discrete renewal process.As a consequence,
the subsequence (M

n
( )
)
n0
forms again an IRF of i.i.d.Lipschitz maps which further is
strictly contractive because,by construction,
l(F
1:
1
( )
)  < 1:
For the associated backward iterations
^
M
x

n
( )
= F
1:
n
( )
(x),inequality (2.1) hence takes
the very strong form
d(
^
M
x

n+m
( )
;
^
M
x

n
( )
) 
n
d(F

n+1
( ):
n+m
( )
(x);x) (2.4)
for all n;m  0 and x 2 X and suggests the following procedure to prove convergence
results for (M
n
)
n0
and its associated sequence of backward iterations:
Step 1.Given a set of conditions,nd out what kind of results hold true for the strictly
contractive IRF (M

n
( )
)
n0
for any 2 (0;1).
Step 2.Analyze the excursions of (M
n
)
n0
between two successive ladder epochs 
k
( )
and 
k+1
( ) and adjust the results with respect to (M
n
)
n0
if necessary.
The stability results in this section are taken from Alsmeyer and Fuh (2001).They
focus on estimates for d(
^
M
1
;
^
M
n
) under P
x
,x 2 X,and d(M
x
n
;M
y
n
) for x;y 2 X.The
latter distance may be viewed as the coupling rate of the forward iterations at time n when
started at dierent values x and y.The two sets of conditions we will consider are that,
for some p > 0 and some x
0
2 X,either
Elog
p+1
(1 +L
1
) < 1 and Elog
p+1
(1 +d(F
1
(x
0
);x
0
)) < 1 (2.5)
or
EL
p
1
< 1 and Ed(F
1
(x
0
);x
0
)
p
< 1 (2.6)
5
holds.Two major conclusions will concern the distance of P
n
(x;) for x 2 X and  in the
Prokhorov metric associated with d.Following Diaconis and Freedman (1999),the latter
is also denoted d and dened,for two probability measures 
1
;
2
on X,as the inmum
over all   0 such that

1
(B) < 
2
(B

) + and 
2
(B) < 
1
(B

) +
for all B 2 B(X),where B

:= fx 2 X:d(x;y) <  for some y 2 Bg.We will show that,
for all x 2 X and n  0,
d(P
n
(x;);)  A
x
(n +1)
p
;(2.7)
if (2.5) holds,and
d(P
n
(x;);)  A
x
r
n
(2.8)
for some r 2 (0;1) not depending on x and n,if (2.6) is true.
Now let 
1
( ) be as dened in (2.3) for 2 (0;1),i.e.

1
( ):= inffn  1:L
1:n
 g = inf

n  1:
n
X
k=1
log L
k
 log

:(2.9)
Providing Elog
+
L
1
< 0,a condition which will always be in force throughout,
1
( ) is an
a.s.nite rst passage time with nite mean ( ).It has also nite variance ( )
2
,say,if
Elog(1 +L
1
)
2
< 1.Let further
log

:= inf
2(0;1)
log
( )
:(2.10)
If Ej log L
1
j < 1,then it is well known from renewal theory,that
log
Elog L
1
 ( ) 
log
Elog L
1
(1 +o(1)) ( !0):(2.11)
It is now easily checked that in this case
log

= lim
#0
log
( )
= Elog L
1
:(2.12)
Theorem 2.1.Given an IRF (M
n
)
n0
of i.i.d.Lipschitz maps,suppose
Elog
+
L
1
< 0 and Elog
+
d(F
1
(x
0
);x
0
) < 1 (2.13)
6
for some x
0
2 X.Then the following assertions hold:
(a)
^
M
n
converges a.s.to a random element
^
M
1
with distribution  which does not depend
on the initial distribution.
(b) For each 2 (

;1),lim
n!1
P
x
(d(
^
M
1
;
^
M
n
) >
n
) = 0 for all x 2 X.
(c) M
n
converges in distribution to  under every P
x
;x 2 X.
(d)  is the unique stationary distribution of (M
n
)
n0
and (
^
M
n
)
n0
a stationary sequence
under P

.
(e) (M
n
)
n0
is ergodic under P

.
Theorem 2.2.Given the situation of Theorem 2.1 and additionally condition (2.5) for
some p > 0,the following assertions hold:
(a) For each 2 (

;1),
X
n1
n
p1
P
x
(d(
^
M
1
;
^
M
n
) >
n
)  c


1 +log
p
(1 +d(x;x
0
))

and
lim
n!1
n
p
P
x
(d(
^
M
1
;
^
M
n
) >
n
) = 0
for all x 2 X and some c

2 (0;1).
(b) For each 2 (

;1),
limsup
n!1
n
p1
p

1
n
log d(
^
M
1
;
^
M
n
) log

 0 P
x
-a.s.
for all x 2 X.In case 0 < p  1 this remains true for =

.
(c) If p = 1,then lim
n!1

n
d(
^
M
1
;
^
M
n
) = 0 P
x
-a.s.for all x 2 X and all 2 (

;1).
(d) d(P
n
(x;);)  A
x
(n + 1)
p
for all n  0,x 2 X and a positive constant A
x
of the
form maxfA;2d(x;x
0
)g,where A does neither depend on x nor on n.
(e)
R
1
0
log
p
(1 +d(x;x
0
)) (dx) =
R
1
0
pt
p1
(x:log(1 +d(x;x
0
)) > t) dt < 1.
Theorem 2.3.Given the situation of Theorem 2.1 and additionally condition (2.6) for
some p > 0,the following assertions hold:
(a) For each 2 (

;1),
lim
n!1

n

P
x
(d(
^
M
1
;
^
M
n
) >
n
) = 0
for all x 2 X and some 

2 (0;1).
(b) There exists  > 0) such that for each q 2 (0;),
lim
n!1
sup
x2X

n
q
(1 +d(x;x
0
))
q
E
x
d(
^
M
1
;
^
M
n
)
q
= 0
7
for some 
q
2 (0;1).The same holds true for q =  with 
q
= 1.
(c) d(P
n
(x;);)  A
x
r
n
for all n  0,some r 2 (0;1) and a constant A
x
of the form
maxfA;d(x;x
0
)g.The constants r and A do not depend on x nor n.
(d)
R
X
d(x;x
0
)

(dx) =
R
1
0
t
1
(x:d(x;x
0
) > t) dt < 1 for some  > 0.
Let us mention that the constants c

;

;
q
;;A
x
and r in the previous theorems gen-
erally further depend on p > 0 of the supposed respective moment condition.
The assertions of the previous two theorems on d(
^
M
1
;
^
M
n
) are easily translated into
similar results on d(M
x
n
;M
y
n
) for the forward iterations started at dierent values x and
y.Essentially,this only takes the observation that (M
x
n
;M
y
n
) and (
^
M
x
n
;
^
M
y
n
) are identically
distributed for all x;y 2 X and n  0 and that
d(
^
M
x
n
;
^
M
y
n
)  d(
^
M
x
0
1
;
^
M
x
n
) + d(
^
M
x
0
1
;
^
M
y
n
):
We summarize the results in the following two corollaries.
Corollary 2.1.Given the situation of Theorem 2.2,the following assertions hold:
(a) For each 2 (

;1),
X
n1
n
p1
P(d(M
x
n
;M
y
n
) >
n
)  c


1 +log
p
(1 +d(x;x
0
)) +log
p
(1 +d(y;x
0
))

and
lim
n!1
n
p
P(d(M
x
n
;M
y
n
) >
n
) = 0
for all x;y 2 X and some c

2 (0;1).
(b) For each 2 (

;1),
limsup
n!1
n
p1
p

1
n
log d(M
x
n
;M
y
n
) log

 0 a.s.
for all x;y 2 X.In case 0 < p  1 this remains true for =

.
(c) If p = 1,then lim
n!1

n
d(M
x
n
;M
y
n
) = 0 a.s.for all x;y 2 X and all 2 (

;1).
Corollary 2.2.Given the situation of Theorem 2.3,the following assertions hold:
(a) For each 2 (

;1),
lim
n!1

n

P(d(M
x
n
;M
y
n
) >
n
) = 0
for all x;y 2 X and some 

2 (0;1).
(b) There exists  > 0 such that for each q 2 (0;),
lim
n!1
sup
x;y2X

n
q
(1 +d(x;x
0
) d(y;x
0
)
q
E
x
d(M
x
n
;M
y
n
)
q
= 0
for some 
q
2 (0;1).The same holds true for q =  with 
q
= 1.
8
3 Central limit theoremand quick convergence:Pois-
son equation approach
In this section we show that a continuous functions obtained by iterated random functions
converge to a standard normal distribution.The machinery which we develop to prove this
result rests on the stability theory developed in Section 2.These techniques are extremely
appealing as well as powerful,and can lead to much further insight into asymptotic behavior
of the iterated random functions.Here we will focus on two results:central limit theorem
and quick convergence.
Let g 2 L
2
0
() be a square integrable function with mean 0,i.e.
Z
X
g d = 0 and kgk
2
2
=
Z
X
g
2
d < 1:(3.1)
Consider the sequence
S
n
(g):= g(M
1
) +   +g(M
n
);n  1;(3.2)
which may be viewed as a Markov random walk with driving chain (M
n
)
n0
.By construct-
ing a solution h 2 L
2
() to the Poisson equation
h = g +Ph;(3.3)
where Ph(x):=
R
X
h(y) P(x;dy),and a subsequent decomposition of S
n
(g) into a mar-
tingale and a stochastically bounded sequence,Benda (1998) showed that S
n
(g)=
p
n is
asymptotically normal as n!1under P
x
for -almost all x 2 X,if g 2 L
Lip
(X;R),
EL
2
1
< 1 and Ed(F
1
(x
0
);x
0
)
2
< 1:(3.4)
It was observed by Wu and Woodroofe (2000) that these conditions may be relaxed if the
integrability assumption on g is slightly strengthened to g 2 L
2
0
()\L
r
() for some r > 2.
Their further assumptions are Elog
+
L
1
< 0,(2.6) and a -square integrability condition
on a certain local Lipschitz constant for g with respect to a attened metric d.The main
point is that it allows discontinuous g,for instance suitable indicator functions.A main
purpose in this section is to summarize Benda (1998),and Wu and Woodroofe's (2000)
results for the asymptotic normality of S
n
(g)=
p
n,and apply the results from Alsmeyer
(1990),and Fuh and Zhang (2000) for quick convergence of n
1
S
n
(g) to 0.As to the above
mentioned local Lipschitz constant for g,we will show that its integrability (instead of
square integrability) with respect to  suces.
We will further give sucient conditions for the -quick convergence of n
1
S
n
(g) to 0.
The concept of quick convergence was introduced by Strassen (1967).A sequence (Z
n
)
n0
is said to converge -quickly ( > 0) to a constant  if
E( supfn  0:jZ
n
j "g)

< 1 (3.5)
9
for all"> 0.Plainly,Z
n
! -quickly implies Z
n
! a.s.Put N
"
:= supfn  0:
jZ
n
j "g.Since (3.5) then reads EN

"
< 1 for all"> 0,the -quick convergence of
Z
n
to  holds if,and only if,
X
n1
n
1
P(N
"
 n) =
X
n1
n
1
P

sup
jn
jZ
j
j "

< 1 (3.6)
for all"> 0.
Our results will be stated in Theorem 3.1 and Corollaries 3.1 and 3.2.As in Benda
(1998),and Wu and Woodroofe (2000),the bulk of the work is to verify the existence
of a solution to the Poisson equation (3.3).This is the content of Theorem 3.1.The
asymptotic normality of S
n
(g)=
p
n (Corollary 3.1) then follows as in Benda (1998) by
applying a martingale central limit theorem;while the -quick convergence of S
n
(g)=n to
0 for suitable  (Corollary 2) will be obtained by using a result from Alsmeyer (1990),and
Fuh and Zhang (2000).
Some preliminary considerations are needed before presenting our results:
A.Flattening the metric.In order for solving the Poisson equation (3.3) for a given
function g,the particular given complete separable metric d on the space X will not be
essential but may rather be altered to our convenience.This has been observed by Wu and
Woodroofe (2000) who therefore consider attened variations of d obtained by composing
d with an arbitrary nondecreasing,concave function :[0;1)![0;1) with (0) = 0
and (t) > 0 for all t > 0.Let be the collection of all such functions.It is easy to see
that d

:=  d is again a complete metric.Possible choices from include
p
(t):= t
p
for any 0 < p  1 as well as

(t):=
t
1+t
.The latter choice leads to a bounded metric d


satisfying
d

(x;y)  d(x;y)  2d

(x;y) (3.7)
for all x;y 2 f(u;v) 2 X
2
:d(u;v)  1g.This shows that the behavior of d and d

 is
essentially the same for small values.Notice further that

 2 with
lim
t#0


 (t)
(t)
= 1 (3.8)
for all 2 .
B.Integrable local Lipschitz constant.One can further relax the global Lipschitz con-
tinuity of g needed in Benda (1998) and instead be satised with a -almost sure local
Lipschitz continuity (with respect to a attened metric d

) in combination with an inte-
grability condition on the local Lipschitz constant.To make this precise,let 2 .For a
10
measurable g:X!R,dene its local Lipschitz constant at x 2 X with respect to d

as
l

(g;x) = sup
y:0<d(x;y)1
jg(x) g(y)j
d

(x;y)
(3.9)
and,for r 2 [1;1],
kgk
r;
= kl

(g;)k
r
;(3.10)
where kk
r
denotes the usual normon L
r
().It is easily seen that kk
r;
denes a (pseudo-)
norm on the space
L
r
;0
() =
n
g 2 L
r
():
Z
X
g(x) (dx) = 0 and kgk
r;
< 1
o
(3.11)
and that L
r
;0
() = L
r


 ;0
() with
1
2
k  k
r;


 k  k
r;
 k  k
r;


on this space (use
(3.7) and (3.8)).Possibly after replacing with

 ,we may therefore always assume
be bounded when dealing with elements of L
r
;0
().
Plainly,all global Lipschitz functions,i.e.all g 2 L
Lip
(X;R),are elements of L
r
;0
()
for any 2 .However,g need not be continuous in order for being an element of some
L
r
;0
().As pointed out in Wu and Woodroofe (2000),if g = 1
B
is the indicator function
of some B 2 B(X),then
l

(1
B
;x) =
1
d

(@B;x)
;(3.12)
where @B denotes the topological boundary of B and d

(@B;x):= inf
y2@B
d(x;y).They
further show that,if B(x;R) = fy:jx yj  Rg is the closed R-ball with center x 2 X,
(t) = t
1=4
and  denotes Lebesgue measure,then,for each x 2 X,1
B(x;R)
(B(x;R)) 2
L
2
;0
() for -almost all R > 0,see their Theorem 3.
Theorem 3.1.Let r 2 (1;1] with conjugate number s  1,given by
1
r
+
1
s
= 1.Let also
2 be satisfying
Z
1
0
(t)
t
dt < 1:(3.13)
If Elog
+
L
1
< 0 and (2.6) holds for some p > s,then each g 2 L
r
()\L
1
;0
() admits a
solution h 2 L
r
0
() to the Poisson equation h = g +Ph.
We remark that all examples of 2 mentioned in Section 8 satisfy condition (2.6).
With the help of the Poisson equation,one may write
S
n
(g) = W
n
+R
n
;n  1 (3.14)
11
where
W
n
:=
n
X
k=1
(h(M
k
) Ph(M
k1
));n  0 (3.15)
forms a zero mean martingale under P

with stationary increments from L
r
() and
R
n
:= Ph(M
0
) Ph(M
n
);n  1 (3.16)
is stochastically L
r
-bounded under P

in the sense that
sup
n1
P

(jR
n
j > t)  2P

(Z > t) (3.17)
for all t > 0 and some Z 2 L
r
();take any random variable Z  0 with distribution
function P

(jPh(M
0
)j  t=2) for t  0
In the stationary regime,that is under P

,the following central limit theorem now fol-
lows exactly as in Benda (1998) from Theorem 3.1 and a martingale central limit theorem.
However,an additional argument is needed to show that the same result holds true under
P
x
for -almost all x 2 X.While this extension is not considered in Wu and Woodroofe
(2000),its proof in Benda (1998) fails to work here because it draws on the continuity of
g and a moment condition like (2.5).
Corollary 3.1.Given the assumptions of Theorem 3.1 with r  2 and p > s,S
n
(g)=
p
n
is asymptotically normal with mean 0 and variance s
2
(g):=
R
(h
2
(Ph)
2
) d under P

as
well as under P
x
for -almost all x 2 X.
So if g 2 L
2
()\L
1
;0
() we need moment condition (2.6) for some p > 2,to conclude
asymptotic normality of S
n
(g)=
p
n.By using the result of the existence of solution for the
Poisson equation (3.3),the following corollary is taken from Theorem 2 in Fuh and Zhang
(2000).
Corollary 3.2.Given the assumptions of Theorem 3.1 with p > s > 1,S
n
(g)=n converges
-quickly to 0 for  = r 1,i.e.
X
n1
n
r2
P


sup
jn
j
1
jS
j
(g)j "

< 1 (3.18)
for all"> 0.
12
4 Harris recurrence of iterated random functions
Let M
n
= F(
n
;M
n1
);n  0 be the iterated random functions dened in Section 1.By
the ergodic theorem as shown in Theorem 2.1(e),the latter implies for each B 2 B(X)
lim
n!1
1
n
n
X
k=1
1
B
(M
k
) = (B) (4.1)
P

-a.s.and thus also P
x
-a.s.for -almost all x 2 X.Hence,if (B) > 0,then
P
x
(M
n
2 B i.o.) = 1 (4.2)
for -almost all x 2 X and we would like to conclude that every -positive set B is
recurrent.Unfortunately,the -null set of x 2 X for which (4.2) fails to hold in general
depends on the set B.On the other hand,if it does not,we infer the -irreducibility of the
chain (M
n
)
n0
on some H with (H) = 1 and then,because of (4.2) for each -positive B,
further its Harris recurrence on H.Providing additionally aperiodicity,this in turn implies
that P
x
(M
n
2 ) converges to  in total variation for every x 2 H which,of course,is a
much stronger conclusion than Elton's result appeared in Theorem 2.1.With regard to a
further analysis of IRF,for instance the rate of convergence towards stationarity (in total
variation),it also gives access to the highly developed theory for irreducible and Harris
recurrent Markov chains on general state spaces.
Given an IRF of i.i.d.Lipschitz maps satisfying the conditions of an a.s.negative
Liapunov exponent and condition (2.13),two questions will be considered in this section
and discussed in various examples in Section 8.we state a sucient condition for H = X
in Theorem 4.1.These conditions are quite often easy to check in applications when the
stationary distribution is known to some extent.See Section 8 for several examples.We also
deals with the convergence towards stationarity for Harris recurrent IRF.Under additional
moment conditions on L
1
and d(F
1
(x
0
);x
0
),we will show w-regularity and w-ergodicity
for suitable functions w in Theorem 4.2,and provide polynomial as well as geometric rates
of convergence towards stationarity in Theorem 4.3.Theorems 4.1 to 4.3 are taken from
Alsmeyer (2003).
A set B 2 B(X) is called -full,if (B) = 1,and P-absorbing,if P(x;B) = 1 for
all x 2 X.For the denitions of irreducibility,Harris recurrence and related notions for
Markov chains on general state spaces not explicitly repeated here,we refers to the standard
monograph by Meyn and Tweedie (1993).If (M
n
)
n0
is a Harris chain on a set H,this set
is called a Harris set (for (M
n
)
n0
).It is well-known that in this case there always exists
a maximal absorbing set with this property,called maximal Harris set.Our next theorem
contains some information on when this latter set is the whole space X.Let int(B) denote
the interior of a set B 2 B(X).
13
Theorem 4.1.Suppose (M
n
)
n0
is an IRF of i.i.d.Lipschitz maps which has a.s.neg-
ative Liapunov exponent l

and satises (2.13).Let  denote its stationary distribution.
Suppose (M
n
)
n0
is Harris recurrent with maximal Harris set H.Then the following as-
sertions hold:
(a) Either (int(H)) = 0,or H = X.
(b) If there exists a -positive set X
0
and a -nite measure  on (X;B(X
0
)) such that each
P(x;),x 2 X
0
,possesses a -continuous component.Furthermore,if X with (int(X
0
)) >
0 and if int(supp ) 6=;,then H = X.
As already mentioned above,Theorem 4.1 implies,by invoking the ergodic theorem for
aperiodic,positive Harris chains (see Meyn and Tweedie (1993),Theorem 13.0.1) that
lim
n!1
kP
x
(M
n
2 ) k = 0 (4.3)
for all x 2 H where k  k denotes the total variation distance.A weaker metric considered
in Diaconis and Freedman (1999) and Alsmeyer and Fuh (2001) is the Prokhorov metric
associated with d.See also Theorems 2.2 and 2.3 in Section 2.
If (M
n
)
n0
is Harris recurrent,it is natural to ask in view of Theorem 2.2(d) and
Theorem 2.3(c),whether or not similar conclusions hold when replacing the Prokhorov
distance with the total variation distance.The positive answer is provided in Theorem 4.3
for the case H = X and under the additional assumptions that the support of the stationary
distribution  has nonempty interior.
Weaker conclusions,stated as Theorem 4.2,can considerably easier concerning the w-
regularity of (M
n
)
n0
.Following Meyn and Tweedie (1993),a set C 2 B(X) is called
w-regular for a function w:X![1;1) if for each -positive B 2 B(X)
sup
x2C
E
x

%(B)1
X
n=0
w(M
n
)

< 1;
where %(B):= inffn  1:M
n
2 Bg.(M
n
)
n0
is called w-regular on a P-absorbing set H if
it is -irreducible and H admits a countable cover of w-regular sets.Dening the w-norm
kk
w
for a signed measure  as
kk
w
:= sup
jgjw
j(g)j;(g):=
Z
g d:
(M
n
)
n0
is called w-ergodic on H if it is positive Harris on H with invariant distribution 
satisfying (f) < 1and if
lim
n!1
kP
n
(x;) k
w
= 0
14
for all x 2 H.Now put
w(x):= 1 +log
p
(1 +d(x;x
0
)) (4.4)
providing (2.5) for p > 0,and
w(x):= 1 +d(x;x
0
)

(4.5)
providing (2.6) for p > 0,and 0 <   p such that
R
X
d(x;x
0
)

d(x) < 1.By using Meyn
and Tweedie's main result on w-regularity,the following result is now immediate and hence
stated without proof.
Theorem 4.2.Let (M
n
)
n0
be an IRS of i.i.d.Lipschitz maps satisfying Elton's con-
ditions.Suppose further that (M
n
)
n0
is an aperiodic positive Harris chain on a -full,
absorbing set H and that either (2.5) or (2.6) holds for some p > 0.Then H may be chosen
such that (M
n
)
n0
is w-regular and w-ergodic on H with w according to (4.4),respectively
(4.5).
It is to be understood that the Harris set H on which (M
n
)
n0
is w-regular need not
be the maximal Harris set.
Theorem4.3.Let (M
n
)
n0
be an IRF of i.i.d.Lipschitz maps with a.s.negative Liapunov
exponent l

and stationary distribution .Suppose further that (M
n
)
n0
is a positive Harris
chain on whole X and that int(supp ) 6=;.Then the following assertions hold:
(a) If (M
n
)
n0
satises (2.5) for some p > 0,then
X
n1
n
p1
kP
x
(M
n
2 ) k < 1 (4.6)
as well as
lim
n!1
n
p
kP
x
(M
n
2 ) k = 0:(4.7)
for all x 2 X.
(b) If (M
n
)
n0
satises (2.6) for some p > 0,then
X
n0
r
n
kP
x
(M
n
2 ) k
w
< 1 (4.8)
for all x 2 X and some r 2 (0;1) not depending on x 2 X,where w is dened as in (4.5).
15
5 Spectral decomposition and characteristic functions
of Markov random walks
It is shown,in Section 4,that the induced Markov chain (M
n
)
n0
of the iterated random
functions is Harris recurrent on a set H.Under the assumption of H = X,and moment
assumption (2.6),(M
n
)
n0
is w-geometric with w dened in (4.5).In this section,we
introduce the culminating form of the geometric ergodicity theorem,and show that such
convergence can be viewed as geometric convergence of an operator norm.That is,the
convergence is bounded independently of the starting point.In the following,we study
the spectral theory for uniform ergodic Markov chains with respect to a general norm,
and apply it to iterated random functions in the next two sections.The materials of this
section are similar to that of Fuh and Lai (2001),and Fuh and Lai (2003),we include here
for completeness.
Let f(X
n
;S
n
);n  0g be a Markov random walk on X  R
d
.For sake of notation,
denote P(x;A) = P(x;A  R
d
).For all transition probability kernels P(x;A);Q(x;A),
x 2 X,A 2 A and for all measurable functions h(x);x 2 X,dene Qh and PQ by
Qh(x) =
R
Q(x;dy)h(y) and PQ(x;A) =
R
P(x;dy)Q(y;A),respectively.
Let N be the Banach space of measurable functions h:X!C (:= set of complex
numbers) with norm khk < 1.We introduce the Banach space B of transition probability
kernels Q such that the operator norm jjQjj = supfjjQgjj;jjgjj  1g is nite.Two pro-
totypical norms used in the literature are the supnorm and the L
p
-norm for 1 < p < 1.
Another two commonly used norms in applications are the weighted variation norm and
the bounded Lipschitz norm,to be described as follows:
1.Let w:X![1;1) be a measurable function,dene for all measurable functions h,
a weighted variation norm
jjhjj
w
= sup
x2X
jh(x)j=w(x);(5.1)
and set N
w
= fh:jjhjj
w
< 1g.Corresponding norm in B
w
is of the form jjQjj
w
=
sup
x2X
R
jQj(x;dy)w(y)=w(x):
2.Let (X;d) be a metric space.For any continuous function h on X,the Lipschitz
seminorm is dened by jjhjj
L
:= sup
x6=y
jh(x)  h(y)j=d(x;y).Call the supremum norm
jjhjj
1
= sup
x2X
jh(x)j:Let the bounded Lipschitz norm
jjhjj
BL
:= jjhjj
L
+jjhjj
1
(5.2)
and N
BL
= fh:jjhjj
BL
< 1g.Here BL stands for\bounded Lipschitz".
Denote by P
n
(x;A) = P(X
n
2 AjX
0
= x),the transition probabilities over n steps.
The kernel P
n
is a n-fold power of P.Dene the Cesaro averages P
(n)
=
P
n
j=0
P
j
=n,where
P
0
= P
(0)
= I and I is the identity operator on B.
16
Denition 1 A Markov chain fX
n
;n  0g is said to be uniformly ergodic (or strongly
stable) with respect to a given norm jj  jj,if there exists a stochastic kernel  such that
P
(n)
! as n!1 in the induced operator norm in B.The Markov chain fX
n
;n  0g
is called w-uniformly ergodic in the case of weighted variation norm.
The Markov chain fX
n
;n  0g is assumed to be irreducible (with respect to some
measure on A),aperiodic and strongly stable.Theorem 1.1.of Kartashov (1996) leads
that P has a unique stationary projector ,in the sense of 
2
=  = P = P,and
(x;A) = (A) for all x 2 X and A 2 A.
The following assumptions will be used in this section.
C1.There exists a natural n,a measure  on A and a measurable function h on
A such that
R
(dx)h(x) > 0,(X) = 1,
R
(dx)h(x) > 0,and the kernel T(x;A) =
P
n
(x;A) h(x)(A) is nonnegative.
C2.sup
khk1
jjE[h(X
1
)jX
0
= x]jj < 1.
C3.sup
x
E
x
j
1
j
2
< 1and sup
khk1
jjE[j
1
j
r
h(X
1
)jX
0
= x]jj < 1for some r  3.
C4.Let  be an initial distribution of the Markov chain fX
n
;n  0g,assume that for
some r  1,
jjjj:= sup
jjhjj1
j
Z
x2X
h(x)E
x
j
1
j
r
(dx)j < 1:
Remarks:1.Condition C1 is a mixing condition on the Markov chain fX
n
;n  0g,
and it is satised for Harris recurrent Markov chain.An example on page 9 of Kartashov
(1996) shows that there exists an uniformly ergodic Markov chain with respect to a given
norm,which is not Harris recurrent.C2 is a condition to guarantee the operators dened
in (5.4)-(5.5) below to be bounded.C3 and C4 are moment conditions.We also note that
by making use a similar argument as that in Section 3 of Jensen (1987),the X
1
and 
1
appeared in C2-C4 can be relaxed to X
t
and 
t
,for some xed t > 1.
2.Theorem 2.2 and Corollary 2.1 of Kartashov (1996) shows that under Condition C1,
a Markov chain X with transition probability kernel P is uniformly ergodic with respect
to a given norm jj  jj if and only if there exist > 0 and 0 <  < 1 such that for all n  1
jjP
n
jj  
n
:(5.3)
When the Markov chain fX
n
;n  0g is w-uniformly ergodic,(5.3) is satised without
Condition C1.
For d 1 vectors ,dene the linear operators P

,P,

and Q on N by
(P

h)(x) =
Z
h(y)e
is
P(x;dy ds) = E[h(X
1
)e
iS
1
jX
0
= x];(5.4)
(Ph)(x) =
Z
h(y)P(x;dy ds) = E[h(X
1
)jX
0
= x];(5.5)
17


h = E

fh(X
0
)e
iS
1
g;Qh =
Z
h(y)(dy):(5.6)
Condition C2 ensures that P

and P are bounded linear operators on N,and (5.3) implies
that
kP
n
Qk = sup
h2N;khk1
kP
n
h Qhk  
n
:(5.7)
For a bounded linear operator T:N!N,the resolvent set is dened as fz 2 C:
(TzI)
1
existsg and (TzI)
1
is called the resolvent (when the inverse exists).From
(5.7),it follows that for z 6= 1 and jzj > ;
R(z):= Q=(z 1) +
1
X
n=0
(P
n
Q)=z
n+1
(5.8)
is well dened.Since R(z)(P zI) = I = (P zI)R(z),the resolvent of P is R(z).
Moreover,by C3 and an argument similar to the proof of Lemma 2.2 of Jensen (1987),
there exist K > 0 and  > 0 such that for jj  ;jz1j > (1)=6 and jzj > +(1)=6,
kP

Pk  Kjj;(5.9)
R

(z):=
1
X
n=0
R(z)f(P

P)R(z)g
n
is well dened:(5.10)
Since R

(z)(P

zI) = R

(z)f(P

P) +(PzI)g = I = (P

zI)R

(z);the resolvent
of P

is R

(z).
For jj   the spectrum (which is the complement of the resolvent set) of P

therefore
lies inside the two circles
C
1
= fz:jz 1j = (1 )=3g and C
2
= fz:jzj =  +(1 )=3g:(5.11)
Hence,by the spectral decomposition theorem (cf.Riesz and Sz-Nagy (1955),page 421),
N = N
1
() N
2
() and
Q

:=
1
2i
Z
C
1
R

(z)dz;I Q

:=
1
2i
Z
C
2
R

(z)dz (5.12)
are parallel projections of N onto the subspaces N
1
();N
2
();respectively.Moreover,by
an argument similar to the proof of Lemma 2.3 of Jensen (1987),there exists 0 <   
such that N
1
() is one-dimension for jj   and
sup
jj
kQ

Qk < 1:(5.13)
18
For jj  ,let () be the eigenvalue of P

with corresponding eigenspace N
1
().Since
Q

is the parallel projection onto the subspace N
1
() in the direction of N
2
();
P

Q

h = ()Q

h for h 2 N:(5.14)
Letting  denote the initial distribution of (X
0
;S
0
) and dening the operator 

by (5.6),
we then have for h 2 N;
E

fe
iS
n
h(X
n
)g = 

P
n

h = 

P
n

fQ

+(I Q

)gh (5.15)
= 
n
()

Q

h +

P
n

(I Q

)h:
Suppose C4 also holds.An argument similar to the proof of Lemma 2.4 of Jensen
(1987) shows that there exist K

> 0 and 0 < 

<  such that for jj  

,
k

P
n

(I Q

)hk  K

khkjjf(1 +2)=3g
n
:(5.16)
We next consider the summand 
n
()

Q

h in (5.15).Suppose that C3 holds with r  3
and let [r] denote the integer part of r.Then analogous to Lemma 2.5 of Jensen (1987),
() has the Taylor expansion
() = 1 +
X
(j
1
;;j
d
):1j
1
++j
d
[r]
i
j
1
++j
d

j
1
;;j
d

j
1
1
   
j
d
d
=(j
1
!   j
d
!) +() (5.17)
in some neighborhood of the origin,where () = O(jj
r
) as !0.Assume furthermore
that C4 holds.Then,analogous to Lemma 2.6 of Jensen (1987),

Q

h
1
has continuous
partial derivatives of order [r]  2 in some neighborhood of the origin.Moreover,there
exist constants K and 0 <  < 

such that for jj <  and l  r 2;we have
d
l
d
l


Q

h =
r3
X
j=1
1
(j 1)!

j1

j
+cKjj
r2l
;(5.18)
where jcj  1,and 
j
;j = 0;1;  ;r 3,are constants with 
0
= 1.

P
n

(I Q

)h has
continuous partial derivatives of order [r] in some neighborhood of the origin.Moreover,
the norm of any such partial derivatives converges to 0 geometrically fast.In summary,we
have
Theorem5.1 Let f(X
n
;S
n
);n  0g be the Markov random walk dened as (5.1),satisfying
Conditions C1-C4.Then,there exists a  > 0 such that for all  2 R
d
with jj < ,we
have
P

= ()Q

+P

(I Q

);(5.19)
19
and
(i) () is the unique eigenvalue of the maximal modulus of P

;
(ii) Q

is a rank-one projection such that Q

(I Q

) = (I Q

)Q

= 0;
(iii) the mappings ();Q

and I Q

are analytic for jj < ;
(iv) j()j >
2+
3
and for each p 2 N;the set of positive integers,there exists c > 0 such
that for each n 2 N,
k
d
p
d
p
P
n

(I Q

)k  c(
1 +2
3
)
n
:
(v) dening
j
= lim
n!1
(1=n)E
x
log jjT
(j)
n
jj as the upper Liapunov exponent,it follows
that

j
=
@()
@
j
j
=0
=
Z
E
x
(log kM
(j)
1
uk=jjujj)dm(x;u):
Remarks:1.Under C1-C3 with respect to a norm k  k together with the assumption
jjE
x
fjS
1
j
r
gjj < 1;it can be shown by an argument similar to the proof of Lemma 2.7 of
Jensen (1987) that 

P
n

(I Q

)h has continuous partial derivatives of order [r] in some
neighborhood of the origin.Moreover analogous to (5.16),the norm of any such partial
derivatives converges to 0 geometrically fast,by an argument similar to the proof of Lemma
2.4 of Jensen (1987).
2.For the special case 
t
= g(X
t
) with g:X!R;the representation (5.15)-(5.17) of
the characteristic function E

(e
iS
n
) was rst obtained by Nagaev (1957) under the uniform
ergodicity condition
sup
A;x;y
jP(X
m
2 AjX
0
= x) P(X
m
2 AjX
0
= y)j < 1 for some m 1:(5.20)
As noted by Nagaev,(5.20) implies the existence of a stationary distribution  and a
uniform geometric rate of converge to the stationary distribution
sup
A;x
jP(X
n
2 AjX
0
= x) (A)j  
n
;(5.21)
for some > 0;0 <  < 1 and all n  1.Jensen (1987) rst claried Nagaev's arguments
and then,considered the more general case 
t
= g(X
t1
;X
t
) with g:X  X!R.
Noting that the moment condition sup
x
E[jg(x;X
1
)j
r
jX
0
= x] < 1 required in Nagaev's
arguments for such 
t
is not satised in most cases where g depends on both X
t1
and X
t
,
he extended Nagaev's representation (5.15)-(5.17) to the case where (5.20) holds (also in
the case of L
p
-norm for 1 < p < 1) and
sup
x
E[jg(X
m
;X
m+1
)j
r
jX
0
= x] < 1 for some m 1 and r  3;(5.22)
Z
E[jg(X
t1
;X
t
)j
r2
jX
0
= x](dx) < 1for 1  t  m2:(5.23)
20
Instead of introducing a delay m as in (5.22) and (5.23),we broaden the scope of appli-
cability of Nagaev's representation theory by using general norm not only in the moment
condition C3 but also in the ergodicity condition.In Sections 6,7 and 8,the usefulness of
this idea is discussed further and illustrated with examples of iterated random functions.
6 Rate of convergence theorems:Asymptotic expan-
sion
We show in Section 4 that iterated random functions,under some regularity conditions,
satisfying the properties of Harris recurrent and w-ergodic.Sucient condition in Theorem
4.1 is also given for -irreducible.Section 5 provides the spectral theory for irreducible
Markov operator.In this section,we apply the results in Harris recurrent and strong stable
Markov chains,to have Edgeworth expansion for iterated random functions.Edgeworth
expansion for irreducible Harris recurrent Markov chains can be found in Hipp (1985),
Malinovskii (1987) and Jensen (1989);while Edgeworth expansion for strong stable Markov
chains is in Fuh and Lai (2003).
By the assumption of Harris recurrent,we can assume,without loss of generality,that
the state space X has an atom A
0
,that is (A
0
) > 0 and P(x;) = P(y;) for all x;y 2 A
0
.
We may then dene the stopping times T
0
= inffn  0jg(M
n
) 2 A
0
g,T
k
= inffn >
T
k1
jg(M
n
) 2 A
0
g for k  1 and 
k
= T
k
 T
k1
.Also we let 0 denote a xed point in
A
0
.Adapted the notations from Section 3.For j = 1;  ;d,let g
j
2 L
2
0
() be a square
integrable function with mean 0,i.e.
Z
X
g
j
d = 0 and kg
j
k
2
2
=
Z
X
g
2
j
d < 1:
Denote g = (g
1
;  ;g
d
) and consider the sequence
S
n
:= S
n
(g):= g(M
1
) +   +g(M
n
);n  1;
which may be viewed as a Markov random walk with driving chain (M
n
)
n0
.We want to
make asymptotic expansion of the distribution of the sum S
n
(g).For this we dene the
random variable
Z
k
=
T
k
X
j=T
k1
+1
g(M
j
) (6.1)
for k = 1;2;  .The uniform Cramer condition for (Z
1
;
1
) under P
0
states that for any
c > 0 there exists a  < 1 such that
jE
0
fexp(iu
0
Z
1
+iv
1
)gj <  (6.2)
21
for all v 2 R and all u 2 R
d
with kuk > c.We dene a uniformity class B
c
of Borel sets
in the following way,
B
c
= fB 2 B
d
jf(@B)
"
g < c"for all"> 0g;(6.3)
where  is the standard normal distribution in R
d
and (@B)
"
= fB(x;")jx 2 B;B(x;") 
B
c
g with B(x;") a ball centered at x and with radius".
The following theorem is taken from Theorem 1 of Jensen (1989).
Theorem 6.1 Suppose (M
n
)
n0
is an IRF of i.i.d.Lipschitz maps which has a.s.negative
Liapunov exponent l

and satises (2.13).Let  denote its stationary distribution.Suppose
there exists a -positive set X
0
and a -nite measure  on (X;B(X)) such that each P(x;),
x 2 X
0
,possesses a -continuous component.Furthermore,if X with (int(X
0
)) > 0 and
if int(supp ) 6=;.Assume further that A
0
is the positive Harris recurrent atom and let
0 2 A
0
.Let  be the initial distribution of M
0
.Assume for some s  3 the moment
conditions
E

(T
s2
0
) < 1 E

(
T
0
X
j=0
jg(M
j
)j)
s2
) < 1 (6.4)
E
0
(
s
1
) < 1 E
0
(
T
1
X
j=1
jg(M
j
)j)
s
) < 1;(6.5)
and assume that under P the covariance of (Z
1
;
1
) is positive denite and that (Z
1
;
1
)
satises the uniform Cramer condition (6.2).Then
P(
1
p
n
n
X
j=1
fg(M
j
) (g)g 2 B) =
Z
B
'
~

(x)
s3
X
r=0
n
r=2
q
r
(x)dx +O(n
(s2)=2
)
uniformly for B 2 B
c
.Here'
~

is the density of the normal distribution with mean zero
and covariance
~
,q
r
is a polynomial in x and
~
 = E

fg(M
1
) (g)g
0
fg(M
0
) (g))g +C
where
C =
1
X
n=2
E

fg(M
1
) (g)g
0
fg(M
n
) (g))g:
In the second part of this section,we will introduce the Edgeworth expansion for strong
stable Markov chains,and then apply it to additive functional of IRF.As in (5.3),the
22
Markov chain fX
n
;n  0g is geometrically mixing in the sense that there exist > 0 and
0 <  < 1 such that for all x 2 X;k  0 and n  1 and for all real-valued measurable
functions g;h with g;h 2 N,
jjE
x
fg(X
k
)h(X
k+n
)g fE
x
g(X
k
)gfE
x
h(X
k+n
)gjj  
n
:(6.6)
Let ~g;
~
h be real-valued measurable functions on X X.Since E
x
~
h(X
k
;X
k+1
) = E
x
h(X
k
),
where h(z) = E
z
f
~
h(z;X
1
)g,the same proof as that of Theorem 2.2 of Kartashov (1996)
can be used to show that there exist
1
> 0 and 0 < 
1
< 1 such that for all x 2 X;k  0
and n  1 and for all measurable ~g;
~
h with sup
y
~g
2
(x;y) 2 N and sup
y
~
h
2
(x;y) 2 N;
jjE
x
f~g(X
k
;X
k+1
)
~
h(X
k+n
;X
k+n+1
)g fE
x
g(X
k
)gfE
x
h(X
k+n
)gjj 
1

n1
1
:(6.7)
To establish asymptotic expansion for Markov randomwalks,we shall make use of (6.7)
in conjunction with the following extension of conditional Cramer's (strongly nonlattice)
condition (cf.Fuh and Lai (2003)):There exists m 1 such that
limsup
jj!1
jEfexp(i  S
m
)jX
0
;X
m
gj < 1:(6.8)
Next,we assume the strong mixing condition hold.
jE

f~g(X
k
;X
k+1
)
~
h(X
k+n
;X
k+n+1
)g fE

g(X
k
)gfE

h(X
k+n
)gj 
1

n1
1
:(6.9)
Remarks:1.When the norm is the weighted variation norm,we do not need the
strong mixing condition (6.9),cf.page 653 of Fuh and Lai (2001).
2.In the special case where S
1
is independent of (X
0
;X
1
) so that the kernel P(x;AB)
in (1.1) can be factorized as P
1
(x;A)P
2
(B),(6.8) reduces to the condition that the random
variable S
1
is strongly nonlattice:
limsup
jj!1
j
Z
exp(i  s)P
2
(ds)j < 1:
In addition to (6.8) and (6.9),we shall assume that C1-C4 in Section 5 hold for some
integer r  3.Let
 =
Z
E
x
S
1
(dx) (= 
0
(0));(6.10)
and let V = (@
2
()=@
i
@
j
j
=0
)
1i;jd
be the Hessian matrix of  at 0.By Theorem 5.1,
lim
n!1
n
1
E

f(S
n
n)(S
n
n)
0
g = V;(6.11)
where
0
denotes the transpose.
23
Let
n
() = E

(e
iS
n
) and let h
1
2 N be the constant function h
1
 1.Then by
Proposition 1 and the fact that 

Q

h
1
has continuous partial derivatives of order r  2
in some neighborhood of  = 0,we have the Taylor series expansion of
n
(=
p
n) for
j=
p
nj "(some suciently small positive number):

n
(=
p
n) = f1 +
r2
X
j=1
n
j=2
~
j
(i)ge

0
V =2
+o(n
(r2)=2
);(6.12)
where ~
j
(i) is a polynomial in i of degree 3j,whose coecients are smooth function
of the partial derivatives of () at  = 0 up to the order j + 2 and those of 

Q

h
1
at  = 0 up to the order j.Letting D denote the d  1 vector whose kth component
is the partial dierentiation operator D
k
with respect to the kth coordinate,dene the
dierential operator ~
j
(D).As in the case of sums of i.i.d.zero-mean random vectors (cf.
Bhattacharya and Rao,1976),we obtain an Edgeworth expansion for the\formal density"
of the distribution of S
n
by replacing the ~
j
(i) and e

0
V =2
in (6.12) by ~
j
(D) and

V
(y),respectively,where 
V
is the density function of the d-variate normal distribution
with mean 0 and covariance matrix V.The following two theorems are taken from Fuh
and Lai (2003).
Theorem6.2 Let r  3 be an integer.Assume C1-C4,(6.8) and (6.9) hold (or C1-C4,and
(6.8) hold in the case of w-uniformly ergodic).Let 
j;V
= ~
j
(D)
V
for j = 1;:::;r 2.
For 0 <   1 and c > 0,let B
;c
be the class of all Borel subsets B of R
d
such that
R
(@B)
"

V
(y)dy  c"

for every"> 0,where @B denotes the boundary of B and (@B)
"
denotes its"-neighborhood.Then
sup
B2B
;c
jP

f(S
n
n)=
p
n 2 Bg 
Z
B
f
V
(y) +
r2
X
j=1
n
j=2

j;V
(y)gdyj = o(n
(r2)=2
):(6.13)
Next,we apply Theorems 6.2 to the case of iterated random functions.The following
theorem proves that M
n
is a strong stable Markov chain,under some moment conditions.
Theorem 6.3 Given an IRF (M
n
)
n0
of i.i.d.Lipschitz maps,suppose for some p > 0,
Elog
+
L
1
< 0;EL
p
1
< 1 and Ed(F
1
(x
0
);x
0
)
p
< 1 (6.14)
for some x
0
2 X.Under the assumptions of Theorem 6.1.Then (M
n
)
n0
is ergodic with
stationary distribution  and uniform ergodic with respect to the norm
khk
wl
:= khk
w
+khk
BL
(6.15)
:= sup
x2X
jh(x)j
1 +d(x
0
;x)
p
+sup
x6=y
jh(x) h(y)j
d(x;y)
q
;
24
for q 2 (0;p) and some x
0
2 X.Here,wl represents a combination of the weighted variation
norm with w(x) = 1 +d(x
0
;x)
p
and the bounded Lipschitz norm.Furthermore,there exist
> 0 and 0 < 
q
< 1 such that
kP
n
Qk
wl
= sup
khk=1
kP
n
h Qhk
wl
 
n
q
;(6.16)
where P;Q are dened as (5.4)-(5.6).
Under negative Liapunov assumption Elog
+
L
1
< 0 and moment conditions EL
2
1
<
1;Ed(F
1
(x
0
);x
0
)
2
< 1;for some x
0
2 X,Benda (1998) and Wu and Woodroofe (2000)
proved the central limit theorem for S
n
(g)=
p
n:=
P
n
t=1
g(M
t
)=
p
n in iterated random
functions.In this section,we study the asymptotic expansion for S
n
(g)=
p
n for a given
function g.Note that the method used in Benda (1998),and Wu and Woodroofe (2000) is
based on the idea of Poisson equation.And no irreducible assumption is needed in their
argument.Here,we apply Theorem4.1 for aperiodic,irreducible and uniformergodic (with
respect to the jj  jj
wl
norm) Markov chain that can be constructed as an iterated random
functions.
7 Renewal theorems
In this section,we summarize the results from Fuh and Lai (2001) to state d-dimensional
renewal theorems,with an estimate on the rate of convergence,for the Markov random
walks induced by the iterated random functions.Although the norm considered in Fuh
and Lai (2001) is the weighted variation norm (5.1),the spectral theory from Section 5 can
be used to generalize them to general norm without any diculty.
Let f(X
n
;S
n
);n  0g be the Markov random walk considered in Section 5.In the
one-dimensional case,let g:X  R!R.The classical Markov renewal theorem states
that under certain regularity conditions,
E

(
1
X
k=0
g(X
k
;b S
k
)) !
Z
X
Z
R
g(x;s)dsd(x)=
Z
X
E
x

1
d(x) (7.1)
as b!1.In Theorem 7.4 we establish rates of convergence for (7.1),generalizing Stone's
(1965) results in the i.i.d.case.While in Theorems 7.1-7.3 we establish the results to
multidimensional Markov renewal theory (for the case d > 1) with convergence rates,
where the Markov random walks are induced by iterated random functions.Our approach
uses the Fourier transform of the Markov transition operator and Schwartz's theory of
distributions,developed in Section 5.
25
When the increments 
n
are i.i.d.and strongly nonlattice,Stone (1965) and Carlsson
(1983) derived the rate of convergence of the renewal measure to its limit under moment
conditions on 
n
in the case d = 1,while Carlsson and Wainger (1982) and Keener (1990)
developed asymptotic expansion of the renewal measure in the case d > 1.To generalize
these results to Markov random walks,we rst recall the conditional Cramer's condition
(corresponding to conditional strongly nonlattice random vectors) dened in (6.8):There
exists m 1 such that
limsup
jj!1
jEfexp(i  S
m
)jX
0
;X
m
gj < 1:(7.2)
Here and in the sequel,we use column vectors to denote  2 R
d
,
0
to denote the transpose
of ,and jj to denote its Euclidean norm (
0
)
1=2
.
Let  = E


1
and V = lim
n!1
n
1
E

f(S
n
 n)(S
n
 n)
0
g,which are well dened
under C2 and C3.Let S
n;j
(or 
n;j
,
j
,
j
) denote the jth component of the d-dimensional
vector S
n
(or 
n
,,).Suppose 
1
> 0.Without loss of generality,it will be assumed that
V is positive denite (i.e.,
n
is strictly d-dimensional under ),because otherwise we can
consider a lower-dimensional subspace instead.In the case d > 1 dene
= E

f(
n;2
=
1
;  ;
n;d
=
1
)
0
g;
~
V = ( ;I
d1
)V


0
I
d1

;(7.3)
where I
k
is the k  k identity matrix.Note that
~
V is the asymptotic covariance matrix
(under P

) of f(S
n;2
;  ;S
n;d
)
0
S
n;1
g=
p
n.For s 2 R
d
,dene ~s = (s
2
;  ;s
d
)
0
s
1
.
First consider the case of i.i.d.
n
,with d > 1 and S
0
= 0.The renewal measure is
dened by U(B) =
P
1
n=0
PfS
n
2 Bg;and multivariate renewal theory is concerned with
approximating U(s + ) by
k
(s + ) as s
1
!1,where
k
is a -nite measure on R
d
whose density function (i.e.,Radon-Nikodym derivative) with respect to Lebesgue measure
is of the form

k
(s) =
1

1
p
det
~
V
(

1
2s
1
)
(d1)=2
e

1
~s
0 ~
V
1
~s=2s
1
f1 +
k
X
j=1
s
j=2
1
!
j
(~s=
p
s
1
)g (7.4)
for s
1
> 0,and
k
(s) = 0 for s
1
 0,where!
j
(u) =
P
n
j
l=0
q
l
(u) and q
l
(u) is a polynomial of
degree l in u whose coecients are associated with the Taylor expansion of (1 Ee
i
0

1
)
1
near  = 0.For Markov random walks,the renewal measure involves not only fS
n
g but
also fX
n
g.For A 2 A and B 2 B,dene
U
A

(B) =
1
X
n=0
P

fX
n
2 A;S
n
2 Bg:(7.5)
26
We can approximate U
A

(s +) by (A)
A;
k
(s +),in which
A;
k
is a -nite measure on
R
d
with density function
A;
k
with respect to Lebesgue measure,where
A;
k
(s) = 0 for
s
1
 0 and
A;
k
(s) is given by (7.4) for s
1
> 0,with the coecients of the polynomials
!
1
(~s);  ;!
k
(~s) depending also on A and  via Taylor's expansion of the Fourier transform
of U
A

near the origin,assuming that C4 hold for some suciently large r (depending on
k).Note that when  is degenerate at (x;0),C4 follows from C2.The precise denition of
!
j
is given in Section 4.1 of Fuh and Lai (2001),where they also proved the following mul-
tidimensional Markov renewal theorem with bounds on the remainders in approximating
U
A

(s +) by (A)
A;
k
(s +) as s
1
!1,recalling the assumption 
1
> 0.
Theorem 7.1.Let k  1 and let f(M
n
;S
n
);n  0g be a strongly nonlattice Markov
random walk satisfying C1-C4 for some r > k +5 +maxf1;(d 1)=2g.Let A 2 A and B
be a d-dimensional rectangle
Q
d
j=1
[
j
;
j
].Then as s
1
!1,
U
A


s +

0
s
1


+B

= (A)
A;
k

s +

0
s
1


+B

+o(s
(d1+k)=2
1
)
uniformly in ~s.
Theorem 7.2.Let f(M
n
;S
n
);n  0g be a strongly nonlattice Markov random walk satis-
fying C1-C4 for some r > 3.Let h > 0 and  > 0.Let B

be the class of all Borel subsets
of R
d1
such that
R
(@B)
"
exp(jyj
2
=2)dy = O("

) as"#0,where @B denotes the boundary
of B and (@B)
"
denotes its"-neighborhood.Then as s
1
!1,
U
A

([s
1
;s
1
+h] 
p
s
1
(s
1
+C)) = (A)
A;
1
([s
1
;s
1
+h] 
p
s
1
(s
1
+C)) +o(s
(1+)=2
1
)
for every  < min(1;r 3);uniformly in A 2 A and C 2 B

.
For"> 0 and f:R
d
!R,dene the oscillation function

f
(s;") = supfjf(s) f(t)j:
js  tj "g:Let F
b
be the set of all Borel functions f:R
d
![0;1] such that f(s) = 0
whenever s
1
62 [b;b +h];with xed h > 0.
Theorem 7.3.Let f(M
n
;S
n
);n  0g be a strongly nonlattice Markov random walk sat-
isfying C1-C4 for some r > 3.Let 0 <  < min(1;r  3).Then for every  > 0,as
b!1,
Z
f(s)dU
A

(s) = (A)
Z
f(s)d
A;
1
(s) +O

Z


f
(s;b

)d
A;
1
(s)

+o(b
(1+)=2
)
uniformly in f 2 F
b
and A 2 A.
27
In the case d = 1,V is a scalar,which will be denoted by 
2
.The following the-
orem provides bounds on the dierence between U
A

([b;b + h]) and its renewal-theoretic
approximation as b!1.
Theorem 7.4.Suppose d = 1 and f(M
n
;S
n
);n  0g is a strongly nonlattice Markov
random walk satisfying C1-C4 for some r  2.Then as b!1,
U
A

([b;b +h]) = (A)h= +o(b
(r1)
)
uniformly in A 2 A.
Given an IRF (M
n
)
n0
of i.i.d.Lipschitz maps,suppose (2.13) and (2.6) hold for some
p > 0.Then (M
n
)
n0
forms satises assumptions C1-C4 in Section 5.For j = 1;  ;d,let
g
j
2 L
2
0
() be a square integrable function with mean 0,i.e.
Z
X
g
j
d = 0 and kg
j
k
2
2
=
Z
X
g
2
j
d < 1:(7.6)
Denote g = (g
1
;  ;g
d
) and consider the sequence
S
n
:= S
n
(g):= g(M
1
) +   +g(M
n
);n  1;(7.7)
which may be viewed as a Markov random walk with driving chain (M
n
)
n0
.
In order to apply the spectral theory for (M
n
)
n0
,we put the following irreducible
condition.We shall say that the Markovian kernel P(x;dy) is irreducible if the condition
P

h = e
i
h ( 2 Rand h 2 N) implies that e
i
= 1 and h is a constant.By using Theorem
6.3,we can apply Theorems 7.1-7.4 to have the conclusion.
8 Examples and Applications
In the following,F always denotes a generic copy of F
1
;F
2
;   and  Lebesgue measure
on R (or some subset).Examples 1,2 and 4 are similar to that in Alsmeyer (2003) and
Example 3 is taken from Fuh (2003).Examples 1 and 2 show the irreducibility from the
density point of view;while Example 3 is from the positivity point of view.
Example 1.This is the motivating example in Diaconis and Freedman (1999),see 2.1
there.Let X:= [0;1],
u
(x):= ux,
u
(x):= x +u(1 x) for u 2 [0;1] and
F(x) = Z
U
(x) +(1 Z)
U
(x) (8.1)
for independent randomvariables U;Z with a uniformdistribution on [0;1] and a Bernoulli(1=2)
distribution,respectively.It is not dicult to verify that (M
n
)
n0
satises the assumptions
28
of Theorem 4.1 and has stationary distribution  =Beta(1=2;1=2) with Lebesgue density
f(x) =
1

p
x(1x)
on (0;1),also called arcsine distribution.(Plainly, in the denomina-
tor of f means the constant 3:14:::) Now observe that P(x;) is a mixture of a uniform
distribution on [0;x] and a uniform distribution on [x;1].So it possesses a -continuous
component for each x 2 [0;1].Theorem 4.1 therefore imply the Harris recurrence of
(M
n
)
n0
on H = X = [0;1].The conclusion remains true in the biased case where Z
has a Bernoulli(p) distribution for some p 6= 1=2.The stationary distribution in this case
is a Beta(p;q) distribution with Lebesgue density
(p+q)
(p)(q)
x
p1
(1  x)
q1
on (0;1),where
q  1 p and  is gamma function.
The Beta-Walk is a generalization of (8.1) and obtained by replacing the uniform vari-
able U in (8.1) by a Beta(;) variable V, 2 [0;1].Here Beta(0;0):=
1
2
(
0
+
1
) and
Beta(1;1):= 
1=2
.(8.1) yields when  = 1.As one can easily see with Theorem 4.1,
(M
n
)
n0
is a positive Harris chain on X = [0;1] for  2 (0;1],but is not for  = 0.Dia-
conis and Freedman [5,Theorem 6.1] show that  equals Beta(

+1
;

+1
) for  2 f0;1;1g,
but diers from it otherwise,although sharing the rst three moments.Except for the case
 = 0,where  =
1
2
(
0
+
1
), is further absolutely continuous with therefore nonempty
int(supp ).Since X is compact,condition (2.6) with x
0
= 0 holds for every p > 0,whence
Theorem 4.3 implies geometric ergodicity of the chain for every  2 (0;1).If  = 0,the
same conclusion yields by observing that,starting from any x 2 [0;1],it takes a geometric
time to enter the absorbing closed Harris set supp  = f0;1g and that Theorem 4.3 gives
geometric ergodicity on that set.
Example 2.Let us next take a look at matrix recursions which have been studied by
many authors,see 2.2 in Diaconis and Freedman (1999) and the references given there.
The dening equation is
M
n
= A
n
M
n1
+B
n
;n  1 (8.2)
on X = R
m
for some m 1,where (A
1
;B
1
);(A
2
;B
2
);   are i.i.d.;A
n
is a mm matrix
and B
n
a m1 vector.So the associated random Lipschitz map is
F(x) = Ax +B (8.3)
with (A;B) being a generic copy of (A
1
;B
1
).Let k  k be any norm on R
m
,dene
kAk:= supfkAxk;x 2 R
m
;kxk  1g for mm matrices A and suppose that
Elog
+
kAk < 1 and Elog
+
kBk < 1:
Suppose further an a.s.negative Liapunov exponent l

,here given by
l

= inffn
1
Elog kA
1
   A
n
k;n  1g:
29
Then the conditions of Theorem 2.1 are satised (with x
0
= 0) whence,by Theorem 2.1,
M
n
possesses a unique stationary distribution  which is the distribution of any solution
M
1
of the stochastic xed point equation M
1
 AM
1
+B,where (A;B) and M
1
are
independent.As one can easily see,we may take
M
1
=
X
n1

n1
Y
k=1
A
k

B
n
:(8.4)
If we now additionally assume that (A;B) is nonsingular with respect to 
mm


m
,then
all P(x;),x 2 R
m
,are evidently nonsingular with respect to 
m
and Theorem 4.1 shows
the positive Harris recurrence of (M
n
)
n0
on whole X = R
m
.The same conclusion holds
true providing that A,B are independent and B is nonsingular with respect to 
m
.
Next,we assume that the matrix recursion (8.2) with a.s.negative Liapunov exponent
and satisfying
(k1) Elog
+
kAk < 1and Elog
+
kBk < 1,
we recall that its positive Harris recurrence on whole X = R
m
follows if further
(k2) (A
1
;B
1
) is nonsingular with respect to 
mm


m
,
or
(k2') A
1
,B
1
are independent and B
1
is nonsingular with respect to 
m
holds true.
Given any p > 0,it is then immediate to conclude the assertion of Theorem 4.2(a) and
of 4.2(b) with w(x) = kxk
p
,providing additionally
(k3) Elog
p+1
(1 +kA
1
k) < 1and Elog
p+1
(1 +kB
1
k) < 1,
respectively
(k4) EkA
1
k
p
< 1and EkB
1
k
p
< 1.
Example 3.In this example,we consider the statistical inferential problem for hidden
Markov model.A hidden Markov model is dened as a parameterized Markov chain in a
Markovian random environment (cf.Cogburn,1980),with the underlying environmental
Markov chain viewed as missing data.That is,for each  2   R
q
,the unknown pa-
rameter,we consider X = fX
n
;n  0g as an ergodic (positive recurrent,irreducible and
aperiodic) Markov chain on a nite state space D = f1;2;  ;dg,with transition prob-
ability matrix P() = [p
xy
()]
x;y=1;;d
and stationary distribution () = (
x
())
x=1;;d
.
Suppose that an additive component 
n
=
P
n
k=0

k
;taking values in R,is adjoined to the
chain such that f(X
n
;
n
);n  0g is a Markov chain on DR and conditioning on the full
X sequence,
n
is a Markov chain with probability
P
()
f
n+1
2 BjX
0
;X
1
;  ;
0
;
1
;  ;
n
g = P
()
(X
n+1
:
n
;B) a:s:(8.5)
30
for each n and B 2 B(R);the Borel -algebra of R.Furthermore,we assume the existence
of a transition probability density for the Markov chain f(X
n
;
n
);n  0g with respect to
a -nite measure  on R such that
P
()
fX
1
2 A;
1
2 BjX
0
= x;
0
= s
0
g =
X
y2A
Z
B
p
xy
()f(s;'
y
()js
0
)d(s);(8.6)
where f(
k
;'
X
k
()j
k1
) is the transition probability density of 
k
given 
k1
;X
k
,with
respect to , 2  is the unknown parameter,and'
y
() is a function dened on the pa-
rameter space  for each y = 1;  ;d.Here and in the sequel,we assume the Markov chain
f(X
n
;
n
);n  0g has stationary probability  with probability density 
x
()f(;'
x
())
with respect to .In this example,we assume that only one parameter is of interest and
treat the other parameters as nuisance parameters.That is,for simplicity,we consider
 2   R as an one-dimensional unknown parameter.For convenience of notation,we
will use 
x
for 
x
() and p
xy
for p
xy
(),respectively,in the sequel.We give a formal
denition of a hidden Markov model as follows:
Denition 2 A process f
n
;n  0g is called a hidden Markov model if there is a Markov
chain fX
n
;n  0g such that the process f(X
n
;
n
);n  0g satises (8.5) and (8.6).
Note that if 
n
are conditionally independent given the full sequences X,then the Markov
chain f(X
n
;
n
);n  0g is called a Markov random walk,and f
n
;n  0g is the classical
hidden Markov model.
Now,let 
0
;
1
;  ;
n
be the observations from the hidden Markov model f
n
;n  0g
with an unknown parameter .Let
S
n
:=
p
n
(
0
;
1
;  ;
n
;
1
)
p
n
(
0
;
1
;  ;
n
;
0
)
(8.7)
:=
P
d
x
0
=1
  
P
d
x
n
=1

x
0
(
1
)f(
0
;'
x
0
(
1
))
Q
n
k=1
p
x
k1
x
k
(
1
)f(
k
;'
x
k
(
1
)j
k1
)
P
d
x
0
=1
  
P
d
x
n
=1

x
0
(
0
)f(
0
;'
x
0
(
0
))
Q
n
k=1
p
x
k1
x
k
(
0
)f(
k
;'
x
k
(
0
)j
k1
)
for xed 
0
;
1
2 .
Let 
0
2 
0
(the interior of ) and consider the problem of testing hypothesis   
0
.
Given 
1
> 
0
,we can construct a sequential probability ratio test of  = 
0
versus  = 
1
and use it to test the composite hypothesis   
0
.Then,the sequential probability ratio
test of  = 
0
versus  = 
1
stops sampling at stage
T:= inffn:log S
n
 a or log S
n
 bg (8.8)
for a  0 < b and accepts the null hypothesis that  = 
0
(or the alternative hypothesis
that  = 
1
) is the actual density according to log S
T
 a (or log S
T
 b).When it is
31
regarded as a test of   
0
,the SPRT rejects   
0
if and only if log S
T
 b.The
problem of interest here is to approximate the type I error  = P
(
0
)
flog S
T
 bg,the
type II error  = P
(
1
)
flog S
T
 ag and the expected sample sizes E
(
0
)
T (E
(
1
)
T) of the
test,where P
()
(E
()
) refers to the probability (expectation) with initial distribution as
the stationary distribution 
x
()f(;'
x
()).
To analyze the likelihood ratio (8.7),we have the following likelihood representation
via products of random matrices.Given a column vector u = (u
1
;  ;u
d
)
t
2 R
d
,where t
denotes the transpose of the underlying vector in R
d
,dene the L
1
-norm of u as kuk =
P
d
i=1
ju
i
j.The likelihood ratio (8.7) then can be represented as
S
n
=
p
n
(
1
;  ;
n
;
1
)
p
n
(
1
;  ;
n
;
0
)
=
kM
n
(
1
)    M
1
(
1
)M
0
(
1
)(
1
)k
kM
n
(
0
)    M
1
(
0
)M
0
(
0
)(
0
)k
;(8.9)
where
M
0
= M
0
() =
2
6
4
f(
0
;'
1
()) 0    0
.
.
.
.
.
.
.
.
.
.
.
.
0 0    f(
0
;'
d
())
3
7
5
;(8.10)
M
k
= M
k
() =
2
6
4
p
11
()f(
k
;'
1
()j
k1
)    p
d1
()f(
k
;'
1
()j
k1
)
.
.
.
.
.
.
.
.
.
p
1d
()f(
k
;'
d
()j
k1
)    p
dd
()f(
k
;'
d
()j
k1
)
3
7
5
(8.11)
for k = 1;  ;n,and
() =


1
();  ;
d
()

t
:(8.12)
Note that each component p
xy
f(
k
;'
y
()j
k1
) in M
k
represents X
k1
= x and X
k
= y,
and 
k
is a Markov chain with transition probability density f(
k
;'
y
()j
k1
),for k =
1;  ;n,therefore the M
k
are random matrices.Since f(X
n
;
n
);n  0g is a Markov chain
by denition (8.5) and (8.6),this implies that fM
k
;k = 1;  ;ng is a sequence of Markov
random matrices.Hence,S
n
is the ratio of the L
1
-norm of the products of Markov random
matrices via representation (8.9).Note that  is xed in (8.9).
Note that we consider i.i.d.random Lipschitz maps in this paper.Although the ex-
ample shows a nite dimensional (random matrices) linear iteration driven by a Markov
chain,the results developed in our setting can still be applied.The property of irreducible
comes from the positivity of the density dened in (8.10) and (8.11).The reader is referred
to Fuh (2003) for formal denitions and details.
Example 4.Let us now look at an example,in fact a one-dimensional special case of
Example 2 (A = (a) and B
n
= 
n
),with a negative answer as to Harris recurrence.Put
32
f
0
(x)  ax 1,f
1
(x)  ax +1 for x 2 R and some a 2 (0;1) and consider
F(x) = f

(x) (8.13)
where  is 0 or 1 with probability 1=2 each.The associated IRF (M
n
)
n0
with state space
X = R thus satises the recursive equation
M
n
= aM
n1
+
n
;n  1 (8.14)
where 
1
;
2
;   are independent Bernoulli(1=2) variables.Its unique stationary distribution
 is the distribution of the innite series
P
n1
a
n1

n
.It is known that  is continuous
for every a 2 (0;1),singular for a 2 (0;1=2) [N,N  (1=2;1) a nonempty -null set,and
absolutely continuous,otherwise.If a = 1=2, is the uniform distribution on [2;2].
We claimthat (M
n
)
n0
is never Harris recurrent.If it were,by Theorem2.1 of Alsmeyer
(2003),we could nd a -positive set X
0
,necessarily uncountable because  is continuous,
such that the
P(x;) =
1
2

ax+1
+
1
2

ax1
;x 2 X
0
were dominated by some -nite measure .By a well-known result of Halmos and Sav-
age (1948),we could then nd a countable subset X
1
of X
0
such that (P(x;))
x2X
0
and
(P(x;))
x2X
1
were equivalent,that is P(x;N) = 0 for all x 2 X
0
i P(x;N) = 0 for
all x 2 X
1
.On the other hand,given any countable X
1
= fx
n
;n  1g,the set of
x such that P(x;) is nonsingular with respect to some P(x
n
;) is easily identied as
X
1
[ fx 2 X:x = x
n

2
a
for some ng which is again countable.Consequently,the
uncountable X
0
contains elements x such that P(x;) is orthogonal to each P(x
n
;),a
contradiction to the equivalence of (P(x;))
x2X
0
and (P(x;))
x2X
1
.
References
[1] Alsmeyer,G.(1990).Convergence rates in the law of large numbers for martingales.
Stoch.Proc.Appl.36,181-194.
[2] Alsmeyer,G.(2003).On the Harris recurrence of iterated random Lipschitz functions
and related convergence rate results.To appear in J.Theoretical Probab.
[3] Alsmeyer,G.and Fuh,C.D.(2001).Limit theorems for iterated random functions
by regenerative methods.Stoch.Proc.Appl.96,123-142.Corrigendum (2002),97,
341-345.
[4] Benda,M.(1998).A central limit theorem for contractive stochastic dynamical sys-
tems.J.Appl.Prob.35,200-205.
33
[5] Bhattacharya,R.N.and Ranaga Rao,R.(1976).Normal Approximation and Asymp-
totic Expansions.Krieger,Malabar.Fl.1986.(Revised Reprint).
[6] Carlsson,H.(1983).Remainder term estimates of the renewal function.Ann.Probab.
11,143-157.
[7] Carlsson,H.and Wainger,S.(1982).An asymptotic series expansion of the multidi-
mensional renewal measure.Comp.Math.,47,355-364.
[8] Diaconis,P.and Freedman,D.(1999).Iterated random functions.SIAM Review 41,
45-76.
[9] Du o,M.(1997).Random Iterative Models,Springer-Verlag,New York.
[10] Elton,J.H.(1990).A multiplicative ergodic theorem for Lipschitz maps.Stoch.Proc.
Appl.34,39-47.
[11] Fuh,C.D.(2003).SPRT and CUSUM in hidden Markov models.To appear in the
Ann.Statist.vol 31.
[12] Fuh,C.D.and Lai,T.L.(2001).Asymptotic expansions in multidimensional Markov
renewal theory and rst passage times for Markov random walks.Adv.Appl.Prob.
33,652-673.
[13] Fuh,C.D.and Lai,T.L.(2003).Characteristic function and edgeworth expansions
for Markov random walks with applications to bootstrap methods.Working paper.
[14] Fuh,C.D.and Zhang,C.H.(2000).Poisson equation,moment inequalities and quick
convergence for Markov random walks.Stoch.Proc.Appl.87,53-67.
[15] Halmos,P.and Savage,L.J.(1948).Application of the Radon-Nikodym theorem to
the theory of sucient statistics.Ann.Math.Statist.20,225-241.
[16] Hipp,C.(1985).Asymptotic expansions in the central limit theorem for compound and
Markov processes.Z.Wahrsch Verw.Gebiete,69,361-385.
[17] Jensen,J.L.(1987).A note on asymptotic expansions for Markov chains using oper-
ator theory.Adv.Appl.Math.8,377-392.
[18] Jensen,J.L.(1989).Asymptotic expansions for strongly mixing Harris recurrent
Markov chains.Scand.J.Statist.16,47-63.
[19] Kartashov,N.V.(1996).Strong Stable Markov Chains.VSP,Utrecht.
34
[20] Keener,R.(1990).Asymptotic expansions in multivariate renewal theory.Stoch.Proc.
Appl.34,137-143.
[21] Malinovskii,V.K.(1987).Limit theorems for Harris-Markov chains,I.Theory
Probab.Appl.31,269-285.
[22] Meyn,S.P.and Tweedie,R.L.(1993).Markov Chains and Stochastic Stability.
Springer-Verlag,New York.
[23] Nagaev,S.V.(1957).Some limit theorems for stationary Markov chains.Theory
Probab.Appl.2,378-406.
[24] Ney,P.and Nummelin,E.(1987).Markov additive processes I.Eigenvalue properties
and limit theorems.Ann.Probab.15,561-592.
[25] Riesz,F.and Sz-Nagy,B.(1955).Functional Analysis.Ungar,New York.
[26] Stone,C.(1965).On characteristic functions and renewal theory.Trans.Amer.Math.
Soc.120,327-342.
[27] Strassen,V.(1967).Almost sure behavior of sums of independent random variables
and martingales.Proc.Fifth Berkeley Symp.Math.Statist.and Probability,315-343.
[28] Wu,W.B.and Woodroofe,M.(2000).A central limit theorem for iterated random
functions.J.Appl.Prob.37,748-755.
35