# Slides for lecture 10

Πολεοδομικά Έργα

16 Νοε 2013 (πριν από 4 χρόνια και 7 μήνες)

104 εμφανίσεις

ECTA [U06982]

Guy Judge November 2008

Econometric Analysis

Week 10

Pooled and panel data models

ECTA [U06982]

Guy Judge November 2008

Lecture outline

Basics

data sets with both cross
-
section and time
dimensions

Pooled regressions and the use of time period dummies

A note on Chow tests versus interactive dummies in the
presence of heteroskedasticity

Panel data and longitudinal data sets

Attractive features of panel data sets

Fixed effects models

differencing and demeaning

Random effects models

An illustrative example

ECTA [U06982]

Guy Judge November 2008

Wooldridge, J M (2006) Introductory
Econometrics. A Modern Approach. (Third
Edition) Chapters 13 and 14

Kennedy, P (2003) A Guide to Econometrics.
(Fifth Edition) Chapter 17

Dougherty, C (2007) Introduction to
Econometrics (Third Edition) Chapter 14

Gujarati, D N (2003) Basic econometrics
(Fourth Edition) Chapter 16

ECTA [U06982]

Guy Judge November 2008

Basics

many data sets have both a cross
-
section and a time
dimension

two subscripts are required for the variables

x
it

i = 1,…..,n

t = 1,…..,T

If n is large but T is quite small (say only 2 or 3) then we
may decide just to apply cross
-
section methods, but with
intercept (and possibly even slope) dummies to distinguish
observations from different time periods

or we might be
able just to
pool

the data.

ECTA [U06982]

Guy Judge November 2008

Pooled regressions with time period dummies

If the time period dummies are insignificant
then the data from the different periods can be
brought together to form one
pooled data

set

We might also think of using
Chow tests

to check
the validity of pooling the data (effectively a test
of structural change)

For example

i = 1,…n; t=1,2

If the data are pooled then we are effectively
incorporating the restriction that

0

and

1

remain
unchanged between periods 1 and 2

it
it
it
u
x
y

1
0

ECTA [U06982]

Guy Judge November 2008

More on pooled models with time dummies

Including a standard dummy variable just allows
the intercept to change

The intercept in period 2 =

it
it
it
it
u
x
D
y

1
0
0

2
1
;
1
0

t
for
D
t
for
D
it
it
0
0

ECTA [U06982]

Guy Judge November 2008

Chow tests

A Chow test looks for evidence of changes in
both intercept and slope parameters

Effectively

Chow tests the restrictions

The pooled data set imposes these restrictions

n
i
t
for
u
x
y
it
it
it
,...,
1
,
1
]
1
[
1
]
1
[
0

n
i
t
for
u
x
y
it
it
it
,...,
1
,
2
]
2
[
1
]
2
[
0

]
2
[
1
]
1
[
1
]
2
[
0
]
1
[
0
;

ECTA [U06982]

Guy Judge November 2008

Interactive slope dummies

The same test could be undertaken using
interactive

slope dummies in a pooled regression

Intercept in period 2 =

0
+

0

Slope in period 2 =

1
+

1

Note: The Chow test requires that the model is free of
heteroskedasticity
-

which may not always be the case. The
dummy variable approach could in this case be combined with
the use of
heteroskedastic consistent standard errors

for
computing t values to assess the significance of

0

and

1

it
it
it
it
it
it
u
x
D
x
D
y

1
1
0
0

ECTA [U06982]

Guy Judge November 2008

Longitudinal data sets

It T is large, as with so called
longitudinal

data
sets, then we may need to consider the nature of any
trends in the variables.

Unit root tests for mixed cross
-
section time series
data have been developed, but we shall not consider
them here

but we will mention that simpler models
may just incorporate lagged dependent variables in
the regression equations.

So, without concerning ourselves about unit roots,
we will just examine two popular types of panel data
models :
fixed effects

and
random effects

models.

ECTA [U06982]

Guy Judge November 2008

More on longitudinal data sets

If the observations in the different time periods relate to
exactly

the same subjects then we may refer to the data
set as
panel data
. For example, the famous US
National
Longitudinal Survey of Youth

tracks the same individuals
over several years. Another well known US panel data set is
the
Panel Study of Income Dynamics (PSID)
.

Note: sometimes this might mean that we have to deal with
unbalanced panels

when some individuals disappear from
the panel sample.

The
Current Population Survey (CPS)

on the other hand
extracts a
different random sample

each year so the cross
-
section data are
not matched

to the same individuals. But
because they are
independently

drawn the data can be
pooled with the possible addition of time dummies.

ECTA [U06982]

Guy Judge November 2008

Attractive features of panel data sets

Panel data can enable us to examine issues not amenable to study
using only cross
-
section or time series data sets.

For example with production functions we can deal
simultaneously with issues of economies of scale and of
technological change.

Cross
-
section labour market data can tell us who is unemployed
in any particular year and time series data can tell us how overall
unemployment changes from year to year. Panel data enables us to
track individuals

unemployment
duration
, turnover rates etc.

* Panel data can help us deal with issues of
heterogeneity

in the
micro units.
Unobserved factors

affecting different people can
cause

bias
in cross
-
section studies, but with panel data
differencing
or
demeaning

can control for these factors.

We can introduce some allowance for

in
models by including lagged variables.

ECTA [U06982]

Guy Judge November 2008

Analysis of the CRIME2 data set in Wooldridge

CRIME2.xls contains data on (amongst other
variables) the crime rate (
crmrte
) and the
unemployment rate (
unem
) for 46 US cities in the
years 1982 and 1987

Question
: Would you expect cities with a higher
unemployment rate to have a higher or lower crime
rate?

So if, like Wooldridge p 460, you used just the
1987 data and ran a regression of
crmrte

on
unem
would you expect

1

to be positive or negative?

ECTA [U06982]

Guy Judge November 2008

Replicating Wooldridge’s results on p460

EQ( 1) Modelling crmrte by OLS (using crime2sorted.xls)

The estimation sample is: 47

92

Coefficient Std.Error t
-
value t
-
prob Part.R^2

Constant 28.378 20.76 6.18 0.000 0.4651

unem
-
4.16113 3.416
-
1.22 0.230 0.0326

R^2 0.0326151 F(1,44) = 1.483 [0.230]

log
-
likelihood
-
227.266 DW 1.11

no. of observations 46 no. of parameters 2

mean(crmrte) 103.873 var(crmrte) 1183.71

Comment: Not only does the coefficient of
unem

have the
“wrong” sign, it is also not significantly different from zero

see
the very low t, F and R^2 values.

Note: the figures in brackets on p460 of Wooldridge are standard
errors, not t
-
values.

ECTA [U06982]

Guy Judge November 2008

Wooldridge notes that this simple model is deficient in
many ways. To improve it we might

introduce

including demographic
factors such as the age distribution in each city, gender
balance, educational data, law enforcement efforts

try a different
functional form

include a
lagged dependent variable

(the crime rate in
the earlier year)

But Wooldridge wants to use this model to demonstrate
how, with two periods of data, we can control for
individual unobserved
fixed effects
and thus remove this
potential form of bias.

ECTA [U06982]

Guy Judge November 2008

Dealing with individual heterogeneity

Suppose we have an individual
unobserved factor
, or set of factors, that
affects crime in different cities, but remains unchanged (
fixed
) between
the different time periods. Denoting this by a
i
, following Wooldridge, we
can write

Here a
i

picks up all those factors unique to the individual cities that
don’t change, or don’t change much, between periods

including the
demographic factors noted on the previous slide. Wooldridge calls this the
unobserved heterogeneity

error term.

The usual error term u
it
picks up all other factors that disturb the
observed values of the dependent variable both across the different cities
and between the years
-

Wooldridge calls this the
idiosyncratic

error term.

You can see that this model also includes a dummy variable to allow for
intercept shifts (common to all cities) between periods.

it
i
it
it
it
u
a
unem
d
crmrte

1
0
0
87

ECTA [U06982]

Guy Judge November 2008

Replicating Wooldridge’s results p462

The coefficient on
unem
, although positive, is still not significant.

Here pooled OLS has not solved the omitted variables problem.

EQ( 2) Modelling crmrte by OLS (using crime2.in7)

The estimation sample is: 1
-

92

Coefficient Std.Error t
-
value t
-
prob Part.R^2

Constant 93.4203 12.74 7.33 0.000 0.3766

d87 7.94041 7.975 0.996 0.322 0.0110

unem 0.426546 1.188 0.359 0.720 0.0014

R^2 0.0122119 F(2,89) = 0.5501 [0.579]

log
-
likelihood
-
441.902 DW 1.16

no. of observations 92 no. of parameters 3

mean(crmrte) 100.791 var(crmrte) 880.929

If we were to ignore the heterogeneous effects and just pool the
data (but including the time period dummy) we would find

ECTA [U06982]

Guy Judge November 2008

A first
-
differenced model

Because a
i

is constant over time we can remove its
effect by first
-
differencing the equation. Notice that
the initial intercept term

0

gets removed too,
leaving us with

Here I am using Wooldridge’s labels
ccrmrte
and
cunem

respectively for

crmrte

and

unem.

Wooldridge’s results for this regression are replicated
in PcGive and shown on the next slide.

it
it
it
u
cunem
ccrmrte

1
0

ECTA [U06982]

Guy Judge November 2008

Replicating Wooldridge’s results p464

(1)

in this regression the estimate of

1

is positive and statistically
significant

(2)

the positive and statistically significant estimate of

0

shows
evidence of a secular increase in crime across all cities between
1982 and 1987.

EQ( 3) Modelling ccrmrte by OLS (using ccrime2.xls)

The estimation sample is: 1
-

46

Coefficient Std.Error t
-
value t
-
prob Part.R^2

Constant 15.4022 4.702 3.28 0.002 0.1960

cunem 2.21800 0.8779 2.53 0.015 0.1267

R^2 0.1267 F(1,44) = 6.384 [0.015]*

log
-
likelihood
-
202.169 DW 1.15

no. of observations 46 no. of parameters 2

mean(ccrmrte) 6.16375 var(ccrmrte) 440.348

ECTA [U06982]

Guy Judge November 2008

The fixed effects model with time demeaned data

Define the
time demeaned

data series for y as

where

(and similarly for x and u)

then if we have

and

then we can estimate

Again the unobserved effect has disappeared.

t
it
it
y
y
y

T
t
it
t
y
T
y
1
1
it
i
it
it
u
a
x
y

1
0

t
i
t
t
u
a
x
y

1
0

it
it
it
u
x
y

1

ECTA [U06982]

Guy Judge November 2008

Some comments on these two approaches

If T=2 the first
-
differencing approach and the fixed effects
demeaned data approaches are equivalent

(see W3 p491)

But with T>2 the choice depends on the relative efficiency of
the two methods.

Wooldridge says that the first
-
differences method is better if
the differenced u term is serially uncorrelated

but he
advises you to try both approaches and look to explain why
the results differ (if they do).

ECTA [U06982]

Guy Judge November 2008

Random effects models

it
i
it
it
u
a
x
y

1
0

Again suppose that we have

But here we assume that

(Wooldridge allows for k separate x variables, so each must be
uncorrelated with a).

Writing v
it

as the composite error

We have

The composite error v
it

is serially correlated

0
)
,
cov(

i
it
a
x
it
i
it
u
a
v

it
it
it
v
x
y

1
0

2
2
2
)
,
(
a
u
u
is
it
v
v
corr

ECTA [U06982]

Guy Judge November 2008

More on the random effects model

If we knew these variances we could calculate

and use
Generalised Least Squares

based on the
quasi
-
demeaned data

and

This purges the disturbance term of the serial correlation.

In practice this means that we use
Feasible GLS

estimation
(Kennedy calls it EGLS

estimated GLS) where

is
estimated as part of the process. Wooldridge comments
(p495) that the algebra is fairly unpleasant

but most
econometric software packages will do this for you.

2
1
2
2
2
/(
1
a
u
u
T

i
it
y
y

i
it
x
x

ECTA [U06982]

Guy Judge November 2008

Kennedy proposes a strategy in which you use a Hausman
exogeneity test to see if the random effects estimator is
unbiased (if this null is not rejected then use the RE model,
otherwise use the FE approach).

Wooldridge says the key issue is whether one can plausibly
assume that the a
i

are uncorrelated with the x variables (in
which case one can use the RE model via FGLS estimation).
But it is just a question of which estimator is more efficient so
the fixed effects estimator would still be unbiased and
consistent.