Panel Data Analysis – Advantages and Challenges Cheng Hsiao

skillfulbuyerΠολεοδομικά Έργα

16 Νοε 2013 (πριν από 3 χρόνια και 1 μήνα)

55 εμφανίσεις

1

Panel Data Analysis


Advantages and Challenges


Cheng Hsiao

2

Introduction



Year




SSCI




1986





29



2003





580



2004





687



2005





773

3


Three factors contributing to the phenomenon growth



(i) Data availability



(ii) Greater capacity for modeling the complexity of
human behavior



(iii) Challenging methodology


4


Data Availability



US
:


National Longitudinal Surveys of Labor Market Experience
(NLS)


Michigan Panel Study of Income Dynamics (PSID)



Eurostat
:


The European Community Household Panel (ECHP)



Kenya
:


Primary School Deworming Project (PDSP)



China
:


Township & Village Enterprises Survey


Financial Institutions Survey (1984
-
1990)



Taiwan
:


Household Demographic Survey

5

6

Advantages


Cross
-
Sectional

Data

may

reflect

inter
-
individual

differences


Time

Series

data

may

suffer

from

multicollinearity

and

shortages

of

degree

of

freedom


Panel

data,

by

blending

inter
-
individual

indifference

with

intra
-
individual

dynamics,

can

allow

a

researcher

the

possibility

to

specify

more

complicated

behavioral

hypotheses

than

a

single

cross
-
sectional

data

or

time

series

data

7


(i) More degree of freedom, more sample variability,
less multicollinearity

u
x
y
~
~
~



n x 1

n x k

k x 1

)
'
(
)
(
1
2
^
~
x
x
Var




u
x
y
i
i
i
i

















n
i
u
x
x
i
Var
1
2
2
^
_
)
(




x
n
x
i
1
_
8

9

10


(ii) Greater capacity for capturing the complexity of
human behavior



(a) Constructing and testing more complicated


behavioral hypotheses




-

Homogenous vs Heterogenous population




Ben
-
Porath (1973)




-

Program Evaluation




Difference
-
in
-
Difference method

11

u
x
g
y
i
i
i
1
1
1
)
(


if

1

d
i
treatment

u
x
g
y
i
i
i
0
0
0
)
(


if

0

d
i
control

Treatment Effect =

y
y
i
i
0
1

Average Treatment Effect =

]
[
0
1
y
y
E
i
i

Data

y
d
y
d
y
i
i
i
i
i
0
1
)
1
(



Confounding treatment effect with differences


in covariates between control group and


treatment group

)
(
x
i
12

Bias due to selection on unobservables

)
1
|
(
0
)
(
1
1



d
u
E
u
E
i
i
i
)
0
|
(
0
)
(
0
0



d
u
E
u
E
i
i
i
13




Difference
-
in
-
Difference method






(b) Controlling the impact of omitted variables

)
(
)
(
y
y
E
y
y
E
cb
ca
tb
ta



)
(
)
(
)
(
)
,
(
)
(

1
,
1
,
1
,
2
^
u
u
x
x
y
y
x
Var
z
x
Cov
x
x
y
E
E
z
u
z
x
y
t
i
it
t
i
it
t
i
it
i
i
i
i
it
i
it
it






























-

unobservable

14


(c) Uncovering dynamic relationships






multicollinearity



(d) Generating more accurate predictions for
individual outcomes (exchangeability)



(e) Providing micro foundation for aggregate data
analysis


“representative agent” heterogeneity


)
(
2
1
1
x
x
x
x
u
x
y
t
t
t
t
t
j
t
j
t











15

16


(ii) Simplifying Statistical Inference and Computation



(a) Time
-
series inference







if




if




if





i.i.d


,
t
1



t
t
t
y
y



)
1
,
0
(
~
)
(

,
1
2
2
^








N
T










dr
r
w
w
dr
r
w
dr
r
w
T
)
(
1
)
1
(
)
(
)
(
)
(

,
1
2
2
2
1
2
^



,
0
(
)
(

,
1
^
N
T







17


(b) Measurement errors






(c) Dynamic sample selection models

u
x
y
it
it
it




it
it
it
x
z


)
(



it
it
it
it
u
z
y



)
(
)
(
)
(
,
,
,
,




j
t
i
it
j
t
i
it
j
t
i
it
j
t
i
it
u
u
z
z
y
y











du
x
y
u
u
f
x
y
y
f
u
f
x
y
y
f
y
y
y
y
u
x
y
y
t
i
t
i
t
i
t
i
it
it
t
i
it
it
it
t
i
it
it
it
it
it
it
it
t
i
it
1
,
1
,
*
2
,
1
,
1
,
*
1
,
*
*
*
*
1
,
*
)
|
(
)
,
0
|
(
)
(
)
,
|
(

0

if

0
0

if































18

Methodology Challenges


Panel data also raises the issue of how best to model
unobserved heterogeneity



Standard statistical procedures are developed based
on the assumption that y conditional on x is randomly
distributed with a common mean






2
~
~
~
'
~
~
~
~
'
~
~
)
|
(

)
|
(


.
.
)

;

|
(




x
y
Var
x
x
y
E
u
x
y
g
e
x
y
f
19


Panel

data,

by

its

nature,

focus

on

individual

outcomes
.

Factors

affecting

individual

outcomes

are

too

numerous
.




One

way

to

restore

homogeneity

is

to

add

additional

conditional

variables,

say,

,

,


so

.



However




(a)

A

model

is

a

simplification

of

reality,

not

a

mimic

of

reality
.

Multicollinearity,

shortages

of

degree

of

freedom,

etc
.

may

confuse

the

fundamental

relationship

between

and

.




(b)

,

,


may

not

be

observable
.


)
|
(
~
x
y
f
it
it
it
z
~
w
~
,...)
,
,
|
(
~
~
~
w
z
x
y
f
it
it
it
it
y
x
~
z
~
w
~
20



Another way is to let the parameters characterizing
the conditional density of given to vary across
i and/or over t,







.





Meaningful inference on can be made only if
we assume certain structure on .

y
x
~
)

;

|
(
~
~

it
it
it
x
y
f
~

it
~

it
~

it
21


Let




-

structural parameters


-

incidental parameters (increase with N)



-

individual
-
specific

effects

represent

the

effects

of

those

variables

that

vary

across

individuals

but

stay

constant

over

time,

at

least

in

the

short
-
time

span,

e
.
g
.

ability,

socio
-
economic

background

variables,

marginal

utility

of

initial

wealth,

etc
.




-

fixed constant, Fixed Effects Model (FE)



-

random variable, Random Effects Model (RE)




)
,
(
~
~
'
~
'




i
i
it



~

i

i
22

23

24

Concluding Remarks


The power of panel data to isolate the effects of
specific actions, treatments or more general policies
depends on the compatibility of the assumptions of
statistical tools with the data generating process



Factors to consider:


(1) Advantages


(2) Limitations


(3) Compatibility between assumptions and data
generating process


(4) Efficiency