Simultaneous Equations Models in SAS Indirect Least Squares (ILS) and 2-Stage Least Squares (2SLS)

lasagnaseniorΔιαχείριση

28 Οκτ 2013 (πριν από 3 χρόνια και 5 μήνες)

780 εμφανίσεις

UNC
-
Wilmington

ECN 422

Department of Economics and Finance

Dr. Chris Dumas


1


Simultaneous Equations Models

in SAS

Indirect Least Squares (ILS) and 2
-
Stage Least Squares (2SLS)



Simultaneous Equations Models

(SEMs) are models in which two or more equations share two or more
variables that link the equations together in a system. In such models, the variables that appear in two or more
equations are said to be “mutually dependent,” or “jointly
-
determined,” or

they are said to have a “simultaneous”
or “two
-
way” relationship. Such variables affect one another, causing “feedback” relationships between the
equations in the system; that is, if something in one equation changes, the change causes a change in anothe
r
equation which, in turn, “feeds back” to cause a further change in the first equation.
Such situations are
sometimes referred to as “chick
en

and the egg” situations, because it is difficult to determine which came first, a
change in one equation, or a c
hange in
a second

equation, if the two equations are affecting each other.


The

feedback effects
in SEMs
either spiral out
of control or reach an
equilibrium of some sort.

Usually, they
reach an equilibrium (otherwise, our world would be much more explo
sive than we observe). Many of the
standard models of Economics and Finance are SEMs that reach equilibrium (
well, usually they reach
equilibrium,

under typical conditions). Two well
-
known examples are the Demand and Supply model of
Microeconomics, and t
he IS
-
LM model of Macroeconomics.


The Problem with Simultaneous Equation Models

Simultaneous Equations Bias


Although very common in Economics and Finance, SEM’s face a potential problem when it comes to
Econometric estimation of the parameters in their

equations using OLS regression. Recall that one of the
assumptions of OLS regression is that the error term in the regression equation is independent of the X
(and Y)
variables in the equation.
Well, SEM’s typically violate this assumption, with undesir
able results
,
econometrically speaking (Can I say, “econometrically speaking?” Well, I guess I just did.)
. To see why this is
so, consider as an example the Demand and Supply model from Microeconomics.


Suppose we have data on the quantity Q of a produ
ct traded at various locations when the market is in
equilibrium, the
market
price P
Q

in each location, average consumer income I in each location, and the price of
materials P
M

in each location. We want to estimate the parameters (the b’s) in the supply and demand equations:


Demand: Q
D

= β
0

+ β
1
·P
Q

+ β
2
·I + u
D
, where "u
D
" is an error term in the Demand equation,

Supply: Q
S

= β
3

+ β
4
·P
Q

+ β
5
·P
M

+ u
S
, where "u
S
"
is an error term in the
Supply

equation,

Equilibrium Condition: Q
D

= Q
S


Notice first that Demand and Supply are a SEM, because
together
they are two equations that share two variables
in common, Q and P
Q
. Now, suppose that something outside the model
changes, and
this change affects D
emand.
This would appear in the model as a change in u
D
, the error term in the Demand equation. Let’s say that there is
an increase in u
D
.

All else held constant

in the Demand equation
, this would result in an increase
in Q
D
. This, in
turn, would result in an increase in Q
S

in equilibrium, because, in equilibrium, Q
D

= Q
S
. Next, in the Supply
equation, if Q
S

increases, and P
M

remains constant, and u
S

is a random error unaffected by the change in Q
S
, then
the only way that the
equality in the
Supply equation can be maintained is if P
Q

increases. Now
, if P
Q

in the
Supply equation increases, then P
Q

in the Demand equation must
also
increase, because it is the same P
Q

in both
equations.
It is at this
point that the Econometric problem occurs: a change in the error term in the Demand
equation, u
D
, affected an “X” variable in the Demand equation, namely, the variable P
Q
. This violates the
assumption of the OLS method that the error term in the regressio
n equation must be independent of the X (and
Y) variables in the equation.


UNC
-
Wilmington

ECN 422

Department of Economics and Finance

Dr. Chris Dumas


2

As a second example, consider the IS
-
LM model from Macroeconomics. Suppose we have data on national
output Y, money supply M, and the interest rate r, and we want to estimate the
parameters in the IS and LM
equations below:


IS Equation: Y
IS

= β
0

+ β
1
·r + u
IS
, where "u
IS
" is an error term in the IS equation,

LM Equation:

Y
LM

= β
2

+ β
3
·M + β
4
·r + u
LM
, where "u
LM
" is an error term in the LM equation,

Equilibrium Condit
ion:
Y
IS

=
Y
LM


Notice first that IS and LM are a SEM, because together they are two equations that share two variables in
common, Y and r. Now, suppose that something outside the model changes, and this change affects the IS
equation. This would appear

in the model as a change in u
IS
, the error term in the
IS

equation. Let’s say that there
is an increase in u
IS
. All else held constant in the
IS

equation, this would result in an increase in
Y
IS
. This, in
turn, would result in an increase in
Y
LM

in equilibrium, because, in equilibrium,
Y
IS

=
Y
LM
. Next, in the

LM

equation, if
Y
LM

increases, and
M

remains constant, and u
LM

is a random error unaffected by the change in
Y
LM
,
then the only way that the equality in the
LM

equation can be maintained is

if
r

increases. Now, if
r

in the
LM

equation increases, then
r

in the
IS

equation must also increase, because it is the same
r

in both equations. It is at
this point that the Econometric problem occurs: a change in the error term in the
IS

equation, u
D
,

affected an “X”
variable in the
IS

equation, namely, the variable
r
. This violates the
OLS
assumption that the error term in the
regression equation
is

independent of the X (and Y) variables in the equation.


Okay, so what? Well, it can be shown that if

this assumption of the OLS method is violated, the following
negative consequences occur:


Simultaneous Equations Bias
:

(1) the estim
ates of the
b’s are
biased

(2) the estimates of the b’s are
inconsistent

(that is, a larger sample size
will not
diminish

the bias)


The Identification Problem


The Identification Problem is another problem that is characteristic of SEMs. The
Identification Problem

is that
it can be difficult to determine (that is, to identify) which equation in an SEM system is being estimated in a
regression analysis.


For example, suppose you have data on the market
quantity Q and market price P
Q

of a product traded in the
market, and suppose you want to use regression analysis to estimate a demand curve for product Q.
These data
are plotted twice in the figures below.
The
same data are plotted in each figure, but t
wo demand curves are drawn
through the data in the figure on the left, and two supply curves are drawn through the (same) data in the figure on
the right.















P
Q

Q

0

Is Demand Shifting ?

P
Q

Q

0

Or, Is Supply Shifting ?

(cue spooky music from Halloween)

UNC
-
Wilmington

ECN 422

Department of Economics and Finance

Dr. Chris Dumas


3

Look back at the demand and supply equations we considered earlier in this handout. The data points in the
figures above could represent the demand equation, with a change in variable “I” causing the demand curve to
shift. On the other
hand, the data points could instead represent the supply equation, with a change in “P
M

causing the supply curve to shift.


The key point: If we have data for only Q and P
Q

(and no data for I and P
M
), then it is difficult to “identify” which
equation is

represented by the data points.
If we nonetheless

did a regression analysis with data for only Q and
P
Q
, then we wouldn’t be sure whether we were actually estimating a demand curve or a supply curve. Although it
might be obvious from the situation that
you are studying whether the curve in this
simple
example is demand or
supply, in more complicated (
i.e., more
realistic) situations involving more equations and more variables, it can be
very difficult to identify which of the equations you are actually e
stimating wh
en you do a regression analysis
--
unless you are careful . . .



Identifying an Equation Depends on the Variables that
A
re

in the System but

NOT

in the Equation
!
?!


Reconsider the problem of identifying whether the demand equation or the supply equation is represented by the
figures above. If we had data on variable I, then as variable I changed
its
value, the demand curve would shift
along the supply curve, and the
data points would show us the location of the
supply

curve

that is, a change in
the value of a variable in the
demand

curve allows us to “see,” or “identify,”

the
supply

curve, as illustrated in the
figure
on the left
below:















Notice in the figure
on the left
above that identifying the supply curve depends on changes in the value of a
variable that is NOT in the supply curve, namely, the variable I that is in the demand curve. The variable I
is

in
the demand
-
supply
system

of equations, but it is
not

in the sup
ply curve; this is what allows us to use the variable
I to identify the supply curve.


Now consider the problem of identifying the demand curve. I
f we had data on variable P
M

in the
supply

curve
,
then as variable P
M

changed
its
value, the supply curve would shift along the demand curve, and the resulting data
points would show us the location of the demand curve

that is, a change in the value of a variable in the
supply

curve allows us to “see,” or “identify,”

the
demand

curve, as illustrated in the figure
on the right above.
I
dentifying the demand curve depends on changes in the value of a variable that is NOT in the demand curve,
namely, the variable P
M

that is in the supply curve. The variable
P
M

is

in the demand
-
sup
ply
system

of equations,
but it is
not

in the
demand

curve; this is what allows us to use the variable
P
M

to identify the
demand

curve.


(Tricky, no? And very Zen master
-
like, wouldn’t you agree, grasshopper?)





P
Q

Q

0

A shifting Demand Curve reveals
the location of the Supply Curve

P
Q

Q

0

A shifting Supply Cur
ve reveals
the location of the Demand Curve

UNC
-
Wilmington

ECN 422

Department of Economics and Finance

Dr. Chris Dumas


4

Terminology Involved with Identifying
the
Equations in SEMs


Endogenous Variables

are variables whose values are determined
inside

the
SEM
. These are the
“Y”
variables
that you are using the model to “solve for.” For example, in a model of demand and supply, the endogenous
variables would be

Q and P
Q
. There might be other variables in the demand and supply equations, but we would
need to be given the values of these other variables to plug into the equations in order to solve for the values of
the endogenous variables, Q and P
Q
.


In contrast

to Endogenous Variables,
Predetermined Variables

are variables whose values are determined
outside

the SEM
. We must be “given” the values of these variables; we do not solve for them using the model.
The point of Predetermined Variables is to act as “co
ntrols” on the relationship between the Endogenous
Variables in the model; that is, the Predetermined Variables act as shifters, shifting around the relationships

among

the Endogenous Variables so that we can “see,


or “identify”
the relationships among th
e Endogenous
Variables
.
There are two sub
-
types of Predetermined Variables:


1)
Exogenous Variables

These are the variables that are not endogenous variables and have never been
endogenous variables. These are the “X” variables in the model; the variable
s that you are using to help
explain and predict movements in the endogenous “Y” variables.

2)
Lagged Endogenous Variables

These are
endogenous

variables from
earlier

time periods. If we
know the values of endogenous variables from earlier time periods, s
ometimes these are helpful in
explaining and predicting the movements of the endogenous variables in the current time period. For
example, if we are trying to predict the value of Y in period t, we might be able to use the value of Y in
period t
-
1 to hel
p us make a better prediction. If we include the values of Y variables from earlier time
periods in our SEM, then these are considered a type of Predetermined Variable, because we know the
values from earlier time periods (they are “givens”), and we don’t

need to solve for them using the model.
These Lagged Endogenous Variables are considered another type of “X” variable in the model, because
they help explain and predict the movements of the endogenous “Y” variables in the current time period.


Structura
l/Behavioral Equations

are the original equations in the SEM that represent structural features of the
economy or behavioral aspects of individuals in the economy. For example, the LM curve from macroeconomics
is a Structural Equation, because it represen
ts the structure of the relationship between output, money supply and
interest rates in the economy. The demand curve from microeconomics is an example of a Behavioral Equation,
because it represents the behavior of consumers in a market. Structural/Beha
vioral Equations are constructed
from endogenous and predetermined variables. The parameters (the b’s) of Structural/Behavioral Equations are
called, perhaps not surprisingly,
Structural/Behavioral Parameters
.


Reduced Form Equations

are derived from Stru
ctural/Behavioral Equations and express the endogenous
variables solely as functions of the predetermined variables. The Reduced Form Equations are derived by solving
for the endogenous variables in the Structural/Behavioral Equations of the SEM. Much to

the chagrin of creative
people everywhere, the parameters of Reduced Form Equations are named . . .
Reduced Form Parameters
.
(sigh)



UNC
-
Wilmington

ECN 422

Department of Economics and Finance

Dr. Chris Dumas


5


Indirect Least Squares (ILS) Regression Analysis


The point of making the distinction between Structural/Behavioral Equations and Reduced Form Equations is that
Reduced Form Equations do not suffer from the problem of Simultaneous Equations Bias (yea!)
. If the
Simultaneous Equation Model (SEM) has enough

of
the right kinds of
variables

in the right positions
, then we can

use the
Indirect Least Square (ILS) Regression Analysis

method to solve for the b’s in the
Structural/Behavioral Equations. The ILS method proceeds as follows:


1.

Derive the Reduced Form Equations from the Structural/Behavioral Equations,

2.

Estimate the b’s of the Reduced Form Equations using regression analysis (without Simultaneous
Equations Bias

yea!), and

3.

Calculate the b’s of the Structural/Behavioral Equations b
ased on the b’s from the Reduced Form
Equations (and recall that calculating the b’s of the Structural/Behavioral Equations was our original
goal

nice!).


For example, suppose we
were

working with the demand and supply equations
described

earlier in this h
andout:

Demand: Q
D

= β
0

+ β
1
·P
Q

+ β
2
·I + u
D
, where "u
D
" is an error term in the Demand equation,

Supply: Q
S

= β
3

+ β
4
·P
Q

+ β
5
·P
M

+ u
S
, where "u
S
" is an error term in the
Supply

equation,

Equilibrium Condition: Q
D

= Q
S


There are the
Structural/Behavioral Equations of the demand and supply SEM. The endogenous variables are Q
and P
Q
, and the predetermined variables are I and P
M
. In this particular example, both predetermined variables are
exogenous variables, and we don’t have any lag
ged endogenous variables in the system. Now, to derive the
Reduced
-
Form Equations for this demand and supply SEM, we simply solve for the values of the endogenous
variables (Q and P
Q
) in the system:


Because of the Equilibrium Condition, we can set Demand

equal to Supply . . .


β
0

+ β
1
·P
Q

+ β
2
·I + u
D

= β
3

+ β
4
·P
Q

+ β
5
·P
M

+ u
S


and solve for the endogenous variable P
Q

. . .





[










]

[








]



[








]



[










]

this is the Reduced
-
Form Equa. for P
Q


Next, plug the Reduc
ed
-
Form Equation for P
Q

back into either Demand or Supply, and solve for Q:




[


















]

[










]



[













]



[

















]

this is the Reduced
-
Form Equa. for Q


Now we can do regression analysis on the
Reduced
-
Form Equations, and we won’t have a problem with
Simultaneous Equations Bias.


Notice that the terms in brackets are either collections of constants or collections of constants and error terms.
Each collection of constants acts as a big constant,
so we’ll replace each collection of constants with a “mega
-
constant” (I just invented that term).
The “mega
-
constants” are the Reduced
-
Form Coefficients.
Also, each
collection of constants and error terms acts as a big error term, so we’ll replace each of

these collections with a
“mega
-
error term” (I just invented that term, too).

I’ll use tildes (squiggles) to denote the mega terms in the
Reduced
-
Form equation for P
Q
, and I’ll use hats to denote the mega terms in the Reduced
-
Form equation for Q,
like thi
s:

UNC
-
Wilmington

ECN 422

Department of Economics and Finance

Dr. Chris Dumas


6







̃



̃





̃





̃







̂



̂





̂





̂



The “tilde
-
b’s” and “hat
-
b’s” in the equations above are the Reduced
-
Form Coefficients. Run a regression
analysis separately on each of the Reduced
-
Form equations above (the tilde and
hat equations) to get numbers for
the tilde
-
b’s and hat
-
b’s. Then, you can set each of the tilde
-
b’s and hat
-
b’s equal to the bracketed collection of
b’s that it represents, and, with some tedious algebra, solve for the original b’s in the original Struct
ural
-
Behavioral Equations!! (ta
-
da!)



When Does the
ILS Regression Method

Actually Work
--
The Rank and Order Conditions


Sadly, when regression analysis is used to estimate the Reduced
-
Form Coefficients (the “tilde
-
b’s” and “hat
-
b’s”)
in the Reduced
-
Form
Equations, it is not always possible to use these values to solve for the original b’s in the
original Structural
-
Behavioral Equations. (Egad!)


The Rank and Order Conditions

are a set of rules that determine whether it is possible to solve for the origina
l
Structural Behavioral Coefficients from the Reduced
-
Form Coefficients.

To use the Rank and Order Conditions,
we need to define a few more terms:


M = the number of endogenous variables in the SEM system of equations

m = the number of endogenous variable
s in the equation of interest (the equation for which you want the b’s)

K = the number of predetermined variables in the SEM system of equations

k = the number of predetermined variables in the equation of interest (the equation for which you want the b’s)

A = the matrix of b’s that is constructed from the b’s of the variables
excluded

from the equation of interest


Okay, with the terms above, we can now give the Rank and Order Conditions (drum roll . . .):




If (K


k < m


1), then the equation of interest

is
under
-
identified



If (K


k = m


1) AND (rank of matrix A = M


1), then the
equation of interest is
exactly
-
identified



If (K


k = m


1) AND (rank of matrix A < M


1), then the equation of interest is
under
-
identified



If (K


k >

m


1) AND (rank of matrix A = M


1), then the equation of interest is
over
-
identified



If (K


k > m


1) AND (rank of matrix A < M


1), then the equation of interest is
under
-
identified


Exactly
-
identified

means that you will be able
to use the ILS Regression Method
to solve for the b’s in the
equation of interest

(Yea!)
.


Under
-
identified

means that there are not enough variables in the SEM that are excluded from the equation of
interest to be able to solve for the b’s in the equati
on of interest. So, you will need to go “back to the drawing
board” to change the equations in your SEM or change the variables that are in your SEM until you achieve an
exactly
-
identified or over
-
identified SEM.


Over
-
identified

means that you will be ab
le to solve for the b’s in the equation of interest, but, ironically, there
will be more than one set of solutions for the b’s in the equation of interest, and you don’t know which set is the
true set! In this case, there is an extra step, or “stage,” in
the analysis that you can so in order to obtain estimates
of the “true” set of b’s. Perhaps not surprisingly, the analysis method that involves the extra stage is called (you
can’t make this stuff up) . . .
Two
-
Stage

Least Squares regression analysis (aff
ectionately abbreviated 2SLS).


UNC
-
Wilmington

ECN 422

Department of Economics and Finance

Dr. Chris Dumas


7


Two
-
Stage Least Squares (2SLS) Regression Analysis


Two
-
Stage Least Squares (2SLS)

Regression Analysis is a method of estimating the original b’s in the original
Structural/Behavioral Equations of an SEM
when the SEM is ove
r
-
identified
. The steps of the method are:


1.

First (this is the extra stage in 2SLS), r
egress each endogenous variable on all of the predetermined
variables in the system.

2.

Use the equations to predict the values o
f the endogenous variables. These “predicted variables” are
called
Instrumental Variables
.

3.

Replace any endogenous variables appearing as right
-
hand
-
side “X” variables in the equation of interest
with the corresponding Instrumental Variables.

4.

Estimate the
b’s of the
equation of interest (with the Instrumental Variables replacing the endogenous
right
-
hand
-
side X variables) using regression analysis.


For example,
suppose we were working with the demand and supply equations described earlier in this handout,
but the supply equation had some additional exogenous variables in it, “R” and “G” (doesn’t matter what they
are):

Demand: Q
D

= β
0

+ β
1
·P
Q

+ β
2
·I + u
D
, where "u
D
" is the error term in Demand

Supply: Q
S

= β
3

+ β
4
·P
Q

+ β
5
·P
M

+ β
6
·R + β
7
·G + u
S
, where "u
S
" is the error term in Supply

Equilibrium Condition: Q
D

= Q
S


If we check the Rank and Order Conditions, the demand equation in the system above would be over
-
identified,
so we could not use the ILS regression method
to find its b’s. However, we could use the 2SLS method to find
the b’s in the demand
equation:


1.

First (this is the extra stage in 2SLS), regress P
Q

on I, P
M
, R and G.

2.

Use the equation to predict the value of P
Q
. This predicted variable is the I
nstrumental Variable

for
P
Q
.

3.

Replace the P
Q

in the demand equation with the Instrumental Variable (the predicted P
Q
).

4.

Estimate the b’s of the demand equation (with the Instrumental Variable replacing P
Q

on the right
-
hand
-
side) using regression analysis.




UNC
-
Wilmington

ECN 422

Department of Economics and Finance

Dr. Chris Dumas


8


Indirect Least Squares (ILS)

in SAS


The "PROC SYSLIN 2SLS" procedure is used for both ILS and 2SLS

in SAS
. As an example of ILS from
macroeconomics, suppose we have data on aggregate consumption (c), national income (y) and aggregate
investment (i), an
d
suppose we believe that these variables are related to one another in the following SEM:


c = β
1

+ β
2
·y + u
c
, where "u
c
" is an error term,


y = c + i

+ u
y

where "u
y
" is an error term,


These two equations together are a simultaneous
equation mode
l (SEM)

because there are two or more variables
(in this case, c and y) that are in both equations. The variables c and y are endogenous, because they are in both
equations, but the variable i is exogenous because it is in one equation only. In SAS, you mu
st specify the model
equation
s
, which variables are endogenous
,

and which are exogenous. SAS calls the exogenous variables
"instruments."


proc syslin 2sls

data=dataset02
;

model c = y;

model

y = c + i;

endogenous c y;

instruments i;

run;



Two Stage Least

Squares (2SLS)

in SAS


The "PROC SYSLIN 2SLS" procedure is used for both ILS and 2SLS. As an example of 2SLS from
microeconomics, let's consider supply and demand for product Q. Suppose we have data on the quantity Q of the
product traded in various
locations, the price P
Q

in each location, average consumer income I in each location,
the
price of a substitute product P
S

in each location, and the price of materials P
M

in each location. We want to
estimate the supply and demand equations:



Demand: Q

= β
0

+ β
1
·
P
Q

+ β
2
·I + β
3
·P
S

+ u
D
, where "u
D
" is an error term

in the Demand equation
,


Supply: Q = β
4

+ β
5
·P
Q

+ β
6
·P
M

+ u
S
, where "u
S
" is an error term in the
Supply

equation,


These two equations together are a simultaneous system because t
here are two or more variables (in this case,
Q

and
P
Q
) that are in both equations. The variables
Q and P
Q

are endogenous, because they are in both equations, but
the variable
s

I, P
S

and P
M

are

exogenous because
each is in only one equation
. In SAS, you
must specify the
model equation
s, which

variables are endogenous
,

and which are exogenous. SAS calls the exogenous variables
"instruments."


proc syslin 2sls

data=dataset02
;

model
Q

=
P
Q
I P
S
;

model

Q = P
Q
P
M

;

endogenous

Q P
Q
;

instruments

I

P
S

P
M
;

run;