UNC

Wilmington
ECN 422
Department of Economics and Finance
Dr. Chris Dumas
1
Simultaneous Equations Models
in SAS
Indirect Least Squares (ILS) and 2

Stage Least Squares (2SLS)
Simultaneous Equations Models
(SEMs) are models in which two or more equations share two or more
variables that link the equations together in a system. In such models, the variables that appear in two or more
equations are said to be “mutually dependent,” or “jointly

determined,” or
they are said to have a “simultaneous”
or “two

way” relationship. Such variables affect one another, causing “feedback” relationships between the
equations in the system; that is, if something in one equation changes, the change causes a change in anothe
r
equation which, in turn, “feeds back” to cause a further change in the first equation.
Such situations are
sometimes referred to as “chick
en
and the egg” situations, because it is difficult to determine which came first, a
change in one equation, or a c
hange in
a second
equation, if the two equations are affecting each other.
The
feedback effects
in SEMs
either spiral out
of control or reach an
equilibrium of some sort.
Usually, they
reach an equilibrium (otherwise, our world would be much more explo
sive than we observe). Many of the
standard models of Economics and Finance are SEMs that reach equilibrium (
well, usually they reach
equilibrium,
under typical conditions). Two well

known examples are the Demand and Supply model of
Microeconomics, and t
he IS

LM model of Macroeconomics.
The Problem with Simultaneous Equation Models
—
Simultaneous Equations Bias
Although very common in Economics and Finance, SEM’s face a potential problem when it comes to
Econometric estimation of the parameters in their
equations using OLS regression. Recall that one of the
assumptions of OLS regression is that the error term in the regression equation is independent of the X
(and Y)
variables in the equation.
Well, SEM’s typically violate this assumption, with undesir
able results
,
econometrically speaking (Can I say, “econometrically speaking?” Well, I guess I just did.)
. To see why this is
so, consider as an example the Demand and Supply model from Microeconomics.
Suppose we have data on the quantity Q of a produ
ct traded at various locations when the market is in
equilibrium, the
market
price P
Q
in each location, average consumer income I in each location, and the price of
materials P
M
in each location. We want to estimate the parameters (the b’s) in the supply and demand equations:
Demand: Q
D
= β
0
+ β
1
·P
Q
+ β
2
·I + u
D
, where "u
D
" is an error term in the Demand equation,
Supply: Q
S
= β
3
+ β
4
·P
Q
+ β
5
·P
M
+ u
S
, where "u
S
"
is an error term in the
Supply
equation,
Equilibrium Condition: Q
D
= Q
S
Notice first that Demand and Supply are a SEM, because
together
they are two equations that share two variables
in common, Q and P
Q
. Now, suppose that something outside the model
changes, and
this change affects D
emand.
This would appear in the model as a change in u
D
, the error term in the Demand equation. Let’s say that there is
an increase in u
D
.
All else held constant
in the Demand equation
, this would result in an increase
in Q
D
. This, in
turn, would result in an increase in Q
S
in equilibrium, because, in equilibrium, Q
D
= Q
S
. Next, in the Supply
equation, if Q
S
increases, and P
M
remains constant, and u
S
is a random error unaffected by the change in Q
S
, then
the only way that the
equality in the
Supply equation can be maintained is if P
Q
increases. Now
, if P
Q
in the
Supply equation increases, then P
Q
in the Demand equation must
also
increase, because it is the same P
Q
in both
equations.
It is at this
point that the Econometric problem occurs: a change in the error term in the Demand
equation, u
D
, affected an “X” variable in the Demand equation, namely, the variable P
Q
. This violates the
assumption of the OLS method that the error term in the regressio
n equation must be independent of the X (and
Y) variables in the equation.
UNC

Wilmington
ECN 422
Department of Economics and Finance
Dr. Chris Dumas
2
As a second example, consider the IS

LM model from Macroeconomics. Suppose we have data on national
output Y, money supply M, and the interest rate r, and we want to estimate the
parameters in the IS and LM
equations below:
IS Equation: Y
IS
= β
0
+ β
1
·r + u
IS
, where "u
IS
" is an error term in the IS equation,
LM Equation:
Y
LM
= β
2
+ β
3
·M + β
4
·r + u
LM
, where "u
LM
" is an error term in the LM equation,
Equilibrium Condit
ion:
Y
IS
=
Y
LM
Notice first that IS and LM are a SEM, because together they are two equations that share two variables in
common, Y and r. Now, suppose that something outside the model changes, and this change affects the IS
equation. This would appear
in the model as a change in u
IS
, the error term in the
IS
equation. Let’s say that there
is an increase in u
IS
. All else held constant in the
IS
equation, this would result in an increase in
Y
IS
. This, in
turn, would result in an increase in
Y
LM
in equilibrium, because, in equilibrium,
Y
IS
=
Y
LM
. Next, in the
LM
equation, if
Y
LM
increases, and
M
remains constant, and u
LM
is a random error unaffected by the change in
Y
LM
,
then the only way that the equality in the
LM
equation can be maintained is
if
r
increases. Now, if
r
in the
LM
equation increases, then
r
in the
IS
equation must also increase, because it is the same
r
in both equations. It is at
this point that the Econometric problem occurs: a change in the error term in the
IS
equation, u
D
,
affected an “X”
variable in the
IS
equation, namely, the variable
r
. This violates the
OLS
assumption that the error term in the
regression equation
is
independent of the X (and Y) variables in the equation.
Okay, so what? Well, it can be shown that if
this assumption of the OLS method is violated, the following
negative consequences occur:
Simultaneous Equations Bias
:
(1) the estim
ates of the
b’s are
biased
(2) the estimates of the b’s are
inconsistent
(that is, a larger sample size
will not
diminish
the bias)
The Identification Problem
The Identification Problem is another problem that is characteristic of SEMs. The
Identification Problem
is that
it can be difficult to determine (that is, to identify) which equation in an SEM system is being estimated in a
regression analysis.
For example, suppose you have data on the market
quantity Q and market price P
Q
of a product traded in the
market, and suppose you want to use regression analysis to estimate a demand curve for product Q.
These data
are plotted twice in the figures below.
The
same data are plotted in each figure, but t
wo demand curves are drawn
through the data in the figure on the left, and two supply curves are drawn through the (same) data in the figure on
the right.
P
Q
Q
0
Is Demand Shifting ?
P
Q
Q
0
Or, Is Supply Shifting ?
(cue spooky music from Halloween)
UNC

Wilmington
ECN 422
Department of Economics and Finance
Dr. Chris Dumas
3
Look back at the demand and supply equations we considered earlier in this handout. The data points in the
figures above could represent the demand equation, with a change in variable “I” causing the demand curve to
shift. On the other
hand, the data points could instead represent the supply equation, with a change in “P
M
”
causing the supply curve to shift.
The key point: If we have data for only Q and P
Q
(and no data for I and P
M
), then it is difficult to “identify” which
equation is
represented by the data points.
If we nonetheless
did a regression analysis with data for only Q and
P
Q
, then we wouldn’t be sure whether we were actually estimating a demand curve or a supply curve. Although it
might be obvious from the situation that
you are studying whether the curve in this
simple
example is demand or
supply, in more complicated (
i.e., more
realistic) situations involving more equations and more variables, it can be
very difficult to identify which of the equations you are actually e
stimating wh
en you do a regression analysis

unless you are careful . . .
Identifying an Equation Depends on the Variables that
A
re
in the System but
NOT
in the Equation
!
?!
Reconsider the problem of identifying whether the demand equation or the supply equation is represented by the
figures above. If we had data on variable I, then as variable I changed
its
value, the demand curve would shift
along the supply curve, and the
data points would show us the location of the
supply
curve
—
that is, a change in
the value of a variable in the
demand
curve allows us to “see,” or “identify,”
the
supply
curve, as illustrated in the
figure
on the left
below:
Notice in the figure
on the left
above that identifying the supply curve depends on changes in the value of a
variable that is NOT in the supply curve, namely, the variable I that is in the demand curve. The variable I
is
in
the demand

supply
system
of equations, but it is
not
in the sup
ply curve; this is what allows us to use the variable
I to identify the supply curve.
Now consider the problem of identifying the demand curve. I
f we had data on variable P
M
in the
supply
curve
,
then as variable P
M
changed
its
value, the supply curve would shift along the demand curve, and the resulting data
points would show us the location of the demand curve
—
that is, a change in the value of a variable in the
supply
curve allows us to “see,” or “identify,”
the
demand
curve, as illustrated in the figure
on the right above.
I
dentifying the demand curve depends on changes in the value of a variable that is NOT in the demand curve,
namely, the variable P
M
that is in the supply curve. The variable
P
M
is
in the demand

sup
ply
system
of equations,
but it is
not
in the
demand
curve; this is what allows us to use the variable
P
M
to identify the
demand
curve.
(Tricky, no? And very Zen master

like, wouldn’t you agree, grasshopper?)
P
Q
Q
0
A shifting Demand Curve reveals
the location of the Supply Curve
P
Q
Q
0
A shifting Supply Cur
ve reveals
the location of the Demand Curve
UNC

Wilmington
ECN 422
Department of Economics and Finance
Dr. Chris Dumas
4
Terminology Involved with Identifying
the
Equations in SEMs
Endogenous Variables
are variables whose values are determined
inside
the
SEM
. These are the
“Y”
variables
that you are using the model to “solve for.” For example, in a model of demand and supply, the endogenous
variables would be
Q and P
Q
. There might be other variables in the demand and supply equations, but we would
need to be given the values of these other variables to plug into the equations in order to solve for the values of
the endogenous variables, Q and P
Q
.
In contrast
to Endogenous Variables,
Predetermined Variables
are variables whose values are determined
outside
the SEM
. We must be “given” the values of these variables; we do not solve for them using the model.
The point of Predetermined Variables is to act as “co
ntrols” on the relationship between the Endogenous
Variables in the model; that is, the Predetermined Variables act as shifters, shifting around the relationships
among
the Endogenous Variables so that we can “see,
”
or “identify”
the relationships among th
e Endogenous
Variables
.
There are two sub

types of Predetermined Variables:
1)
Exogenous Variables
—
These are the variables that are not endogenous variables and have never been
endogenous variables. These are the “X” variables in the model; the variable
s that you are using to help
explain and predict movements in the endogenous “Y” variables.
2)
Lagged Endogenous Variables
—
These are
endogenous
variables from
earlier
time periods. If we
know the values of endogenous variables from earlier time periods, s
ometimes these are helpful in
explaining and predicting the movements of the endogenous variables in the current time period. For
example, if we are trying to predict the value of Y in period t, we might be able to use the value of Y in
period t

1 to hel
p us make a better prediction. If we include the values of Y variables from earlier time
periods in our SEM, then these are considered a type of Predetermined Variable, because we know the
values from earlier time periods (they are “givens”), and we don’t
need to solve for them using the model.
These Lagged Endogenous Variables are considered another type of “X” variable in the model, because
they help explain and predict the movements of the endogenous “Y” variables in the current time period.
Structura
l/Behavioral Equations
are the original equations in the SEM that represent structural features of the
economy or behavioral aspects of individuals in the economy. For example, the LM curve from macroeconomics
is a Structural Equation, because it represen
ts the structure of the relationship between output, money supply and
interest rates in the economy. The demand curve from microeconomics is an example of a Behavioral Equation,
because it represents the behavior of consumers in a market. Structural/Beha
vioral Equations are constructed
from endogenous and predetermined variables. The parameters (the b’s) of Structural/Behavioral Equations are
called, perhaps not surprisingly,
Structural/Behavioral Parameters
.
Reduced Form Equations
are derived from Stru
ctural/Behavioral Equations and express the endogenous
variables solely as functions of the predetermined variables. The Reduced Form Equations are derived by solving
for the endogenous variables in the Structural/Behavioral Equations of the SEM. Much to
the chagrin of creative
people everywhere, the parameters of Reduced Form Equations are named . . .
Reduced Form Parameters
.
(sigh)
UNC

Wilmington
ECN 422
Department of Economics and Finance
Dr. Chris Dumas
5
Indirect Least Squares (ILS) Regression Analysis
The point of making the distinction between Structural/Behavioral Equations and Reduced Form Equations is that
Reduced Form Equations do not suffer from the problem of Simultaneous Equations Bias (yea!)
. If the
Simultaneous Equation Model (SEM) has enough
of
the right kinds of
variables
in the right positions
, then we can
use the
Indirect Least Square (ILS) Regression Analysis
method to solve for the b’s in the
Structural/Behavioral Equations. The ILS method proceeds as follows:
1.
Derive the Reduced Form Equations from the Structural/Behavioral Equations,
2.
Estimate the b’s of the Reduced Form Equations using regression analysis (without Simultaneous
Equations Bias
—
yea!), and
3.
Calculate the b’s of the Structural/Behavioral Equations b
ased on the b’s from the Reduced Form
Equations (and recall that calculating the b’s of the Structural/Behavioral Equations was our original
goal
—
nice!).
For example, suppose we
were
working with the demand and supply equations
described
earlier in this h
andout:
Demand: Q
D
= β
0
+ β
1
·P
Q
+ β
2
·I + u
D
, where "u
D
" is an error term in the Demand equation,
Supply: Q
S
= β
3
+ β
4
·P
Q
+ β
5
·P
M
+ u
S
, where "u
S
" is an error term in the
Supply
equation,
Equilibrium Condition: Q
D
= Q
S
There are the
Structural/Behavioral Equations of the demand and supply SEM. The endogenous variables are Q
and P
Q
, and the predetermined variables are I and P
M
. In this particular example, both predetermined variables are
exogenous variables, and we don’t have any lag
ged endogenous variables in the system. Now, to derive the
Reduced

Form Equations for this demand and supply SEM, we simply solve for the values of the endogenous
variables (Q and P
Q
) in the system:
Because of the Equilibrium Condition, we can set Demand
equal to Supply . . .
β
0
+ β
1
·P
Q
+ β
2
·I + u
D
= β
3
+ β
4
·P
Q
+ β
5
·P
M
+ u
S
and solve for the endogenous variable P
Q
. . .
[
]
[
]
[
]
[
]
this is the Reduced

Form Equa. for P
Q
Next, plug the Reduc
ed

Form Equation for P
Q
back into either Demand or Supply, and solve for Q:
[
]
[
]
[
]
[
]
this is the Reduced

Form Equa. for Q
Now we can do regression analysis on the
Reduced

Form Equations, and we won’t have a problem with
Simultaneous Equations Bias.
Notice that the terms in brackets are either collections of constants or collections of constants and error terms.
Each collection of constants acts as a big constant,
so we’ll replace each collection of constants with a “mega

constant” (I just invented that term).
The “mega

constants” are the Reduced

Form Coefficients.
Also, each
collection of constants and error terms acts as a big error term, so we’ll replace each of
these collections with a
“mega

error term” (I just invented that term, too).
I’ll use tildes (squiggles) to denote the mega terms in the
Reduced

Form equation for P
Q
, and I’ll use hats to denote the mega terms in the Reduced

Form equation for Q,
like thi
s:
UNC

Wilmington
ECN 422
Department of Economics and Finance
Dr. Chris Dumas
6
̃
̃
̃
̃
̂
̂
̂
̂
The “tilde

b’s” and “hat

b’s” in the equations above are the Reduced

Form Coefficients. Run a regression
analysis separately on each of the Reduced

Form equations above (the tilde and
hat equations) to get numbers for
the tilde

b’s and hat

b’s. Then, you can set each of the tilde

b’s and hat

b’s equal to the bracketed collection of
b’s that it represents, and, with some tedious algebra, solve for the original b’s in the original Struct
ural

Behavioral Equations!! (ta

da!)
When Does the
ILS Regression Method
Actually Work

The Rank and Order Conditions
Sadly, when regression analysis is used to estimate the Reduced

Form Coefficients (the “tilde

b’s” and “hat

b’s”)
in the Reduced

Form
Equations, it is not always possible to use these values to solve for the original b’s in the
original Structural

Behavioral Equations. (Egad!)
The Rank and Order Conditions
are a set of rules that determine whether it is possible to solve for the origina
l
Structural Behavioral Coefficients from the Reduced

Form Coefficients.
To use the Rank and Order Conditions,
we need to define a few more terms:
M = the number of endogenous variables in the SEM system of equations
m = the number of endogenous variable
s in the equation of interest (the equation for which you want the b’s)
K = the number of predetermined variables in the SEM system of equations
k = the number of predetermined variables in the equation of interest (the equation for which you want the b’s)
A = the matrix of b’s that is constructed from the b’s of the variables
excluded
from the equation of interest
Okay, with the terms above, we can now give the Rank and Order Conditions (drum roll . . .):
If (K
–
k < m
–
1), then the equation of interest
is
under

identified
If (K
–
k = m
–
1) AND (rank of matrix A = M
–
1), then the
equation of interest is
exactly

identified
If (K
–
k = m
–
1) AND (rank of matrix A < M
–
1), then the equation of interest is
under

identified
If (K
–
k >
m
–
1) AND (rank of matrix A = M
–
1), then the equation of interest is
over

identified
If (K
–
k > m
–
1) AND (rank of matrix A < M
–
1), then the equation of interest is
under

identified
Exactly

identified
means that you will be able
to use the ILS Regression Method
to solve for the b’s in the
equation of interest
(Yea!)
.
Under

identified
means that there are not enough variables in the SEM that are excluded from the equation of
interest to be able to solve for the b’s in the equati
on of interest. So, you will need to go “back to the drawing
board” to change the equations in your SEM or change the variables that are in your SEM until you achieve an
exactly

identified or over

identified SEM.
Over

identified
means that you will be ab
le to solve for the b’s in the equation of interest, but, ironically, there
will be more than one set of solutions for the b’s in the equation of interest, and you don’t know which set is the
true set! In this case, there is an extra step, or “stage,” in
the analysis that you can so in order to obtain estimates
of the “true” set of b’s. Perhaps not surprisingly, the analysis method that involves the extra stage is called (you
can’t make this stuff up) . . .
Two

Stage
Least Squares regression analysis (aff
ectionately abbreviated 2SLS).
UNC

Wilmington
ECN 422
Department of Economics and Finance
Dr. Chris Dumas
7
Two

Stage Least Squares (2SLS) Regression Analysis
Two

Stage Least Squares (2SLS)
Regression Analysis is a method of estimating the original b’s in the original
Structural/Behavioral Equations of an SEM
when the SEM is ove
r

identified
. The steps of the method are:
1.
First (this is the extra stage in 2SLS), r
egress each endogenous variable on all of the predetermined
variables in the system.
2.
Use the equations to predict the values o
f the endogenous variables. These “predicted variables” are
called
Instrumental Variables
.
3.
Replace any endogenous variables appearing as right

hand

side “X” variables in the equation of interest
with the corresponding Instrumental Variables.
4.
Estimate the
b’s of the
equation of interest (with the Instrumental Variables replacing the endogenous
right

hand

side X variables) using regression analysis.
For example,
suppose we were working with the demand and supply equations described earlier in this handout,
but the supply equation had some additional exogenous variables in it, “R” and “G” (doesn’t matter what they
are):
Demand: Q
D
= β
0
+ β
1
·P
Q
+ β
2
·I + u
D
, where "u
D
" is the error term in Demand
Supply: Q
S
= β
3
+ β
4
·P
Q
+ β
5
·P
M
+ β
6
·R + β
7
·G + u
S
, where "u
S
" is the error term in Supply
Equilibrium Condition: Q
D
= Q
S
If we check the Rank and Order Conditions, the demand equation in the system above would be over

identified,
so we could not use the ILS regression method
to find its b’s. However, we could use the 2SLS method to find
the b’s in the demand
equation:
1.
First (this is the extra stage in 2SLS), regress P
Q
on I, P
M
, R and G.
2.
Use the equation to predict the value of P
Q
. This predicted variable is the I
nstrumental Variable
for
P
Q
.
3.
Replace the P
Q
in the demand equation with the Instrumental Variable (the predicted P
Q
).
4.
Estimate the b’s of the demand equation (with the Instrumental Variable replacing P
Q
on the right

hand

side) using regression analysis.
UNC

Wilmington
ECN 422
Department of Economics and Finance
Dr. Chris Dumas
8
Indirect Least Squares (ILS)
in SAS
The "PROC SYSLIN 2SLS" procedure is used for both ILS and 2SLS
in SAS
. As an example of ILS from
macroeconomics, suppose we have data on aggregate consumption (c), national income (y) and aggregate
investment (i), an
d
suppose we believe that these variables are related to one another in the following SEM:
c = β
1
+ β
2
·y + u
c
, where "u
c
" is an error term,
y = c + i
+ u
y
where "u
y
" is an error term,
These two equations together are a simultaneous
equation mode
l (SEM)
because there are two or more variables
(in this case, c and y) that are in both equations. The variables c and y are endogenous, because they are in both
equations, but the variable i is exogenous because it is in one equation only. In SAS, you mu
st specify the model
equation
s
, which variables are endogenous
,
and which are exogenous. SAS calls the exogenous variables
"instruments."
proc syslin 2sls
data=dataset02
;
model c = y;
model
y = c + i;
endogenous c y;
instruments i;
run;
Two Stage Least
Squares (2SLS)
in SAS
The "PROC SYSLIN 2SLS" procedure is used for both ILS and 2SLS. As an example of 2SLS from
microeconomics, let's consider supply and demand for product Q. Suppose we have data on the quantity Q of the
product traded in various
locations, the price P
Q
in each location, average consumer income I in each location,
the
price of a substitute product P
S
in each location, and the price of materials P
M
in each location. We want to
estimate the supply and demand equations:
Demand: Q
= β
0
+ β
1
·
P
Q
+ β
2
·I + β
3
·P
S
+ u
D
, where "u
D
" is an error term
in the Demand equation
,
Supply: Q = β
4
+ β
5
·P
Q
+ β
6
·P
M
+ u
S
, where "u
S
" is an error term in the
Supply
equation,
These two equations together are a simultaneous system because t
here are two or more variables (in this case,
Q
and
P
Q
) that are in both equations. The variables
Q and P
Q
are endogenous, because they are in both equations, but
the variable
s
I, P
S
and P
M
are
exogenous because
each is in only one equation
. In SAS, you
must specify the
model equation
s, which
variables are endogenous
,
and which are exogenous. SAS calls the exogenous variables
"instruments."
proc syslin 2sls
data=dataset02
;
model
Q
=
P
Q
I P
S
;
model
Q = P
Q
P
M
;
endogenous
Q P
Q
;
instruments
I
P
S
P
M
;
run;
Comments 0
Log in to post a comment