Determinants of Dying
from
Coronary Artery Disease
By
Justin Brown
Student
Economics Major
Submitted to:
Dr. Jackie Khorassani
Instructor of Econ 421
2005
Introduction:
In 2001, 700,000 people died from
coronary artery disease.
Coronary artery disease is
the hardening of the arteries near the heart. This hardening can lead to reduced blood flow, heart
attacks, and death (“What is Coronary...” 2003)
.
Why are so many people dying from coronary
artery dis
ease? With the use of OLS regression analysis of 49 observations in the year 2000, I
will examine thirteen variables that may affect coronary artery disease. The thirteen variables
are education level, income level, lack of health insurance, state health
expenditures, alcohol
consumption, inactivity, state stress levels, tobacco consumption, average age of the population,
diabetes, high blood cholesterol, high blood pressure, and obesity.
This study is organized in seven sections. Section one is the int
roduction. Section two is
a description of each variable and the reasons for each variable’s inclusion in my model. Section
three discusses the meaning of the raw data. Section four and five test my regression analysis for
common errors. Section six di
scusses the significance of each variable and the effects of each
significant variable. Section seven concludes with a brief wrap up of this study.
Empirical Model:
For the purpose of measuring the effects of thirteen factors on the death rate caused by
coronary artery disease among the US population in 2001, Equation 1 is estimated with a cross
sectional data set that consists of 49 observations from 49 US states
1
. The method of estimation
is OLS, and the estimation software is EViews:
1
Florida
was excluded for lack of state health expenditure data.
3
Equation 1: LCAD = F (EDU, INC, LHI, SHE, ALC, INY, SSL, TOB, AAP, DIB, HBC, HBP, OBY) + error
term
The dependent variable, LCAD, is the population per 100,000 that dies from
coronary
artery disease. Table 1 includes the definition of the independent variables, and their expected
effect on the dependent variable.
Table 1: Independent Variables
Variables
Definitions
Expected Sign of the
Coefficients
EDU
percentage of the 25
and older population that has a college degree
negative
INC
per capita disposable personal income in current dollars
negative
LHI
percent of the population without health insurance
positive
SHE
per capita dollars spent by the state on health care
negative
ALC
average gallons of beer consumed per person
ambiguous
INY
percentage of adults with no leisure

time physical activity
positive
SSL
percent of the population of each s tate living in metropolitan areas
ambiguous
TOB
percentage of the
population that has reported having s moked 100
or more cigarettes during their lifetime and who currently s moke
every day or s ome days
pos itive
AAP
average age of the population
positive
DIB
percent of the population that has been diagnosed with diabetes
positive
HBC
percent of the population with high cholesterol
positive
HBP
percentage of adults with high blood pressure
positive
OBY
percentage of adults who were obese
positive
Coronary artery disease is the hardening of the arteries near the heart that leads to reduced blood
flow, heart attacks, and can lead to death (“What is Coronary...” 2003).
The nature of my thirteen independent variables allows me to summarize them in th
ree
4
categories: economic independent variables, lifestyle independent variables, and medical/genetic
independent variables. The economic independent variables are EDU (education), INC
(income), LHI (lack of health insurance), and SHE (state health expendi
tures). The lifestyle
independent variables are ALC (alcohol), INY (inactivity), SSL (state stress levels), and TOB
(tobacco). The medical/genetic independent variables are AAP (average age of the population),
DIB (diabetes), HBC (high blood cholesterol)
, HBP (high blood pressure), and OBY (obesity).
Economic Independent Variables:
The first economic variable is EDU, or education. EDU measures the percentage of the
25 and older population that has a college degree in each state in the year 2000. The effect of
education on the number of cases of lethal coronary artery disease overla
ps with the effect of
many of the other variables in my study.
The Tromso Heart Study
(1988) found that more
educated people are less likely to be overweight, seem to smoke less, are more physically active,
and have better diets (“Risk factors for...” 19
88). That is why I expect the sign of the coefficient
of EDU to be negative.
The second economic variable is INC, or income. More specifically, INC measures the
per capita disposable personal income in current dollars in each state in the year 2000.
A
h
ealth
study
(2001)
out of Canada
shows that
in general
the higher the income level, the more likely a
person is to have an active lifestyle, a healthy weight, not smoke, and not drink dangerous
amounts of alcohol. (“Health and Wealth...
” 2001
).
Due to th
is study, I expect the sign of the
coefficient for INC to be negative.
The third economic variable is LHI, or lack of health insurance. LHI measures the
percentage of the population who did not have health insurance in each state in 2002. According
5
to a
2003 report by the
Robert Wood Johnson Foundation
, the uninsured are more likely than
those who have health coverage to receive second

rate care and to die from health

related
problems (Anil Kumar. 2004). Due to this study’s findings, I expect the sign
of the coefficient
of LHI to be positive.
The fourth economic variable is SHE, or state healthcare expenditures.
SHE measures
the per capita dollars spent by the state on health care in each state in the year 200
0
. The effect
of this variable on the per
centage of the population that dies from coronary artery disease is
similar to the effects of income and health insurance. State healthcare expenditures can take
different forms such as funding clinics and hospitals, state

funded insurance, and funds give
n
directly to citizens to spend on healthcare. The money spent on clinics and hospitals improve the
quality of the care provided, which will decrease lethal coronary artery disease. State

funded
health insurance and funds given to citizens increase the q
uantity of health care a person can
afford, meaning fewer cases of lethal coronary artery disease. Due to these effects, I expect the
sign of the coefficient for SHE to be negative.
Lifestyle Independent Variables:
The first lifestyle variable is ALC, or
alcohol consumption. More specifically,
ALC
measures
the average gallons of beer consumed per person in each state in the year 20
00
. The
effects of alcohol on the heart are a little questionable. One study by the
Cleveland Clinic Heart
Center
(2004) ha
s found that, “moderate alcohol consumption (wine or beer) does offer some
protection against heart disease for some people (“Heart Disease: Alcohol...
” 2004
).” Alcohol’s
poisonous effects, however, may be dangerous to the heart. The same article warns
that those
who already have heart disease should avoid alcohol, and it also warns to not start drinking
6
because the same benefits made by alcohol can be produced through healthy eating and exercise
(“Heart Disease: Alcohol...” 2004). Due to the uncertain
ty of the effects of alcohol, I expect the
sign of the coefficient of ALC to be ambiguous.
The second lifestyle variable is INY, or inactivity. This variable is measured as the
percentage of adults who reported no leisure

time physical activity in each s
tate in 2000.
Inactivity prevents the heart from
benefiting
from exercise. There are numerous benefits to
exercise for the heart including strengthening the heart and cardiovascular system, improving
circulation and helping the body use oxygen better, im
proving heart failure symptoms, lowering
blood pressure and helping reduce stress, tension, anxiety and depression (“Heart Disease:
Exercise...” 2004). Being active obviously has good effects for the heart, so I predict that the
sign of the coefficient
of INY will be positive.
The third lifestyle variable is SSL, or state stress levels. SSL is measured as the
percentage of the population of each state living in metropolitan areas in 2000.
This
is not a
perfect measure of stress. It doesn’t include ot
her sources of stress outside of living in a city,
such as the number of children per couple, the nature
of
their jobs, or how well people respond to
stressful situations. Data for these and any other sources of stress are not
available;
therefore I
was unable to include them in my measure of stress. I also have to consider that people in
metropolitan areas most likely have better access to healthcare, which may affect the results for
this variable. The connection between stress and hea
rt health has not been proven. That is
because, according to an article by the
Texas Heart Institute
(
2004) on heart disease, people
define and respond to stress in different ways (“Causes of Heart Disease.” 2004). It is hard to
determine why stress may
be damaging to the heart. In general, however, the article points to
7
three effects of stress that would have a damaging effect on the heart. Those three are: 1)
stressful situations increase heart rate and blood pressure, which makes the heart demand
ad
ditional oxygen, 2) during stress, extra hormones are released which causes blood pressure to
increase, 3) and stress also increases the amount of clotting agents that are flowing in the blood.
The need for additional oxygen can cause angina (pain of and
around the heart) in persons with
preexisting heart disease. Angina can damage the heart and blood vessels further, leading to
hardening of the arteries. The increase in blood pressure can damage artery walls. When this
damage heals, the arteries may be
come hard and more prone to collect plaque.
Additional
clotting agents in the blood make
it more likely to form a clot in arteries that are already partially
blocked by plaque (“Causes of Heart Disease.” 2004). Considering these effects of stress on the
heart, but keeping in mind the issues of measuring stress in this manner, I am expecting the sign
of the coefficient of SSL to be ambiguous.
The fourth lifestyle variable is TOB, or tobacco smoke consumption. TOB is measured
as the percentage of the pop
ulation that has reported having smoked 100 or more cigarettes
during their lifetime and who currently smoke every day or some days in each state as of the year
200
0
. There is a strong link between smoking and developing lethal coronary artery disease.
A
ccording to
The Cleveland Clinic
, smoking increases risk of coronary artery disease in four
ways: 1) decreased oxygen to the heart, 2) increased blood pressure and heart rate, 3) increased
blood clotting, and 4) damage to cells that line coronary arteries
and other blood vessels (“Heart
Disease: Smoking...” 2004). I already established that these four effects are damaging and will
lead to more cases of lethal coronary artery disease. That is why I expect that the sign of the
coefficient of TOB is positi
ve.
8
Medical/Genetic Independent Variables:
The first medical/genetic variable is AAP, or average age of the population. AAP is
measured exactly how it sounds, the average age of the population in every state in 2000.
It is
common sense that the older
you are, the more health problems you are likely to have. Statistics
also show that about 80% of the deaths from coronary artery disease are people age 65 and older
(“Coronary.” 2005)
. Therefore
, I predict that
the sign
of the coefficient of AAP is posi
tive.
The second medical/genetic variable is DIB, or diabetes. This is measured as the
percentage of the population that has been diagnosed as having diabetes in each state in the year
200
1
. The reasons why diabetes increases cases of coronary artery di
sease are not completely
understood. However, according to
The Cleveland Clinic
, the high glucose levels in the blood
from diabetes may damage the small blood vessels of the heart and predispose a person to
atherosclerosis (hardening) of the large arterie
s (“Diabetes...
” 2004
). Since this is the definition
of coronary artery disease, I expect the sign of the coefficient of DIB to be positive.
The third medical/genetic variable is HBC, or high blood cholesterol. HBC is measured
as the percentage of the a
dult population in each state that reported having high cholesterol in
200
1
. According to the
National Heart Lung and Blood Institute
(2003), too much cholesterol in
your blood can build up in the walls of your arteries. This buildup of cholesterol is cal
led plaque.
Over time, plaque can cause hardening of the arteries (“What is Coronary...” 2003). Given that
coronary artery disease is defined as hardening of the arteries high cholesterol obviously is a
cause of coronary artery disease (“What is Coronary.
..” 2003). Therefore, I expect the sign of
the coefficient of HBC to be positive.
The fourth medical/genetic variable is HBP, or high blood pressure. This is measured as
9
the percentage of adults who have ever been told by a health

care provider that the
y have high
blood pressur
e
. High blood pressure may be caused by smoking, excessive alcohol consumption,
inactivity, and obesity, all of which are a part of my thirteen independent variables (“Heart
Disease: Risk ...
” 2004).
However, to a certain extent
, getting high blood pressure seems to be
genetic, and may not be a good indicator of the overall health of the circulatory system.
Considering that the long list of possible causes of high blood pressure makes it the most
common coronary artery disease r
isk factor, it still should have a direct impact.
High blood pressure increases cases of lethal coronary artery disease for two reasons: 1)
it makes the heart work harder to supply the body with blood and 2) it contributes to the
hardening of the arter
ies.
Why does it make the heart work harder? First, for clarification, high blood pressure
causes the heart to work harder but the heart working harder does not necessarily cause high
blood pressure. Blood pressure is determined by two forces: 1) the pu
mping of the heart, and 2)
the force of the arteries resisting the blood flow (“Blood Pressure” 2004). In most cases, it is the
increase of resistance to the blood flow from the arteries that causes high blood pressure. As the
resistance to blood flow is
increased, the heart must work harder to accomplish its job. A harder
working heart has a shorter life. If that isn’t bad enough, the increase in resistance to blood flow
from the arteries happens when arteries are damaged and harden. High blood press
ure,
therefore
, can
be considered a sign that there may be some coronary artery disease present.
Therefore, I predict the sign of the coefficient of HBP is positive.
The fifth medical/genetic variable is OBY, or obesity. OBY is measured as t
he
percentage of adults in the US who were obese in each state in 200
1
. Since inactivity can cause
10
obesity, they have the same links to coronary artery disease, but there are additional effects from
obesity. A study by the
American Heart Association
(199
7) finds that obesity is connected to
heart disease both indirectly (through other factors) and directly (“Obesity and Heart Disease”
1997). Considering this evidence, I expect the sign of the coefficient of OBY to be positive.
Data
Analysis
:
Table 2
shows the lowest values and corresponding states, the highest values and their
corresponding states, and the mean values for each variable.
Table 2: Data Analysis: Maximum, Minimum, and Mean
Variable
Minimum
Maximum
Mean
LCAD
171.0

Minnesota
329.0

Mississippi
238.12
EDU
15.3%

West Virginia
34.6%

Colorado
25.0%
INC
$19,258

West Virginia
$32,556

Connecticut
$24,076
LHI
7.9%

Minnesota
21.1%

New Mexico
13.8%
SHE
$504.09
–
Nevada
$2,001.49

Alaska
$970.55
ALC
12.59 gallons
–
Utah
33.09
gallons

Nevada
22.87 gallons
INY
15.5%

Utah
41.1%

Kentucky
26.8%
SSL
27.8%

Vermont
100.0%

New Jers ey
67.2%
TOB
12.9%

Utah
29.1%

Nevada
22.9%
AAP
27.1 years
–
Utah
38.9 years

West Virginia
35.5 years
DIB
2.71%

Alaska
6.08%

West
Virginia
4.37%
HBC
24.8%

New Mexico
37.7%

West Virginia
30.5%
HBP
14.0%

Arizona
31.6%

Alabama
24.6%
OBY
13.8%

Colorado
24.3%

Mississippi
19.5%
Note: LCAD = lethal coronary artery disease; EDU = education; INC = income; LHI = lack of health
insurance; SHE = state health
expenditures; ALC = alcohol consumption; INY = inactivity; SSL = state stress levels; TOB = tobacco consumption; AAP = avera
ge
age of the population; DIB = diabetes; HBC = high blood cholesterol; HBP = high blood pressure;
OBY = obesity
According to my data, the worst state to live in when worried about lethal coronary artery
disease is Mississippi, and the best is Minnesota. The difference between these two extremes is
158 per 100,000 people.
11
The biggest observation involves West Virginia. This state appears five times in table
two, and each time, it is not a good thing pertaining to lethal coronary artery disease. They have
the minimum in EDU (education), a variable expected to have a negati
ve affect on LCAD (lethal
coronary artery disease). They also have the minimum in INC (income), which is closely related
to EDU (education). This too is expected to have a negative affect on LCAD (lethal coronary
artery disease). West Virginia next appe
ars as the maximum for AAP (average age of the
population), a variable expected to affect LCAD (lethal coronary artery disease
) positively
. The
state also appears as the maximum for DIB (diabetes) and HBC (high blood cholesterol), which
also is expected t
o affect LCAD (lethal coronary artery disease) positively. Each appearance as
maximum or minimum shows
that West Virginia is expected to be more likely to develop lethal
cases of coronary artery disease. It would seem that with this much working against
it, West
Virginia would most likely be the maximum for LCAD (lethal coronary artery disease), but they
are not. However, they do come in second from the maximum at 296 per 100,000 people, only
33 per 100,000 people below Mississippi.
Another state that s
ticks out in Table 2 is Utah. Utah holds the minimum for four
variables. All four variables, ALC (alcohol), INY (inactivity), TOB (tobacco), and AAP
(average age of the population), are expected to have a negative
e
ffect on LCAD. With Utah
holding this
many minimums for variables expected to have a negative coefficient, it is likely
that Utah is very low on the percentage of lethal coronary artery disease deaths. In fact, they are
third from the minimum at 185.2 per 100,000 people, just 14.2 per 100,000
people above
Minnesota.
Alaska is also interesting. It holds the maximum in state healthcare expenditures and the
12
minimum in diabetes. New Mexico also appears twice in Table 2, first as the maximum for lack
of health insurance, and second, the minimum
for high blood cholesterol. Nevada appears twice
as the minimum for state healthcare expenditures and the maximum for tobacco use. The next
state to appear twice is Colorado, who holds the minimum for obesity and the maximum for
education.
Multicollinearity Test:
Any equation must be tested for problems that may affect the results of the estimation.
One such problem is multicollinearity. A. H. Studenmund (2001) states that multicollinearity is
either perfect or imperfect (A. H. Studen
mund. 2001)
. Perfect
multicollinearity is a violation of
the classical
assumption that no independent variable is a perfect linear function of any other
independent variable.
With perfect multicollinearity the variable’s coefficient cannot be
determined
, and the standard
error for the coefficients
is
infinite. Imperfect multicollinearity is
when the linear function between two or more independent variables is strong enough to affect
the estimation results. Imperfect multicollinearity results in increas
ed variance and standard
errors of the coefficients and decreased t

statistics. Multicollinearity, however, does not bias the
coefficients of the equation and the overall accuracy of the equation is not affected.
The test for multicollinearity involves e
xamining the correlation coefficients. The
correlation coefficient is not considered a problem unless the absolute value of any correlation
coefficient is higher than 0.7 and is higher than the correlation between the dependent variable
13
and the correspond
ing independent variables.
Table 3: Correlation Coefficients
LCAD
EDU
INC
LHI
SHE
ALC
INY
SSL
TOB
AAP
DIB
HBC
HBP
OBY
LCAD
1

0.513

0.260
0.150
0.092

0.079
0.706
0.070
0.587
0.195
0.831
0.478
0.530
0.631
EDU
1
0.761

0.271
0.127

0.229

0.373
0.444

0.556

0.018

0.475

0.270

0.461

0.597
INC
1

0.260
0.219

0.172

0.281
0.682

0.241
0.141

0.256
0.042

0.225

0.512
LHI
1

0.064
0.116
0.059

0.002
0.050

0.429
0.044
0.010
0.064
0.189
SHE
1

0.153
0.069
0.181
0.122
0.193
0.069

0.134
0.010
0.026
ALC
1
0.061

0.318
0.348
0.193

0.077

0.002
0.072

0.052
INY
1

0.135
0.463
0.230
0.604
0.188
0.233
0.497
SSL
1

0.144

0.153
0.164
0.138
0.112

0.164
TOB
1
0.386
0.442
0.368
0.583
0.482
AAP
1
0.318
0.254
0.101

0.075
DIB
1
0.426
0.546
0.596
HBC
1
0.448
0.273
HBP
1
0.482
OBY
1
Note: Any correlation coefficients with an absolute value more than 0.7 is
underlined by a thick line
and
italicized
. Any correlation
coefficient with
an
absolute value that is almost 0.7 is
underlined by a dotted line
and
italicized
. Any correlation coefficient with an absolute value
that is
larger than the
correlation coefficients between the dependent var
iables and the independent variable are
underlined by a thin line
and
italicized
.
As you can see in Table 3, there is only one clear multicollinearity problem between my
thirteen independent variables. The high correlation coefficient of 0.761 is
between INC
(income) and EDU (education). That is clearly higher than 0.7. This correlation coefficient is
also larger than the absolute value of the correlation coefficient between LCAD (lethal coronary
artery disease) and EDU (education) and larger tha
n the absolute value of the correlation
coefficient between LCAD (lethal coronary artery disease) and INC (income). These results
14
show that there is a severe multicollinearity problem between INC (income) and EDU
(education).
The correlation coefficient
between SSL (state stress levels) and INC (income) is too near
0.7 to ignore. The issue is amplified since the correlation coefficient between LCAD (lethal
coronary artery disease) and SSL (state stress levels), and the correlation coefficient between
LCA
D (lethal coronary artery disease) and INC (income) are considerably smaller than the
correlation coefficient between SSL (state stress levels) and INC (income). These results show
there may be a severe multicollinearity problem between SSL (state stress
levels) and INC
(income).
The correlation coefficients between ALC (alcohol) and SHE (state healthcare
expenditures), SSL (state stress levels) and SHE (state healthcare expenditures), SSL (state stress
levels) and ALC (alcohol), and AAP (average age of t
he population) and LHI (lack of health
insurance) all may have a multicollinearity problem. Each one is more than the correlation
coefficient between each of these independent variables and the dependent variable. These
results show there may be a multic
ollinearity problem between each of these pairs. Considering
the very small size of these correlation coefficients, the equations should not be affected
significantly;
therefore I am not doing anything to fix these problems.
In order to limit the effects
of multicollinearity, Equation 1 will be split into two
variations: Equation 1

A and Equation 1

B. Equation 1

A will exclude EDU (education) and
SSL (state stress levels) and equation 1

B will exclude INC (income).
Heteroskedasticity Test:
Heteroskedas
ticity is another issue to deal with when estimating an equation. As defined
15
by A. H. Studenmund (2001), heteroskedasticity is a violation of the classical assumption that
the observations of the error terms are drawn from a distribution that has a consta
nt variance (A.
H. Studenmund. 2001). There are two types of heteroskedasticity: pure and impure. Pure
heteroskedasticity occurs when the assumption is violated even though the equation is correctly
specified. Correctly specified means there are no ir
relevant or omitted variables, the functional
form is correct (linear), and there are no sample errors. In the case of pure heteroskedasticity the
coefficients of the variables are not biased, but the t statistics are bigger than they should be,
which res
ults in a bigger chance that a variable will be considered relevant. Impure
heteroskedasticity occurs when the equation is not correctly specified (i.e. irrelevant or omitted
variables, wrong functional form, sample errors). The results of impure heteros
kedasticity are
biased variable coefficients and incorrect standard errors.
In order to test for heteroskedasticity, I am using the white test, named after its creator
Halbert White. The white test has three steps. The first step is to obtain the resi
duals of
Equation 1

A. The second step is to use these residuals squared as the dependent variable in a
second equation. The independent variables of the second equation are the independent variables
of Equation 1

A, the squares of the independent variab
les of Equation 1

A, and the products of
each two independent variables of Equation 1

A. However, the white test, although considered
the best for cross sectional equations, does have one flaw. It cannot be used if, in the second
equation, there are more
variables than observations. I have 49 observations, but the second
equation for Equation 1

A has more than 49 variables. The only way for the white test to work
here is to use its other form, which drops the products of each two independent variables,
and
only uses the independent variables and their squares. Here, I would have the same 49
observations, but only 22 variables. This test is also sufficient to determine if there is a
16
heteroskedasticity problem.
The third step is to multiply the number o
f observations by the
unadjusted R
2
(n*R
2
)
.
The decision rule is that if n*R
2
is greater than critical chi squared, then
there is a heteroskedasticity problem.
For Equation 1

A, n*R
2
= 24.26 and chi squared with
degrees of freedom 22 = 33.92. Repeat the
three steps for Equation 1

B, which also will use the
simple version of the white test. For Equation 1

B, n*R
2
= 27.24 and chi squared with degrees
of freedom 24 = 36.41. By the decision rule, I find that there is no serious problem with
heteroskedastic
ity in Equation 1

A or Equation 1

B.
Empirical Estimation Results:
Table 4 reports the results of the estimation of Equation 1

A, and Equation 1

B.
Table 4: Estimation Results for Equation 1

A and Equation 1

B
Independent Variables
Variations of Equation
1
Expected Sign
of Coefficients
Equation 1

A
Equation 1

B
Intercepts
40.25261 (0.499160)
56.58221(0.657541)
EDU
40.05244 (0.379967)
negative
INC
0.000110 (0.113182)
negative
LHI
56.77456 (0.712506)
55.23382 (0.686585)
positive
SHE
0.004282
(0.430282)
0.005972 (0.596954)
negative
ALC

1.223578 (

1.595043)

1.307321 (

1.675342)
ambiguous
INY
199.6210 (2.969297)
186.9634 (2.654275)
positive
SSL

13.09042 (

0.653808)
ambiguous
TOB
347.0400 (2.599307)
374.4734 (2.549625)
pos itive
AAP

2.839518 (

1.438962)

3.610169 (

1.558149)
positive
DIB
2676.302 (4.645088)
2953.804 (4.038012)
positive
HBC
246.6707 (1.824513)
269.3550 (2.008087)
positive
HBP

41.07500 (

0.334783)

38.39052 (

0.310557)
positive
OBY

14.01551 (

0.085363)

63.05713
(

0.372228)
positive
Adjusted R
2
0.797766
0.794527
17
Note: t

statistics are in parenthesis ( )
thick underline
= significant at 99.5% level of certainty,
thin underline
= significant at 99% level of certainty,
double
underline
= significant at 95% level of certainty, and
dotted underline
= significant at 90% level of certainty.
As observed from Table 4, the adjusted R
2
for Equation 1

A is 0.797, and the adjusted R
2
for Equation 1

B is 0.794. According to
A. H. Studenmund (2001), the closer the adjusted R
2
is
to 1, the closer the estimated equation fits the data (A. H. Studenmund. 2001)
.
T
herefore
, the
adjusted R2
is quite strong for both equations and slightly stronger for Equation 1

A.
In order to tes
t the hypothesis that the coefficients have a significant impact on LCAD
(lethal coronary artery disease), the t

test will be used. For the coefficients whose signs are
expected to be positive or negative, I will use a one

sided test. The decision rule f
or a one

sided
test is if the absolute value of the t

statistic (in Table 4) is greater than the absolute value of the
critical

t then that coefficient is significant. For coefficients with signs that are expected to be
ambiguous, I will use a two

sided t
est. The decision rule for a two

sided test is if the positive t

statistic is greater than the positive critical

t, or the negative t

statistic is less than the negative
critical

t then that coefficient is significant.
Out of the thirteen variables, eigh
t are not significant. These eight are LHI (lack of health
insurance), HBP (high blood pressure), OBY (obesity), EDU (education), INC (income), SHE
(state health expenditures), ALC (alcohol), and SSL (state stress levels). All eight are tested at
the 90%
level of certainty, and
are
found to be insignificant.
There are five significant variables, all of which are expected to effect lethal coronary
artery disease positively. For 99.5% level of certainty, the critical

t is 2.704. Out of the five
significant independent variables only DIB (diabetes) for both equations is significant at this
level. This means with 99.5% certainty, every additional one percentage point of the population
that has been diagnosed with diabetes causes a 2,676.302 to 2,9
53.804 rise in the population per
18
100,000 dying from lethal coronary artery disease.
For 99% level of certainty, the critical

t is 2.423. At this level, INY (inactivity) and TOB
(tobacco) for both equations are significant. Specifically for INY, this me
ans that with 99%
certainty, every additional one percentage point of the population with no leisure

time activity
causes a 186.9634 to
199.621 increase
in the population per 100,000 dying from lethal coronary
artery disease. For TOB, this means that with
99% certainty, every additional one percentage
point of the population that has ever smoked 100 or more cigarettes and currently smoke causes a
347.04 to 374.4743 increase in the population per 100,000 dying from lethal coronary artery
disease.
For 95%
level of certainty, the critical

t is 1.684. This level has HBC (high blood
cholesterol) for both equations as being significant. This means that
with 95% certainty,
for
every additional one percentage point of the population with high cholesterol there
is a 246.6707
to 269.355 increase in population per 100,000 dying from lethal coronary artery disease.
For 90% level of certainty, the critical

t is 1.303. For this level, AAP (average age of the
population) for both equations is significant. This pose
s a problem. The sign for the coefficient
for AAP is expected to be positive, but the estimated coefficient turns out to be negative. One
explanation is an omitted variable. An omitted variable can bias the estimated coefficient of a
variable if one of
the following is true: the correlation coefficient (
r
) between the omitted
variable and LCAD (lethal coronary artery disease) is negative, and the estimated coefficient (β)
for the omitted variable is positive, or
r
is positive, and the β is negative. One
possible omitted
variable is a measure of the quality of health care which would have a positive
r
and a negative
β. Out of my thirteen variables, only SHE (state healthcare expenditures) measures any of the
quality of healthcare, and that is only a smal
l part of that variable. That is why I believe the
19
incorrect sign is from the omitted variable of quality of health care.
Conclusion:
In brief, this paper investigates the effects thirteen variables have on lethal coronary
artery disease through OLS regr
ession analysis using 49 observations in the year 2000. Equation
1 was tested for multicollinearity and heteroskedasticity. There was no problem with
heteroskedasticity, but there was a problem with multicollinearity. In order to fix it, Equation 1
was
split into two variations: Equation 1

A and Equation 1

B. Each new equation was estimated
resulting in a very high accuracy (adjusted R
2
) with Equation 1

A being slightly more accurate.
Out of the thirteen variables, eight were insignificant. These vari
ables are LHI (lack of
health insurance), HBP (high blood pressure), OBY (obesity), EDU (education), INC (income),
SHE (state health expenditures), ALC (alcohol), and SSL (state stress levels). That leaves five
variables that significantly affect lethal c
oronary artery disease. These five are DIB (diabetes),
INY (inactivity), TOB (tobacco), HBC (high blood cholesterol), and AAP (average age of the
population). AAP is the only variable that is significant in the opposite way that was expected. I
determin
ed that this was due to an omitted variable, namely, the quality of health care. My
results show that in order to avoid coronary artery disease, the best strategy is to prevent getting
diabetes, be more active, smoke less, and control cholesterol levels.
If this study is to be done again, I have two recommendations. First, I recommend
including
the omitted variable, quality of healthcare. This would avoid the biased coefficient for
average age and make the equation more accurate. Second, spend more ti
me finding data from
the same year. Even though a couple of years off should not make a huge difference, the
equation would be more accurate if all of the data is from the same year.
20
Data Sources
The data for lethal coronary artery disease
was obtained from the National Center for
Chronic Disease Prevention and Health Promotion at:
http://www.cdc.gov/nccdphp/burdenbook2004/Section02/heart.htm
The
data for education was obtained from the National Census Bureau at:
http://www.census.gov/prod/2003pubs/02statab/educ.pdf
The data for income was obtained from the National Census Bur
eau at:
http://www.census.gov/prod/2002pubs/01statab/income.pdf
The data for lack of health insurance was obtained from the United Health Foundation at:
http://www.unitedhealthfoundation.org/shr2003/components/lackinsurance.html
The data for state health expenditures was obtained from the Milbank Memorial Fund at:
http://www.milbank.org/reports/2000shcer/nasbotable14.html
The data for alcohol consumption was obtained
from the Brewers Association at:
http://65.23.136.214/beerinfo/bystate.shtml
The data for inactivity was obtained from the National Center for Chronic Disease
Prevention and Health Promotion at:
http://www.cdc.gov/nccdphp/burdenbook2002/03_leisureadult.htm
The data for state stress levels was obtained from the National Census Bureau at:
http://www.census.gov/prod/2003pubs/02statab/pop.pdf
The data for tobacco consumption was obtained from the National Census Bureau at:
http://www.census.gov/prod/2003pubs/02statab/health.pdf
21
The data for average age of the population was obtained from the National Census Bureau at:
http://tinyurl.com/6lez4
The data for diabetes was obtained from the National Center for Chronic Disease Prevention
and Health Promotion at:
http://tinyurl.com/6jkc6
The da
ta for high blood cholesterol was obtained from the National Center for Chronic
Disease Prevention and Health Promotion at:
http://www.cdc.gov/nccdphp/burdenbook2004/Sect
ion03/cholesterol.htm
The data for high blood pressure was obtained from the National Center for Chronic Disease
Prevention and Health Promotion at:
http://www.cdc.gov/mmwr/preview/mmwrhtml/mm5121a2.htm
The data for obesity was obtained from the American Obesity Association at:
http://www.obesity.org/subs/fastfacts/obesity_US.shtml
The data for state populations was obtained from the National Census Bureau at:
http://www.census.gov/population/cen2000/phc

t2/tab01.pdf
22
Works Cited
A. H. Studenmund. “Using Econometrics: A Practical Guide.” 4
th
edition. Addison W
esley
Longman. 2001. April 18, 2005.
Anil Kumar. “Who Doesn’t Have Health Insurance and Why?”
Federal Reserve Bank of
Dallas
. 2004.
April 18, 2005.
http://www.dallasfed.org/research/swe/2004/swe0406a.html
“Blood Pressure.”
American Heart Association
. 2004. April 18, 2005.
http://www.americanheart.org/presenter.jhtml?identifier=4473
“Causes of Heart Disease.”
Texas Heart Institute
. December 8, 2004. April 18, 2005.
http://chinese

school.netfirms.com/heart

disease

causes.html
“Coronary.”
Mama’s Health
. April 18, 2005. April 18, 2005.
http://www.mamashealth.com/Co
ronary.asp
“Diabetes: Type 2 Diabetes.”
WebMD
. June, 2004. April 18, 2005.
http://my.webmd.com/content/article/59/66844.htm
“Health and Wealth
–
A Fundamental Look.”
Region of Peel
.
2001. April 18, 2005.
http://www.region.peel.on.ca/health/health

status

report/pdfs/health_wealth.pdf
“Heart Disease: Alcohol and Your Heart.”
WebMD
. June
, 2004. April 18, 2005.
http://my.webmd.com/content/pages/9/1675_57836.htm
“Heart Disease: Exercise for a Healthy Heart.”
WebMD
. June, 2004. April 18, 2005.
http://my.webmd.com/content/pages/9/1675_57839.htm
“Heart Disease: Risk Factors For Heart Disease.”
WebMD
. June, 2004. April 18, 2005.
http://my.webmd.com/content/pages/9/1675_57840.htm
23
“Heart Disease: Smoking and Heart Disease.”
WebMD
. June, 2004. April 18, 2005.
http://my.webmd.com/content/pages/9/
1675_57857.htm
“Obesity and Heart Disease.”
American Heart Association
. 1997. April 18, 2005.
http://circ.ahajournals.org/cgi/content/full/96/9/3248
“Risk factors for coronary heart disease and level of education.”
The Tromso Heart Study
.
1988. April 18, 2005.
http://aje.oupjournals.org/cgi/content/abstract/127/5/923
“What is Coronary Artery Disease?”
National Heart, Lung, and Blood Institute
. August, 2003.
April 18, 2005.
http://www.nhlbi.nih.gov/health/dci/Diseases/Cad/CAD_WhatIs.htm
l
Comments 0
Log in to post a comment