VARIABLES - Personal Page

Urban and Civil

Nov 16, 2013 (4 years and 7 months ago)

186 views

VARIABLES

Topic #3

Variables and the Unit of Analysis

Variables

are characteristics of the “things” that we are
studying.

These “things” are commonly called
cases
or
units
.

A “case study” focuses on a single “thing.”

The kind of “thing” that is being studied is called the
unit of analysis
.

Individuals

constitute the unit of analysis for much
empirical social science research (and almost all survey
research in political science).

A particular research project focuses on a particular set
or
population

of cases (individuals or other units),

often by studying a sample of cases drawn from the population.

American National Election Studies

ANES focuses on individuals as the units of analysis in
the American
voting age population
(VAP).

ANES variables pertain to these individuals

ANES variables include

gender, race, education, and other
demographic

variables;

party identification, voting intention, President approval,
ideology, abortion opinion, political trust, and other
attitudinal

variables;

whether registered/voted, candidate vote for, whether
contributed campaign \$\$, and other
behavioral

variables;

These are all variable properties of individuals,

not households, elections, nations, etc.

Other Populations of Individuals

Population = All Members of Congress

additional variables pertaining to this specialized
population of individuals include:

number of terms served, campaign expenditure in last
election, last re
-
election margin, party affiliation, committee
assignments, roll
-
call vote on specified bill, ADA (etc.) rating,
NOMINATE score, etc.

Annual Survey of Social Security and Medicare
Beneficiaries

British [etc.] Election Studies

Other Units of Analysis in Political Research

Presidential elections
: variables include

winning party, winner’s vote popular vote %, Dem. candidate’s popular vote %,
winner’s electoral vote margin, turnout %, whether the incumbent was running for
re
-
election, total campaign expenditures, etc.

States in a given Presidential election
: variables include

number of electoral votes, winning party/candidate, winner’s vote %; Rep.
candidate’s vote %, turnout %, etc.

States in all historical Presidential elections
: variables include

all of above for each election year

Nations
: variables include

population, GNP, per capita income, literacy rate, military spending as % of GNP,
size of army, type of party system, etc.

States, counties, other jurisdictions, precincts, legislatures, political
parties
, etc.

Households

Households are often the unit of analysis in
economic and sociological research

Variables include:

size (# of persons)

type (single
-
parent, no children, unrelated, etc.)

type of housing unit

household income

etc.

Current Population Survey (CPS)

Panel Study of Income Dynamics (PSID)

Rotating panel surveys of households

Variables vs. Values

Variables

that pertain to a given
unit of analysis

take on
different
values

from
case to case

[
cross
-
sectional

analysis].

Gender [
individuals
]
:
male, female

Education [
individuals
]
:
primary school only,
# years
completed
, etc.

Income [
individuals
or

households
]
:
dollar amount (or
dollar range), quintile,

etc.

Type of dwelling [
households
]
:
detached, townhouse,
apartment, etc.

Literacy rate [
nations
]
:
numerical %

Turnout [
elections
]
:
numerical %

Variables can also vary over time in the same case
[
longitudinal

analysis],

e.g., state democratic candidate vote % over time.

Variables are the building blocks of
empirical political science research

Researchers have to figure out how to
measure

the
variables they are interested in by designing

appropriate survey questions

or other kinds of measures

Researchers next need to actually collect the data, e.g.,
by carrying out

the survey they have designed

or other data collecting operations.

With the data at hand, researchers then ask such
questions as the following:

What is the
average
or

typical
value of a variable in a set of
cases?

For example, what is typical level of interest among voters, or the
average rate of turnout in recent elections?

Questions (cont.)

How are the values of a variable
distributed

in a set of data, i.e., do
most of the same cases have about the same value (
low dispersion
) or
do different cases have very different values (
high dispersion
). For
example:

Do all voters have about the same level of interest or are some very interested while
others not interested at all?

Do all elections have about the same level of turnout, or do some have very high
turnout while others have very low turnout?

Distribution

of income or wealth.

How are two variables
related

or
associated

in a set of data? For
example:

Is the level of interest among voters related to their level of education?

Does the level of turnout in elections depend on how close elections are expected to
be?

Does one variable have a (direct)
causal impact
on another variable?
For example:

Does higher education cause people to become more interested in politics?

Does the prospect of a close election cause more voters to turn out and vote?

Does one variable have an (indirect)
causal impact
on another
variable? For example:

Does the prospect of a close election cause greater activity by campaign
organizations that in turn causes more voters to turn out and vote?

Variables and Their Values

To repeat, variables
vary

they take on different values from case to case
[or from time to time]

Thus, associated with every
variable

is a
list or range
of possible
values
.
For example:

PARTY IDENTIFICATION

(pertaining to
individuals
) in the U.S
has
values: REPUBLICAN, DEMOCRAT, INDEPENDENT

(or
perhaps refinements like
STRONG REPUBLICAN, WEAK
DEMOCRAT
, etc., and/or other values like
MINOR PARTY
).

VOTED IN 2008 ELECTION?

is another variable pertaining to
individuals
, with just two possible values,
YES

and
NO
.

HEIGHT

is a physical variable pertaining to
individuals

with
values that
are real numbers

(expressed in units such as inches, centimeters, or
feet).

SIZE

(# of persons) is a variable pertaining to
households

with
values
that are whole numbers

>

1 (values are
counts
)

LEVEL OF TURNOUT

is a variable pertaining to
elections

(or to
different jurisdictions in a given election), with
values ranging potentially
from 0% to 100%
.

Naming Variables

As a reminder that any variable must have a range of
two or more possible values, it is useful to give variables
names like

LEVEL OF

EDUCATION

WHETHER OR NOT

VOTED IN 2000 ELECTION

SIZE OF

POPULATION

TYPE OF

POLITICAL REGIME

LEVEL OF

VOTING TURNOUT

DIRECTION OF

IDEOLOGY

ETC.

In quantitative research, variable names are often written
in capital letters (as above).

Observations/Observed Values

The actual value of a variable in a particular case is
called an
observation

(or
observed value
). For example,

we "observe“ [by asking the appropriate question(s) in
a survey] that
Joe Smith

(the case) has the
PARTY
IDENTIFICATION

(the variable)
WEAK DEMOCRAT

(the observed value), and likewise

we “observe” [by consulting the appropriate records]
that the
2008 Presidential election

(the case) has a
LEVEL OF TURNOUT

(the variable) of
61%

(the
observed value).

Identifying Variables (PS#3A)

Each of the following statements makes an empirical assertion (which
may or may not be true); each refers (at least implicitly) to
two
variables

(and asserts that there is some kind of relationship
between them). For each statement:

(a)

indicate to what
unit of analysis

(individuals, nations, elections, etc.) and,
as appropriate, what particular
population

the variables pertain;

(b)

identify the two
variables
, with appropriate names (probably TYPE OF
_____, LEVEL OF _____, DEGREE OF _____, AMOUNT OF _____”,
WHETHER OR NOT _____”); and

(c)

indicate a range of possible
values

for each variable (often, but certainly
not always, LOW and HIGH will do).

(
Note
: both variables in each sentence pertain to the same units.)

1.

Junior members of Congress are less pragmatic than their senior
colleagues.

2.

Education tends to undermine religious faith.

3.

Capital punishment deters murder.

8.

When times are bad, incumbent candidates are punished in elections. =>

11.

If you want to get ahead, stay in school.

CLASS LIST (Data Spreadsheet)

Case ID

Variable 1 Var2 Var3 Var4

Name

SSN

Class

Major

GPA Cand?

Jones, R.

215
-
14
-
6609

Senior

POLI

3.12

No

Kim, S.

144
-
56
-
9231

Sophomore

PYSC

2.78

No

Smith. H.

502
-
45
-
2323

Junior

POLI

2.75

No

Williams, R.

212
-
16
-
7834

Senior

HIST

3.28

Yes

Etc.

What distinctions between different types of variables can
we make?

Types of Variables

Our concern here is with drawing distinctions among
variables with respect to their
logical properties
, not their
substantive nature (e.g., demographic, attitudinal, etc.)

Every variable has
at least two

possible values
(otherwise it could not vary).

A variable is
dichotomous

(also called a
dummy variable
)
if it has
exactly

two possible values (typically “yes” and
“no”), e.g.,

[
Students
] (
Yes/No
)

WHETHER VOTED IN 2000 ELECTION

[
Inds
.] (
Yes/No
)

GENDER

[
Inds
.] (
M/F
)

However, most variables have three or more possible
values.

Some variables have an infinite number of possible values.

Qualitative Variables

A variable is
qualitative

if its values are given by
words

MAJOR

[
Students
]:
POLI, HIST, BIOL, etc.

TYPE OF REGIME

[
nations
]:
Free, Partly Free, Unfree

ABORTION OPINION

[
Inds.
]:
Never permit, etc.

In a data spreadsheet [e.g., SPSS], these verbal values
are typically recorded in terms of
numerical codes,
because this

saves space, and

facilitates machine processing.

Moreover, survey data from closed
-
form questions is often pre
-
coded (e.g., the Student Survey).

Rows are

cases

Columns are

variables

Cell are values

(varying from
case to case)

Values (except V01
YEAR OF
SURVEY) in the
Student Survey and
SETUPS are
numerically
coded.

Quantitative Variables

A variable is
quantitative

if its (true, not coded) values
are given by
numbers
:

GPA

[
Students
]:
3.12, 2.78, etc.

LITERACY RATE

[
Nations
]:
98%, 55%, etc.

HEIGHT

[
Inds.
]:
72
"
, 62
",

etc.

SIZE

[
Households
]:
1 person, 2 persons, etc.

LEVEL OF TURNOUT

[
Elections

or
jurisdictions
]:
51%, etc.

The magnitude of these numbers may depend on the units of
measurement used (e.g., is HEIGHT given in inches, feet,
centimeters, etc.?).

In spreadsheet, such values are typically recorded in
terms of their actual numerical values.

The SETUPS data contains data pertaining to variables
that, while “truly” quantitative in nature, are
recoded

in

AGE (V60)
[ 18
-
24, 25
-
34, etc.]
or

INCOME (V65A)
[0
-
16
th

percentile, 17
-
33
rd

percentile, etc.]

Truly Quantitative Data Need Not be Coded

Variables and the Unit of Analysis

Substantively related
variables

may be of different types
depending on the
unit of analysis
to which they pertain.

TURNOUT

pertaining to
individuals

is a dichotomous
variable with values
“yes

voted”

and
“no

did not
vote”.

[LEVEL OF] TURNOUT

pertaining to
elections

(or
jurisdictions, precincts, etc.
) is a quantitative variable
with possible values ranging from
0% to 100%
.

Types of Variables / Levels of
Measurement

It is useful to refine both qualitative and quantitative
variables further by distinguishing among four

different
types

of variables, or (equivalently)

different
levels of measurement

of pertaining to
variables.

Note
: these distinctions are relevant only as they pertain
to
non
-
dichotomous

variables.

Please take note of this with respect to PS #3B,
Question 2.

Nominal Variables

A
nominal

variable (or a variable
measured at the
nominal level
) has values that are
unordered

categories
.

Accordingly, nominal variable are qualitative in nature.

Given two cases and a nominal variable, we can observe

that they have the same value or they have different
values, but (if they have different values)

we cannot say that one has the “higher/bigger” value
and the other the “lower/smaller,” etc.

Nominal Variables (cont.)

A nominal variable typically has a name like

NAME OF ____

TYPE OF ____

NATURE OF ____

KIND OF ____

Examples:

(NAME OF) MAJOR
:
Political Science, Economics, History, etc.

(TYPE OF) RELIGIOUS AFFILIATION
:
Protestant, Catholic,
Jewish, etc.

PREFERENCE FOR REPUBLICAN NOMINATION
:
Giuliani,
McCain, Romney, etc.

In a data spreadsheet, numerical codes must be
assigned to values of nominal variables in an essentially
arbitrary manner,

so it is certainly illegitimate to do arithmetic on the numerical
code values.

Typically the numerical codes are consecutive whole numbers.

Ordinal Variables

An
ordinal

variable (or a variable
measured at the ordinal
level
) has values that fall into some kind of
natural
ordering
,

often (but not always) running from (in some sense) LOW to
HIGH.

Therefore, cases can be ranked or ordered with respect to their
values on an ordinal variable.

An ordinal variable is also qualitative in nature.

Given two cases and a ordinal variable, we can observe

that they have the same value or they have different values, and
also

(if they have different values)

that one has the “higher/bigger” value and the other
“lower/smaller,” etc., but

we
cannot
say
how much

higher/bigger or lower/smaller.

Given three cases with different values on an ordinal
variable,

we can identify the case with the observed value
between

the other two

but we cannot say which of the other value it is closer to.

Ordinal Variables (cont.)

An ordinal variable typically has a name like

DIRECTION OF ___

EXTENT OF ____

LEVEL OF ____

DEGREE of ____

Examples:

TYPE OF REGIME/DEGREE OF FREEDOM

[
nations
]:
Free, Partly Free,
Unfree

(LEVEL OF) INTEREST IN THE ELECTION CAMPAIGN
[
individuals
]
:
from “low”
to “high”

(DIRECTION OF) IDEOLOGY
[
individuals
]
:
from most liberal to most
conservative

(DEGREE OF) PRESIDENTIAL APPROVAL
[
individuals
]
:
from strongly approve
to strongly disapprove

DIRECTION OF ABORTION OPINION
[
individuals
]:

Never permit, . . . , Always
permit

(LEVEL OF) CLASS STANDING
[
students
]

:
freshman, sophomore, junior, senior

When data is recorded in coded form, numerical codes should be
assigned to values in a manner consistent with the natural ordering
of the values.

Ordinal Variables (cont.)

If the natural ordering is from LOW to HIGH, the codes
should likewise run from lower to higher numbers.

If the natural ordering is not from LOW to HIGH, e.g.,
DIRECTION OF IDEOLOGY,

the two extreme values (or “poles”), e.g., MOST LIBERAL and
MOST CONSERVATIVE, should be assigned the minimum and
maximum code values, but

which gets which is arbitrary ,

and intermediate values, e.g., MODERATE, should be assigned
intermediate codes).

In any event, values are typically assigned numerical
codes that are
consecutive integers
,

but this is not a logical necessity (because only their order
matters).

It remains illegitimate to do arithmetic on the
numerical code values

unless we are willing to attribute “interval” status to the code
values.

Ordinal Variables (cont.)

Note that DIRECTION OF IDEOLOGY could be
renamed

DEGREE OF LIBERALISM,

which does range from LOW (i.e., “least liberal” [or
“most conservative”]) to HIGH (“most liberal” or [“least
conservative”]).

We could also
reverse

the “polarity” of the renamed
variable and call it DEGREE OF CONSERVATISM,

ranging from LOW (i.e., “least conservative” [or “most
liberal”]) to HIGH (“most conservative” [or “least
liberal”]).

Ordinal Variables (cont.)

Opinion variables with closed
-
form values running from
(STRONGLY) AGREE (or APPROVE) to (STRONGLY) DISAGREE
(or DISAPPROVE) are ordinal in nature.

The value INDEPENDENT is usually deemed to fall “between”
DEMOCRAT and REPUBLICAN, so PARTY IDENTIFICATION is
usually deemed to be ordinal in nature.

But this works only if we treat cases with “minor party” or DK values as
missing data (since these values don’t fall in the natural ordering).

An SPSS spreadsheet normally displays a numerical code (rather
than a blank) for missing data (“unobserved” values), which must be
understood as not part of the natural ordering.

In the SETUPS and Student Survey data, missing data coded as (9).

SPSS must be told the “missing data” code(s) for each variable, so that
it can set cases so coded aside when it processes data.

Interval [Scale] Variables

An
interval

variable (or variable
measured at the interval level
) has
values that are
real numbers

that can appropriately be added
together, subtracted one from another, and averaged.

SPSS refers to
scale

variables

An interval variable is quantitative in nature.

Given two cases and an interval variable, we can say they have the
same value or they have different values, and
also

(if they have
different values)

that one has the higher value and the other lower, etc., and
also

how much

higher or lower one value is than the other, because

we can subtract one value from another,

i.e., we can determine the magnitude of the
interval

separating them and
thus say how “far apart” the cases are with respect to the variable.

Given three case with different values on an interval variable, we
can identify the case with the observed value between the other
two
and

we can
also

determine which of the to other cases it is
closer to.

But we
cannot

say how
many times greater

one value is than
another.

Interval Variables (cont.)

An interval variable typically has a name like

LEVEL OF ____

DEGREE OF

____

NUMBER OF ____

AMOUNT OF ____

In a spreadsheet, actual numerical values (rather than numerical
codes) are normally entered into a data array (e.g., Presidential
election data).

But sometimes (numerically coded)
class intervals

(e.g., SETUPS V60 [AGE]), as will be discussed later. [See =>]

Variables like PARTY IDENTIFICATION,IDEOLOGY, and ISSUE
OPINIONS are often treated as interval variables (e.g., my Student
Survey/ANES longitudinal charts that showed changing
average

levels of Party ID, Ideology, etc., over time).

A Truly Interval Variable May Be Recoded
into An Ordinal One

Ordinal vs. Interval Variables

Example

Baseball Standings

Rank Standing

of a team (first place, second place, etc.) is
ordinal

information

Winning Percent

(or
) is
interval

information

For the league playoffs:

the determination of division winners is based on
ordinal

information only; but

the determination of the “wild card” entry is based on
interval

information (best winning percent not otherwise in playoffs)

A team that fails to make the playoffs may have a higher
winning percent that a team that does make the playoffs

Ratio Variables

A
ratio

variable (or a variable
measured at the ratio level
)
is an interval variable (that has values that are
real
numbers

that can appropriately be added together,
subtracted one from another, and averaged) but in

one can appropriately
divide

one value by another
(i.e., compute their
ratio
), and

say, for example, that one case has
twice
the
observed value of another.

This requires that the ratio variable have a
non
-
arbitrary
zero value
,

which usually represents in some sense the complete absence
of the characteristic or property to which the variable refers.

Even if negative values are possible, the zero value is non
-
arbitrary, e.g.,

level of profit
(of a
) may have a
negative value
, or

rate of economic growth
(over
years
) may have a
negative
value
.

Ratio Variables (cont.)

Examples of interval variables that are
not

ratio:

LEVEL OF SAT (or IQ) SCORE: there is no 0 score

DEGREE OF TEMPERATURE (Fahrenheit or
Celsius): while each has a 0
°

value,

0
°
F

and
0
°
C represent different temperatures, so

0
°

has no fundamental significance in either temperature scale

vs. Kelvin Temperature scale with
absolute

0
°
K.

IDEOLOGY, PARTY IDENTIFICATION and OPINION
variables

may perhaps be treated as interval rather than merely
ordinal,

but they certainly are not ratio.

Ratio Variables (cont.)

Examples of ratio variables include:

NUMBER OF CHILDREN
or
AGE

(
uncoded
) [
individuals
]

SIZE/NUMBER OF MEMBERS
[
households

or
legislatures
]

SIZE OF POPULATION
[
nations
]

LEVEL OF INCOME
[
individuals

or
households
]

PER CAPITA INCOME
[
nations
]

LEVEL OF PROFITS
[
firms
]

SIZE OF BUDGET SURPLUS
[
governments
or

fiscal years
]

NUMBER OF VOTES FOR DEM CAND
[
elections
,
states
]

PERCENT OF VOTES FOR DEM CAND
[
elections
,
states
]

Even though LEVEL OF PROFITS or SIZE OF BUDGET
SURPLUS can have negative values, their zero points
are not arbitrary.

However, ratio comparisons can only be made between
observed values with the same [positive or negative] sign.

Freeway Exits and Levels of Measurement

The identification of freeway exits has changed over the
years, progressing from lower to higher levels of
measurement.

Nominal
: exits were once only given
names

(e.g., name
of crossroad or town),

So you could tell only whether the upcoming exit is your exit or
not.

Ordinal
: Exits then were ordered (e.g., from east to west)
and consecutively numbered, so you could tell

whether you have passed your exit or not, and

how many exits there are between your exit and where you are
now.

(
Otherwise exit numbers are uninformative =>
)

Interval/Ratio
: Exits are now usually numbered in terms
of their
distance

in miles from the state line,

so can tell how far you have to go to get to your exit

(and also that your exit is X times as far from the state line as
where you are now).

Ordinal Information May Not Be Informative

But Ordinal Is Better Than Nominal

Discrete vs. Continuous Variables

Quantitative

[interval and ratio] variables may be either
discrete

or
continuous
.

[Qualitative variables are pretty much necessarily discrete.]

A
discrete

variable has a finite (and typically small)
number of possible values that usually (if the variable is
quantitative) correspond to
whole numbers

(or
integers
)
only.

NUMBER OF CHILDREN
[
households
]

NUMBER OF MEMBERS
[
councils

or
legislatures
]

NUMBER OF ELECTORAL VOTES WON BY DEM
CANDIDATE
[
Presidential elections
] vs.

PERCENT OF POPULAR VOTE WON BY DEM CANDIDATE
[
Presidential elections
]

Continuous Variables

A
continuous

variable can have
any real number

(at least
within some range) as a value (i.e., including fractional
values between the integers).

So a continuous variable has (at least in principle) an
infinite number

of possible values,

so that given two cases with distinct values of the continuous
variable, it is in principle always possible that there is another
case with an intermediate value of the variable.

“Discrete” vs. “Continuous” temperature controls on a kitchen
range.

Digital vs. “old fashioned” thermometer

Continuous Variables (cont.)

Examples:

LEVEL OF DAILY HIGH TEMPERATURE
[
places

(cross
-
sectional),
days

(longitudinal)]

HEIGHT
,
WEIGHT
, and
AGE

[
individuals
]

Because we typically round off the value of such
variables to the nearest degree, inch, pound, year, etc.,
such variables may “look” discrete.

IDEOLOGY might be thought of as a “truly” continuous variable.

Some interval variables are in principle discrete but are
“virtually” continuous because they have so many
possible (numerical) values, e.g.,

RATE OF TURNOUT
[elections]

PERCENT OF VOTE FOR DEMOCRATIC CANDIDATE
[elections]