Hard versus Soft Science: Studies in Biometrics and Psychometrics

spotlessstareSecurity

Nov 29, 2013 (3 years and 11 months ago)

108 views

Hard versus Soft Science:
Studies in Biometrics and
Psychometrics


Peter H. Westfall

Horn Professor of Statistics

Dept. of ISQS

Goals of this Talk


Characterize “hard” and “soft” science


Biometrics


Psychometrics


Medicine


Differences concern


Measurement


Models


Action orientation


Describe pitfalls


Recommendations

Hard and Soft Measurements


Hard(er) endpoints


Patient genotype


Patient bilirubin level



Soft(er) endpoints


Patient
-
reported pain level


Patient reported quality of life

Characterizations


Hard endpoints


Meaningful units (eg, g/L)


Reliable


Accurate


Soft endpoints


Units not as meaningful (e.g., 1
-
5 Likert scale)


Less reliable


Accurate?

Measurement Scales

Hard Science

Soft Science

Measurement
:


23.2 grams

What do you think?


Disapprove Approve


1 2 3 4 5

Measurement:

“I dunno, … , uh, 4?”

A “Hard Science” Model

Genotype

Phenotype 1

Phenotype 2

Phenotype 3

Data for “Hard” Science Model

Gene1
Gene2
Eye Color
Metabolism
Schizophrenia
Diabetes
AA
AA
Brn
High
Yes
No
AA
AB
Blue
High
Yes
No
AB
AB
Blue
Med
No
No
AA
BB
Brn
High
No
Yes
AC
AA
Blue
Med
No
Yes
CC
AA
Green
Low
No
No
AA
BB
Brn
High
Yes
No
BB
AB
Hzl
High
Yes
Yes
AA
AB
Blue
High
No
Yes
AB
AB
Brn
Low
No
Yes






Phenotypes
Genotype
A “Soft Science” Model

“Intelligence”

Test 1

Test 2

Test 3

Data for “Soft” Science Model

Math
Verbal
Test1
Test2
Test3
Test4
?
?
79
75
73
79
?
?
79
69
73
86
?
?
76
82
83
86
?
?
80
82
84
74
?
?
81
80
82
76
?
?
78
83
84
75
?
?
85
86
83
76
?
?
84
80
76
78
?
?
84
78
81
77
?
?
88
84
81
87






Test Scores
Latent Constructs
What is “Intelligence”?


“An Intelligent person is one who scores
high on tests.”


Circular: Defined in terms of test scores, and
yet also is used to predict test scores.


Usual psychometric model simply
assumes

that there is a number “intelligence” existing in
each individual person (like a genotype).


It assumes all people in the universe are
perfectly ordered by their “intelligence.”


This is hogwash.

Assumed Psychometric Data

Math
Verbal
Test1
Test2
Test3
Test4
0.27
0.51
79
75
73
79
0.18
-1.53
79
69
73
86
-1.19
-0.97
76
82
83
86
-0.15
0.39
80
82
84
74
0.00
-0.53
81
80
82
76
-1.72
-0.40
78
83
84
75
-0.06
0.21
85
86
83
76
-0.21
1.49
84
80
76
78
-0.29
-0.37
84
78
81
77
2.76
-0.48
88
84
81
87






Test Scores
Latent Constructs
These numbers are
assumed

to exist!

People are perfectly ordered by them.

This is hogwash!

SEM (Structural Equations Model)

Measurement Model
Structural Model
Assumptions:
1. Existence of latent variables and
2. Structural form (linearity, constrai
nts)
3. Independence
4. Homogeneity of subjects
5
y
x
Y
X
B
 
 
   
 
  
  
   
. Normality (not as crucial as all the
others)
The Utility of Better Models

To bring the data into sharper focus:

Clearer focus with SEM model:

When is a Model Good?


Property 1: A good model is one whose
predictions (what comes out of the black box)
match reality well.



Property 2: A good model is one whose
constructs (what is inside the black box) match
reality well.



Property 3: A good model is one that has
prescriptive utility.

Property 1: Outputs


Both models predict data that “looks like”
the data we see:



SEM model predicts generally high test
scores for a person with “high intelligence.”



Genotype/phenotype model predicts certain
physical characteristics for people sharing a
common genotype.

Property 2: Model Constructs


The latent constructs
are not real
, thus the
model fails on this count.



The genotype/phenotype constructs are
real, and the directional arrows have clear
biological justification (genes code for
proteins that perform biological functions).

Property 3: Prescription


Prescriptive use of the SEM model:


Since latent factors do not exist, we cannot
use the model prescriptively.


But the model is often used for scoring; and
scores might be used prescriptively.


Prescriptive uses of Genotype/Phenotype
model:


Counseling


Saving lives

Is Psychometric Score
Construction Helpful?

Many

variables

Psycho
-

metric

Score

construction

Use score

In future

analysis

(Multiple variables

X
1
, X
2
,…,X
20
)

(Cronbach’s alpha,

SEM, discriminant

and convergent validity;

S= X
1
+X
3
+X
17
)

(Classification,

Prediction)

Example 1: Arthritis Pain
Measurement


Ask subjects to rate pain in feet, knees,
shoulder, hands, in morning; all in midday,
morning, and night.



Psychometric score: “Advancement of Arthritic
Condition” (essentially a summate of all
measures).



If used to evaluate a
knee therapy
, this score will
waste the company’s money and delay the
progress of science.

Example 2: The essence of Turtle

Measurements: Log(Length), Log(Width), Log(Height)


Reliability of T = Log(Length) + Log(Width) + Log(Height)

as a measure of the “essence of turtle”:


Males: Cronbach’s Alpha = 0.97

Females: Cronbach’s Alpha = 0.98


Exceptional! Alpha > 0.70 often considered “acceptable”.



T is the score we should use in further analysis!

Example 2 Continued:

Despite its high reliability, T is improper for

characterizing Female vs. Male turtles.


The best classifier is


W =
-
2.42Log(Length)
-
0.48Log(Width) + 3.74Log(Height).


(Females turtles are shaped differently from Males.)


The psychometric scale impedes science.

Example 3: Patient Condition


Measurements (Likert scale): X
i

= condition at week i
after start of treatment, i=1,2,3,4.



Psychometric scale: “Condition” = X
1
+X
2
+X
3
+X
4
.



But this is an inappropriate:


“Improvement” =
-
1.5X
1
-
0.5X
2
+ 0.5X
3
+1.5X
4

is better.



The pychometric scale will cost the drug company more,
delay approval, and possibly result in lives lost.

Revised Score Construction Model

Many

variables

Pilot study or

Training sample

Use score

In future

analysis

(Multiple variables

X
1
, X
2
,…,X
20
)

(Construct score using

scientific relevance and

statistical predictive ability;

S = (X
2

+ X
5
)


(X
7
+X
9
))


(Classification,

Prediction)

Follow the Money


Money talks: “Hard science” approaches
receive the money:


Data mining in business


Expensive customer scoring data


Analyze money spent, not intention to spend



Pharmaceutical company


exploration


genes, chemistry


experimentation
-

100’s of millions of dollars
change hands on a single clinical trial

Then why do we do so

much soft science?


Inertia, inbreeding


Journals


Universities, “research methods”



Money:


Drug trials:
$10,000 per subject


Undergraduate students:
$0 per subject

Inbreeding: The Exponential

BS (bogus science) Theory


BS
0


published

Time

0




1



2



3



4


BS
1


published


BS
1


published


BS
2


BS
2



BS
2


BS
2

3

3

3

3

3

3

3

3

Comparison


Hard Science:

Spend a winter collecting
and analyzing fungus from caves in
Northern Alaska



Soft Science:

Ask students to
pretend

they are fungus in caves in Northern
Alaska

Survey data on undergraduate students

Survey data on undergraduate students

analyzed via complex statistical model

Conclusions

Let’s move towards harder science:


Work harder to get relevant data


Use more real measures, less fictional


Use more models that


Predict reality


Have real constructs


Are prescriptive


Are falsifiable


Use more external validation