Conditional Stereotype Logistic Regression A new estimation command

minorbigarmSecurity

Nov 30, 2013 (3 years and 7 months ago)

68 views

1

Rob Woodruff

Battelle Memorial Institute, Health & Analytics

Email: woodruffr@battelle.org

Cynthia Ferre

Centers for Disease Control and Prevention


Conditional Stereotype Logistic Regression

A new estimation command

2

Overview


What is it?


-

Stereotype Logistic Regression


-

Conditional on what?


What‘s it good for?


Syntax and Examples


3

Constrained Multinomial Logistic Regression


Multinomial Model


-
Categorical Outcome Variable

-
Vector of Explanatory Variables

-
Related through the m
logits
:

4

Constrained Multinomial (continued)

-
The stereotype model imposes the constraints:

Note:
The phi’s are scalar quantities

5

It’s all about the phi’s


Full multinomial has m(p+1) parameters


Stereotype model has m
-
1 + m + p = 2m
-
1+p




The phi parameters give a way to quantify
ordinality

of the
outcome variable. If

Then we have evidence of ordinal effect.



Also allow tests of
distinguishability

of outcome categories

6

So what’s the condition?


The multinomial and stereotype logistic regression models
are implemented in
Stata

by
mlogit

and

slogit


Assume independence of observations, not true for matched
case
-
control data


For matched case control study, only independence of matched
groups (strata, panels, clusters, etc)


For 1:M matching, condition on stratum total for outcome
variable and focus instead on conditional likelihood


Do I have to?


Why condition on this particular event?

7

Conditional vs. Unconditional Likelihood

8

Conditional vs. Unconditional Likelihood

9

CSTEREO

cstereo

command

Basic syntax:


.
cstereo

depvar

indepvars

[if] [in], group(
varname
)
[options]

10

Example with Real Data:

Preterm Birth and Vitamin D


1:2 (some 1:1) Pooled, Matched Case
-
Control Study of 2,583
Mothers in 870 matched groups



A case defined as gestational age at delivery of <37 weeks

outcome4=3 (<32 weeks), outcome4=2, (32
-
35 weeks), outcome4=1
(36 weeks) and outcome4=0 (control: 37+ weeks)



Primary exposure variable of interest: Vitamin D levels,
ohd25_total: blood serum concentration of (25)OHD in
ng
/ml



Sample of other covariates measured:

edu

= 0/1 indicator of post
-
high school education

vitamin = 0/1 indicator of vitamin use during pregnancy

11

Example Continued (
nolog

option):

P
-
v
a
l
u
e
:

.
0
3
1
3
8
2
8
1
C
h
i
2

v
a
l
u
e

o
n

4

d
e
g
r
e
e
s

o
f

f
r
e
e
d
o
m
:

1
0
.
6
0
4
8
6
1
L
o
g
-
L
i
k
e
l
i
h
o
o
d

f
r
o
m

C
o
n
d
i
t
i
o
n
a
l

M
u
l
t
i
n
o
m
i
a
l

M
o
d
e
l
:


-
8
3
5
.
8
3
6
7
9






a
l
l

n
e
g
a
t
i
v
e

o
u
t
c
o
m
e
s
.
n
o
t
e
:

7
7

g
r
o
u
p
s

(
1
3
9

o
b
s
)

d
r
o
p
p
e
d

b
e
c
a
u
s
e

o
f

a
l
l

p
o
s
i
t
i
v
e

o
r
.

c
s
t
e
r
e
o

o
u
t
c
o
m
e
4

o
h
d
2
5
_
t
o
t
a
l

e
d
u

v
i
t
a
m
i
n
,

g
r
o
u
p
(
m
a
t
c
h
g
r
o
u
p
)

n
o
l
o
g
12

Example Continued:






















































































_
c
o
n
s





.
9
3
9
8
1
1
3



1
.
2
0
6
1
3
9





0
.
7
8



0
.
4
3
6




-
1
.
4
2
4
1
7
8






3
.
3
0
3
8
p
h
i
2































































































_
c
o
n
s





.
8
7
6
4
5
7
8



1
.
2
6
8
3
3
1





0
.
6
9



0
.
4
9
0




-
1
.
6
0
9
4
2
4





3
.
3
6
2
3
4
p
h
i
1





























































































v
i
t
a
m
i
n





.
1
3
0
1
3
6
9



.
1
9
5
4
5
1
6





0
.
6
7



0
.
5
0
6




-
.
2
5
2
9
4
1
3




.
5
1
3
2
1
5
1









e
d
u




-
.
4
0
1
0
3
9
1




.
4
3
1
5
8
7




-
0
.
9
3



0
.
3
5
3




-
1
.
2
4
6
9
3
4




.
4
4
4
8
5
5
9

o
h
d
2
5
_
t
o
t
a
l




-
.
0
0
7
3
6
8
4



.
0
1
4
4
9
1
6




-
0
.
5
1



0
.
6
1
1




-
.
0
3
5
7
7
1
4




.
0
2
1
0
3
4
6
x
b






























































































o
u
t
c
o
m
e
4








C
o
e
f
.



S
t
d
.

E
r
r
.






z




P
>
|
z
|





[
9
5
%

C
o
n
f
.

I
n
t
e
r
v
a
l
]














































































L
o
g

l
i
k
e
l
i
h
o
o
d

=

-
8
4
1
.
1
3
9
2
2























P
r
o
b

>

c
h
i
2





=





0
.
6
0
4
8


















































W
a
l
d

c
h
i
2
(
3
)




=







1
.
8
5


















































N
u
m
b
e
r

o
f

o
b
s



=







2
3
2
2
13

Interpretation of
cstereo

output:


Estimated beta coefficient of ohd25_total =
-
0.0074 with
95% confidence interval (
-
0.0358, 0.0210)


Odds ratio of being in <32 weeks gestational age compared to
control is exp(
-
0.0074) = 0.993 (0.965, 1.021)


Now for odds ratios for the 32
-
35 weeks and 36 week case
categories, we need the products of the parameters:


For standard errors, use Delta Method via
nlcom

14

Interpretation continued:






















































































_
n
l
_
1




-
.
0
0
6
9
2
4
9



.
0
0
7
2
7
5
7




-
0
.
9
5



0
.
3
4
1





-
.
0
2
1
1
8
5




.
0
0
7
3
3
5
1


















































































o
u
t
c
o
m
e
4








C
o
e
f
.



S
t
d
.

E
r
r
.






z




P
>
|
z
|





[
9
5
%

C
o
n
f
.

I
n
t
e
r
v
a
l
]





















































































_
n
l
_
1
:


[
x
b
]
o
h
d
2
5
_
t
o
t
a
l
*
[
p
h
i
2
]
_
c
o
n
s
.

n
l
c
o
m

[
x
b
]
o
h
d
2
5
_
t
o
t
a
l
*
[
p
h
i
2
]
_
c
o
n
s
Exponentiating

gives the odds ratio of being in the
32
-
35 weeks case category compare to controls of
0.994 with a 95% C.I. of (0.983, 1.004)

15

Constraints:


Are the 36 week and 32
-
35 weeks case categories
distinguishable?







a
l
l

n
e
g
a
t
i
v
e

o
u
t
c
o
m
e
s
.
n
o
t
e
:

7
7

g
r
o
u
p
s

(
1
3
9

o
b
s
)

d
r
o
p
p
e
d

b
e
c
a
u
s
e

o
f

a
l
l

p
o
s
i
t
i
v
e

o
r
.

c
s
t
e
r
e
o

o
u
t
c
o
m
e
4

o
h
d
2
5
_
t
o
t
a
l

e
d
u

v
i
t
a
m
i
n
,

g
r
o
u
p
(
m
a
t
c
h
g
r
o
u
p
)

n
o
l
o
g

c
o
n
s
t
r
a
i
n
t
s
(
1
)
.

c
o
n
s
t
r
a
i
n
t

1

[
p
h
i
1
]
_
c
o
n
s
=
[
p
h
i
2
]
_
c
o
n
s
16

Constraint Output






















































































_
c
o
n
s





.
9
4
1
7
8
3
6




1
.
2
4
2
9
1





0
.
7
6



0
.
4
4
9




-
1
.
4
9
4
2
7
6




3
.
3
7
7
8
4
3
p
h
i
2































































































_
c
o
n
s





.
9
4
1
7
8
3
6




1
.
2
4
2
9
1





0
.
7
6



0
.
4
4
9




-
1
.
4
9
4
2
7
6




3
.
3
7
7
8
4
3
p
h
i
1





























































































v
i
t
a
m
i
n





.
1
2
9
4
8
0
6



.
1
8
8
8
1
5
4





0
.
6
9



0
.
4
9
3




-
.
2
4
0
5
9
0
8




.
4
9
9
5
5
1
9









e
d
u




-
.
3
9
0
9
9
2
4



.
4
2
8
9
3
4
8




-
0
.
9
1



0
.
3
6
2




-
1
.
2
3
1
6
8
9




.
4
4
9
7
0
4
3

o
h
d
2
5
_
t
o
t
a
l




-
.
0
0
6
8
3
8
2




.
0
1
3
1
7
2




-
0
.
5
2



0
.
6
0
4




-
.
0
3
2
6
5
4
8




.
0
1
8
9
7
8
4
x
b






























































































o
u
t
c
o
m
e
4








C
o
e
f
.



S
t
d
.

E
r
r
.






z




P
>
|
z
|





[
9
5
%

C
o
n
f
.

I
n
t
e
r
v
a
l
]















































































(

1
)


[
p
h
i
1
]
_
c
o
n
s

-

[
p
h
i
2
]
_
c
o
n
s

=

0
L
o
g

l
i
k
e
l
i
h
o
o
d

=


-
8
4
1
.
1
4
5
4























P
r
o
b

>

c
h
i
2





=





0
.
6
2
9
3


















































W
a
l
d

c
h
i
2
(
3
)




=







1
.
7
3


















































N
u
m
b
e
r

o
f

o
b
s



=







2
3
2
2
17

Constraint Output


The log
-
likelihood from the constrained model is
-
841.145
compared to
-
841.139 for the unconstrained stereotype model



Difference of 0.006 gives a chi2 value of 0.012 on 1 degree
of freedom



P
-
value = 0.91



Unconstrained stereotype model does not fit significantly
better than the constrained and the two case categories are
indistinguishable

18

Relationship to Other Models for
Ordered/Categorical Outcomes


Constrained Multinomial




Not as parsimonious as the proportional odds model (
ologit
)
but not valid in outcome dependent sampling




Adjacent category model is (basically) a constrained
stereotype model. Also valid under outcome dependent
sampling

19

Limitations


Convergence Issues



Currently only a one dimensional stereotype model



Cannot currently force an ordering on the stereotype
parameters



Additional dependence structure


20

References:


Ferre C, et al; Maternal 25
-
Hydroxyvitamin D Status and the
Risk of Preterm Delivery: A Multi
-
Center Nested Case Control
Study; preprint


Mukherjee

B, Liu I,
Sinha

S; Analysis of matched case
-
control data with multiple ordered disease states;
Statistics in Medicine 2007


Ahn

J et. al.; Missing Exposure Date in Stereotype
Regression Model; Biometrics 2011


Andersen EB; Asymptotic Properties of Conditional Maximum
-
Likelihood Estimators; Journal of the Royal Statistical
Society 1970


Liang KY, Stewart WF;
Polychotomous

Logistic Regression
Methods for Matched Case
-
Control Studies with Multiple Case
or Control Groups; American Journal of Epidemiology 1987


Scott AJ, Wild CJ; Fitting Regression Models to Case
-
Contro

Data by Maximum Likelihood;
Biometrika

1997


Anderson JA; Regression and Ordered Categorical Variable;
Journal of the Royal Statistical Society 1984
\


Greenland S; Alternative Models for Ordinal Logistic
Regression; Statistics in Medicine 1994