Neural Networks
and Logistic Regression
Lucila Ohno

Machado
Decision Systems Group
Brigham and Women’s Hospital
Department of Radiology
Outline
•
Examples, neuroscience analogy
•
Perceptrons, MLPs: How they work
•
How the networks learn from examples
•
Backpropagation algorithm
•
Learning parameters
•
Overfitting
Examples in Medical
Pattern Recognition
Diagnosis
•
Protein Structure Prediction
•
Diagnosis of Giant Cell Arteritis
•
Diagnosis of Myocardial Infarction
•
Interpretation of ECGs
•
Interpretation of PET scans, Chest X

rays
Prognosis
•
Prognosis of Breast Cancer
•
Outcomes After Spinal Cord Injury
Myocardial Infarction Network
0.8
Myocardial Infarction
“Probability” of MI
1
1
2
1
50
Male
Age
Smoker
ECG: ST
Pain
Intensity
4
Pain
Duration
Elevation
Abdominal Pain Perceptron
Male
Age
Temp
WBC
Pain
Intensity
Pain
Duration
37
10
1
1
20
1
adjustable
weights
0
1
0
0
0
0
0
Appendicitis
Diverticulitis
Perforated
Non

specific
Cholecystitis
Small Bowel
Pancreatitis
Obstruction
Pain
Duodenal
Ulcer
Biological Analogy
Perceptrons
weights
Output units
No disease
Pneumonia
Flu
Meningitis
Input units
Cough
Headache
what we got
what we wanted

error
D
rule
change weights to
decrease the error
Perceptrons
Input units
Input to unit
j
:
a
j
=
S
w
ij
a
i
j
i
Input to unit
i
:
a
i
measured
value of variable
i
Output of unit
j
:
o
j
= 1/ (1 + e

(
a
j
+
q
j
)
)
Output
units
AND
input
output
00
01
10
1
1
0
0
0
1
y
x
1
x
2
w
1
w
2
f(x
1
w
1
+ x
2
w
2
) = y
f(0w
1
+ 0w
2
) = 0
f(0w
1
+ 1w
2
) = 0
f(1w
1
+ 0w
2
) = 0
f(1w
1
+ 1w
2
) = 1
q
= 0.5
f(a) =
1, for a
>
q
0, for a
q
some possible values for
w
1
and
w
2
w
1
w
2
0.20
0.20
0.25
0.40
0.35
0.40
0.30
0.20
q
XOR
input
output
00
01
10
1
1
0
1
1
0
y
x
1
x
2
w
1
w
2
f(x
1
w
1
+ x
2
w
2
) = y
f(0w
1
+ 0w
2
) = 0
f(0w
1
+ 1w
2
) = 1
f(1w
1
+ 0w
2
) = 1
f(1w
1
+ 1w
2
) = 0
q
= 0.5
f(a) =
1, for a
>
q
0, for a
q
some possible values for
w
1
and
w
2
w
1
w
2
q
XOR
input
output
00
01
10
1
1
0
1
1
0
y
x
1
x
2
q
= 0.5
f(a) =
1, for a
>
q
0, for a
q
q
z
q
= 0.5
w
3
w
4
f
(w
1
, w
2
, w
3
, w
4
, w
5
)
w
5
a possible set of values for w
s
(w
1
, w
2
, w
3
, w
4
, w
5
)
(0.3,0.3,1,1,

2)
w
1
w
2
XOR
input
output
00
01
10
1
1
0
1
1
0
f(a) =
1, for a
>
q
0, for a
q
q
f
(w
1
, w
2
, w
3
, w
4
, w
5
, w
6
)
a possible set of values for w
s
(w
1
, w
2
, w
3
, w
4
, w
5
, w
6
)
(0.6,

0.6,

0.7,0.8,1,1)
w
1
w
4
w
3
w
2
w
5
w
6
q
= 0.5 for all units
Linear Separation
Abdominal Pain
37
10
1
Appendicitis
Diverticulitis
Perforated
Non

specific
Cholecystitis
Small Bowel
Pancreatitis
1
20
Male
Age
T
emp
WBC
Pain
Intensity
1
Pain
Duration
0
1
0
0
0
0
0
adjustable
weights
Obstruction
Pain
Duodenal
Ulcer
Multilayered Perceptrons
Regression vs. Neural Networks
Logistic Regression
•
One independent variable
f(x) = 1
1 + e

(ax + cte)
•
Two
f(x) = 1
1 + e

(ax
1
+ bx
2
+ cte)
f(x)
x
1
0
Logistic function
p = 1
1 + e

(ax + cte)
log (p/1

p) = ax + cte
log(p/1

p)
x
1
0
linear
a
Logistic function
p = 1
1 + e

(ax + cte)
log (p/1

p) = ax + cte
linear
a
is the odds for
1 unit of increase in
x
Jargon Pseudo

Correspondence
•
Independent variable = input variable
•
Dependent variable = output variable
•
Coefficients = “weights”
•
Estimates = “targets”
•
Cycles = epoch
Logistic Regression Model
Inputs
Coefficients
a, b, c
Output
Independent
variables
x1, x2, x3
Dependent
variable
p
Prediction
Age
34
1
Gender
Stage
4
“Probability
of beingAlive”
5
8
4
0.6
S
S
is the sum of inputs * weights
Inputs
Coefficients
Output
Independent
variables
Prediction
Age
34
1
Gender
Stage
4
5
8
4
S = 34*.5 + 1*.4 + 4*.8 = 20.6
Logistic function
Inputs
Coefficients
Output
Independent
variables
Prediction
Age
34
1
Gender
Stage
4
.5
.8
.4
0.6
S
“Probability
of beingAlive”
p =
1
1 + e

(
S
+ cte)
Activation Functions...
•
Linear
•
Threshold or step function
•
Logistic, sigmoid, “squash”
•
Hyperbolic tangent
Neural Network Model
Inputs
Weights
Output
Independent
variables
Dependent
variable
Prediction
Age
34
2
Gender
Stage
4
.6
.5
.8
.2
.1
.3
.7
.2
Weights
Hidden
Layer
“Probability
of beingAlive”
0.6
S
S
.4
.2
S
“Combined logistic models”
Inputs
Weights
Output
Independent
variables
Dependent
variable
Prediction
Age
34
2
Gender
Stage
4
.6
.5
.8
.1
.7
Weights
Hidden
Layer
“Probability
of beingAlive”
0.6
S
Inputs
Weights
Output
Independent
variables
Dependent
variable
Prediction
Age
34
2
Gender
Stage
4
.5
.8
.2
.3
.2
Weights
Hidden
Layer
“Probability
of beingAlive”
0.6
S
Inputs
Weights
Output
Independent
variables
Dependent
variable
Prediction
Age
34
1
Gender
Stage
4
.6
.5
.8
.2
.1
.3
.7
.2
Weights
Hidden
Layer
“Probability
of beingAlive”
0.6
S
Not really,
no target for hidden units...
Weights
Independent
variables
Dependent
variable
Prediction
Age
34
2
Gender
Stage
4
.6
.5
.8
.2
.1
.3
.7
.2
Weights
Hidden
Layer
“Probability
of beingAlive”
0.6
S
S
.4
.2
S
Perceptrons
weights
Output units
No disease
Pneumonia
Flu
Meningitis
Input units
Cough
Headache
what we got
what we wanted

error
D
rule
change weights to
decrease the error
Hidden Units and Backpropagation
Error Functions
•
Mean Squared Error (for most problems)
S
(
t

o
)
2
/
n
•
Cross Entropy Error (for dichotomous or
binary outcomes)

S
(
t
ln
o
) + (1

t
) ln (1

o
)
Minimizing the Error
w
initial
w
trained
initial error
final error
Error surface
positive
change
negative
derivative
local minimum
Numerical Methods
Gradient descent
Local minimum
Global minimum
Error
Overfitting
Overfitted
Model
Real Distribution
Overfitting
b
= training set
a
= test set
Overfitted
model
tss
Epochs
min
(D
tss
)
tss
a
tss
b
Stopping criterion
Overfitting in Neural Nets
CHD
age
0
Overfitted model
“Real” model
cycles
error
Overfitted model
holdout
training
Parameter Estimation
Logistic regression
•
It models “just” one
function
–
Maximum likelihood
–
Fast
–
Optimizations
•
Fisher
•
Newton

Raphson
Neural network
•
It models several
functions
–
Backpropagation
–
Iterative
–
Slow
–
Optimizations
•
Quickprop
•
Scaled conjugate g.d.
•
Adaptive learning rate
What do you want?
Insight versus prediction
Insight into the model
•
Explain importance of
each variable
•
Assess model fit to
existing data
Accurate predictions
•
Make a good estimate
of the “real”
probability
•
Assess model
prediction in new data
Model Selection
Finding influential variables
Logistic
•
Forward
•
Backward
•
Stepwise
•
Arbitrary
•
All combinations
•
Relative risk
Neural Network
•
Weight elimination
•
Automatic Relevance
Determination
•
“Relevance”
Regression Diagnostics
Finding influential observations
Logistic
•
Analysis of residuals
•
Cook’s distance
•
Deviance
•
Difference in
coefficients when case
is left out
Neural Network
•
Ad

hoc
How accurate are predictions?
•
Construct training and test sets or bootstrap
to assess “unbiased” error
•
Assess
–
Discrimination
•
How model “separates” alive and dead
–
Calibration
•
How close the estimates are from “real” probability
“Unbiased” Evaluation
Training and Tests Sets
•
Training set is used to build the model (may
include holdout set to control for
overfitting)
•
Test set left aside for evaluation purposes
•
Ideal: yet another validation data set, from
different source to test if model generalizes
to other settings
Small sets: Cross

validation
•
Several training and test set pairs are
created so that the union of all test sets
corresponds exactly to the original set
•
Results from the different models are
pooled and overall performance is estimated
•
“Leave

n

out”
•
Jackknife
ECG Interpretation
Thyroid Diseases
Time Series
Hidden units
X
n
X
n+1
Input units
Y
=
X
n+2
Output units
(dependent variables)
(independent variables)
W
eights
(estimated parameters)
Time Series
Evaluation
Evaluation: Area Under ROCs
ROC Analysis: Variations
ROC
Area under ROC
Slope and
Intercept
Confidence interval
W
ilcoxon statistic
Expert Systems and Neural Nets
Model Comparison
(personal biases)
Modeling
Examples
Explanation
Effort
Needed
Provided
Rule

based Exp. Syst.
high
low
high
Bayesian Nets
high
low
moderate
Classification Trees
low
high
“high”
Neural Nets
low
high
low
Regression Models
high
moderate
moderate
Conclusion
Neural Networks are
•
mathematical models that resemble nonlinear regression
models, but are also useful to model nonlinearly separable
spaces
•
“knowledge acquisition tools” that learn from examples
•
Neural Networks in Medicine are used for:
–
pattern recognition (images, diseases, etc.)
–
exploratory analysis, control
–
predictive models
Conclusion
•
No final indication for using either logistic
regression or neural network
•
Try both, select best
•
Make unbiased evaluation
•
Compare statistically
Some References
Introductory Textbooks
•
Rumelhart, D.E., and McClelland, J.L. (eds) Parallel Distributed
Processing. MIT Press, Cambridge, 1986.
•
Hertz JA; Palmer RG; Krogh, AS. Introduction to the Theory of
Neural Computation. Addison

Wesley, Redwood City, 1991.
•
Pao, YH. Adaptive Pattern Recognition and Neural Networks.
Addison

Wesley, Reading, 1989.
•
Reggia JA. Neural computation in medicine. Artificial Intelligence in
Medicine, 1993 Apr, 5(2):143
–
57.
•
Miller AS; Blott BH; Hames TK. Review of neural network
applications in medical imaging and signal processing.Medical and
Biological Engineering and Computing, 1992 Sep, 30(5):449
–
64.
•
Bishop CM. Neural Networks for Pattern Recognition. Clarendon
Press, Oxford, 1995.
Comments 0
Log in to post a comment