# Neural Networks in Medicine - Center for Evidence-Based Imaging ...

AI and Robotics

Oct 19, 2013 (4 years and 6 months ago)

113 views

Neural Networks

and Logistic Regression

Lucila Ohno
-

Decision Systems Group

Brigham and Women’s Hospital

Outline

Examples, neuroscience analogy

Perceptrons, MLPs: How they work

How the networks learn from examples

Backpropagation algorithm

Learning parameters

Overfitting

Examples in Medical

Pattern Recognition

Diagnosis

Protein Structure Prediction

Diagnosis of Giant Cell Arteritis

Diagnosis of Myocardial Infarction

Interpretation of ECGs

Interpretation of PET scans, Chest X
-
rays

Prognosis

Prognosis of Breast Cancer

Outcomes After Spinal Cord Injury

Myocardial Infarction Network

0.8

Myocardial Infarction

“Probability” of MI

1

1

2

1

50

Male

Age

Smoker

ECG: ST

Pain

Intensity

4

Pain

Duration

Elevation

Abdominal Pain Perceptron

Male

Age

Temp

WBC

Pain

Intensity

Pain

Duration

37

10

1

1

20

1

weights

0

1

0

0

0

0

0

Appendicitis

Diverticulitis

Perforated

Non
-
specific

Cholecystitis

Small Bowel

Pancreatitis

Obstruction

Pain

Duodenal

Ulcer

Biological Analogy

Perceptrons

weights

Output units

No disease

Pneumonia

Flu

Meningitis

Input units

Cough

what we got

what we wanted

-

error

D

rule

change weights to

decrease the error

Perceptrons

Input units

Input to unit

j

:

a

j

=

S

w

ij

a

i

j

i

Input to unit

i

:

a

i

measured

value of variable

i

Output of unit

j

:

o

j

= 1/ (1 + e

-

(

a

j

+

q

j

)

)

Output

units

AND

input

output

00

01

10

1

1

0

0

0

1

y

x

1

x

2

w

1

w

2

f(x

1

w

1

+ x

2

w

2

) = y

f(0w

1

+ 0w

2

) = 0

f(0w

1

+ 1w

2

) = 0

f(1w

1

+ 0w

2

) = 0

f(1w

1

+ 1w

2

) = 1

q

= 0.5

f(a) =

1, for a

>

q

0, for a

q

some possible values for

w

1

and

w

2

w

1

w

2

0.20

0.20

0.25

0.40

0.35

0.40

0.30

0.20

q

XOR

input

output

00

01

10

1

1

0

1

1

0

y

x

1

x

2

w

1

w

2

f(x

1

w

1

+ x

2

w

2

) = y

f(0w

1

+ 0w

2

) = 0

f(0w

1

+ 1w

2

) = 1

f(1w

1

+ 0w

2

) = 1

f(1w

1

+ 1w

2

) = 0

q

= 0.5

f(a) =

1, for a

>

q

0, for a

q

some possible values for

w

1

and

w

2

w

1

w

2

q

XOR

input

output

00

01

10

1

1

0

1

1

0

y

x

1

x

2

q

= 0.5

f(a) =

1, for a

>

q

0, for a

q

q

z

q

= 0.5

w

3

w

4

f
(w
1
, w
2
, w
3
, w
4
, w
5
)

w
5

a possible set of values for w
s

(w
1
, w
2
, w
3
, w
4
, w
5
)

(0.3,0.3,1,1,
-
2)

w
1

w
2

XOR

input

output

00

01

10

1

1

0

1

1

0

f(a) =

1, for a

>

q

0, for a

q

q

f
(w
1
, w
2
, w
3
, w
4
, w
5
, w
6
)

a possible set of values for w
s

(w
1
, w
2
, w
3
, w
4
, w
5
, w
6
)

(0.6,
-
0.6,
-
0.7,0.8,1,1)

w
1

w
4

w
3

w
2

w
5

w
6

q

= 0.5 for all units

Linear Separation

Abdominal Pain

37

10

1

Appendicitis

Diverticulitis

Perforated

Non
-
specific

Cholecystitis

Small Bowel

Pancreatitis

1

20

Male

Age

T

emp

WBC

Pain

Intensity

1

Pain

Duration

0

1

0

0

0

0

0

weights

Obstruction

Pain

Duodenal

Ulcer

Multilayered Perceptrons

Regression vs. Neural Networks

Logistic Regression

One independent variable

f(x) = 1

1 + e
-
(ax + cte)

Two

f(x) = 1

1 + e
-
(ax
1

+ bx
2

+ cte)

f(x)

x

1

0

Logistic function

p = 1

1 + e
-
(ax + cte)

log (p/1
-
p) = ax + cte

log(p/1
-
p)

x

1

0

linear

a

Logistic function

p = 1

1 + e
-
(ax + cte)

log (p/1
-
p) = ax + cte

linear

a

is the odds for

1 unit of increase in
x

Jargon Pseudo
-
Correspondence

Independent variable = input variable

Dependent variable = output variable

Coefficients = “weights”

Estimates = “targets”

Cycles = epoch

Logistic Regression Model

Inputs

Coefficients

a, b, c

Output

Independent
variables

x1, x2, x3

Dependent
variable

p

Prediction

Age

34

1

Gender

Stage

4

“Probability
of beingAlive”

5

8

4

0.6

S

S
is the sum of inputs * weights

Inputs

Coefficients

Output

Independent
variables

Prediction

Age

34

1

Gender

Stage

4

5

8

4

S = 34*.5 + 1*.4 + 4*.8 = 20.6

Logistic function

Inputs

Coefficients

Output

Independent
variables

Prediction

Age

34

1

Gender

Stage

4

.5

.8

.4

0.6

S

“Probability
of beingAlive”

p =

1

1 + e
-
(
S

+ cte)

Activation Functions...

Linear

Threshold or step function

Logistic, sigmoid, “squash”

Hyperbolic tangent

Neural Network Model

Inputs

Weights

Output

Independent
variables

Dependent
variable

Prediction

Age

34

2

Gender

Stage

4

.6

.5

.8

.2

.1

.3

.7

.2

Weights

Hidden
Layer

“Probability
of beingAlive”

0.6

S

S

.4

.2

S

“Combined logistic models”

Inputs

Weights

Output

Independent
variables

Dependent
variable

Prediction

Age

34

2

Gender

Stage

4

.6

.5

.8

.1

.7

Weights

Hidden
Layer

“Probability
of beingAlive”

0.6

S

Inputs

Weights

Output

Independent
variables

Dependent
variable

Prediction

Age

34

2

Gender

Stage

4

.5

.8

.2

.3

.2

Weights

Hidden
Layer

“Probability
of beingAlive”

0.6

S

Inputs

Weights

Output

Independent
variables

Dependent
variable

Prediction

Age

34

1

Gender

Stage

4

.6

.5

.8

.2

.1

.3

.7

.2

Weights

Hidden
Layer

“Probability
of beingAlive”

0.6

S

Not really,

no target for hidden units...

Weights

Independent
variables

Dependent
variable

Prediction

Age

34

2

Gender

Stage

4

.6

.5

.8

.2

.1

.3

.7

.2

Weights

Hidden
Layer

“Probability
of beingAlive”

0.6

S

S

.4

.2

S

Perceptrons

weights

Output units

No disease

Pneumonia

Flu

Meningitis

Input units

Cough

what we got

what we wanted

-

error

D

rule

change weights to

decrease the error

Hidden Units and Backpropagation

Error Functions

Mean Squared Error (for most problems)

S
(
t
-

o
)
2
/
n

Cross Entropy Error (for dichotomous or
binary outcomes)

-

S
(
t

ln
o
) + (1
-
t
) ln (1
-
o
)

Minimizing the Error

w

initial

w

trained

initial error

final error

Error surface

positive

change

negative

derivative

local minimum

Numerical Methods

Local minimum

Global minimum

Error

Overfitting

Overfitted

Model

Real Distribution

Overfitting

b

= training set

a

= test set

Overfitted

model

tss

Epochs

min

(D

tss

)

tss

a

tss

b

Stopping criterion

Overfitting in Neural Nets

CHD

age

0

Overfitted model

“Real” model

cycles

error

Overfitted model

holdout

training

Parameter Estimation

Logistic regression

It models “just” one
function

Maximum likelihood

Fast

Optimizations

Fisher

Newton
-
Raphson

Neural network

It models several
functions

Backpropagation

Iterative

Slow

Optimizations

Quickprop

Scaled conjugate g.d.

What do you want?

Insight versus prediction

Insight into the model

Explain importance of
each variable

Assess model fit to
existing data

Accurate predictions

Make a good estimate
of the “real”
probability

Assess model
prediction in new data

Model Selection

Finding influential variables

Logistic

Forward

Backward

Stepwise

Arbitrary

All combinations

Relative risk

Neural Network

Weight elimination

Automatic Relevance
Determination

“Relevance”

Regression Diagnostics

Finding influential observations

Logistic

Analysis of residuals

Cook’s distance

Deviance

Difference in
coefficients when case
is left out

Neural Network

-
hoc

How accurate are predictions?

Construct training and test sets or bootstrap
to assess “unbiased” error

Assess

Discrimination

How model “separates” alive and dead

Calibration

How close the estimates are from “real” probability

“Unbiased” Evaluation

Training and Tests Sets

Training set is used to build the model (may
include holdout set to control for
overfitting)

Test set left aside for evaluation purposes

Ideal: yet another validation data set, from
different source to test if model generalizes
to other settings

Small sets: Cross
-
validation

Several training and test set pairs are
created so that the union of all test sets
corresponds exactly to the original set

Results from the different models are
pooled and overall performance is estimated

“Leave
-
n
-
out”

Jackknife

ECG Interpretation

Thyroid Diseases

Time Series

Hidden units

X

n

X

n+1

Input units

Y

=

X

n+2

Output units

(dependent variables)

(independent variables)

W

eights

(estimated parameters)

Time Series

Evaluation

Evaluation: Area Under ROCs

ROC Analysis: Variations

ROC

Area under ROC

Slope and

Intercept

Confidence interval

W

ilcoxon statistic

Expert Systems and Neural Nets

Model Comparison

(personal biases)

Modeling

Examples

Explanation

Effort

Needed

Provided

Rule
-
based Exp. Syst.

high

low

high

Bayesian Nets

high

low

moderate

Classification Trees

low

high

“high”

Neural Nets

low

high

low

Regression Models

high

moderate

moderate

Conclusion

Neural Networks are

mathematical models that resemble nonlinear regression
models, but are also useful to model nonlinearly separable
spaces

“knowledge acquisition tools” that learn from examples

Neural Networks in Medicine are used for:

pattern recognition (images, diseases, etc.)

exploratory analysis, control

predictive models

Conclusion

No final indication for using either logistic
regression or neural network

Try both, select best

Make unbiased evaluation

Compare statistically

Some References

Introductory Textbooks

Rumelhart, D.E., and McClelland, J.L. (eds) Parallel Distributed
Processing. MIT Press, Cambridge, 1986.

Hertz JA; Palmer RG; Krogh, AS. Introduction to the Theory of
-
Wesley, Redwood City, 1991.

Pao, YH. Adaptive Pattern Recognition and Neural Networks.
-

Reggia JA. Neural computation in medicine. Artificial Intelligence in
Medicine, 1993 Apr, 5(2):143

57.

Miller AS; Blott BH; Hames TK. Review of neural network
applications in medical imaging and signal processing.Medical and
Biological Engineering and Computing, 1992 Sep, 30(5):449

64.

Bishop CM. Neural Networks for Pattern Recognition. Clarendon
Press, Oxford, 1995.