Neural Networks in Medicine - Center for Evidence-Based Imaging ...

cracklegulleyAI and Robotics

Oct 19, 2013 (3 years and 7 months ago)

96 views

Neural Networks

and Logistic Regression

Lucila Ohno
-
Machado


Decision Systems Group

Brigham and Women’s Hospital

Department of Radiology

Outline


Examples, neuroscience analogy


Perceptrons, MLPs: How they work


How the networks learn from examples


Backpropagation algorithm


Learning parameters


Overfitting

Examples in Medical

Pattern Recognition

Diagnosis


Protein Structure Prediction


Diagnosis of Giant Cell Arteritis


Diagnosis of Myocardial Infarction


Interpretation of ECGs


Interpretation of PET scans, Chest X
-
rays


Prognosis


Prognosis of Breast Cancer


Outcomes After Spinal Cord Injury

Myocardial Infarction Network

0.8

Myocardial Infarction

“Probability” of MI

1

1

2

1

50

Male

Age

Smoker

ECG: ST

Pain

Intensity

4

Pain

Duration

Elevation

Abdominal Pain Perceptron

Male

Age

Temp

WBC

Pain

Intensity

Pain

Duration

37

10

1

1

20

1

adjustable

weights

0

1

0

0

0

0

0

Appendicitis

Diverticulitis

Perforated

Non
-
specific

Cholecystitis

Small Bowel

Pancreatitis

Obstruction

Pain

Duodenal

Ulcer

Biological Analogy

Perceptrons

weights

Output units

No disease

Pneumonia

Flu

Meningitis

Input units

Cough

Headache

what we got

what we wanted

-

error

D


rule

change weights to

decrease the error

Perceptrons

Input units

Input to unit

j

:



a

j


=


S

w

ij

a

i

j

i

Input to unit

i

:



a

i

measured



value of variable

i

Output of unit

j

:

o

j


= 1/ (1 + e

-

(

a

j

+

q

j

)


)

Output

units

AND

input

output

00

01

10

1

1

0

0

0

1

y

x

1

x

2

w

1

w

2

f(x

1

w

1

+ x

2

w

2

) = y

f(0w

1

+ 0w

2

) = 0

f(0w

1

+ 1w

2

) = 0

f(1w

1

+ 0w

2

) = 0

f(1w

1

+ 1w

2

) = 1

q


= 0.5

f(a) =

1, for a


>

q


0, for a







q

some possible values for

w

1

and

w

2



w

1

w

2



0.20

0.20

0.25

0.40

0.35

0.40

0.30

0.20

q

XOR

input

output

00

01

10

1

1

0

1

1

0

y

x

1

x

2

w

1

w

2

f(x

1

w

1

+ x

2

w

2

) = y

f(0w

1

+ 0w

2

) = 0

f(0w

1

+ 1w

2

) = 1

f(1w

1

+ 0w

2

) = 1

f(1w

1

+ 1w

2

) = 0

q


= 0.5

f(a) =

1, for a


>

q


0, for a







q

some possible values for

w

1

and

w

2



w

1

w

2



q

XOR

input

output

00

01

10

1

1

0

1

1

0

y

x

1

x

2

q


= 0.5

f(a) =

1, for a


>

q


0, for a







q

q

z

q


= 0.5

w

3

w

4

f
(w
1
, w
2
, w
3
, w
4
, w
5
)

w
5

a possible set of values for w
s


(w
1
, w
2
, w
3
, w
4
, w
5
)


(0.3,0.3,1,1,
-
2)




w
1

w
2

XOR

input

output

00

01

10

1

1

0

1

1

0

f(a) =

1, for a


>

q


0, for a







q

q

f
(w
1
, w
2
, w
3
, w
4
, w
5
, w
6
)

a possible set of values for w
s


(w
1
, w
2
, w
3
, w
4
, w
5
, w
6
)


(0.6,
-
0.6,
-
0.7,0.8,1,1)




w
1

w
4

w
3

w
2

w
5

w
6

q


= 0.5 for all units

Linear Separation

Abdominal Pain

37

10

1

Appendicitis

Diverticulitis

Perforated

Non
-
specific

Cholecystitis

Small Bowel

Pancreatitis

1

20

Male

Age

T

emp

WBC

Pain

Intensity

1

Pain

Duration

0

1

0

0

0

0

0

adjustable

weights

Obstruction

Pain

Duodenal

Ulcer

Multilayered Perceptrons

Regression vs. Neural Networks

Logistic Regression


One independent variable

f(x) = 1


1 + e
-
(ax + cte)



Two

f(x) = 1


1 + e
-
(ax
1

+ bx
2

+ cte)

f(x)

x

1

0

Logistic function


p = 1


1 + e
-
(ax + cte)


log (p/1
-
p) = ax + cte


log(p/1
-
p)

x

1

0

linear

a

Logistic function


p = 1


1 + e
-
(ax + cte)


log (p/1
-
p) = ax + cte


linear

a

is the odds for

1 unit of increase in
x

Jargon Pseudo
-
Correspondence


Independent variable = input variable


Dependent variable = output variable


Coefficients = “weights”


Estimates = “targets”



Cycles = epoch

Logistic Regression Model

Inputs

Coefficients

a, b, c

Output

Independent
variables

x1, x2, x3

Dependent
variable

p

Prediction

Age

34

1

Gender

Stage

4

“Probability
of beingAlive”

5

8

4

0.6

S

S
is the sum of inputs * weights

Inputs

Coefficients

Output

Independent
variables

Prediction

Age

34

1

Gender

Stage

4

5

8

4

S = 34*.5 + 1*.4 + 4*.8 = 20.6

Logistic function

Inputs

Coefficients

Output

Independent
variables

Prediction

Age

34

1

Gender

Stage

4

.5

.8

.4

0.6

S

“Probability
of beingAlive”

p =

1


1 + e
-
(
S

+ cte)


Activation Functions...


Linear



Threshold or step function


Logistic, sigmoid, “squash”


Hyperbolic tangent


Neural Network Model

Inputs

Weights

Output

Independent
variables

Dependent
variable

Prediction

Age

34

2

Gender

Stage

4

.6

.5

.8

.2

.1

.3

.7

.2

Weights

Hidden
Layer

“Probability
of beingAlive”

0.6

S

S

.4

.2

S

“Combined logistic models”

Inputs

Weights

Output

Independent
variables

Dependent
variable

Prediction

Age

34

2

Gender

Stage

4

.6

.5

.8

.1

.7

Weights

Hidden
Layer

“Probability
of beingAlive”

0.6

S

Inputs

Weights

Output

Independent
variables

Dependent
variable

Prediction

Age

34

2

Gender

Stage

4

.5

.8

.2

.3

.2

Weights

Hidden
Layer

“Probability
of beingAlive”

0.6

S

Inputs

Weights

Output

Independent
variables

Dependent
variable

Prediction

Age

34

1

Gender

Stage

4

.6

.5

.8

.2

.1

.3

.7

.2

Weights

Hidden
Layer

“Probability
of beingAlive”

0.6

S

Not really,

no target for hidden units...

Weights

Independent
variables

Dependent
variable

Prediction

Age

34

2

Gender

Stage

4

.6

.5

.8

.2

.1

.3

.7

.2

Weights

Hidden
Layer

“Probability
of beingAlive”

0.6

S

S

.4

.2

S

Perceptrons

weights

Output units

No disease

Pneumonia

Flu

Meningitis

Input units

Cough

Headache

what we got

what we wanted

-

error

D


rule

change weights to

decrease the error

Hidden Units and Backpropagation

Error Functions


Mean Squared Error (for most problems)

S
(
t
-

o
)
2
/
n



Cross Entropy Error (for dichotomous or
binary outcomes)

-

S
(
t

ln
o
) + (1
-
t
) ln (1
-
o
)

Minimizing the Error

w

initial

w

trained

initial error

final error

Error surface

positive


change

negative


derivative

local minimum

Numerical Methods

Gradient descent

Local minimum

Global minimum

Error

Overfitting

Overfitted

Model

Real Distribution

Overfitting

b


= training set

a


= test set

Overfitted

model

tss

Epochs

min


(D

tss

)

tss


a

tss


b

Stopping criterion

Overfitting in Neural Nets

CHD

age

0

Overfitted model

“Real” model

cycles

error

Overfitted model

holdout

training

Parameter Estimation

Logistic regression


It models “just” one
function


Maximum likelihood


Fast


Optimizations


Fisher


Newton
-
Raphson

Neural network


It models several
functions


Backpropagation


Iterative


Slow


Optimizations


Quickprop


Scaled conjugate g.d.


Adaptive learning rate

What do you want?

Insight versus prediction

Insight into the model


Explain importance of
each variable


Assess model fit to
existing data

Accurate predictions


Make a good estimate
of the “real”
probability


Assess model
prediction in new data

Model Selection

Finding influential variables

Logistic


Forward


Backward


Stepwise


Arbitrary


All combinations


Relative risk

Neural Network


Weight elimination


Automatic Relevance
Determination


“Relevance”

Regression Diagnostics

Finding influential observations

Logistic


Analysis of residuals


Cook’s distance


Deviance


Difference in
coefficients when case
is left out

Neural Network


Ad
-
hoc

How accurate are predictions?


Construct training and test sets or bootstrap
to assess “unbiased” error


Assess


Discrimination


How model “separates” alive and dead


Calibration


How close the estimates are from “real” probability


“Unbiased” Evaluation

Training and Tests Sets


Training set is used to build the model (may
include holdout set to control for
overfitting)


Test set left aside for evaluation purposes



Ideal: yet another validation data set, from
different source to test if model generalizes
to other settings

Small sets: Cross
-
validation


Several training and test set pairs are
created so that the union of all test sets
corresponds exactly to the original set


Results from the different models are
pooled and overall performance is estimated


“Leave
-
n
-
out”


Jackknife

ECG Interpretation

Thyroid Diseases

Time Series

Hidden units

X

n

X

n+1

Input units

Y


=

X

n+2

Output units

(dependent variables)

(independent variables)

W

eights

(estimated parameters)

Time Series

Evaluation

Evaluation: Area Under ROCs

ROC Analysis: Variations

ROC

Area under ROC

Slope and

Intercept

Confidence interval

W

ilcoxon statistic

Expert Systems and Neural Nets

Model Comparison

(personal biases)











Modeling

Examples

Explanation





Effort


Needed


Provided


Rule
-
based Exp. Syst.

high


low


high

Bayesian Nets


high


low


moderate

Classification Trees

low


high


“high”

Neural Nets


low


high


low

Regression Models

high


moderate

moderate

Conclusion

Neural Networks are


mathematical models that resemble nonlinear regression
models, but are also useful to model nonlinearly separable
spaces


“knowledge acquisition tools” that learn from examples


Neural Networks in Medicine are used for:


pattern recognition (images, diseases, etc.)


exploratory analysis, control


predictive models

Conclusion


No final indication for using either logistic
regression or neural network


Try both, select best


Make unbiased evaluation


Compare statistically

Some References

Introductory Textbooks


Rumelhart, D.E., and McClelland, J.L. (eds) Parallel Distributed
Processing. MIT Press, Cambridge, 1986.


Hertz JA; Palmer RG; Krogh, AS. Introduction to the Theory of
Neural Computation. Addison
-
Wesley, Redwood City, 1991.


Pao, YH. Adaptive Pattern Recognition and Neural Networks.
Addison
-
Wesley, Reading, 1989.


Reggia JA. Neural computation in medicine. Artificial Intelligence in
Medicine, 1993 Apr, 5(2):143

57.


Miller AS; Blott BH; Hames TK. Review of neural network
applications in medical imaging and signal processing.Medical and
Biological Engineering and Computing, 1992 Sep, 30(5):449

64.


Bishop CM. Neural Networks for Pattern Recognition. Clarendon
Press, Oxford, 1995.