x

appliancepartAI and Robotics

Oct 19, 2013 (3 years and 11 months ago)

97 views


Lecture 8
-

1

DSCI 4520/5240
DATA MINING



Some slide material taken from: SAS Education

DSCI 4520/5240 Lecture 8


Neural Networks

DSCI 4520/5240 (DATA MINING)


Lecture 8
-

2

DSCI 4520/5240
DATA MINING

Objectives


Understand the structure of a Neural Network


Understand how the objective function leads to
overgeneralization


Understand how overgeneralization is counteracted by
taking into account the validation data



Lecture 8
-

3

DSCI 4520/5240
DATA MINING

Neural network models


(
multi
-
layer perceptrons
)


Often regarded as a mysterious and powerful predictive
modeling technique.

The most typical form of the model is, in fact, a natural
extension of a regression model:


A generalized linear model on a set of derived inputs


These derived inputs are themselves a generalized linear model on
the original inputs

The usual link for the derived input’s model is
inverse
hyperbolic tangent
, a shift and rescaling of the logit
function

Ability to approximate virtually any continuous
association between the inputs and the target


You simply need to specify the correct number of derived inputs



Lecture 8
-

4

DSCI 4520/5240

DATA MINING

( )

p


1

-

p


log

w
00
+
w
01
H
1
+
w
02
H
2
+
w
03
H
3


=

Neural Network Model

Training Data

x
2

x
1

tanh
-
1
(
H
1

) =
w
10
+
w
11
x
1
+
w
12
x
2


tanh
-
1
(
H
2

) =
w
20
+
w
21
x
1
+
w
22
x
2


tanh
-
1
(
H
3

) =
w
30
+
w
31
x
1
+
w
32
x
2


tanh(
x
)

x

0

-
1

1


Lecture 8
-

5

DSCI 4520/5240
DATA MINING

Input

layer,
hidden

layer,
output

layer

Multi
-
layer perceptron models were
originally inspired by
neurophysiology and the
interconnections between neurons.
The basic model form arranges
neurons in layers.

The
input layer

connects to a layer of
neurons called a
hidden

layer
, which,
in turn, connects to a final layer
called the
target
, or
output
, layer.

The structure of a multi
-
layer
perceptron lends itself to a graphical
representation called a
network
diagram
.

H
2

H
1

H
3

x
1

x
2

p



Lecture 8
-

6

DSCI 4520/5240

DATA MINING

Neural Network Diagram

Hidden Layers

Output Layer

Input

Layer

Hidden Unit


Lecture 8
-

7

DSCI 4520/5240

DATA MINING

NNs as a Universal Approximator

6+A
-
2B+3C

A

B

C


Lecture 8
-

8

DSCI 4520/5240

DATA MINING

An example

INPUT


HIDDEN

OUTPUT

ß
1
+

ß
2
AGE+

ß
3
INC

COMBINATION

ACTIVATION

tanh

1
+

ß
2
AGE+

ß
3
INC)



=A

ß
4
+

ß
5
AGE+

ß
6
INC

tanh

4
+

ß
5
AGE+

ß
6
INC)



=B

ß
7
+

ß
8
AGE+

ß
9
INC

tanh

7
+

ß
8
AGE+

ß
9
INC)


=C


COMBINATION

COMBINATION

ACTIVATION

COMBINATION

ACTIVATION

AGE

INCOME

ß
10
+
ß
11
A+

ß
12
B+
ß
13
C


RESPONSE TO
PROMOTION


Lecture 8
-

9

DSCI 4520/5240
DATA MINING

Objective Function

Predictions are compared to the actual values of the target via
an objective function.

An easy
-
to
-
understand example of an objective function is the
mean squared error (MSE) given by
:



Where:


N


is the number of training cases.


y
i


is the target value of the ith case.




is the predicted target value.



is the current estimate of the model parameters.



Lecture 8
-

10

DSCI 4520/5240

DATA MINING

( )

p


1

-

p


log

w
00
+
w
01
H
1
+
w
02
H
2
+
w
03
H
3


=

Neural Network Training

Training Data

x
2

x
1

tanh
-
1
(
H
1

) =
w
10
+
w
11
x
1
+
w
12
x
2


tanh
-
1
(
H
2

) =
w
20
+
w
21
x
1
+
w
22
x
2


tanh
-
1
(
H
3

) =
w
30
+
w
31
x
1
+
w
32
x
2


0

10

20

30

40

50

60

70

Objective function (
w
)


Lecture 8
-

11

DSCI 4520/5240

DATA MINING

Neural Network Training

Parameter 2

Parameter 1


Lecture 8
-

12

DSCI 4520/5240
DATA MINING

Convergence

Training concludes when small changes in the parameter
values no longer decrease the value of the objective
function.

The network is said to have reached a local minimum in the
objective.



Lecture 8
-

13

DSCI 4520/5240

DATA MINING

( )

p


1

-

p


log

w
00
+
w
01
H
1
+
w
02
H
2
+
w
03
H
3


=

Neural Network Training Convergence

Training Data

x
2

x
1

tanh
-
1
(
H
1

) =
w
10
+
w
11
x
1
+
w
12
x
2


tanh
-
1
(
H
2

) =
w
20
+
w
21
x
1
+
w
22
x
2


tanh
-
1
(
H
3

) =
w
30
+
w
31
x
1
+
w
32
x
2


0

10

20

30

40

50

60

70

Objective function (
w
)


Lecture 8
-

14

DSCI 4520/5240
DATA MINING

Overgeneralization

A small value for the objective function, when calculated on
training data, need not imply a small value for the
function on validation data.

Typically, improvement on the objective function is
observed on both the training and the validation data
over the first few iterations of the training process.

At convergence, however, the model is likely to be highly
overgeneralized

and the values of the objective function
computed on training and validation data may be quite
different.



Lecture 8
-

15

DSCI 4520/5240

DATA MINING

( )

p


1

-

p


log

w
00
+
w
01
H
1
+
w
02
H
2
+
w
03
H
3


=

Training Overgeneralization

Training Data

x
2

x
1

tanh
-
1
(
H
1

) =
w
10
+
w
11
x
1
+
w
12
x
2


tanh
-
1
(
H
2

) =
w
20
+
w
21
x
1
+
w
22
x
2


tanh
-
1
(
H
3

) =
w
30
+
w
31
x
1
+
w
32
x
2


0

10

20

30

40

50

60

70

Objective function (
w
)

Validation

Training


Lecture 8
-

16

DSCI 4520/5240
DATA MINING

Final Model

To compensate for overgeneralization, the overall average
profit, computed on
validation

data, is examined.

The final parameter estimates for the model are taken from
the training iteration with the
maximum validation profit
.



Lecture 8
-

17

DSCI 4520/5240

DATA MINING

( )

p


1

-

p


log

w
00
+
w
01
H
1
+
w
02
H
2
+
w
03
H
3


=

Neural Network Final Model

Training Data

x
2

x
1

tanh
-
1
(
H
1

) =
w
10
+
w
11
x
1
+
w
12
x
2


tanh
-
1
(
H
2

) =
w
20
+
w
21
x
1
+
w
22
x
2


tanh
-
1
(
H
3

) =
w
30
+
w
31
x
1
+
w
32
x
2


0

10

20

30

40

50

60

70

Profit