Advanced information retreival

chickenchairwomanΤεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

68 εμφανίσεις

Advanced information retreival

Chapter 02: Modeling
-



Neural Network Model





Neural Network Model


A neural network is an oversimplified representation
of the neuron interconnections in the human brain:


nodes are processing units


edges are synaptic connections


the strength of a propagating signal is modelled by a
weight assigned to each edge


the state of a node is defined by its
activation level


depending on its activation level, a node might issue
an output signal

Neural Networks


Neural Networks


Complex learning systems recognized in animal brains


Single neuron has simple structure


Interconnected sets of neurons perform complex learning tasks


Human brain has 10
15

synaptic connections


Artificial Neural Networks

attempt to replicate non
-
linear
learning found in nature

Dendrites

Cell Body

Axon

Neural Networks
(
cont’d
)


Dendrites gather inputs from other neurons and combine
information


Then generate non
-
linear response when threshold reached


Signal sent to other neurons via axon







Artificial neuron model is similar


Data inputs (x
i
) are collected from upstream neurons input to
combination function (sigma)






Neural Networks

(
cont’d
)


Activation function reads combined input and produces non
-
linear response (y)


Response channeled downstream to other neurons


What problems applicable to Neural Networks?


Quite robust with respect to noisy data


Can learn and work around erroneous data


Results opaque to human interpretation


Often require long training times






Input and Output Encoding


Neural Networks require attribute values encoded to [0, 1]


Numeric


Apply Min
-
max Normalization to continuous variables





Works well when Min and Max known


Also assumes new data values occur within Min
-
Max range


Values outside range may be rejected or mapped to Min or Max



Input and Output Encoding
(
cont’d
)


Output



Neural Networks always return continuous values [0, 1]


Many classification problems have two outcomes


Solution uses threshold established
a priori

in single output node
to separate classes


For example, target variable is “leave” or “stay”


Threshold value is “leave if output >= 0.67”


Single output node value = 0.72 classifies record as “leave”

Simple Example of a Neural
Network








Neural Network consists of
layered
,
feedforward
,
completely
connected

network of nodes


Feedforward restricts network flow to single direction


Flow does not loop or cycle


Network composed of two or more layers

Node
1

Node
2

Node
3

Node
B

Node
A

Node
Z

W
1A

W
1B

W
2A

W
2B

W
AZ

W
3A

W
3B

W
0A

W
BZ

W
0Z

W
0B

Input Layer

Hidden Layer

Output Layer

Simple Example of a Neural
Network
(
cont’d
)


Most networks have
Input
,
Hidden
,
Output

layers


Network may contain more than one hidden layer


Network is completely connected


Each node in given layer, connected to every node in next layer


Every connection has weight (W
ij
) associated with it


Weight values randomly assigned 0 to 1 by algorithm


Number of input nodes dependent on number of predictors


Number of hidden and output nodes configurable


Simple Example of a Neural Network (
cont
)


Combination function produces linear combination of node
inputs and connection weights to single scalar value





For node j, x
ij

is i
th

input


W
ij

is weight associated with i
th

input node


I+ 1 inputs to node j


x
1
, x
2
, ..., x
I

are inputs from upstream nodes


x
0

is
constant input

value = 1.0


Each input node has extra input W
0j
x
0j

= W
0j


Node
1

Node
2

Node
3

Node
B

Node
A

Node
Z

W
1A

W
1B

W
2A

W
2B

W
AZ

W
3A

W
3B

W
0A

W
BZ

W
0Z

W
0B

Input Layer

Hidden Layer

Output Layer

Simple Example of a Neural
Network
(
cont’d
)





The scalar value computed for hidden layer Node A equals





For Node A, net
A

= 1.32 is input to activation function


Neurons “fire” in biological organisms


Signals sent between neurons when combination of inputs cross
threshold



x
0

= 1.0

W
0A

= 0.5

W
0B

= 0.7

W
0Z

= 0.5

x
1

= 0.4

W
1A

= 0.6

W
1B

= 0.9

W
AZ

= 0.9

x
2

= 0.2

W
2A

= 0.8

W
2B

= 0.8

W
BZ

= 0.9

x
3

= 0.7

W
3A

= 0.6

W
3B

= 0.4

Simple Example of a Neural
Network
(
cont’d
)


Firing response
not necessarily linearly related

to increase in
input stimulation


Neural Networks model behavior using non
-
linear activation
function


Sigmoid function

most commonly used





In Node A, sigmoid function takes net
A

= 1.32 as input and
produces output




Simple Example of a Neural
Network
(
cont’d
)


Node A outputs 0.7892 along connection to Node Z, and
becomes component of net
Z


Before net
Z

is computed, contribution from Node B required







Node Z combines outputs from Node A and Node B, through
net
Z





Simple Example of a Neural
Network
(
cont’d
)


Inputs to Node Z not data attribute values


Rather, outputs are from sigmoid function in upstream nodes








Value 0.8750 output from Neural Network on first pass


Represents predicted value for target variable, given first
observation





Sigmoid Activation Function





Sigmoid function combines
nearly linear
,
curvilinear
, and
nearly
constant behavior

depending on input value


Function nearly linear for domain values
-
1 < x < 1


Becomes curvilinear as values move away from center


At extreme values, f(
x
) is nearly constant


Moderate increments in
x

produce variable increase in f(
x
),
depending on location of
x


Sometimes called “Squashing Function”


Takes real
-
valued input and returns values [0, 1]












Back
-
Propagation


Neural Networks are supervised learning method


Require target variable


Each observation passed through network results in output
value


Output value compared to actual value of target variable


(Actual


Output) = Error


Prediction error analogous to residuals in regression models


Most networks use Sum of Squares (SSE) to measure how well
predictions fit target values


Back
-
Propagation
(
cont’d
)


Squared prediction errors summed over all output nodes, and
all records in data set


Model weights constructed that minimize SSE


Actual values that minimize SSE are unknown


Weights estimated, given the data set


Back
-
Propagation Rules


Back
-
propagation percolates prediction error for record back
through network


Partitioned responsibility for prediction error assigned to various
connections


Back
-
propagation rules defined
(Mitchell)


Back
-
Propagation Rules
(
cont’d
)


Error responsibility computed using partial derivative of the
sigmoid function with respect to net
j


Values take one of two forms







Rules show why input values require normalization


Large input values x
ij

would dominate weight adjustment


Error propagation would be overwhelmed, and learning stifled




Example of Back
-
Propagation


Recall that first pass through network yielded
output

= 0.8750


Assume actual target value = 0.8, and learning rate = 0.01


Prediction error = 0.8
-

0.8750 =
-
0.075


Neural Networks use
stochastic

back
-
propagation


Weights updated after each record processed by network


Adjusting the weights using back
-
propagation shown next



Error responsibility for Node Z, an output node, found first






Node
1

Node
2

Node
3

Node
B

Node
A

Node
Z

W
1A

W
1B

W
2A

W
2B

W
AZ

W
3A

W
3B

W
0A

W
BZ

W
0Z

W
0B

Input Layer

Hidden Layer

Output Layer

Example of Back
-
Propagation
(
cont’d
)


Now adjust “constant” weight w
0Z

using rules






Move upstream to Node A, a hidden layer node


Only node downstream from Node A is Node Z







Example of Back
-
Propagation
(
cont’d
)


Adjust weight w
AZ

using back
-
propagation rules





Connection weight between Node A and Node Z adjusted from
0.9 to 0.899353



Next, Node B is hidden layer node


Only node downstream from Node B is Node Z





Example of Back
-
Propagation
(
cont’d
)


Adjust weight w
BZ

using back
-
propagation rules





Connection weight between Node B and Node Z adjusted from
0.9 to 0.89933



Similarly, application of back
-
propagation rules continues to
input layer nodes


Weights {w
1A
, w
2A
, w
3A
, w
0A
} and {w
1B
, w
2B
, w
3B
, w
0B
} updated
by process






Example of Back
-
Propagation
(
cont’d
)


Now, all network weights in model are updated


Each iteration based on single record from data set


Summary


Network calculated predicted value for target variable


Prediction error derived


Prediction error percolated back through network


Weights adjusted to generate smaller prediction error


Process repeats record by record

Termination Criteria


Many passes through data set performed


Constantly adjusting weights to reduce prediction error


When to terminate?



Stopping criterion may be computational “clock” time?


Short training times likely result in poor model



Terminate when SSE reaches threshold level?


Neural Networks are prone to overfitting


Memorizing patterns rather than generalizing



And …

Learning Rate


Recall
Learning Rate

(Greek “eta”) is a constant





Helps adjust weights toward global minimum for SSE


Small Learning Rate


With small learning rate, weight adjustments small


Network takes unacceptable time converging to solution


Large Learning Rate


Suppose algorithm close to optimal solution


With large learning rate, network likely to “overshoot” optimal
solution





Neural Network for IR:


From the work by Wilkinson & Hingston, SIGIR’91

Document

Terms


Query
Terms

Documents


k
a

k
b

k
c

k
a

k
b

k
c

k
1

k
t

d
1

d
j

d
j+1

d
N

Neural Network for IR


Three layers network


Signals propagate across the network


First level of propagation:


Query terms issue the first signals


These signals propagate accross the network to
reach the document nodes


Second level of propagation:


Document nodes might themselves generate new
signals which affect the document term nodes


Document term nodes might respond with new
signals of their own

Quantifying Signal Propagation


Normalize signal strength (MAX = 1)


Query terms emit initial signal equal to 1


Weight associated with an edge from a query term
node ki to a document term node ki:


Wiq

=

wiq








sqrt (

i


wiq )


Weight associated with an edge from a document
term node ki to a document node dj:


Wij

=

wij








sqrt (

i


wij )

2

2

Quantifying Signal Propagation


After the first level of signal propagation, the
activation level of a document node dj is given by:












i

Wiq

Wij

=


i

wiq wij






sqrt (

i


wiq ) * sqrt (

i


wij )



which is exactly the ranking of the Vector model


New signals might be exchanged among document
term nodes and document nodes in a process
analogous to a feedback cycle


A minimum threshold should be enforced to avoid
spurious signal generation

2

2

2

Conclusions


Model provides an interesting formulation of the IR
problem


Model has not been tested extensively


It is not clear the improvements that the model might
provide