Web Data Mining Exploring Hyperlinks,
Contents and Usage Data
Special topics:
STOCK TREND
PREDICTION
WITH
NEURAL NETWORK TECHNIQUES
Instructor: Yu

Chieh
Wu
Date:
2010.11.29
http://140.115.112.118/course/99

1MCU

Web/index.htm
Outline
1.
Introduction/Research Objective
2.
Stock Trend Prediction
3.
Neural network
4.
Support vector machine
5.
Feature selection
6.
Experiments and Result
7.
Conclusion
Objectives
a) E
valuate the performance of the neural network
techniques on the task of stock trend prediction
.
Multilayer Perceptron (MLP), Radial Basis Function (RBF)
network and Support Vector Machine (SVM)
are evaluated
.
b)
S
tock prediction is formulated
and evaluated
as a 2 class
classification and regression problem.
c) Study pattern rejection technique to improve prediction
performance.
Stock Prediction
Stock prediction is a difficult task due to the nature of the
stock data which is very noisy and time varying
.
The efficient market hypothesis claim that future price of
the stock is not predictable based on publicly available
information.
However theory has been challenged by many studies and
a few researchers have successfully applied
machine
learning approach such as neural network to perform stock
prediction
Is the Market Predictable ?
Efficient Market Hypothesis (EMH) (Fama, 1965)
Stock market is efficient in that the current market prices reflect all
information available to traders, so that future changes cannot be
predicted relying on past prices or publicly available information.
Fama et al. (1988) showed that 25% to 40% of the variance in
the stock returns over the period of three to five years is
predictable from past return
Pesaran and Timmerman (1999) conclude that the UK stock market is
predictable for the past 25 years.
Saad (1998) has successfully employed different neural network models
to predict the trend of various stocks on a short

term range
Implementation
In this paper we propose to investigate SVM, MLP and RBF
network for the task of predicting the future trend of the
3
major stock indices
a)
Kuala Lumpur Composite Index (KLCI)
b)
Hongkong
Hangseng
index
c)
Nikkei 225 stock index
using input based on t
echnical
indicators.
This paper approach the problem based on 2 class pattern
classification formulated specifically to assist investor in
making trading decisions
The classifier is asked to recognise investment
opportunities that can give a return of r% or more within
the
next h days
. r=3% h=10 days
System Block Diagram
The classifier is to predict if the trend of the stock index
increment of more than 3% within the next 10 days period
can be achieved.
Increment Achievable ??
Yes / No
Data from
daily
historical
data
converted
into
technical
analysis
indicator
Classifier
Classification
Vs
Forecasting
Forecasting
◦
Predict actual future value
Classification
◦
Assign pattern to different class categories.
◦
Classification class give future trend direction predicted.
Data Used
Kuala
Lumpur
Stock
Index
(KLCI)
for
the
period
of
1992

1997
.
Data Used
Hangseng index
(20/4/1992

1/9/1997)
Data Used
Nikkei 225 stock index (20/4/1982

1/9/1987)
Input to Classifier
TABLE 1: DESCRIPTION OF INPUT TO CLASSIFIER
x
i
i
=1,2,3 ….12 n=15
DL
N
(t) = sign[q(t)

q(t

N)] *
ln
(q(t)/q(t

N) +1) (1)
q(t) is the index level at day t and DL
N
(t) is the actual input to the classifier
Prediction Formulation
Consider
y
max
(t)
as the maximum upward movement of the stock
index value within the period
t
and
t +
. y(t)
represents the stock
index level at day
t
Prediction Formulation
Classification
T
he
prediction
of
stock
trend
is
formulated
as
a
two
class
classification
problem
.
y
r
(t) > r% >> Class 2
y
r
(t)
r% >> Class 1
Prediction Formulation
Classification
Let
(
x
i
,
y
i
)
1
<i<N
be
a
set
of
N
training
examples,
each
input
example
x
i
R
n
n=
15
being
the
dimension
of
the
input
space,
belongs
to
a
class
labelled
by
y
i
+
1
,

1
.
Y
i
=

1
Y
i
=+1
Prediction Formulation
Regression
In the regression approach, the target output is
represented by a scalar value y
r
that represents the
predicted maximum excess return within the period
days
ahead.
Neural Network
According to Haykin, S. (1994),
Neural Networks: A
Comprehensive Foundation,
NY: Macmillan, p. 2:
◦
A neural network is a massively parallel distributed
processor that has a natural propensity for storing
experiential knowledge and making it available for use.
◦
Knowledge is acquired by the network through a learning
process
either supervised learning or unsupervised
learning.This paper use supervised learning where the
training pattern and it’s target pattern are presented to
the neural network during the learning process.
Neural Network
Advantages of Neural Networks
The advantages of neural networks are due to its adaptive and
generalization ability.
a)
Neural networks are adaptive methods that can learn without
any prior assumption of the underlying data
.
b)
Neural network, namely the feed forward multilayer
perceptron and radial basis function network have been proven
to be a universal functional approximators.
c)
Neural networks are non

linear model with good generalization
ability
.
Neural Network
Taxonomy of Neural Network Architecture
The architecture of the neural network refers to the arrangement
of the connection between neurons, processing element, number
of layers, and the flow of signal in the neural network. There are
mainly two category of neural network architecture:
feed

forward and feedback (recurrent) neural networks
Neural Network
Feed

forward network
, Multilayer Perceptron
Neural Network
Recurrent network
Multilayer Perceptron (MLP)
O
1
h1
h2
x1
x2
x3
x4
x
n
.
.
.
Hidden Layer
Output Layer
Input Layer
Input
Vector
MLP Structure
F(y)
y
x1
x2
xn
Neuron processing element
w1
w2
wn
F(y)
y
Multilayer Perceptron (MLP)
Training MLP Network
The multilayer perceptron (MLP) network uses the back
propagation learning algorithm to obtain the weight of the
network.
Simple back propagation algorithm use the steepest gradient
descent method to make changes to the weights.
The objective of training is to minimize the training mean square
error E
mse
for all the training patterns.
To speed up training, the faster Levenberg

Marquardt Back
propagation Algorithm
is used.
Multilayer Perceptron (MLP)
MLP Network Setup
a)
Number of hidden layers
b)
Number of hidden neuron
c)
Number of input neurons
d)
Activation function
RBF Network
RBF network consist of 3 layer feed forward structure consisting
of an input layer, single hidden layer with locally tuned hidden
units and an output layer as a linear combiner.
RBF Network
RBF Network Training
The
orthogonal least

square (OLS)
proposed by Chen, S. et al
(1991) is a learning method that provide a systematic selection of
the centre nodes in order to reduce the size of the RBF network.
The learning task involve finding the appropriate centres and then
the corresponding weight. This method is adopted.
RBF centres
are selected from a set of training data.
The
orthogonal least square (OLS)
method is employed as a
forward regression procedure to select the centres of RBF nodes
from the candidate set. At each step the centre that maximize the
error reduction is selected.
Support Vector Machine
Support Vector Machine is a special neural network
technique based on structural risk minimisation (SRM)
principle
. In SRM both the capacity of the learning
machines is to be minimized together with the training
error.
In empirical risk minimization (ERM) used in conventional
neural network such as the MLP and RBF network, only
training error is minimized.
SVM was first introduced by Vapnik and Chervonenkis in
1995.
Support Vector Machine
SVM demonstrate good generalization performance.
It has sparse representation of solution. The solution to the
problem is only dependent on a subset of training data
points called support vector.
Training of SVM is equivalent to solving a linearly
constrained quadratic programming problem. The solution
is always unique , globally optimal and free from local
minima problem.
Support Vector Machine
Many decision boundaries can separate these two classes
Which one should we choose ?
Class 1
Class 2
Support Vector Machine
Class 1
Class 2
m
In SVM the optimal separating hyperplane is chosen to
maximize the separation margin m and minimize error.
Optimization Problem in SVM
Let {
x
1
, ...,
x
n
} be our data set and let
y
i
{1,

1} be the class label of
x
i
The decision boundary should classify all points
correctly
A constrained optimization problem
Support Vector Machine
•
For non linear boundry , SVM
map the training data into a
higher dimension feature space using a kernel function
K(
x,x
i
) .
•
In this feature space SVM construct a separating
hyperplane which maximise the margin or distance from
the closest data points to the hyperplane and minimizing
misclassification error at the same time.
•
Gaussian
radial
basis
kernel
is
used
and
defined
as
follow
.
K(
x
,
x
i
)
=
exp
(


x

x
i

2
)
The optimum separating hyperplane (OSH) is represented
by
F(
x
)=sign (
i
y
i
K(
x , x
i
) + b )
The sign give the class label.
Tolerance to Noise
To allow misclassification error
y
i
(
w
.
x
i
+ b)> 1

>
0
The following equation is minimized in order to obtain the
optimum hyperplane

w

2
+ C
is
the
slack
variable
introduced
to
allow
certain
level
of
misclassified
points
.
C
is
the
regularisation
parameter
that
trade
off
between
misclassification
error
and
margin
maximisation
.
For Uneven Class Distribution

w

2
+ C
+
+ C

Different misclassification cost can be applied to data with
different class label.
R
eceiver operating curve
(ROC) can be obtained by
varying C
+
and C

Support Vector Regression
In
the
regression
problem
the
desired
output
to
be
predicted
is
real
valued
whereas
in
the
classification
problems
the
desired
output
is
discreet
value
representing
the
class/categories
.
The
output
to
be
predicted
is
the
strength
of
the
trend
.
SVM
approximate
the
regression
function
with
the
following
form
.
Parameter for SVM
a) Classifier
Regularisation constant C
Kernel parameter
b) Regressor
Parameter
for the

insensitive loss function
Regularisation constant C
Kernel parameter
Feature Selection
Feature selection is a process whereby a subset of the potential
predictor variables are selected based on a relevance criterion in
order to reduce the input dimension.
Typical feature selection will involve the following steps
Step 1. Search algorithm
Step 2. Evaluation of generated subset
Step 3. Evaluation of generated subset
Step
1
,
2
and
3
are
repeated
until
the
stopping
criterions
are
met
such
as
when
the
minimum
number
of
features
is
included
or
minimum
accepted
prediction
accuracy
achieved
.
Feature Selection
General
Approach
for
Feature
Selection
a)
Wrapper
approach
The wrapper approach makes use of the induction algorithm
to evaluate the relevance of the features.
Relevance measure is based on solving the related problem,
usually the prediction accuracy of the induction algorithm
when the features are used.
b)
Filter
approach
Filter
method
selects
the
feature
subset
independent
of
the
induction
algorithm
.
Features
correlation
is
usually
used
.
Feature Selection
Feature Subset Selection
The feature subset selection (FSS) algorithm can be categorized
into three categories of search algorithms:
a) exponential
b) randomised
c) sequential.
Forward Sequential Selection (FSS)
Backward Sequential Selection
(BSS)
Feature Selection
Sequential selection technique
a)
Forward Sequential Selection (FSS)
b)
Backward Sequential Selection
(BSS)
Both BSS and FSS is used.
Features are selected based on subset
that gives the best predictor performance when BSS and FSS is
used.
Feature Subset Selection
Sequential selection result
Performance Measure
True
Positive
(TP)
is
the
number
of
positive
class
predicted
correctly
as
positive
class
.
False
Positive
(FP)
is
the
number
of
negative
class
predicted
wrongly
as
positive
class
.
False
Negative
(FN)
is
the
number
of
positive
class
predicted
wrongly
as
negative
class
.
True
Negative
(TN)
is
the
number
of
negative
class
predicted
correctly
as
negative
class
.
Performance Measure
Accuracy = TP+TN / (TP+FP+TN+FN)
Precision = TP/(TP+FP)
Recall rate (sensitivity) = TP/(TP+FN)
F1 = 2 * Precision * Recall/(Precision + Recall)
Testing Method
Rolling Window Method is Used to Capture Training and
Test Data
Train
Test
Train =600 data Test= 400 data
Experiment and Result
Experiments
are conducted to
predict the stock
trend of three major stock indexes, KLCI,
Hangseng and Nikkei.
SVM, MLP and RBF network is used in making
trend prediction based on classification and
regression approach.
A
hypothetical
trading
system
is
simulated
to
find
out
the
annualized
profit
generated
based
on
the
given
prediction
.
Experiment and Result
Trading Performance
A hypothetical trading system is used
When
a
positive
prediction
is
made,
one
unit
of
money
was
invested
in
a
portfolio
reflecting
the
stock
index
.
If
the
stock
index
increased
by
more
than
r
%
(r=
3
%
)
within
the
next
h
days
(h=
10
)
at
day
t,
then
the
investment
is
sold
at
the
index
price
of
day
t
.
If
not,
the
investment
is
sold
on
day
t+
1
regardless
of
the
price
.
A
transaction
fee
of
1
%
is
charged
for
every
transaction
made
.
Use
annualised
rate
of
return
.
Trading Performance
Classifier Evaluation Using Hypothetical
Trading System
Trading Performance
Experiment and Result
Classification Result
Experiment and Result
The result shows better performance of neural
network techniques when compared to K nearest
neighbour classifier. SVM shows the overall
better performance on average than MLP and
RBF network in most of the performance metric
used
Experiment and Result
Comparison of Receiver Operating Curve (ROC)
Experiment and Result
Area under Curve (ROC)
Experiment and Result
Error

Reject Trade

off
Experiment and Result
The
Accuracy

Reject (AR) curve
can be plotted to see the accuracy
improvement of the classifier due to various rejection rates. The AR
curve is a plot of the classifier operating points showing the possible
trade

off between the accuracy of the classifier versus the rejection rate
implemented.
Accuracy

Reject (AR) curve
Accuracy

Reject (AR) curve
Compare Regression Performance
The SVM, RBF and MLP network are used as the
predictors.
Compare Regression Performance
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Comments 0
Log in to post a comment