Web Data Mining Exploring Hyperlinks, Contents and Usage Data

zoomzurichΤεχνίτη Νοημοσύνη και Ρομποτική

16 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

66 εμφανίσεις

Web Data Mining Exploring Hyperlinks,
Contents and Usage Data

Special topics:

STOCK TREND
PREDICTION

WITH
NEURAL NETWORK TECHNIQUES

Instructor: Yu
-
Chieh

Wu

Date:
2010.11.29

http://140.115.112.118/course/99
-
1MCU
-
Web/index.htm


Outline

1.
Introduction/Research Objective

2.
Stock Trend Prediction

3.
Neural network


4.
Support vector machine

5.
Feature selection

6.
Experiments and Result

7.
Conclusion


Objectives

a) E
valuate the performance of the neural network

techniques on the task of stock trend prediction
.


Multilayer Perceptron (MLP), Radial Basis Function (RBF)
network and Support Vector Machine (SVM)

are evaluated
.


b)

S
tock prediction is formulated

and evaluated
as a 2 class
classification and regression problem.


c) Study pattern rejection technique to improve prediction
performance.

Stock Prediction


Stock prediction is a difficult task due to the nature of the
stock data which is very noisy and time varying
.


The efficient market hypothesis claim that future price of
the stock is not predictable based on publicly available
information.


However theory has been challenged by many studies and
a few researchers have successfully applied
machine
learning approach such as neural network to perform stock
prediction


Is the Market Predictable ?


Efficient Market Hypothesis (EMH) (Fama, 1965)


Stock market is efficient in that the current market prices reflect all
information available to traders, so that future changes cannot be
predicted relying on past prices or publicly available information.


Fama et al. (1988) showed that 25% to 40% of the variance in

the stock returns over the period of three to five years is

predictable from past return



Pesaran and Timmerman (1999) conclude that the UK stock market is

predictable for the past 25 years.


Saad (1998) has successfully employed different neural network models

to predict the trend of various stocks on a short
-
term range


Implementation


In this paper we propose to investigate SVM, MLP and RBF
network for the task of predicting the future trend of the

3
major stock indices


a)
Kuala Lumpur Composite Index (KLCI)


b)
Hongkong

Hangseng

index


c)
Nikkei 225 stock index


using input based on t
echnical

indicators.


This paper approach the problem based on 2 class pattern
classification formulated specifically to assist investor in
making trading decisions



The classifier is asked to recognise investment
opportunities that can give a return of r% or more within
the
next h days
. r=3% h=10 days

System Block Diagram


The classifier is to predict if the trend of the stock index
increment of more than 3% within the next 10 days period
can be achieved.

Increment Achievable ??

Yes / No

Data from
daily
historical
data
converted
into
technical
analysis
indicator

Classifier

Classification
Vs

Forecasting


Forecasting


Predict actual future value


Classification


Assign pattern to different class categories.


Classification class give future trend direction predicted.


Data Used


Kuala

Lumpur

Stock

Index

(KLCI)

for

the

period

of

1992
-
1997
.





Data Used


Hangseng index
(20/4/1992
-
1/9/1997)


Data Used

Nikkei 225 stock index (20/4/1982
-
1/9/1987)


Input to Classifier




TABLE 1: DESCRIPTION OF INPUT TO CLASSIFIER


x
i
i
=1,2,3 ….12 n=15

















DL
N
(t) = sign[q(t)
-
q(t
-
N)] *
ln

(q(t)/q(t
-
N) +1) (1)


q(t) is the index level at day t and DL
N
(t) is the actual input to the classifier

Prediction Formulation








Consider
y
max
(t)

as the maximum upward movement of the stock
index value within the period
t
and
t +

. y(t)

represents the stock
index level at day
t


Prediction Formulation







Classification

T
he

prediction

of

stock

trend

is

formulated

as

a

two

class

classification

problem
.



y
r
(t) > r% >> Class 2

y
r
(t)


r% >> Class 1







Prediction Formulation

Classification


Let

(
x
i

,

y
i

)

1
<i<N

be

a

set

of

N

training

examples,

each

input

example

x
i



R
n

n=
15

being

the

dimension

of

the

input

space,

belongs

to

a

class

labelled

by

y
i




+
1
,
-
1

.





Y
i
=
-
1

Y
i
=+1

Prediction Formulation

Regression


In the regression approach, the target output is
represented by a scalar value y
r

that represents the
predicted maximum excess return within the period


days
ahead.






Neural Network


According to Haykin, S. (1994),
Neural Networks: A
Comprehensive Foundation,

NY: Macmillan, p. 2:


A neural network is a massively parallel distributed
processor that has a natural propensity for storing
experiential knowledge and making it available for use.


Knowledge is acquired by the network through a learning
process

either supervised learning or unsupervised
learning.This paper use supervised learning where the
training pattern and it’s target pattern are presented to
the neural network during the learning process.

Neural Network

Advantages of Neural Networks


The advantages of neural networks are due to its adaptive and

generalization ability.


a)

Neural networks are adaptive methods that can learn without
any prior assumption of the underlying data
.

b)

Neural network, namely the feed forward multilayer
perceptron and radial basis function network have been proven
to be a universal functional approximators.

c)

Neural networks are non
-
linear model with good generalization
ability
.




Neural Network

Taxonomy of Neural Network Architecture



The architecture of the neural network refers to the arrangement
of the connection between neurons, processing element, number
of layers, and the flow of signal in the neural network. There are
mainly two category of neural network architecture:
feed
-
forward and feedback (recurrent) neural networks


Neural Network


Feed
-
forward network
, Multilayer Perceptron

Neural Network


Recurrent network


Multilayer Perceptron (MLP)

O
1

h1

h2

x1

x2

x3

x4

x
n

.

.

.

Hidden Layer

Output Layer

Input Layer

Input

Vector

MLP Structure

F(y)

y

x1

x2

xn

Neuron processing element

w1

w2

wn

F(y)

y

Multilayer Perceptron (MLP)

Training MLP Network


The multilayer perceptron (MLP) network uses the back
propagation learning algorithm to obtain the weight of the
network.


Simple back propagation algorithm use the steepest gradient
descent method to make changes to the weights.


The objective of training is to minimize the training mean square
error E
mse

for all the training patterns.








To speed up training, the faster Levenberg
-
Marquardt Back
propagation Algorithm

is used.


Multilayer Perceptron (MLP)


MLP Network Setup



a)
Number of hidden layers

b)
Number of hidden neuron

c)
Number of input neurons

d)
Activation function

RBF Network


RBF network consist of 3 layer feed forward structure consisting
of an input layer, single hidden layer with locally tuned hidden
units and an output layer as a linear combiner.


RBF Network

RBF Network Training



The
orthogonal least
-
square (OLS)

proposed by Chen, S. et al
(1991) is a learning method that provide a systematic selection of
the centre nodes in order to reduce the size of the RBF network.
The learning task involve finding the appropriate centres and then
the corresponding weight. This method is adopted.



RBF centres

are selected from a set of training data.


The
orthogonal least square (OLS)

method is employed as a
forward regression procedure to select the centres of RBF nodes
from the candidate set. At each step the centre that maximize the
error reduction is selected.


Support Vector Machine


Support Vector Machine is a special neural network
technique based on structural risk minimisation (SRM)
principle
. In SRM both the capacity of the learning
machines is to be minimized together with the training
error.


In empirical risk minimization (ERM) used in conventional
neural network such as the MLP and RBF network, only
training error is minimized.


SVM was first introduced by Vapnik and Chervonenkis in
1995.



Support Vector Machine


SVM demonstrate good generalization performance.



It has sparse representation of solution. The solution to the
problem is only dependent on a subset of training data
points called support vector.



Training of SVM is equivalent to solving a linearly
constrained quadratic programming problem. The solution
is always unique , globally optimal and free from local
minima problem.



Support Vector Machine


Many decision boundaries can separate these two classes


Which one should we choose ?

Class 1

Class 2

Support Vector Machine


Class 1

Class 2

m

In SVM the optimal separating hyperplane is chosen to

maximize the separation margin m and minimize error.

Optimization Problem in SVM


Let {
x
1
, ...,
x
n
} be our data set and let
y
i



{1,
-
1} be the class label of
x
i


The decision boundary should classify all points
correctly



A constrained optimization problem


Support Vector Machine


For non linear boundry , SVM
map the training data into a
higher dimension feature space using a kernel function
K(
x,x
i

) .


In this feature space SVM construct a separating
hyperplane which maximise the margin or distance from
the closest data points to the hyperplane and minimizing
misclassification error at the same time.



Gaussian

radial

basis

kernel

is

used

and

defined

as

follow
.


K(
x
,
x
i
)

=

exp

(
-



||
x
-
x
i
||

2

)




The optimum separating hyperplane (OSH) is represented
by
F(
x
)=sign (


i

y
i

K(
x , x

i

) + b )


The sign give the class label.


Tolerance to Noise


To allow misclassification error


y
i

(
w
.
x
i

+ b)> 1
-



>

0



The following equation is minimized in order to obtain the
optimum hyperplane



||
w
||
2

+ C





is

the

slack

variable

introduced

to

allow

certain

level

of

misclassified

points
.

C

is

the

regularisation

parameter

that

trade

off

between

misclassification

error

and

margin

maximisation
.





For Uneven Class Distribution



||
w
||
2

+ C
+

+ C
-





Different misclassification cost can be applied to data with
different class label.


R
eceiver operating curve
(ROC) can be obtained by
varying C
+
and C
-




Support Vector Regression



In

the

regression

problem

the

desired

output

to

be

predicted

is

real

valued

whereas

in

the

classification

problems

the

desired

output

is

discreet

value

representing

the

class/categories
.




The

output

to

be

predicted

is

the

strength

of

the

trend
.



SVM

approximate

the

regression

function

with

the

following

form
.




Parameter for SVM

a) Classifier


Regularisation constant C


Kernel parameter


b) Regressor


Parameter


for the

-
insensitive loss function


Regularisation constant C


Kernel parameter


Feature Selection



Feature selection is a process whereby a subset of the potential
predictor variables are selected based on a relevance criterion in
order to reduce the input dimension.




Typical feature selection will involve the following steps


Step 1. Search algorithm

Step 2. Evaluation of generated subset


Step 3. Evaluation of generated subset




Step

1
,
2

and

3

are

repeated

until

the

stopping

criterions

are

met

such

as

when

the

minimum

number

of

features

is

included

or

minimum

accepted

prediction

accuracy

achieved
.


Feature Selection

General

Approach

for

Feature

Selection




a)

Wrapper

approach



The wrapper approach makes use of the induction algorithm

to evaluate the relevance of the features.




Relevance measure is based on solving the related problem,

usually the prediction accuracy of the induction algorithm

when the features are used.




b)

Filter

approach



Filter

method

selects

the

feature

subset

independent

of


the

induction

algorithm
.

Features

correlation

is

usually

used
.







Feature Selection

Feature Subset Selection



The feature subset selection (FSS) algorithm can be categorized
into three categories of search algorithms:


a) exponential


b) randomised


c) sequential.




Forward Sequential Selection (FSS)


Backward Sequential Selection

(BSS)


Feature Selection

Sequential selection technique

a)
Forward Sequential Selection (FSS)

b)

Backward Sequential Selection

(BSS)



Both BSS and FSS is used.


Features are selected based on subset


that gives the best predictor performance when BSS and FSS is


used.




Feature Subset Selection

Sequential selection result

Performance Measure


True

Positive

(TP)

is

the

number

of

positive

class

predicted

correctly

as

positive

class
.


False

Positive

(FP)

is

the

number

of

negative

class

predicted

wrongly

as

positive

class
.


False

Negative

(FN)

is

the

number

of

positive

class

predicted

wrongly

as

negative

class
.


True

Negative

(TN)

is

the

number

of

negative

class

predicted

correctly

as

negative

class
.

Performance Measure



Accuracy = TP+TN / (TP+FP+TN+FN)


Precision = TP/(TP+FP)


Recall rate (sensitivity) = TP/(TP+FN)


F1 = 2 * Precision * Recall/(Precision + Recall)

Testing Method

Rolling Window Method is Used to Capture Training and

Test Data

Train

Test

Train =600 data Test= 400 data

Experiment and Result


Experiments

are conducted to
predict the stock
trend of three major stock indexes, KLCI,
Hangseng and Nikkei.



SVM, MLP and RBF network is used in making
trend prediction based on classification and
regression approach.



A

hypothetical

trading

system

is

simulated

to

find

out

the

annualized

profit

generated

based

on

the

given

prediction
.





Experiment and Result


Trading Performance


A hypothetical trading system is used



When

a

positive

prediction

is

made,

one

unit

of

money

was

invested

in

a

portfolio

reflecting

the

stock

index
.

If

the

stock

index

increased

by

more

than

r
%

(r=
3
%
)

within

the

next

h

days

(h=
10
)

at

day

t,

then

the

investment

is

sold

at

the

index

price

of

day

t
.

If

not,

the

investment

is

sold

on

day

t+
1

regardless

of

the

price
.

A

transaction

fee

of

1
%

is

charged

for

every

transaction

made
.


Use

annualised

rate

of

return

.



Trading Performance


Classifier Evaluation Using Hypothetical
Trading System



Trading Performance


Experiment and Result


Classification Result





Experiment and Result



The result shows better performance of neural
network techniques when compared to K nearest
neighbour classifier. SVM shows the overall
better performance on average than MLP and
RBF network in most of the performance metric
used


Experiment and Result

Comparison of Receiver Operating Curve (ROC)



Experiment and Result


Area under Curve (ROC)

Experiment and Result


Error
-
Reject Trade
-
off


Experiment and Result


The
Accuracy
-
Reject (AR) curve

can be plotted to see the accuracy
improvement of the classifier due to various rejection rates. The AR
curve is a plot of the classifier operating points showing the possible
trade
-
off between the accuracy of the classifier versus the rejection rate
implemented.



Accuracy
-
Reject (AR) curve

Accuracy
-
Reject (AR) curve


Compare Regression Performance


The SVM, RBF and MLP network are used as the
predictors.



Compare Regression Performance