Machine Learning for Stock Selection

unknownlippsAI and Robotics

Oct 16, 2013 (3 years and 10 months ago)

69 views

1

Machine Learning for Stock Selection


Robert J. Yan

Charles X. Ling

University of Western Ontario, Canada

{jyan, cling}@csd.uwo.ca

2

Outline


Introduction


The stock selection task


The Prototype Ranking method


Experimental results


Conclusions

3

Introduction


Objective:


Use machine learning to select a small number
of “good” stocks to form a portfolio


Research questions:


Learning in the noisy dataset


Learning in the imbalanced dataset


Our solution: Prototype Ranking


A specially designed machine learning method

4

Outline


Introduction


The stock selection task


The Prototype Ranking method


Experimental results


Conclusions

5

Stock Selection Task

Given information prior to week
t
, predict
performance of stocks of week
t


Training set

Predictor 1

Predictor 2

Predictor 3

Goal

Stock ID

Return of
week
t
-
1

Return of
week
t
-
2

Volume ratio
of
t
-
2/
t
-
1

Return of
week
t

Learning a ranking function to rank

testing data


Select

n

highest to buy,
n

lowest to short
-
sell

6

Outline


Introduction


The stock selection task


The Prototype Ranking method


Experimental results


Conclusions

7

Prototype Ranking


Prototype Ranking (PR): special machine
learning for noisy and imbalanced stock data



The PR System

Step 1. Find good “prototypes” in training data

Step 2. Use k
-
NN on prototypes to rank test data


8

Step 1: Finding Prototypes

Prototypes: representative points


Goal:
discover the underlying

density/clusters of the training

samples by distributing

prototypes in sample

space


Reduce data size


prototypes

prototype

neighborhood

samples

10

Finding prototypes using
competitive learning


General competitive learning


Step 1: Randomly initialize a set of prototypes


Step 2: Search the nearest prototypes


Step 3: Adjust the prototypes


Step 4: Output the prototypes


Hidden density in training is reflected in prototypes

11

Modifications for Stock data



In step 1: Initial prototypes organized in a tree
-
structure


Fast nearest prototype searching


In step 2: Searching prototypes in the
predictor space


Better learning effect for the prediction tasks


In step 3: Adjusting prototypes in the
goal attribute space


Better learning effect in the imbalanced stock data


In step 4, prune the prototype tree


Prune children prototypes if they are similar to the parent


Combine leaf prototypes to form the final prototypes


12

Step 2: Predicting Test Data


The weighted average of
k

nearest prototypes


Online update the model with new data


13

Outline


Introduction


The stock selection task


The Prototype Ranking method


Experimental results


Conclusions

14

Data

CRSP daily stock database


300 NYSE and AMEX stocks, largest market cap


From 1962 to 2004

15

Testing PR


Experiment 1: Larger portfolio, lower average
return, lower risk


diversification


Experiment 2: is PR better than Cooper’s
method?

16

Results of Experiment 1

Average
Return

(1978
-
2004)

Risk (std)

(1978
-
2004)

17

Experiment 2: Comparison to
Cooper’s method


Cooper’s method (CP): A traditional non
-
ML method for stock selection…


Compare PR and CP in 10
-
stock portfolios

18

Results of Experiment 2

Measures:


Average Return (Ret.)


Sharpe Ratio (SR): a risk
-
adjusted return: SR= Ret. / Std.

20

Outline


Introduction


The stock selection task


The Prototype Ranking method


Experimental results


Conclusions

21

Conclusions


PR: modified competitive learning and k
-
NN
for noisy and imbalanced stock data


PR does well in stock selection


Larger portfolio, lower return, lower risk


PR outperforms the non
-
ML method CP


Future work: use it to invest and make money!