Finding the most relevant topic to the fluctuation of stock price data

munchsistersAI and Robotics

Oct 17, 2013 (3 years and 9 months ago)

65 views

Information Universe, fall
2011

Project Proposal

2010
-
23381 YeRim Choi


Finding the most relevant topic

to
the
fluctuation of stock
price data


1.

Goal



Find the major interest
s

of people depend
s

on the situation of stock market



relationship

between word count

of
queries

and stock price


2.

Model



given each query, extract keywords



count the number of each keyword appearance thru a day



find the relationship between word count and
stock price



select keywords that have high correlation with stock price


3.

Data
:



AOL Search Data



NASDAQ Exchange Daily 1970
-
2010


4.

Data Description
:



AOL Search Data

i.

Overview



The AOL Search Data is a collection of real query log data that is based on
real
users. The data set consists of 20M web queries collected from 650k
users over three months. The data is sorted by anonymous user ID and
sequentially arranged.

ii.

History



In August 2006, AOL released the search data. Within days, the company
realized that thi
s was a mistake, withdrew the data and made a public
apology. Many copies of the data set were made before it was withdrawn,
and it is still available for download on some sites
.

i.

F
ormat

{AnonID, Query, QueryTime, ItemRank, ClickURL}



AnonID


an anonymous
user ID number.



Query


the query issued by the user, case shifted with most punctuation
removed.



QueryTime


the time at which the query was submitted for search.



ItemRank


if the user clicked on a search result, the rank of the item on
which they clicke
d is listed.



ClickURL


if the user clicked on a search result, the domain portion of the
URL in the clicked result is listed.




NASDAQ Exchange Daily 1970
-
2010

i.

Overview



Historical NASDAQ stock data from 1970


2010, including daily open, close,
low, high and trading volume figures. Data is organized alphabetically by
ticker symbol.

ii.

Format



exchange



stock_symbol



date



stock_price_open



stock_price_high



stock_price_low



stock_p
rice_close



stock_volume



stock_price_adj_close


5.

Method



Key word extraction

using Support Vector Machine



Regression between keywords and stock price


6.

Plan



~11/2

~11/16

~11/30

~11/14

K
eywords extraction









N
umber count









R
elationship discovery









K
eyword selection









Documentation