Information Universe, fall
2011
Project Proposal
2010
-
23381 YeRim Choi
Finding the most relevant topic
to
the
fluctuation of stock
price data
1.
Goal
Find the major interest
s
of people depend
s
on the situation of stock market
relationship
between word count
of
queries
and stock price
2.
Model
①
given each query, extract keywords
②
count the number of each keyword appearance thru a day
③
find the relationship between word count and
stock price
④
select keywords that have high correlation with stock price
3.
Data
:
AOL Search Data
NASDAQ Exchange Daily 1970
-
2010
4.
Data Description
:
AOL Search Data
i.
Overview
The AOL Search Data is a collection of real query log data that is based on
real
users. The data set consists of 20M web queries collected from 650k
users over three months. The data is sorted by anonymous user ID and
sequentially arranged.
ii.
History
In August 2006, AOL released the search data. Within days, the company
realized that thi
s was a mistake, withdrew the data and made a public
apology. Many copies of the data set were made before it was withdrawn,
and it is still available for download on some sites
.
i.
F
ormat
{AnonID, Query, QueryTime, ItemRank, ClickURL}
AnonID
–
an anonymous
user ID number.
Query
–
the query issued by the user, case shifted with most punctuation
removed.
QueryTime
–
the time at which the query was submitted for search.
ItemRank
–
if the user clicked on a search result, the rank of the item on
which they clicke
d is listed.
ClickURL
–
if the user clicked on a search result, the domain portion of the
URL in the clicked result is listed.
NASDAQ Exchange Daily 1970
-
2010
i.
Overview
Historical NASDAQ stock data from 1970
–
2010, including daily open, close,
low, high and trading volume figures. Data is organized alphabetically by
ticker symbol.
ii.
Format
exchange
stock_symbol
date
stock_price_open
stock_price_high
stock_price_low
stock_p
rice_close
stock_volume
stock_price_adj_close
5.
Method
Key word extraction
using Support Vector Machine
Regression between keywords and stock price
6.
Plan
~11/2
~11/16
~11/30
~11/14
K
eywords extraction
N
umber count
R
elationship discovery
K
eyword selection
Documentation
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο