Tang Chia Ping
HSIEH HSIN JU M9605103
Payment date: December 28, 96
Use KDD CUP 2007 (or KDD CUP 2008) derived a
DATASET the best learning algorithm, to be completed
TASK1 or TASK2 final analysis.
The first task in KDD Cup 2007 is to predict
which users rated which movies in 2006, given
the Netflix Prize training data set that contains
more than 100 million ratings from over 480
thousand users on nearly 18 thousand movie
titles collected between 1998 and 2005. In our
practice, we cast the task as a link prediction
problem and address it via a simple
1 The Movies Description
This is in accordance with the annual volume of Movies,
charts drawn from the figure shows that with the annual increase
in the number of films with the increase, especially in 2004
reached its highest point, in 2005, of sudden reduced to five
films Department 100, which we can see the relevance, do only
affect the ratings of the characteristics of factors.
2 Training Dataset File Description
MovieIDs range from 1 to 17770 sequentially.
CustomerIDs range from 1 to 2649429, with gaps.
There are 480189 users.
Dates have the format YYYY.
Year Of Release can range from 1890 to 2005 and may
correspond to the release of corresponding DVD, not
necessarily its theaterical release.
3 The Feature Selection follwing as:
User ID a unique identifier for a user .
Movie Name title of the movie.
User Movie Rating a number between 1 and 5 (1 is
Average Rating by User average rating on all movies
rated by the user
Average Popular Movie Rating by User average rating on
all popular movies rated by the user
User Ratings number of ratings by the user.
2. Analysis of the results
Fig2. Random distribution
2 Training Parameters of the Network
Fig3. Leaning rate =
Fig4. Network: (1) there are four neurals in the first layer;
(2) there are three neurals in the second layer
4 Weight to layer
Fig5. Weight to layer 1
Fig6. Weight to layer 2
5 Bias to layer
Fig7. Bias to layer 1
Fig8. Bias to layer 2
6 Training with TRAINGDM
Fig9. The Performance is 0.397427
Neural is the most important parameters can be
adjusted, the use of the Internet is looking forward
to show some of the expectations or interested
Neural operation is divided into two: the main
decisions training weights of the network with partial
weight, and in accordance with training simulation to
predict the output value or verified the accuracy of
the network. Do the most simple and widely used for
The KDD cup in 2007 there are two main tasks: Task 1
Who Rated What and Task2
HOw Manr Ratings. On these
two tasks, we were racking their brains, because
information is so large, up to 17,700 of the information is
dealing with, we had no choice but to its split into several,
to run in MATLAB, and add a lot of what we believe will be
affected by factors inside.
Use MATLAB to deal with these data, according to
Year Of Release to
run programs, and design Leaning rate = 0.05, four
neurals in the first layer, three neurals in the second layer,
by The Performance is 0.397427.
For the operation, we are headache, so we Before many
of reference literature to help us do so, the situation will be
detailed in the steps described.
Saharon Rosset, Claudia Perlich and Yan Liu, "KDD Cup 2007 Task 2
George S. Davidson, Brian N. Wylie, Kevin W. Boyack, "Cluster Stability
and the Use of Noise in Interpretation of Clustering“
Eamonn Keogh and Christian Shelton, "Workshop and Challenge on
Time Series Classification"
Yan Liu and Zhenzhen Kou, "Predicting Who Rated What in Large
Miklos Kurucz stvan Nagy, Andras A. Benczur I Adrienn Szabo, "Tamas
Kiss Balazs TormaWho Rated What: a combination of SVD, correlation
and frequent sequence mining"
James Malaugh Inductis, Sachin Gangaputra Inductis and Nikhil Rastogi
Inductis, "KDD Cup 2007
How often will that movie be rated? "