Neural Network homework-2

cartcletchAI and Robotics

Oct 19, 2013 (4 years and 24 days ago)

99 views

Neural Network
homework
-
2

Groups:Group 1

Members:


Tang Chia Ping


M9615010


HSIEH HSIN JU M9605103


Payment date: December 28, 96

Title:


Use KDD CUP 2007 (or KDD CUP 2008) derived a
DATASET the best learning algorithm, to be completed
TASK1 or TASK2 final analysis.

1. Introduction


The first task in KDD Cup 2007 is to predict
which users rated which movies in 2006, given
the Netflix Prize training data set that contains
more than 100 million ratings from over 480
thousand users on nearly 18 thousand movie
titles collected between 1998 and 2005. In our
practice, we cast the task as a link prediction
problem and address it via a simple
classification approach
.


1
-
1 The Movies Description



This is in accordance with the annual volume of Movies,
charts drawn from the figure shows that with the annual increase
in the number of films with the increase, especially in 2004
reached its highest point, in 2005, of sudden reduced to five
films Department 100, which we can see the relevance, do only
affect the ratings of the characteristics of factors.



1
-
2 Training Dataset File Description


MovieID1,CustomerID11,Date11,Date11,YearOfReleas11


MovieID1,CustomerID12,Date12,Date12,YearOfReleas12




...


MovieID2,CustomerID21,Date21,Date21,YearOfReleas21


MovieID2,CustomerID22,Date21,Date22,YearOfReleas22


...


MovieIDs range from 1 to 17770 sequentially.




CustomerIDs range from 1 to 2649429, with gaps.


There are 480189 users.




Dates have the format YYYY.





Year Of Release can range from 1890 to 2005 and may
correspond to the release of corresponding DVD, not
necessarily its theaterical release.


1
-
3 The Feature Selection follwing as:

User ID a unique identifier for a user .




Movie Name title of the movie.




User Movie Rating a number between 1 and 5 (1 is
lowest) .




Average Rating by User average rating on all movies
rated by the user

.

Average Popular Movie Rating by User average rating on
all popular movies rated by the user

.


User Ratings number of ratings by the user.


2. Analysis of the results







2
-
1Random Sampling


Fig2. Random distribution

2
-
2 Training Parameters of the Network


Fig3. Leaning rate =
0.05


2
-
3 Network:

Fig4. Network: (1) there are four neurals in the first layer;

(2) there are three neurals in the second layer

2
-
4 Weight to layer


Fig5. Weight to layer 1


Fig6. Weight to layer 2


2
-
5 Bias to layer


Fig7. Bias to layer 1


Fig8. Bias to layer 2


2
-
6 Training with TRAINGDM


Fig9. The Performance is 0.397427

3.discussion

Neural is the most important parameters can be
adjusted, the use of the Internet is looking forward
to show some of the expectations or interested
behaviour.


Neural operation is divided into two: the main
decisions training weights of the network with partial
weight, and in accordance with training simulation to
predict the output value or verified the accuracy of
the network. Do the most simple and widely used for
Surpervised Learing.

The KDD cup in 2007 there are two main tasks: Task 1
-
Who Rated What and Task2
-
HOw Manr Ratings. On these
two tasks, we were racking their brains, because
information is so large, up to 17,700 of the information is
dealing with, we had no choice but to its split into several,
to run in MATLAB, and add a lot of what we believe will be
affected by factors inside.


Use MATLAB to deal with these data, according to
MovieIDs


CustomerIDs

Dates


Year Of Release to
run programs, and design Leaning rate = 0.05, four
neurals in the first layer, three neurals in the second layer,
by The Performance is 0.397427.


For the operation, we are headache, so we Before many
of reference literature to help us do so, the situation will be
detailed in the steps described.


4.References

Saharon Rosset, Claudia Perlich and Yan Liu, "KDD Cup 2007 Task 2
Winner's Report"



George S. Davidson, Brian N. Wylie, Kevin W. Boyack, "Cluster Stability
and the Use of Noise in Interpretation of Clustering“



Eamonn Keogh and Christian Shelton, "Workshop and Challenge on
Time Series Classification"



Yan Liu and Zhenzhen Kou, "Predicting Who Rated What in Large
-
scale
Datasets"



Miklos Kurucz stvan Nagy, Andras A. Benczur I Adrienn Szabo, "Tamas
Kiss Balazs TormaWho Rated What: a combination of SVD, correlation
and frequent sequence mining"



James Malaugh Inductis, Sachin Gangaputra Inductis and Nikhil Rastogi
Inductis, "KDD Cup 2007


How often will that movie be rated? "