Latent Factor Models for Collaborative Filtering

piloturuguayanAI and Robotics

Oct 15, 2013 (3 years and 9 months ago)

121 views

[
1
]


Latent Factor Models for Collaborative Filtering


Nate Black

Joey Civin


Matthew Robertson


ntblack@calpoly.edu



jcivin@calpoly.edu


mlrobert
@calpoly.edu


Abstract

The recently concluded Netflix Prize competition
brought a tremendous amount of attention

[
1
] to the
problem of building
recommender systems that can
return maximally accurate results after being trained
on sparse datasets of unprecedented size

[
2
]. While
some new algorithms were at the forefront of the
solutions that came out on top, most of the accuracy
improvements made t
hroughout the competition were
brought about by combining finely tuned
implementations of existing models that identified
different patterns in the dataset. Our objective is to
improve the accuracy of a blend of two existing
methods for collaborative filte
ring (CF) that discover
and exploit latent factors: the statistical approach of
singular value decomposition (SVD) and the artificial
intelligence/neural network approach of restri
cted
Boltzmann machines (RBMs).

1

Introduction


In this paper, we will discuss

the basics of different
methods used for data extraction and specifically the
three algorithms implemented for this project:
Singular Value Decomposition (SVD), Restricted
Boltzmann Machines (RBM), and k
-
Nearest
Neighbors (kNN).


1.1

Motivation

In 2006
,

Netfl
ix released
a
data set (outlined in later
sections)
. This data set became one of the
largest

data sets

publicly released for
the purposes of
data
mining
.


This project set out by Netflix was ultimately a
competition known as the Netflix Prize. A $1 million
reward was offered for
improv
ing

upon

the

existing

Netflix

system
,

"Cinematch"
,

by 10% using
Root
Mean Square Error (
RMSE
). E
ven

a

10%
improvement on th
e Cinematch system may
dramatically improve the quality of the "top" movies
recommended to a user, which benefits both users
and Netflix's bottom line in the form of satisfied
subscribers
.

The prize was not the goal for our project since the
competition ha
s already closed. We
instead
wanted
to look at several different approaches used in many
solutions on the leaderboard of the Netflix Prize.
Each of the algorithms chosen has a very different
approach to the data. The ultimate goal
was

to blend
SVD, RBM,

and kNN to improve upon the individual
results of each algorithm.


1.2

Final Implementation

The final implementation makes use of RBM and
kNN. SVD did not make it into the final evaluation
system due to limitations in the java libraries that we
picked. The
SVD implementation is described in
more detail in section 4.3.3 and the limitations found
when running the c
ode are described in section 6.2
.


2

Background

Since the Netflix prize is a competition,
Netflix kept
data on the users that registered. According
to
Netflix [1], the user base spanned 186 countries and
51
,
051 contestants. Each user or team of users used
different algorithms for their submissions and the
purpose of this project was to see the benefit of
blending several techniques together.

2.1

k Neares
t Neighbors

k Nearest Neighbors is a very simplistic model. kNN
attempts to make clusters from the data set by finding
a distance from each object

to the next.

The clusters
can then be used for comparing new inputs to see
where the new input values should fall.


kNN differs from the other two algorithms mainly
due to the fact that kNN is more of a pattern
recognition algorithm
. During computation, a
distance i
s taken from each point to every other point
in the system. Every additional point added to the
system potentially increases the run
-
time of the
algorithm exponentially since the distance
calculations will be run an additional n
-
1 times. In
this sense, k
NN is a slow running algorithm, but
extracts much different information about the data
than the next two described algorithms.

[
2
]


2.2

Restricted Boltzmann Machines

Where SVD

take
s

a statistical or mathematical
approach to the decomposition of the original data se
t
into feature vectors, restricted Boltzmann machines
attack the same problem from the perspective of
artificial intelligence. RBMs, so named because the
original formulation of the Boltzmann machine has
unrestricted connectivity (generally making
converge
nce intractable), improve on their
predecessor by representing the neural network as a
two
-
layer bipartite graph. Each of the units
comprising the visible layer of the RBM represents a
single input to the machine, or a single observation
from the training
set; the units in the hidden layer
represent the features we want to discover.


In our particular case, learning of the model is
performed by instantiating a unique hidden layer for
every user using global weights between the visible
and hidden units. We u
se softmax units on the visible
layer to represent the five possible values each rating
can take. Weights and unit biases are learned
efficiently over many epochs with the assistance of a
technique known as contrastive divergence first
detailed by research
ers at the University of Toronto
in 2006 and outlined later in this paper.

2.3

Singular Value Decomposition

Singular Value Decomposition is a mathematical
model in linear algebra
that decomposes a matrix into
three factor matrices. SVD is a much more complex
approach than kNN; however, once the original
matrix has been decomposed, operations on the
matrix are rather quick.


SVD is an extension of Latent Semantic Analysis
(LSA). L
SA is a method of analyzing a group of
terms and documents to find relationshi
ps between
the two [7].


SVD takes as input a matrix of size
m

x
n
. This
matrix is decomposed into three different matrices:
U∑V
T
.
The output of these three matrices is a
relationship to the original matrix.


U and V
T

are both orthogonal matrices and ∑ is a
diagonal matrix.

These three matrices are further
identified as:



U

is the right singular vectors

o

Sized:
m

x
r



V
T

is the left singular vectors

o

Sized:
r

x
r



∑ is the singular values

o

Sized:
n

x
r


For SVD, “
r
” i
s considered to be the rank of
the
matrix
, which is the
minimum of the original matrix
dimensions.
In general, all matrices must be full
rank, meaning
r

is equal to either
m

or
n
. In this case,
we can exactly reconstruct the original matrix given
the three decomposed matrices.


SVD has an interesting property that allows for less
than full rank of the matrices to approximate the
original matrix. For th
e p
urposes of this project and
LSA
, we don’t want the original data back (perfect
reconstruction of the original matrix), but rather we
want underlying relationships in the movie data.


Figure 1 below shows the breakdown of matrix
reduction. Instead of all t
hree matrices having a full
rank of
r
, we can reduce all three matrices based on a
common factor
k
. This is called rank reduction. The
arrows in the figure show which
direction each
matrix reduces. Since ∑
is ordered from largest value
in the first cell
, to smallest value in the last cell, rank
reduction on the matrices will remove those
components that are contributing the least to the
overall model.


Once the matrices are reduced, we recompose the
new matrix A
k
, which then gives us information
about th
e underlying relations of the cells in matrix
A, instead of the original data back.


Figure 1: SVD Matrix reduction
k

[
6].

3

Problem

Given

100

million

existing ratings, predict future

ratings.

This may seem like a simple problem
; however, there
are many
different solutions, each containing
different algorithms.

3.1

C
hallenges

The
sheer amount of data

and

sparseness of

the

data
set render
s

many strategies impractical.

For example,
k

k




k

k

V
T

V
k
T

k

U
k

U

A/A
k

[
3
]


a

naive implementation of SVD or kNN
will be

brought down by this problem
.


Accuracy of the output is officially the only metric
used in consideration of the Netflix Prize; however,
methods with faster running times can be tuned faster
and are more likely to be of use in real
-
world
applications
.

For example, some kNN methods may
take days to run on a data set as large as the one
given from Netflix. While manipulating data, if
careful consideration is taken as to the structure of
the data, the algorithm can be tweaked appropriately
to significantly increase the running speed of th
e
algorithm. In simplest terms, a faster running
algorithm allows for quicker results and potentially
more data analysis in the same amount of time.


For the Netflix problem, we are working with a rather
large data set and in order for the algorithms to r
un
efficiently,
we need to store data on disk and access it
efficiently without sacrificing accuracy
.


4

Proposed Solution


4.1

Solution Breakdown

To solve this problem
,

we
plan on implementing a
few different algorithms and then combining them to
increase the accuracy of the results. We will
implement a version of
Singular Value
Decomposition (
SVD
)
,
Restricted Boltzmann
Machines (
RBM
), and to a lesser
extent
,

k Nearest
Neig
hbors (
kNN
)
. The
proposed
final solution will
focus less on kNN as kNN is the simplest of the three
algorithms and generally one of the more popular
algorithms

to blend with latent factor models as it
steers in a different direction for pattern recognitio
n.


4.2

Backend

In order to implement the algorithms, we must have a
working backend to store the data. For t
he purposes
of this problem, NEUS
tore will be used to store the
data set.
NEUS
tore

is a Java package that allows for
the storage of data in paginated

and buffered index
structures.


NEUStore
was
implemented by the
Databases Lab at Northeastern University

[8]
.


At the time of writing this paper,
the original website
for Neustore is no more, but an archive website has
been setup for NEUStore

with accompanying
information about implementation and structure as
seen at [8].




4.3

Algorithm
/Technique

Implementation


4.3.1

S
imple metrics

Simple metrics is a way of c
aptur
ing

broad patterns
in the data (average ratings, etc.) to allow for
normalization and rudimentary prediction
.


The entire purpose of simple metrics isn't to get an
incredibly accurate prediction, but rather to get a
broad picture of the data set through patte
rn
recognition. This could include, but is not limited to,
average rating patterns, date patterns, and userid
rating swing patterns. Rating swing patterns refers to
the overall validity of the userid ratings of a
particular userid. In this sense, we can

look at a
particular userid and see if that user rated all movies
only a 1 or 5, which potentially throws off the data
set, or if the ratings vary roughly equally from 1 to 5.


Simple metrics doesn’t follow a set path since it
could include so many differ
ent ways of analysis. As
it
applies

to the project
, simple metrics

is any
analysis performed on the data that does no
t involve
one of the three

algorithms described above

in
section 2
.


4.3.2

k
-
Nearest Neighbors

The intent of k
-
Nearest Neighbors is to c
apture
direct
relationships between the ratings provided for pairs
of movies and pairs of users
.

As describe
d in section
2.1
, kNN attempts to cluster

objects together
.


When visualized in 3
-
D space, each movie contains a
distance to every other movie in the data

set. For our
implementation, kNN clusters movies together based
upon the distances from one movie to every other
movie in the data set.

A “distance table” is created
based upon these distances and written to disk for
further analysis. Writing out the s
tructures to disk
saves us immense amounts of time that don’t need to
be wasted recalculating the distances between
movies, which never change as the training data set is
static.


The first concern for kNN was being able to manage
memory without overloadin
g the system during the
kNN distance computations

and well as not spending
an extreme amount of time running the algorithm
. In
the first code pass,
the full matrix of distances was
computed, but took very long to run. The second
rendition used
only a hal
f matrix
of the distances
to
compute

at a time. Using only half of the distances
could potentially be faster and also save on memory.
For efficiency, a hash table was used to sort the
distances. In the end, it turned out to be faster to
[
4
]


simply use the f
ull matrix due to less object
instantiation with the full matrix vs. the half matrix
approach.
The full matrix solution does require a
significant amount of computations more than the
half matrix; however, only one row needs to be
stored in memory

at a ti
me and memory can be freed
as appropriate

by writing the distance table to disk
.


One of the main goals with kNN was to get it to not
only run correctly, but to run efficiently. For the sake
of efficiency, running the algorithm to compute the
distances
every time we want to predict a rating
would be painfully slow and horribly inefficient.
Instead, since the distances don’t change once we
have fed the algorithm the training set, the resulting
distance tables from the first algorithm run on the
training
set are stored to disk. The prediction system
can then use these stored values for immensely faster
predictions.


When running the prediction system, the full matrix
needs to be reloaded; however, since the distance
tables have been written to disk, we sa
ve time by not
having to recalculate the unchanged distances.


4.3.3

Singular Value Decomposition

In order to save time for our final implementation, we
opted to use

existing libraries to

perform analysis
based upon the SVD model.

The first rendition of SVD was
to use the JAMA
package

from NIST [9]. The major problem with this
implementation was the package was not a complete
environment for matrices in linear algebra. As a
result, JAMA does not have any extra provisions for
specialized matrices, such as a spar
se matrix.
Considering our data set contains very sparse data,
this implementation would not suit for the final
evaluation.

The chosen package for use in the final evaluation
system was

Colt
. C
olt

is a java librar
y maintained by
CERN, the

European Organi
zation for Nuclear
Research

[10].

The Colt

libraries provide easy access to data
structures to organize the movieID and userID items.
Each movieID/userID pair is given a cell in a
DoubleMatrix2D structure. The DoubleMatrix2D
structure can then be pass
ed
to the Colt

SVD
implementation.

So we wouldn’t have to rerun the SVD code
for each
input movie and use pair in the final evaluations,

once the decomposition portion has finished, the data
str
ucture is written to disk. Colt

also provides
function calls that return each of the individual
decomposed matrices. With this information, we
could reconstruct the original matrix, but as stated in
section 2.3, we don’t want the original matrix back.
What we can do, however, is reduc
e the rank of each
individual matrix with additional function calls
provide
d by Colt
. With the new reduced matrices, all
we have to do is recompose (or rather multiply them
back together) to get the matrix we need for movie
comparison. This new matrix ca
n then be used for
the final evaluation and RMSE calculations.

As described earlier, we wanted to save time for our
final implementation of SVD.
After choosing C
olt

as
the mechanism for implementing SVD, we used a
Carleton College SVD implementation as a
reference
and ported the code to work with our NEUStore
database [12].

4.3.4

Restricted Boltzmann Machines

We began this project hoping to use a public
implementation of the restricted Boltzmann machine
to assist with our ultimate objective of creating an
optima
lly accurate RBM/SVD blend. We eventually
wound up with such an implementation but the road
there was quite treacherous.


Our restricted Boltzmann machine was trained using
the algorithm outlined by researchers at the
University of Toronto in Section 2.2 o
f their paper on
adapting RBMs for collaborative filtering

[
14
].
Specifically, we avoid the inefficiencies of maximum
likelihood learning by following the gradient of
contrastive divergence, following some number of
steps of the alternating Gibbs sampler w
hich
gradually increases with the number of prior epochs.
The first reconstruction of a user's hidden layer from
the real visible inputs for that individual user is
treated as the positive learning step, at which each
active pair of softmax value biases an
d hidden value
biases as well as the weight connecting those units
will be increased. All further reconstructions
comprise the negative learning phase, over which the
biases of active pairs and their connecting weights
will be decreased. Reconstructions of

the visible
layer of softmax units are made to determine the
RBM's predictions from the probe and training sets.
Due to time limitations, we could not test the effects
of conditionalizing the RBM or factorizing the
weights matrix, extensions to respective
ly improve
the accuracy of the machine by exploiting hidden
data and decrease the running time and number of
free parameters of the machine. These extensions are
also outlined in more detail in the aforementioned
[
5
]


paper from the University of Toronto.


4.4

Trai
ning

Each of the algorithms requires a starting point to
predict
. For the Netflix Prize, a training set is given
from Netflix.
NEUS
tore

(see section 4.2) takes the
training data set and creates a
n index file. Each of
the algorithms takes the index file
containing
movieID’s and userID’s

and performs preliminary
work. The individual procedures are described above
in section 4.3.

Section

5.1

contains a detailed description of the
training set including

how the individual structures
are stored in NEUStore.

4.5

Evaluation

Since the Netflix Prize has completed, the full
qualifying set has been published, which benefits our
project as we do not have to rely on the Netflix rating
system that simply delivers a final RMSE value. We
can further validate our
implementation by seeing
where each algorithm falls short.


Final evaluation c
ompa
r
e
s

the
final values to the
actual ratings of the movies in the qualifying set
.

Calculation of RMSE for the predicted values over
the qualifying set gives a measure of accur
acy of our
system vs. the original accuracy of Cinematch.


Additional interestin
g statistics may also be output.
This part is not included in the official project, but
rather as an additional portion of evaluation if time
permits.


4.6

Timetab
l
e



Proposal
creation



W
eek

5 Tuesday



Create the backend (structure for storing the
training set and indexes)


W
eek

5 Friday



Create the infrastructure for running algorithms,
combining their predictions and analyzing their
results


W
eek

6 Wednesday



Implement existin
g algorithms and/or integrate
existing implementations of algorithms


W
eek

7
Friday



Explore improvements in line with the objectives
outlined in t
he abstract and introduction


W
eek

10
Wednesday



Create final document


W
eek

11

Monda
y


Figure
2
: Gantt Chart of project proposal

The above Gantt
chart

was an approximation for
project completion.

4.7

Code Repository

To make coding between three members viable, we
needed to have a solution that makes use of some sort
of code version control. For the purposes of our
project, Git turned out to be simplistic and allowed
usage of Github
.


All of the code used in the project, including each
individual algorithm implementation and the
NEUStore backend
,

can be found online a
t [13].


5

Data Sets

All of the full data sets are now freely available
online [3].

Each data set is described below as well
as our implementation for storing the data sets.

Section 4.2 lightly details the backend for storing the
text files in our NEUStore
database. The training set
is stored using NEUStore.


5.1

Training Set

The training set is the set of data on which algorithms
are to be trained. This data set consists of over 100
million rating quadruplets
.


The training set is initially input as 17770 flat text
files, one for each movie. Each text file contains four
different pieces of data:
user_id, movie_id,
rating_date,

and

rating
.


The user_id is a unique ID for each user from the
selected data set given

by Netflix. The user_id’s start
at 1 and end at 480,000, for a total of 480,000 unique
users. For our implementation, each user_id is
approximately 3 bytes.


The movie_id is a unique ID for each movie from the
selected data set given by Netflix. The mo
vie_id’s
start at 1

and end at 17,770, for a total of 17,770
[
6
]


unique movies. For our implementation, each
movie_id is approximately 2 bytes.


The rating_date corresponds to the date a specific
rating was created. Each rating_date is stored in
MM/DD/YYYY f
ormat. This value is not
necessarily needed for our three algorithms, although
it could be very appropriate for simple metrics.

Since
rating_date values were not needed for the three main
algorithms, these values were ultimately removed
from the NEUStore

database. The values can still be
accessed from the training set text file if needed.


The rating is a value given by a user_id for a specific
movie_id. The rating value is an integer between 1
and 5, inclusive. For our implementation, each rating
is a
pproximately 1 byte.


In the final implementation, each
record consists of a
single user_id, movie_id and rating for a total of 6
bytes. For approximately 100 million records in the
data set, this means our unoptimized storage on disk
is approximately 570

Megabytes.


5.2

Qualifying Set

The qualifying set consists of triplets for prediction
.
Each triplet contains
: <user_id, movie_id,
rating_date>.


The qualifying set is stored as a flat text file for our
implementation.


5.3

“Judging” Set

The judging set was released after the conclusion of
the competition. None of
the information

from the
Judging set will be available to the prediction system;
however a separate evaluation system will have
access to the ratings

and be able to calculate th
e
RMSE for the predicted ratings. The judging set
contains the actual ratings for the qualifying set:
<rating, (0|1) membership in the (quiz|test) subset>.


In the final implementation, the judging set is stored
as a flat text file.


6

Analysis


After
implementation of the three main algorithms

described in section 4.3, each of the algorithms
needed to be run separately to get a baseline value for
the final RMSE calculations to later compare in the
blend
.




6.1

kNN Solo Run

First testing of the kNN
algorithm used simple
predictions to see if the code was operating as
expected.


kNN is an algorithm that
requires much tweaking to
provide accurate results and this certainly proved true
in the version used for this project. We were able to
get performan
ce improvements by shrinking down
the size of objects as much as possible, reusing
objects instead of instantiating new ones, and using
arrays instead of collections in some places.


The biggest improvement, but unfortunately, one we
were unable to capital
ize on because it came so late,
was having the distances pre
-
calculated.
The most

expensive

and time consuming part of k
NN is
calcul
ating the distances. Once the distances are
known,

similarity requires a simple calculation since
it doesn’t need any comp
arisons.


For kNN, we need to start with a baseline of the
rating will be. For all general purposes, this value
will be a best guess since the algorithm will adjust
that value accordingly based upon biases.


To get our
baseline value, we start with the

global average
.
Next a

"user bias" and "movie bias", the user and
movie's average ratings

respectively, are added to the
global average. Once a baseline value is obtained,
collaborative data is used
to
adjust

the guess

rating

up
or down
accordingly
.
Th
e collaborative data used in
this case is the nearest neighbors.


The effectiveness of kNN relies heavily on the
number of neighbors chosen, or rather, the k value.
For accurate results, a large k value should be chosen
and for the purposes of the movie d
ata, a value of 300
to 500 was used. This value allows us to have a
significant amount of neighbors to compare each
node with, but doesn’t degrade the performance of
the algorithm due to too many neighbor comparisons.


The next step is to f
in
d out which o
f the neighbo
rs
the user has rated

and adjust our baseline
accordingly
.

If the

user

hasn
't rated any
,

or
hasn’t
rat
ed enough below some threshold,
the guess is not
modified
,

but rather stays as the

baseline
; otherwise,
the average rating of the neighbor
(modified by a
voodoo parameter called the support, which is the
number of times the movie was rated)

is added

to a
summation. At the end
,

the average of the
summation

is taken

and
a

distance from the baseline
guess

to the average is obtained
.
The final
distance
value is modified
by
a

"confidence" that the movie is
similar to the movie

for which

we want to
obtain a
guess
.
The

confidence

value comes

from a
nother

[
7
]


voodoo calculation that involves the distances.
Essentially,
the confidence

value

has a corre
lation
with the distance where the

lower the distance value,

the
higher

the
confidence value should be
.


6.2

SVD Solo Run

Unfortunately SVD did not run at all on the data set.
In theory, the SVD code should correctly instantiate
and build a SingularValueDecom
position object. The
re
-
composition code should also be correct, but since
the builder will not run, we couldn’t test the code on
the full training data set.


We originally chose Colt as it contains specific
functions and data structures for sparse matric
es,
specifically SparseDoubleMatrix2D. This class is
particularly useful for our data set as it doesn’t use
memory for cells that are not set to a non
-
zero value,
meaning the cell was not assigned a value. Saving on
memory space allows us to pull the ent
ire matrix into
memory for work.


Colt contains Javadoc style documentation available
online [11]. During implementation, we neglected to
notice the size constraints of Colt, assuming it would
be able to handle the matrix size we need for our data
sets.
According to the Javadoc for
SparseDoubleMatrix2D, the maximum size of the
matrix is 2^31


1 values, as defined by
Integer.MAX_VALUE.


In the first implementation, we attempted to construct
the SVD object using a double array, such as the
implementation b
y Carleton College; however, this
would potentially create a matrix that is too large for
us to pull into memory since we would be
instantiating a matrix of roughly 8.5 million doubles.
In order to exploit the nature of our spare data set, we
adapted a ne
w implementation that uses a function
call to SparseDoubleMatrix2D with a set matrix size.
According to the Colt Javadocs, the
SparseDoubleMatrix2D (int rows, int columns)
constructor does not use memory for non
-
zero values
and can reclaim memory for valu
es that are set to a
non
-
zero value then reverted back to zero at a later
point.


In our final implementation, the SVD portion of the
code does not run at all. Instead an
IllegalArgumentException is thrown and the program
exits. As described above, the m
aximum size of a
SparseDoubleMatrix2D object is 2^31


1; however,
our matrix requires a size of approximately four
times this value for the full training data set.


SVD output would have been very beneficial to the
blend; however, due to time constraints,

SVD was
ultimately removed from the final evaluation system
as we did not have enough time to re
-
implement the
SVD algorithm on another java package. If time had
permitted, we could have potentially focused on
coding SVD from the ground up. A solution o
f this
caliber would have allowed us to circumvent
limitations imposed by other java packages as well as
make improvements upon the base SVD algorithm to
exploit the nature of our sparse data set. The main
reason for ultimately removing SVD from the final

evaluation system was due to the training set. In
order to get results from SVD, we would have to
approximately cut the data set into four sections and
keep a reduced rank SVD matrix for each section.
Note that this is a potential solution, but since RB
M
and kNN are operating on the full training data set,
we would need to store a total of potentially five
different index files on disk and the blends including
SVD would only be related to a portion of the full
training set. In simpler terms, SVD would h
ave no
relation between the four sets and as such, the results
could be detrimental to the final blended values.


6.3

RBM Solo Run


training set
RMSE

probe set
RMSE

epoch run
time (secs)

0.904805

0.962995

2255.693885

0.861035

0.915038

2209.648267

0.84708

0.897914

2179.214389

0.838889

0.887303

2172.958197

0.833903

0.88085

2200.963425

0.830776

0.876061

2217.168343

0.826705

0.870054

2225.017503

0.82387

0.865829

2183.128083

0.821614

0.862388

2161.185157

0.819763

0.859485

2155.944228

0.81842

0.857573

3110.416707

0.817113

0.85556

3114.69582

0.815921

0.853675

3119.466467

0.814866

0.852003

3125.4032

0.813904

0.850359

3128.298285

0.813163

0.849252

3595.302253

0.812471

0.848176

3597.782703

Table
1
: RBM Results

Table 1 above shows the run results for our
implemented RBM. As expected, RBM
overfit the
training set and achieved
less accurate RMSEs in its
[
8
]


attempt
s

to reconstruct the
ratings in the
probe set.

Appendix A show
s a visual representation of table 1
.


6.4

Ble
nded Results

Since SVD didn’t run on our data sets, we

ultimately
decided to shift away from blending the results
together as we would be missing a statistical
approach method in the blend.

Instead we focused
more on implementation of the other two algorithms
and improvements to the NEUStore database.


7

Conclusion

Each of the algorithms used in this project extracts
different information about the data sets provided.
The analysis section de
tails the outcome of each
algorithm specifically
.


The project didn’t end up being exactly as proposed
considering SVD was removed due to time
constraints; however, both kNN and RBM produced
results
. As a direct result of SVD being removed, we

ended up re
moving blending entirely and focused
more on optimization among the two working
algorithms. In theory, the blend of the two working
algorithms should produce results with a better
RMSE than the individual algorithms alone, but we
would be missing a mathem
atical based approach in
our blend.


The three algorithms explored in this paper are very
powerful for data extraction and each has a different
approach and time complexity for inferring relations
in the data set. We only considered a few specific
algorit
hms of the many used in the Netflix Prize and
the grand prize was won by BellKor’s Pragmatic
Chaos.
For further reading, see the individual team
solutions, as well as yearly progress papers (as
required by Netflix) in section 8.2.


In the end, t
he solutio
n by
the winning team

implements at least one (if not multiple) versions of
each of the algorithms described in this paper, so we
can see the importance

of these algorithms in data
mining.


8

References


8.1

Cited References

[1]

Netflix Prize: Home
. Netflix. Web. 14 Oct. 2009.
<http://www.netflixprize.com>.


[2] "Netflix Prize: Review Rules."
Netflix Prize:
Home
. Netflix. Web. 14 Oct. 2009.
<http://www.netflixprize.com/rules>.


[3]
UCI Machine Learning Repository
. UC Irvine.
Web. 14 Oct. 2009.

<ht
tp://archive.ics.uci.
edu/ml/datasets/Netflix+Prize>


[4] Hinton, Geoffrey E. "Boltzmann machine."
Main
Page
-

Scholarpedia
. 2007. Web. 14 Oct. 2009.
<http://www.scholarpedia.org/article/Boltzmann_ma
chine>.


[5] M. Angelica Cueto, Jason Morton, and Bernd
St
urmfels. "Geometry of the Restricted Boltzmann
Machine." 30 Aug. 2009. Web. 14 Oct. 2009.
<http://arxiv.org/pdf/0908.4425v1>.


[6]
Liu, Bing.
Web Data Mining Exploring
Hyperlinks, Contents, and Usage Data (Data
-
Centric
Systems and Applications)
. New York:
Springer,
2006. Print.


[7]
Landauer, T. K., Foltz, P. W., & Laham, D.
(1998). Introduction to Latent Semantic Analysis.
Discourse Processes,
25
, 259
-
284.


[8]
Zhang, Donghui. "NEUStore (version 1.4)." 29
Sept. 2008. Web. Oct.
-
Nov. 2009.
<http://zgking.com
:8080/home/donghui/research/neu
store/NEUStore.pdf>.


[9]
Hicklin, Joe, Cleve Moler, and Peter Webb.
JAMA : A Java Matrix Package
. 13 July 2005. Web.
Nov.
2009.<http://math.nist.gov/javanumerics/jama/>.


[10]
"Colt." 1999. Web. Nov. & dec. 2009.
<
http://acs.lbl.gov/~hoschek/colt/>.


[11]
"Colt 1.2.0
-

API Specifi
cation." Web. Nov. &
dec. 2009.

<http://acs.lbl.gov/~hoschek/colt/api/index.html>.


[12]
"Algorithms :: Single Value Decomposition."
Carleton College: Computer Science
. Web. Nov. &
dec. 200
9.

<http://www.cs.carleton.edu/cs_comps/0607/recomm
end/recommender/svd.html>.


[13]
Autumn 2009. Web.

<http://github.com/galfgarion/Tortuga>.


[14]
Ruslan Salakhutdinov, Andriy Mnih, and
Geoffrey Hinton. Restricted boltzmann machines for
collaborative
filtering. In ICML '07: Proceedings of
the 24th international conference on Machine
learning, pages 791
-
798, New York, NY, USA, 2007.
ACM.


[
9
]


8.2

Uncited
Further

Reading


Koren, Y. "Factorization meets the neighborhood: a
multifaceted collaborative filtering mod
el."
Proceeding of the 14th ACM SIGKDD international
conference on Knowledge discovery and data mining
.
ACM. 426
-
34.

2008.

Print.


8.2.1

Progress Prize Papers (2007)

R. Bell, Y. Koren, C. Volinsky,

"The BellKor
solution to the Netflix Prize"
, (2007).


R. Bell, Y
. Koren, C. Volinsky,

"Modeling
relationships at multiple scales to improve accuracy
of large recommender systems"
,

Proceedings of the
13th ACM SIGKDD international conference on
Knowledge discovery and data mining (KDD'07)
,
ACM Press (2007).


R. Bell, Y.
Koren,

"Improved Neighborhood
-
based
Collaborative Filtering"
,

KDD Cup and Workshop
(KDD'07)
, ACM Press (2007).


R. Bell, Y. Koren,

"Scalable Collaborative Filtering
with Jointly Derived Neighborhood Interpolation
Weights"
,

IEEE Conference on Data Mining
(I
CDM'07)
, IEEE (2007).


8.2.2

Progress Prize Papers (2008)

R. Bell, Y. Koren, C. Volinsky,

"The BellKor 2008
Solution to the Netflix Prize"
, (2008).


A. Töscher, M. Jahrer,

“The BigChaos Solution to
the Netflix Prize 2008"
, (2008).


A. Töscher, M. Jahrer, R. Leg
enstein,

"Improved
Neighborhood
-
Based Algorithms for Large
-
Scale
Recommender Systems"
,

SIGKDD Workshop on
Large
-
Scale Recommender Systems and the Netflix
Prize Competition (KDD’08)
, ACM Press (2008).


8.2.3

Grand Prize Papers (2009)

Y. Koren,

"The BellKor
Solution to the Netflix
Grand Prize"
, (2009).


A. Töscher, M. Jahrer, R. Bell,

"The BigChaos
Solution to the Netflix Grand Prize"
, (2009).


M. Piotte, M. Chabbert,

"The Pragmatic Theory
solution to the Netflix Grand Prize"
, (2009).



[
10
]


9

Appendix A


Graph
1
: Training set RBM results


Graph
2
: Probe set RBM results


Graph
3
: RBM run times