An Innovative Personalized Recommendation System Integrated Collaborative Filtering and Decision Trees

kettlecatelbowcornerΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

67 εμφανίσεις

An Innovative Personalized Recommendation System
I
ntegrated
C
ollaborative
F
iltering and
D
ecision
T
rees



Tien
-
Chin Wang
1
, Hsien
-
Da Lee
2


1

Department of Informati
on Management, I
-
Shou
University
, Taiwan

tcwang@isu.edu.tw

2

Department of Informati
on
Engine
ering
, I
-
Shou
University
, Taiwan

leesd@center.fotech.edu.tw



Abstract


As there is

explosive growth of information o
n

the
Internet, customers may spend more time to find
suitable products when they purchase on the web. In
order to boost sales and enhance

customer loyalty,
developing an intelligent recommendation system is a
good way to help customers effectively find suitable
products in nowadays overloaded information of
Internet environment. Traditional collaborative
filtering algorithms have been widel
y accepted as the
most popular PRS approaches. However,
collaborative filtering approaches face problems such
as sparsity and cold
-
start that limit the applicability
of PRS.

T
his paper proposes a
PRS which integrates
collaborative filtering and
Shannon’
s

e
ntropy theory.
The entropy
-
based collaborative filtering algorithm
intends to better improve accuracy and performance.
A decision tree induction is developed to identify
customer patterns.

Three measures (precision, recall
and F1
-
measure) are used to evalu
ate the
performance of the system. An experiment results
show the applicability of the proposed system
.


1. Introduction


Like

the booming of
the
Internet, e
-
commerce
attracts millions of people to buy and sell products
over the Internet. In order to enla
rge market shares
and create more business opportunities, enterprises
have been developing new business portals and
providing large amounts of product information, as a
result of which customers have more opportunities to
choose various products that meet
their needs.
However, the explosive growth of information may
cause customers to spend more time and efforts to find
products. On the other hand, companies desire to
collect customer information in order to provide
suitable products to meet customer needs.

In order to
solve the information overload and to identify
customer purchase behavior, developing

a

web
-
based
personalized recommendation system is one of the
most feasible approaches to alleviate the information
overload burden and provide users with per
sonalized
i
nformation to meet different

needs. According to [1]
“personalization is defined as any action that

adapts
the information or services provided by a web site to
the knowledge gained from the users’ navigational
behavior and individual interests,

in combination with
the content and the structure of the site”.

With personalized recommendation systems,
consumers can effectively gain the information they
are interested in, and save effort while reading
enormous web pages to compare similar product
f
eatures. In addition, enterprises can classify
customers’ previous purchasing behaviors and then
develop appropriate marketing strategies to enhance
customer loyalty.

Most recommendation systems can be divided into
two major categories: content
-
based app
roach and
collaborative filtering approach. In the content
-
based
approach, it recommends products or services that are
similar to what the user has been interested in the past
[2]. In the collaborative filtering approach, it
recommends products or services

to customers based
on other customers with similar interests [3]. Based on
the two approaches, many AI techniques have been
researched by researchers in order to generate
accurate recommendations and improve the efficiency
and effectiveness of recommenda
tion systems, such
as Bayesian network [4], clustering technology [5],[6],
singular value decomposition [8],[9], association rule
mining [10],[11], etc.

9999

Collaborative filtering, one of the earliest and most
successful recommendation technologies
,

obse
rves the
behavior of individuals in a like
-
minded peer group
and makes recommendations to individual users based
on the behavior pattern of the peer group. Examples of
systems using this approach include Ringo [3] and
GroupLens [12]. Although collaborative

filtering
approaches have been successfully applied to many
domains, they

are

constrain
ed

b
y three major
limitations that

need

to be solved. These limitations
include sparsity, cold
-
start and scalability problems
which restricted the feasibility and sprea
d of a
practical PRS.

This paper proposes an entropy
-
based collaborative
filtering algorithm while implementing a personal
recommendation system in order to better improve
performance. The remainder of th
e paper is organized
as follows:

In Section 2, re
search background is
expatiated, including an overview of personalized
recommendation systems and entropy. Section 3
explains the implementation issues of the proposed
method. Section 4 reports the

experimental process
and the results of the study. Finally
, the conclusion is
given in Section 5.


2.
Background


Each day
, as

more and more web
pages appear in
cyberspace
,

people become overwhelmed by
information overload problem
s

when searching

for
info
rmation or purchasing products o
n the Internet. In
respons
e to the challenge of information overload,
many researchers devote their efforts to developing
effective personal recommendation systems.
Personalization, a special form of differentiation, is
that a website can respond to a customer’s unique and
particul
ar needs. Mobasher et al. defined Web
personalization as an act of response according to the
individual user’s interest and hobby on Internet usage
[13]. A personalized recommendation system can
provide personal service to customers based on
customers’ pas
t purchasing patterns and through
inference from other users with similar preferences.
The aim of personalization is to offer customers what
they want without asking them explicitly and to
capture the social component of interpersonal
interaction [14].


2
.1.
Personal recommendation system


Personalized recommendation systems can be
categorized into two approaches: contented
-
based
approach and collaborative filtering approach. In the
content
-
based approach, products are described by a
set of attributes or t
he content of the ite
ms. It analyz
es
the content of items that a person has selected in the
past and recommends items with similar content [15].
The content
-
based filtering approach adopts some
artificial intelligence concepts such as information
retrieval

and information filtering. The item
recommended by content
-
based filtering often
indicates textual information, such as news webs and
documents. And these items usually describe with
keywords and its correspondent weights. Usually
clustering techniques ar
e utilized to analyze the
feature content of products and recommend suitable
content based on features characteristics or customer’s
preference. The challenge of this approach includes
limited content analysis because of limited keywords,
overspecializatio
n problems and new user problems.

On the other hand,
the
collaborative filtering (CF)
approach works on building a customer dataset from
customers and present
s

recommendation
s

by
collaborative algorithm.

The
Collaborative filtering
approach identifies oth
er users who have shown
similar preferences to a given use
r and recommends
what they like
d

[16]. It is based on the idea that the
target users may rate products
which
are similar to
their nearest neighbors.

Collaborative filtering approaches usually are
c
onsisted of three steps. At first, a user
-
item rating
matrix is constructed to represent user ratings of items.
Secondly, the nearest

neighbor clustering techniques
are applied by computing the similarities for all pairs
of users. Finally, the recommendati
on generation is
performed by aggregating ratings, which involves
aggregating the ratings of the target item by the target
user’s neighbors. Those steps can be described as
follows:

1.

Ratings Matrix construction : The users’
judgments or preferences are e
xplicitly represented by
a m×n user item ratings matrix R,

where m is the
number of users and n is the number of items. R =
(
ij
r

), the value of
ij
r

means that user i rates on item
j
. In the e
-
commerce recommendation

sys
tems, the
entry

represents a user’s tendency toward the rated
item. The higher the value, the more positive
preference the user. However, it is not necessary
needed that the user had purchased the rated item
before.

2. Neighborhood
-
Similarity C
lustering
:

Clustering
is a form of unsupervised learning, i.e., the data
available is not labeled and the output is a set of
clusters containing the similar points. Based on
clustering concept, K nearest
-
neighbors technique is
applied. KNN is that all the similariti
es between the
target user and other users in the system are computed
in order to find the set of the K most similar users
-
nearest
-
neighbors. The K nearest
-
neighbors are sorted
by similarity. To a great extent, the efficiency and
effectiveness

of collabora
tive filtering
recommendation algorithms mainly depend on

the
efficiency and effectiveness of K
-
nearest
-
neighbors
algorithms.

3.

Recommendation generation: Based on the
nearest
-
neighbor set, the predicted ratings of the items
unrated by the target user can

be computed, and the
recommendations are generated by triggering rules
whose condition match

the

threshold.

Although collaborative filtering technology has
been successfully used in many applications, its major
limitations including data sparsity, cold
-
s
tart and
scalability have restricted its widespread use in
practical e
-
commerce systems.

The first limitation is the sparsity problem [17].
Conventional collaborative filtering recommendation
systems require users to explicitly input preference
ratings abo
ut many products. In a large e
-
commerce
system, the number of items rated by a user is usually
less than one percent of total items. The percentage of
items rated by two or more users is much less than
that, which resulting in a very sparse user
-
item ratin
gs
matrix. Using the large scale matrix and sparse
ratings, the computation cost of similarities between
users is high while the results may not be acceptable.
As a result, predicted ratings accuracy degrades
significantly when the received ratings are spa
rse.
Scalability is also a common concern faced by CF
[18].

As the

number of users and items grow
s
, the
computation complexity increases rapidly. User
-
based
collaborative filtering algorithm requires computation
that grows with both the number of users and

the
number of items. An e
-
commerce usually has millions
of users and items. A typical web
-
based recommender
system running the CF algorithm will suffer serious
scalability problems.



2
.
2
.
Shannon

s entropy


The concept “entropy” originally comes from
the
rmodynamics. In the thermodynamic systems,
entropy is defined in terms of heat divided by the
absolute temperature. The entropy measure is used to
calculate the information gain which reflects the
quality of an attribute as the branching

attribute

[7]
.

An
information
-
based heuristic selects the attribute
providing the highest information gain. A data set with
some discrete
-
valued condition attributes and one
discrete
-
valued decision attributes can be presented in
the form of knowledge representation

syst
em
)
,
(
D
C
U
J


,

where


s
u
u
u
U
,
,
,
2
1



is the set of data samples,


n
c
c
c
C
,
,
,
2
1



is the set of condition attributes
and


d
D


is the one
-
elemental set with the
decision attribute or class label attribute. Suppose this
class la
bel attribute has m distinct values

defining m
distinct classes ,
i
d

(
m
i
,
,
2
,
1


), let
i
s

be the

number of samples of
U

in class
i
d
.The expected
informa
tion or entropy need to classify a given

sample
is given by





m
i
i
i
m
p
p
s
s
s
I
1
2
2
1
log
)
,
,
,
(



(1)


Where
i
p

is the probability that an arbitrary
sample belongs to class
i
s

and is estimated by
summation those samples’ entropy (m is the number
of all samples). Let

attribute
i
c

have v distinct value



v
A
A
A
,
,
,
2
1


, attribute
i
c

can be used to
partition U into v subsets


v
s
s
s
,
,
,
2
1


where
j
s
(
v
j
,
,
2
,
1


) contains those samples in
U

that
ha
ve value
j
A

of
i
c
. Let
ij
s

be the number of
samples of class
i
d

in a subset
j
s
, the entropy of
attribute
i
c

is given by







v
j
mj
j
j
mj
j
j
i
s
s
s
I
s
s
s
s
c
E
1
2
1
2
1
)
,
,
,
(
)
(



(2)

The
term
s
s
s
s
mj
j
j




2
1
acts as the weight
of the
j
th subset and is the number of samples in the
subset divided by the total number of samples. The
smaller the entropy value

is
, the greater the purity of
the subset partitions

is
.

Thus the attribute that

leads to
the largest information gain, is selected as the
branching attribute.

For a given subset
j
s
,the
information gain is expressed as







m
i
ij
ij
mj
j
j
p
p
s
s
s
I
1
2
2
1
log
)
,
,
,
(




(3)


Where
j
ij
ij
S
s
p


(
j
S

is the number of
samples in the subset
j
S

) and is the probability that
a sample in
j
S

belongs to class
i
d
. So information
gain of attribute
i
c

is given by


)
(
)
,
,
,
(
)
(
2
1
i
mj
j
j
i
c
E
s
s
s
I
c
Gain




(4)


We compute the information gain of each condition
attribute, the attribute with the highest information
gain is the most informative and the most
discriminating attribute of the given set.


2.2.1.

Item entropy


Based on Shannon

s entropy, we

can extend to
define the item entropy as





n
i
x
i
x
i
p
p
H
I
1
,
2
,
log
)
(


(
5
)

W
here n is the total number of users,
)
(
X
I

is the
entropy measure of item x. If a user
I
rates on item x,
the probability p is calculated as


items
of
number
total
useri
by
rated
items
of
number
items
of
number
total
i
user
by
rated
items
of
number
p
m
x
i




1
,

(
6
)


According to the definition of entropy, the more
rates the item, the more entropy the item has. From the
user perspective, the user rates more items, the use has
more influence on item entropy. An item with large
item entropy values indicates u
sers are more interested
in the specific item compared to other items.


3
.
Proposed methodology


I
n this section, a web
-
based personal intelligent
recommendation system is proposed, which
based on
Shannon

s entropy measure
.
The objective of our
system is
to recommend a unique set of

objects to
satisfy the needs of each active user.
The proposed
system is composed of the following major parts
:

1.

Data Representation Module: Data need to
be
pre
-
process
ed

into structure form.

2.

Decision Trees and Simila
rity Calc
ulation
Module:
I
t is possible to g
enerat
e the
nearest
-
neighbors of the target user

by

implementing
the ratings matrix
.

Besides, c
omput
e
the
entropy of every item
attribute
, w
e apply ID3
to construct a decision tree to identify user
preference patterns.


3.

Generation of recommendations
:
Recommendations are generated by
triggering rules whose

conditions match the
thresholds
in customers’ inputs.
For a better
performance quality, a threshold is defined
for the requirement
s

to

be

met
.
Products in

the action pa
rts of the fired rules
can be those
potential candidates for

recommendation.



4.
Experiment


4
.
1
.
Data sets


We use the
famous
movielens dataset

(available for
downloading

from http://movielens.umn.edu)
collected by the GroupLens Research at the Universit
y

of Minnesota.
Movielens

contains 100,000 ratings
from 943 users for 16
81

movies

[19]
. Each user

has
rated at least 20 movies, and each movie has been
rated at least once.
We divided the database into 80%
training set and 20% test set.

The training set is

used
to generate the recommendation

model. Our
recommendation

system is then evaluated by
comparing the Top
-
N recommendations it makes,
given the test data, with the set of deleted items.


4
.
2
.
Evaluation metrics


Many metrics have been proposed for asse
ssing
the accuracy of a collaborative filtering system.

To
evaluate the effectiveness of proposed system, we
apply th
e most common

used
metric



the
F1 measure

for evaluating the recommendati
on quality [
20
]. F1

is
calculated as follow:


recall
precision
recall
precision
F




2
1

(
7
)

Precision is the
ratio of the
accura
te items identified
over the top N set.
It is

computed as the ratio of the
number of relevant

recommendations to the total
number of recommendations

that a
r
ecommendation
s
ystem produces.


N
N
top
test
precision
_




Rec
all
measures the ability of a
r
ecommendation
s
ystem to

recommend all the products that are likely to
interest the

customers. It is the ratio of the number of
recommendations that are
correctly
generated by the

r
ecommendation
s
ystem
over

the
total data set.


set
test
N
top
test
recall
_



These two measures
, precision and recall, are
o
ften
conflicting

with

each

other in nature
.
Take number N
for example, i
ncreasing the number N tends to
increase recall but decrease precision. The fact that
both are critical for the qu
ality judgment leads us to
use a combination of the two.

T
he standard F1 metric

can be a balance to both precision and recall
.


4
.
3
.
Experiments


As described in section 3, an innovative
personalized recommendation system which integrated
collaborative f
iltering and decision trees is
presented
.
At the first phase of data
representation
, we
select the
movie
-
rating table
from

several

database tables
according to their genres. The movie table can be
separated into

Action

,

Adventure

,

,


Western


tables.
The genre is used as decisi
ve attribute. In order
to simpl
ify

computation complexity, we also further
divide the table into several sub
-
tables according to
user occupations. For instance, there is a table called
student
-
drama rating table which solely cont
ains the
users who
are
students. We intend to explore the user
preference

patterns toward t
he

movie genre.
Which

user

group

is

more
likely

to watch drama movies than
other user group? Are male users supposed to
like

sci
-
fi movies

better than

romance movie
s? We also define
use
rs’

age to several groups. For example, the user
whose age is less than ten belongs to the group

kid

.
There are five groups

which are


kid

,

teenager

,

young adult

,

adult


and

mid
dle
-
aged

.
The rating
values in movielen
s

also

ha
d
be
en

classified as “
high


and

low

. If the

values are 3,4 or 5, we
classified

them

as

high

, otherwise

they will be
classified

as

low

.

After pr
e
-
processing the data, we then develop a
decision trees and conduct data analysis through
similarity calc
ulation. Take student
-
drama table for
example, we
develop
ed a decision tree by calculating
every

attribute entropy.
According to eq. (5), t
he

age


entropy is 0.00329 and

gender


is 0.000034.
The attribute

age


is selected as the split node. Then
we calc
ulated different age group

s rating probability
according to eq.(6).
We developed a decision tree
shown in fig.1.



Figure 1.
Decision tree of students rating drama
movies


The main
advantage of decision tree is

easy to
interpret. As in figure 1, we may
generate some useful
decision rules. Take the group

kid


for example, if a
student age is less then 10 (kid), the probability that
he or she rates drama movies as high, i.e highly
recommend, is about 75%. The
probability

that

middle aged


h
ighly recomme
nd drama movies
jumped to 97.2% which is th
e highest among five age

groups. It may indicate that middle
-
aged student
s

favor drama movies a lot.
Based on the decision tree,
we can
evaluate

the effectiveness of the proposed
system by utilizing F1 metrics.

We

randomly selected
20% of cases as test sets. Then we

applied eq. (7) to
evaluate F1 values.
T
able 1

shows the result
:



Table 1. Performance of the
entropy
-
based model on
test dataset

Movie type

Precision(%)

Recall(%)

F1

A
ction

45

3.77

0.070

A
dventure

3
6


4.20

0.075

A
nimation

32

2.46

0.046

C
hildren

25


4.60

0.078

C
omedy

38


3.80

0.069

C
rime

36


4.00

0.072

D
ocumentary

39


7.00

0.119

D
rama

56


5.20

0.095

F
antancy

43


6.50

0.113

F
ilmnoir

37


8.60

0.140

H
orror

53

11.20

0.185

M
usical

49

10.8
0

0.177

M
ystery

45


4.90

0.088

R
omance

36


5.80

0.100

S
ci
-
fi

41


6.80

0.117

T
hriller

38


5.60

0.098

W
ar

23


7.00

0.107

Western

12


3.77

0.057



5.
Conclusions


As Internet
has
become a
n

important

part

for
everyone

s

daily life, recommendation
systems have
emerged as a powerful new technology for extracting
valuable information effectively from the Web
.

Recommendation

systems help customers find
suitable

products and boost company sales. Recommendation
systems are quickly becoming a crucial tool

in E
-
commence. In this paper, we presented an innovated
personalized
recommendation

system that integrated
collaborative filtering and Shannon

s entropy concept.
Based on Shannon

s entropy, a recommendation
decision tree is constructed. Decision
tree
s hav
e the
advantage

of being

easy to comprehend and
implement.
The experimental results
show the
applicability of the proposed system

by achieving
good performance.



6
. References


[1]

R.
Agrawal
,

and
R.
Srikant, “Fast Algorithms for
Mining Association Rules
”,
Proc. of the 20th VLDB
,

J.
Bocca, M. Jarke, & C., Zaniolo,

(Eds.), Morgan Kaufmann,
1994, pp. 487
-
499.


[2]

K.
Lang
,


Newsweeder. Learning to
F
ilter
N
etnews

,

Proceedings of the 12th
i
nternational conference on
machine learning
, Tahoe City,

California
,

1
995
.


[3]

U.,
Shardanand
,

and P.

Maes
,

“Social Information
Filtering: Algorithms
f
or Automating ‘Word
o
f Mouth


,

Proceedings of the Conference on Human Factors in
Computing Systems
-
CHI’95
, Denver, Co., May 1995.


[4]

D.

Chickering
,

and
D.
Hecherman
,
“Effi
cient
Approximations for the Marginal Likelihood of Bayesian
Networks with Hidden Variables”
,

Machine Learning
, 1997,
29:

pp.

181
-
212.


[5]

A.

Dempster,
N.

Laird, and
D.
Rubin,

“Maximum
Likelihood from Incomplete Data via the EM Algorithm”
,

Journal of the
Royal Statistical Society
, 1977.


[6]

B.
Thiesson,
C.
Meek,
D.
Chickering, and
D.
Heckerman, “Learning Mixture of DAG Models”
,

Technical
Report MSR
-
TR
-
97
-
30
, Microsoft Research, Redmond, WA,
1997.


[7]

X.

Yang,


A Maximum Entropy Model
A
pplication
o
n Recog
nition Of Metaphor Phenomena

,
Proceedings of the 15th International Conference on

Computing (CIC'06)
, 2006.


[8]

B.M.
Sarwar,
G.

Karypis,
J.A.
Konstan, and
J.
Riedl,
“Application of Dimensionality Reduction in Recommender
System
--
A Case Study”
,

In
ACM We
bKDD 2000 Workshop
,
2000.


[9]

C.C.
Aggarwal,

“On the Effects of Dimensionality
Reduction on High Dimensional Similarity Search”
,

ACM
PODS Conference
, 2001.


[10]

A.
Zheng
, Y.Y.
Zhu
, and B.L.
Shi
,


Collaborative
Filtering Recommendation Algorithm based on I
tem Rating
Prediction

,

Journal of Software
, 13(4),
2002.

[11]

B.
Sarwar,
G.
Karypis,
J.
Konstan, and
J.
Riedl,
“Analysis of Recommendation Algorithms for E
-
Commerce”
,
ACM Conference on Electronic Commerce
,
2000,
pp.
158
-
167.


[12]

J.
Konstan,
B.
Miller,
D
.
Maltz,
J.

Herlocker,
L.
Gordon,
and

J.
Riedl,
“Grouplens: Applying Collaborative
Filtering
t
o Usenet News”
,
Communications of the ACM
,
40(3),
1997, pp.
77

87.


[13]

B.
Mobashe,
H.

Dai,
and T.

Luo,
“Discovery
a
nd
Evaluation
o
f Aggregate Usage Profiles
f
or
Web
Personalization”
,

Data Mining and Knowledge Discovery
,
6(1)
, 2002
,
pp.
61

82.


[14]


B.
Mittal,
and W.

Lassar,
“The Role
o
f
Personalization
i
n Service Encounters”
,

Journal of
Retailing
, 72(1),

1996, pp.

95

109.


[15]

P.S.
Yu,
“Data Mining
a
nd Personali
zation
Technologies”
,

Proceedings
o
f
t
he Sixth International
Conference
o
n Database System
f
or Advanced Application
,
Hsinchu, Taiwan
, 1999,
pp. 6

13.


[16]

J.S.
Breese,
D.

Heckerman,
and C.

Kadie,
“Empirical
Analysis
o
f Predictive Algorithms
f
or Collabora
tive
Filtering”
,

Proceedings of the 14th conference on
uncertainty in artificial intelligence (UAI
-
98)
,
1998,
pp.

43

52.


[17]

K.W.
Cheung,
J.T.
Kwok,
M.H.
Law,
and K.C.
Tsui,
“Mining Customer Product Ratings
f
or Personalized
Marketing”
,

Journal of Decisi
on Making
,
35,
2003
,

pp.
231

243.


[18]

Y.H.
Cho,

J.K.

Kim,
and

S.H.

Kim
,

“A Personalized
Recommender System
b
ased
o
n Web Usage Mining And
Decision Tree Induction”
,

Journal of Expert Systems with
Applications
,
23(3),
2002,

pp.
329

342.


[19]

J.K.

Herlocker
,
J.A.
Konstan,
A.
Borchers,
a
nd
J.
Riedl
,

“An Algorithmic Framework
f
or Performing
Collaborative Filtering”
,
Proceeding of ACM SIGIR’99
,

pp.
230
-
237.