A Survey on Web Recommendation Systems Based on Web Usage Mining

ticketdonkeyAI and Robotics

Nov 25, 2013 (3 years and 6 months ago)

76 views

1


A Survey on Web Recommendation
Systems

Based on Web Usage
M
ining

N. M. Abo El
-
Yazeed

Demonstrator at High Institute for Management and Computer,

Port Said University, Egypt

no3man_mohamed@himc.psu.edu.eg

Abstract:


Web usage mining has become the subject of exhaustive research, as
its potential for Web
-
based personalized services, prediction of user near
future intentions, adaptive Web sites, and customer profiling are
recognized. Recently, a variety of recommend
ation systems to predict
user future movements through Web usage mining have been proposed.
However, the quality of recommendations in the current systems to
predict user future requests in a particular Web site is below satisfaction.


Different efforts

have been made to address the problem of information
overload on the Internet. Web recommendation systems based on web
usage mining try to mine users’ behavior patterns from web access logs,
and recommend pages to the online user by matching the user’s br
owsing
behavior with the mined historical behavior patterns.


Keywords:


W
eb Mining, W
eb Usage Mining,

Web
-
based recommendation systems,
Navigation pattern mining, Web Log, Web Personalization.

1.

Introduction:


The volume of information available on the internet is increasing
rapidly with the explosive growth of the World Wide Web and the advent
of e
-
Commerce. While users are provided with more information and
service options, it has become more difficult for the
m to find the “right”
or “interesting” information, the problem commonly known as
information overload.

2



Recommender systems [1] are alternative, user
-
centric, promising
approaches to tackle the problem of information overload by adapting the
content a
nd structure of websites to the needs of the users by taking
advantage of the knowledge acquired from the analysis of the users’
access behaviors. They can be generally defined as systems that guide
users toward interesting or useful objects in a large spa
ce of

possible options [2].


In recent years there has been an increasing interest in applying web
usage mining techniques to build web recommender systems [3,4,
5
]. Web
usage recommender systems take web server access logs as input, and
make use of data

mining techniques such as association rule and
clustering to extract implicit, and potentially useful navigational patterns,
which are then used to provide recommendations. Web server access logs
record user browsing history, which contains plenty of hidd
en
information regarding users and their navigation. They could, therefore,
be a good alternative to the explicit user rating or feedback in deriving
user models. Unlike traditional techniques, which mainly recommend a
set (referred to as the recommendatio
n set) of items deemed to be of
interest to the user base their decisions on user ratings on different

items
or other explicit feedbacks provided by the user [
6
,
7
]. These techniques
discover user preferences from their implicit feedbacks, namely the web
pa
ges they have visited. Clustering and collaborative filtering approaches
are ready to incorporate both binary and non
-
binary weights of pages,
although binary weights are usually used for computing

efficiency [
8
].
Association Rule (AR) mining [
9
] can lead
to higher recommendation
precision [
8
], and are easy to scale to large datasets, but how to
incorporate page weight into the AR models has not been explored in
previous studies.

2.

Web Mining:



Web mining is the application of data mining Techniques to e
xtract
knowledge from Web data, in which at least one of structure or usage
(Web log) data is used in the mining process. There are three broad
categories of Web mining [
10
]:




3




Web content mining


Web content mining is the process to discover useful information from
text, image, audio or video data in the web. Web content mining
sometimes is called web text mining, because
the text content is the most
widely researched area. The technologies that are normally used in web
content mining are NLP (Natural language processing) and IR
(Information retrieval).



Web structure mining


Web structure mining operates on the Web’s
hyperlink structure.

Web structure mining is the process of using graph theory to analyze the
node and connection structure of a web site. This graph structure can
provide information about ranking or authoritativeness and enhance
search results of a page
through filtering. According to the type of web
structural data, web structure mining can be divided into two kinds.


The first kind of web structure mining is extracting patterns from
hyperlinks in the web. A hyperlink is a structural component that
c
onnects the web page to a different location. The other kind of the web
structure mining is mining the document structure. It is using the tree
-
like
structure to analyze and describe the HTML (Hyper Text Markup
Language) or XML (eXtensible Markup Language)

tags within the web
page.



Web usage mining




Web usage mining also known as web log mining, aims to discover
interesting and frequent user access patterns from web browsing data that
are stored in web server logs, proxy server logs or browser logs. Web
usage mining is the application that uses d
ata mining to analyze and
discover interesting patterns of user’s usage data on the web. The usage
data records the user’s behavior when the user browses or makes
transactions on the web site. It is an activity that involves the automatic
discovery of patt
erns from one or more Web servers.


The Web usage data includes the data from Web server access logs,
proxy server logs, browser logs, user profiles, registration data, user
4


sessions or transactions, cookies, user queries, bookmark data, mouse
clicks a
nd scrolls, and any other data as the results of interactions.

3.

Recommendation Systems:


In the WWW context, recommender systems are becoming widely
used by users and information retrieval systems to perform results of
both prefetching and
recommendation. In the literature, most researchers
focus on Web usage mining that analyzes Web logs with a process of
discovering knowledge in databases. Indeed, Web sites are generating a
big amount of Web logs data that contain useful information about
the
user behavior. The term “Web Usage Mining” was introduced by Cooley
et al. in 1997 when a first attempt of taxonomy of Web Mining was
done; in particular they define Web mining as the “discovery and
analysis of useful information from the World Wide We
b”. It is also
defined as “the application of data mining techniques to large Web data
repositories”. By citing the definition that Cooley et al. gave in

[11]
,
Web usage mining is the “automatic discovery of user access patterns
from Web servers” [1
2
].



Analyzing web log files to extract useful patterns is called web usage
mining. Web usage mining approaches include clustering, association
rule mining, sequential pattern mining etc., to facilitate web page access
by users, web recommendation model is ne
eded. The web usage mining
approaches can be applied to predict next page access.

4.

Literature Review:


The importance of Web usage mining has led to a number of research
papers in the area. However, most of these papers were hinde
red by some
kind of l
imitations.
Different combinations of mining techniques were
already suggested for web access recommendation:


Devanshu

et al. [13] introduced new model based on Markov process
for web access prediction has drawback of high complexity due to
consideration of all access sequences throughout the prediction process.


Siripon

et al.

[14] introduced web access prediction mod
el by
integrating roughest clustering with Markov model. It has major
5


drawback that lack of prediction accuracy due to approximation while
forming clusters. The possibility of an object for belonging to a cluster
can reduce the cluster tightness, which in
turn affects prediction accuracy.
The sequential mining suggested in that work is all k
-
th order Markov
model.


F. Khalil

et al.

[15]

has proposed a new framework for predicting the
next web page access

“Modelling and predicting web page accesses using

Markov Processes”
. F. Khalil has used the Markov model for web
predict
-
ion. If the Markov model is not able to predict the next page then
the association rule are used to predict the next web page
.


Antonio Maratea and Alfredo Petrosino
, [16] Personalized Web page
recommendation is strictly restricted by the nature of web logs, the
intrinsic complexity of the problem and the higher efficiency needs.
When handled by existing Web usage mining methods, because of the
existence of an large n
umber of meaningful clusters and profiles for
visitors of a usually highly rated Website, the model
-
based or distance
-
based techniques are likely to create very strong and simple assumptions
or, on the other hand, to turn out to be highly complex and slow.

The
author designed a heuristic majority intelligence technique, which
effortlessly adjusts to changing navigational patterns; with the low cost
explicitly individuate them ahead of navigation. The proposed technique
imitates human behavior in an unidenti
fied environment in occurrence of
several individuals working in parallel and it has the ability to predict
with better accuracy and in real time the next page group visited by a
user. This Technique has been checked on real data from users who
browse a po
pular Website of common content. Average accuracy on test
sets is better on a 17 class problem and, most importantly, it continues to
be steady as the Web navigation goes on.


V.V.R.Maheswara Rao and V. Valli Kumari,

[17]


authors of this paper
introdu
ce a new approach to predict users browsing behavior at two
levels to meet the nature of the navigation. One is category stage and the
other is web page stage. In stage one is to predict category. The
unnecessary categories can be excluded. The scope of ca
lculation is
massively reduced. Next, using pruned Markov models using higher
order in the level two to predict the users browsing page is more
6


effectively and high operational performance. The results of experiment
prove the low state complexity and predi
ctive power is well in both
stages.


A.Anitha, [18] introduced a new approach for next page access
prediction. Its use a combined approach of integrating Markov model and
a proposed model which find out highly homogeneous access patterns by
pair wise n
earest neighbor based clustering. The resultant patterns are
highly relevant, and the size data set that is utilized for sequential mining
process is highly reduced. The proposed method resulted in good
prediction accuracy with less state space complexity.

The drawback of
this work is, loosely connected access sequences are not considered for
mining process. Hence, it is suggested to extend this work by considering
noncontiguous access sequences also.


M.Jalali et al. [19] developed a recommendation syst
em called
WebPUM,

an online prediction using

Web usage mining system and
propose a novel approach for classifying user navigation patterns to
predict

users’ future intentions. The approach is based on the new graph
partitioning algorithm to model

user navi
gation patterns for the
navigation patterns mining phase. Furthermore, longest common
subsequence

algorithm is used for classifying current user activities to
predict user next movement. The proposed

system has been tested on CTI
and MSNBC datasets. The re
sults show an improvement in the

quality of
recommendations. Furthermore, experiments on scalability prove that the
size of dataset

and the number of the users in dataset do not significantly
contribute to the percentage of accuracy.


B.Nigam and S.Jain

[20] proposed a new way

of structuring the
Markov model

named as

Dynamic Nested Markov model for modeling
the user web

navigation sessions. Dynamic Nested Markov model uses
the

nesting concept; the higher
-
order Markov model is nested

inside the
lower
-
orde
r Markov model. Through this nesting,

the second
-
order
Markov model is accommodated inside the

first
-
order Markov model. In
Dynamic Nested Markov model,

all the advantages of lower
-
order model
and higher
-
order

model are achieved in one model. In this model

focus is
on time

complexity and coverage of the prediction state. Result shows

7


that the high coverage has achieved and time complexity has

been
reduced.


A.Anitha

and
N.Krishnan

[21], Authors
focuses on

providing
recommendations to learners as well as
web masters

to improve overall
effectiveness of web based teaching and

learning. This work deals with
analysis of web log data and

development of recommendation framework
using web usage

mining techniques like upper approximation based
rough set

clustering

using k nearest neighbors, dynamic support pruned

all k
-
th order Markov model and all k
-
th order association

rule mining by
dynamic frequent (k+1) item set generation

using Apriori. The goal of
this integrated approach is to make

accurate recommendations
for
learning management systems

with reduced state space complexity.

5.

Conclusion:


World Wide Web is growing rapidly, and to facilitate web browsing
which help user in his surfing session, and to engage users of a website at
an early stage of surfing, a
system for web access recommendation is
essential. So it is necessary to study the user web navigation behavior to
improve the quality of web services, offered to the web user. Analysis of
user web navigation behavior is achieved through modeling web
navig
ation history.


Many approaches were introduced to do this task most of them are
based on “Markov model” which is the widest one was used to model the
user web navigation sessions. Lower
-
order Markov model provides high
coverage, but with low accuracy.

Higher
-
order Markov model give low
coverage but high accuracy with more time complexity.

6.

Future Work:


Because of the drawback of current web access models such as high
complexity, less accuracy, and contradictory predictions and so on, it’s
necessary to enhance web pages recommendation approach to treat this
weakness by making improvements which result

high recommendations
accuracy, low complexity and to eliminate current approaches
disadvantages.

8


7.

Reference:

[1]

P.
Resnick, H. R. Va
rian, “Recommender
Systems”,
Communications of the ACM,
VOL
40
, No.3
, pp. 56
-
58
,
March
1997.

[2]

P. Burke, “Hybrid Recommender Syst
ems: Survey and
Experiments”,
User Modeling and User
-
Adapted Interaction

,
pp.
331
-
370,

2002.

[3]

X. Fu, J. Budzik, K. J. Hammond, “Mining Navigation History for
Recommendation”, In Intelligent User Interfaces
, pp. 106

112
,
2000.

[4]

W. Lin, S.A. Alvarez, C. Ruiz
,

Collaborative recommendation via
adaptive association rule mining
”,

In

Proceedings of the Web
Mining for E
-
Commerce Workshop (WebKDD'2000)
, Boston
,
August 2000.


[5]

Y.
H
.
Wu, Y
.
C
.
Chen,

A. L. P. Chen,
“Enabling Personalized
Recommendation on the Web based on User Interests and
Behaviors”,

In 11th International Workshop on research Issues in
Data Engineering, 2001.

[6]

M. Deshpande, G. Karypis,

Item
-
Based Top
-
N Recommendation
Algorithms”, ACM Transactions
on Information Systems
, VOL
.
22, N
O
. 1
, p. 143
-
177, January

2004.

[7]

J
.

L. Herlocker
,

J
.

A. Konstan
,

A.
Borchers
,

J
.
Riedl
,



An
Algorithmic Framework for Performing Collaborative
Filtering
”,

In

SIGIR 99: Proceedings of the 22nd Annual
International ACM SIGIR

Conference on Research and
Development in Information Retrieval,

p
p.

230
-
237,

1999.

[8]

B. Mobasher, “Web Usage Mining and Personalization”, In
Practical Handbook of Internet Computing, Munindar, P. Singh
(ed.), CRC Press, 2005.

[9]

M. Nakagawa, B. Mobasher, “A H
ybrid Web Personalization
Model Based on Site Connectivity”, In The Fifth International
WEBKDD Workshop: Web mining as a Premise to Effective and
Intelligent Web Applications
, pp. 59


70
, 2003.

[10]

J. Vellingiri and S.Chenthur Pandian, “A Survey on Web Usage
Mining”, Global Journal of Compute
r Science and Technology,
VOL.

11, Issue 4, Version 1.0, USA, March 2011.

9


[11]

R. Cooley, J. Srivastava, and B. Mobasher
,


Web

mining:
Information and pattern discovery on the

world wide web

,
In 9th
IEEE International

Conference

on Tools with Artificial Intelligence
(ICTAI’97),

November

1997.

[12]

M
.

Géry and H
.

Haddad, “Evaluation of Web Usage Mining
Approaches for User’s Next Request Prediction”, WIDM '03
Proceedings of the 5th ACM international workshop on Web
information

and data management, New

York, NY, USA
, pp.74
-
81,

2003.

[13]

D. Dhyani,
S
.

S. Bhowmi
ck, and W.
K
.

Ng, “Modelling and
predicting web page accesses using Markov Processes”, IEEE,
Computer Society, 2003.

[14]

S
.

chimphlee, N
.

Salim, M
.

S
.

B
.

Ngadiman, W
.

chimphlee,
and S
.
srinoy, “Rough Sets Clustering and Markov Model for Web Access
Prediction”, Proceedings of post graduate annual seminar, pp. 470
-
474, 2006.

[15]

F. Khalil, J. Li, and H. Wang, “A framework of combining Markov
model with association rules for predicting w
eb page accesses”,
Proc. Fifth Australasian Data Mining Conference (AusDM2006),
volume 61, pp 177

184, 2006
.

[16]

A
.

Maratea and A
.

Petrosino,

An Heuristic Approach to Page
Recommendation in Web Usage Mining

, Ninth International
Conference on Intelligent Syst
ems Design and Applications, pp.
1043
-
1048, 2009.

[17]

V.

V.

R.

M
.

Rao and V. V
.

Kumari, “An Efficient Hybrid
Successive Markov Model for Predicting Web User Usage
Behavior using Web Usage Mining”, International Journal of Data
Engineering (IJDE), V
OL.

1
,

Issue

(5), pp.43
-
62, 2011.

[18]

A.

Anitha, “A New Web Usage Mining Approach for Next Page
Access Prediction”, International Journal of Computer
Applications V
OL.

8
,

No.11, pp.7
-
10, October 2010.

[19]

M
.

Jalali, N
.

Mustapha, Md. N
.

Sulaiman and A
.

Mamat
,

WebPUM:

A Web
-
based recommendation system to predict user
future movements”,
Expert Systems with Applications
,

VOL.

37
,

Issue 10,

pp. 6201

6212 , 2010.

[20]

B
.

Nigam and S
.

Jain, “Generating a New Model for Predicting the
Next Accessed Web Page in Web Usage Mining”, Th
ird
10


International Conference on Emerging Trends in Eng
ineering and
Technology, India
, Goa,

pp.485
-
490
,

2010.

[21]

A.

Anitha and N.

Krishnan,

A Web Usage Mining based
Recommendation Model for Learning Management Systems
”,

Computational Intelligence and Computing Research (ICCIC)
IEEE International Conference
, 2010.