Evolving semantic web with social navigation

blaredsnottyAI and Robotics

Nov 15, 2013 (3 years and 9 months ago)

111 views

Evolving semantic web with social navigation
Ghassan Beydoun
a,
*
,Roman Kultchitsky
b
,Grace Manasseh
b
a
School of Information Systems and Economics,University of Wollongong,Australia
b
Department of Computer Science,American University of Beirut,Beirut
Abstract
The Semantic Web (SW) is a meta-web built on the existing WWW to facilitate its access.SW expresses and exploits dependencies
between web pages to yield focused search results.Manual annotation of web pages towards building a SWis hindered by at least two
user dependent factors:users do not agree on an annotation standard,which can be used to extricate their pages inter-dependencies;and
they are simply too lazy to use,undertake and maintain annotation of pages.In this paper,we present an alternative to exploit web pages
dependencies:as users surf the net,they create a virtual surfing trail which can be shared with other users,this parallels social navigation
for knowledge.We capture and use these trails to allow subsequent intelligent search of the web.
People surfing the net with different interests and objectives do not leave similar and mutually beneficial trails.However,individuals in
a given interest group produce trails that are of interest to the whole group.Moreover,special interest groups will be higher motivated
than casual users to rate utility of pages they browse.In this paper,we introduce our systemKAPUST1.2 (Keeper And Processor of User
Surfing Trails).It captures user trails as they search the internet.It constructs a semantic web structure fromthe trails.The semantic web
structure is expressed as a conceptual lattice guiding future searches.KAPUST is deployed as an E-learning software for an undergrad-
uate class.First results indicated that indeed it is possible to process surfing trails into useful knowledge structures which can later be
used to produce intelligent searching.
￿ 2006 Published by Elsevier Ltd.
Keywords:Cooperative systems;Intelligent searching;Semantic web;E-learning application;Interactive knowledge acquisition;Formal concept analysis
application;Machine learning application
1.Introduction
Social navigation (Dieberger,1997) is an aspect in our
daily life and a very efficient social mechanism for acquir-
ing knowledge;examples are:asking for directions on the
road,consulting family members for advice,calling a doc-
tor when feeling ill,or meeting with colleagues at university
to discuss a research topic.The essence of social navigation
is that people keep track of experiences and contexts
regardless of whether or not those contexts/experiences
have been useful to a past goal of theirs.Harnessing social
interactions to information retrieval on net enables con-
sulting and benefiting from other people experiences.For
example,a shopper looking to buy a new car,searching
car advertisements or online motor shops,can greatly ben-
efit from someone else’s car shopping experience,even
though they both might be looking for cars with different
characteristics.Aperson who has just bought a car has lots
of useful advice to a prospective buyer,e.g.the former can
at least point out models and their expected prices.If the
prospective buyer could go online and see the pages that
other people have found useful,it would make his/her
search a lot easier.This social navigation inspires our
approach,described in this paper,towards building a
semantic web incrementally and in a distributed and collec-
tive way.
In our view,web-surfing experience of people is benefi-
ciary and should be stored and reused,this requires a user
friendly exposure of the experience that all people can
understand.We represent surfing experiences as surfing
0957-4174/$ - see front matter ￿ 2006 Published by Elsevier Ltd.
doi:10.1016/j.eswa.2005.11.035
*
Corresponding author.
E-mail addresses:g.beydoun@unsw.edu.au,ghassan.beydoun@unsw.
edu.au (G.Beydoun),dk09@aub.edu.1b (R.Kultchitsky).
www.elsevier.com/locate/eswa
Expert Systems with Applications 32 (2007) 265–276
Expert Systems
with Applications
trails.These are virtual surfing trails created by users as
they surf the net and these are typically lost current brows-
ers.We introduce our system,KAPUST1.2 (Keeper And
Processor of User Surfing Trails).It captures user trails
and it organizes the trails according to the browsing topic.
KAPUST then processes intersections of trails into a
knowledge base (a conceptual lattice).This later allows
intelligent search of the web.
We advocate building a SWincrementally created by the
users themselves,where no intermediate expert is needed,
and ontologies are not predetermined.Users provide their
topic of interest,within their interest group,and begin
browsing web pages.Submitted topics of interest and trails
left behind by users form the raw information to constitute
the SW structure,and determine how it evolves.We pro-
cess this raw information using machine learning.Our
approach is inspired by the work of Wexelblat and Maes
at the MIT Media Lab,which highlighted the importance
of tracking user trails and interactions to benefit from
interaction history for information navigation (Wexelblat,
1999).We expand that idea by applying Formal Concept
Analysis (FCA) (Ganter & Wille,1999) to reason about
the traces instead of only displaying and browsing trails
as in Wexelblat (1999).Our system KAPUST1.2 collects
and stores user session naming and their trails using a
web browser plug-in interface,this as well connects the
browser and the reasoning component of KAPUST1.2.
The reasoning component has a retrieval knowledge base
(the actual SW) which integrates users knowledge scattered
in their left-behind surfing traces.We tested our approach
for E-learning in a class environment in an undergraduate
course at the Political Science and Public Administration
Department at the American University of Beirut,PSPA289:
Information Technology and Public Administration.Our
results in that domain illustrate the utility of user trails to
allow intelligent web searching.
This paper is organized as follows:Section 2 overviews
related work to our system KAPUST,Section 3 describes
the architecture of KAPUST.It details steps involved start-
ing from the interface,which collects user traces,to creat-
ing knowledge out of the traces and making use of this
knowledge in later searching.Section 4 describes the dis-
tributed deployment of KAPUST.Section 5 describes
experiments and results of using KAPUST for E-learning
in an actual university environment.Section 6 discusses
results and concludes with future plans for the work in e-
markets.
2.Related work and our approach
The SW enables automated intelligent services such as
information brokers,search agents,information filters
etc.The first step to realize the SWis making data smarter
using languages such as XML,XAL (Lacher & Decker,
2002) and HTML-A (Benjamins,Fensel,Decker,& Perez,
1999).A further step is creating ontologies to guarantee
interoperability between ‘‘smart data’’ and to allow infer-
ence over for the ‘‘smart data’’.Towards this,many tech-
nologies were developed for example:OIL (Ontology
Inference Layer) (Davies,Fensel,& Harmelen,2003),
DAML (Darpa Agent Markup Language) (Hendler,
2001),Haystack (Quan,Huynh,& Karger,2003) or Onto-
Pad (Benjamins et al.,1999).Other tools to manage collab-
orative ontologies sharing and creation were also
developed such as OntoEdit (Sure et al.,2002) for combin-
ing methodology-based ontology development with capa-
bilities for collaboration,or Annotea for a web based
shared annotation for RDF documents (Kahan,Koivune,
Prud’Hommeaux,& Swick,2001).Regardless of whether
the tool is collaborative or not,manual marking up is still
an essential component of current methods and tools to
build the SW.Marking up is a cumbersome process that
might be accompanied by errors and is time consuming.
In this paper,we explore a novel approach using a
machine learning technique,Formal Concept Analysis
(FCA) (Ganter & Wille,1999) and only requiring users
to name their browsing session—rather than high load
annotation of pages.Pages visited are only labeled as good
or bad hits (with an optional weight).This imposes very lit-
tle effort on users.Our system described in this paper is in
fact an example of an interaction system.It captures users
behaviors and stores them for analysis and reasoning.In
capturing user traces,it is similar to Laus (2001) and to
Wexelblat (1999) and Wexelblat and Maes (1999) which
store interactions history on a user basis.
Interactions systems differ in the assumptions they make
about users and what kind of interactions is logged and
used.How users interact while browsing the web behavior
is affected by several factors related to the user himself,the
tool being used and the domain under study.The kind of
information of interest to us in this paper is the internet
parallel of social navigation,which is human interactions
in pursuit of information gathering (Dieberger,1997).
In our work,we use user trails to model unintended and
indirect social navigation over the web:users are not inten-
tionally helping each other (e.g.following footsteps in a
forest) and they do not directly communicate (Forsberg,
Hook,& Svensson,2001).In the process we build a com-
plex information space (the SW),where we analyze traces
left by users,using a Formal Concept Analysis (Ganter &
Wille,1999) algorithm.This has been found efficient when
applied to document retrieval systems for domain specific
browsing (Kim&Compton,2001).Our approach is similar
to Footprints (Wexelblat,1999;Wexelblat & Maes,1999),
where a theory of interaction history was developed and a
series of tools were built to allow navigation of history rich
pages and to contextualize web pages during browsing.
However,unlike our approach,it does not use history to
make recommendations and nor does it have a reasoning
technique embedded in it.Following Forseberg’s (Forsberg
et al.,2001) characterization of design issues for any elec-
tronic social navigation system,our system KAPUST
adheres to the following four:First is integration:
KAPUST is integrated in an internet browser.Second is
266 G.Beydoun et al./Expert Systems with Applications 32 (2007) 265–276
presence of other users,which is a necessity for KAPUST;
otherwise,social navigation is not applicable.Third is trust
of the source of information,this varies in importance
according to the domain of study,KAPUST has an
authentication step which can become mandatory if
required.Fourth and last is privacy of the advice giver,this
is handled during the deployment process of KAPUST.
Some more useful properties usedby Wexelblat andMaes
(1999) to generally characterize the problem space of inter-
action history systemare also characteristic of KAPUST:
1.Proxemic versus dystemic:A proxemic space is one that
the users feel to be transparent,where they do not have
to put extra effort to understand the signs and structures
used.Conversely,a nonproxemic (i.e.dystemic) spaces
are opaque—KAPUST interface is simple to use.A user
familiar with an internet browser can immediately use it.
It is therefore said to be providing a proxemic space (i.e.
it is easy to learn how to use it because its interface is
familiar).
2.Active versus passive recording of the interaction history
(from the point of view of the user).In KAPUST,the
recording is passive.
3.Rate/Form of change of visited data as interaction his-
tory information builds up.KAPUST currently assumes
that the URLs do not change or become obsolete after
they are visited.
4.Degree of permeation between the history information
and the object it represents.History information can
be attached too tightly to the object by being embedded
in it,or stored as a separate document.In KAPUST’s
database,the traces are stored without the web pages
themselves;only URLs to web pages are stored.
5.The kind of information collected which depends on the
domain the users are interested in and what they are try-
ing to accomplish.Our KAPUST’s architecture is
domain independent.We later use it in an E-learning
environment in an undergraduate political science class.
KAPUST has a dictionary module to prohibit entering
invalid words (see Section 3.1 later).
In the next section,we present details of our approach in
KAPUST,we illustrate its architecture and describe techni-
cal details.
3.Constructing the semantic web with KAPUST1.2
It is of high interest to learn fromcolleagues of the same
community because of common interests and aims.Our
system KAPUST1.2 (see Fig.2) converts surfing traces of
users in the same community into a Semantic Web.For
example a group of students sharing an assignment prob-
lem usually discuss the assignment topic meeting face to
face every day at university.Using KAPUST1.2,it will be
as if those students are discussing their thoughts online
so that not only the students of their class will benefit from,
but also students who will take the same class in following
semesters.
User traces are stored as a sequence of URL of pages
that users visit in a browsing session when a user from par-
ticular interest group is searching for a specialized topic.
For example,in E-learning,users are students who pro-
vide one or more keywords to identify their search domain
at the beginning of each session (Fig.1).Their trails con-
sist of sequence of URL annotated by the session title
word(s) entered at the beginning.Web page addresses
and session title keywords are the building blocks for
our SW.Initially entered words are checked against dictio-
nary of existing set of keywords in the database.This min-
imizes the redundancy of keywords (e.g.synonyms) and
corrects any syntactical errors by users.The evolved SW
structure gives authenticated users recommendations in
the form of categorized web page links,based on session
keywords.In addition,they can browse any notes added
previously by authorized users,who can also rate a page
relevancy as Poor,Not Bad,Good,or Excellent.Page notes
and their ratings provide another level of knowledge shar-
ing between users.As user trails are accumulated,brows-
ing sessions begin to intersect one another to create new
knowledge.For example,while a student A searches web
course notes for pages about the ‘‘Public Sector’’,she
comes across a web page p1,which has been visited by stu-
dent X but it was related to ‘‘IT,E-learning’’ in his session.
This creates new knowledge relating ‘‘IT,E-learning’’ and
‘‘Public Sector’’ due to intersection in the corresponding
trails.
3.1.KAPUST1.2 user interface
KAPUST architecture (Fig.2) has two components:an
extensive interactive part (visible to the user),and a reason-
ing/knowledge creating part (invisible to the user).In this
section,the visible user interface is detailed (the reasoning
component is detailed in the next section).
KAPUST user interface main role is gathering user trails
and providing feedback for the users from the SW con-
structed.It is implemented as a browser plug-in.It has
these three modules:
...
Web Page 2
Web Page 1 Web Page n
Student X
Session name
=
“IT, E-Learning”
Fig.1.User trail:A student X searching for web pages related to IT and E-learning:he logs in,enters session title,and browses for related articles.Web
pages 1 to n will be given under any session with title keywords IT and E-learning.
G.Beydoun et al./Expert Systems with Applications 32 (2007) 265–276 267
1.The login/logout module (Fig.3) Tracing does not occur
unless,the user is validated by this module (see Fig.4 for
the lifecycle of a user session).The user then identifies
the session by a name containing one or more keyword
word.For example,in testing our tool in an E-learning
environment,students are expected to identify the topic
and the question,as they do their research assignment in
any browsing session.Abrowsing session is delimited by
a login and a logout.Use of session identifiers during
reasoning is detailed in Section 3.2.Fig.5 displays a
flowchart of these steps.
2.The ‘‘View Recommended Pages’’ module shows search
results as lists of rated and categorized links.Section 3.2
describes later how addresses of these web page are
View Recommended
Login / Logout
Enter Session Keywords
Dictionary
FCA
Conceptual
Lattice
Query concept
Add Notes and Ranks
Display Notes and Ranks
User Trails
Export Utility
Trigger Lattice Update
Delta & Full
XML Files
Inference
Engine
FCA
Algorith
|
Reporting Module
Fig.2.KAPUST1.2 architecture.
Fig.3.Login module (left frame of the window).
No
Suggestions
Found
Suggest
Valid
Initialise Trace
Authenticate
U
Check dictionary for
similar name
Login Screen: get user ID,
session name
User browses and marks
good pages
Log out to exit or start a
new session
Propose
suggestions
Open Annotation
Explorer Bar
Fig.4.Life cycle of a user’s session.
268 G.Beydoun et al./Expert Systems with Applications 32 (2007) 265–276
retrieved.To visit any of them,users may click on their
links.This is where users benefit fromtraces of other users
in their community.In our E-learning example (detailed
later),students get to exploit views of each other.They
might findtheir quests inthose links,or they cangoahead
and continue searching through other pages and contrib-
ute to the traces database for the benefits of future users.
3.The Dictionary module minimizes syntactic mistakes in
session keywords and detects synonymous is applied.
The SOUNDEX function is a built-in utility in the
Microsoft SQL Server 2000.SOUNDEX converts an
alpha string to a four-character code to find similar-
sounding words or names.The first character of the code
is the first character of character expression and the sec-
ond through fourth characters of the code are numbers.
Vowels in character expression are ignored unless they
are the first letter of the string.The DIFFERENCEfunc-
tion compares the difference of the SOUNDEX pattern
results and returns a number between 1 and 4.Number
4 represents the least possible difference between the
two strings.We use this number to determine if a similar
string exists (see Table 1).For example,applying SOUN-
DEX on McDonnell gives the code M-235.Suppose a
keyword ‘IT projects’ exists in the database in the list
of keywords and the user starts a new search session
and enters any of ‘IT’,‘IT-projects’ or even ‘IT-projects’.
He would get a suggestion telling himthat a similar key-
word ‘IT projects’ already exists.He can then either use
the suggested keyword or ignore it.
In our Semantic Web,search ontologies are evolved
from the keywords that users enter to name their browsing
sessions at the start.These keywords relate to users search
domains.How these naming keywords are transformed
into a semantic web is described in the next section.
3.2.From user trails to Semantic Web
KAPUST turns user traces into structured knowledge,
in the form of a conceptual lattice,using FCA reasoning.
This involves two steps:a matrix table is constructed show-
ing keywords that each page satisfies,a conceptual lattice is
then assembled from the matrix table,as detailed next.
3.2.1.Formal Concept Analysis (FCA)
FCA is a mathematical theory (Ganter & Wille,1999)
modeling concepts in terms of lattice theory.FCA starts
with a context K =(G;M;I),where G is a set whose ele-
ments are called objects,M is a set whose elements are
called attributes,and I is a binary relation between G and
M[(g;m) 2 I is read ‘‘object g has attribute m’’].Formalized
concepts reflect a relation between objects and attributes.A
formal concept,C,of a formal context (G;M;I) is a pair
(A,B) where A G is the set of objects (extent of C) and
B M is the set of attributes (intent of C).The set of all
formal concepts of a context Ktogether with the order rela-
tion 6is a complete lattice,F (G,M,I):For each subset of
concepts there is always a unique greatest common subcon-
cept and a unique least common superconcept.In
KAPUST,web page URLs formG,the set of objects.Key-
words of session names form M,the set of attributes.A
concept in the resulting conceptual lattice is formed of a
set of page URLs as the extents and a set of keywords as
the intents.Concepts can be a result of either a single user
session or multiple sessions that intersect each other.
Fig.6 displays an example of three different user-ses-
sions that share some common web pages in their trails.
For example,‘‘WebPage1’’ is visited by users A and C hav-
ing different keywords identifying their session.This indi-
cates the creation of a new concept in the lattice as a
result of the intersection between their sessions.The new
concept will have ‘‘WebPage1’’ as its page set and ‘‘IT,
Political Science,Technology’’ as its keyword set.The
A page is
visited
Show a page info
summary and a link to
any note
Display the
cumulative
rating
Prerequisite:
User has opened
the explorer bar and started a
browsing session
On-click on a link
the note appears
Take URL, display and
update any info
User adds a note
User rates page
Fig.5.Flow chart for visiting a web page.
Table 1
SOUNDEX coding guide:same numbers are used for similar sounding
characters
The number Represents the letters
1 B,P,F,V
2 C,S,K,G,J,Q,X,Z
3 D,T
4 L
5 M,N
6 R
WebPage 1
User A Session =
“IT”
User B Session =
“IT Projects, Public
Sector”
WebPage 2
WebPage 3
WebPage 4
WebPage 2
WebPage 5
WebPage 4
WebPage 1
WebPage 6
User C Session =
“Political Science,
Technology”
Fig.6.Example of user trails.
G.Beydoun et al./Expert Systems with Applications 32 (2007) 265–276 269
concept having ‘‘WebPage1’’,‘‘WebPage2’’,‘‘WebPage3’’
as its page set and ‘‘IT’’ as its keyword set,is an example
of a concept resulting from a single user,A,session.
Set of names and web page URLs are extracted from
user traces and input to the FCA engine as XML docu-
ments.Traces are stored in a database together with the
relations that exist between each web page URL and its
session name.At this stage the input to the FCA engine
contains all user traces collected so far.As more traces
are collected,they are incrementally fed to the FCA engine
to update the existing matrix table with any new web page
URLs and keywords (Fig.7).
On receiving the initial matrix table and every time the
FCA engine is updated with new traces,it reconstructs
the conceptual lattice.This is computationally expensive
and may take few minutes to complete.We update the
matrix table and the lattice on a weekly basis.This has
been sufficiently efficient in our E-learning application,
since students are using KAPUST to do a weekly assign-
ment.The lattice generation can be scheduled to run daily
instead of weekly in case of higher usage of KAPUST.
KAPUST subsequent use of the generated lattice for query
management,and intelligent interface are described in the
next sub-section.
3.2.2.KAPUST Query handling
Our querying algorithm takes a user’s query (set of key-
words) and the conceptual lattice as input.It returns as
output,the web page links that best match the search query
(Figs.7 and 8).
Alattice,L,is a tuple of two sets,(P
i
,K
i
),where P
i
and K
i
as sets of page links and keywords respectively (Fig.7).To
illustrate query processing by KAPUST,we denote the set
of potential concepts that match the user request,together
with their priorities as PotentialConcepts ={(P
i
,K
i
),
Priority},where Priority determines how relevant the con-
cept is to the user’s query.We take it as the depth level of
a concept (P
i
,K
i
) in the conceptual lattice in case a matching
concept is found.Otherwise,we take it as a measure of how
many keywords from the set of keywords entered by the
user at login,UK,exists in a concept (P
i
,K
i
).Referring to
Fig.8,the algorithm has the following steps:
Step 2 in the process prunes the set of keywords entered
by the user by removing new keywords that do not exist
in the concept lattice.
Step 3 checks for a concept that has the exact set of
pruned keywords.
Step 4 handles the case were no matching concept is
found.In this case,all concepts that have one or more
keyword in their set of keywords that matches any key-
word in UK’ are added as potential concepts.The prior-
ity is taken as (CountUK–CountK) to give highest
priority to the concepts that have more matching key-
words.If a concept has 2 matching keywords and the
set UK’ has 3 keywords in its list,the priority will be
1,which is higher than a concept that has 1 matching
keyword where the priority will be 2.
Steps 5 and 6 consider the case where a matching con-
cept is found.They add superconcepts and/or subcon-
cepts of the matching concept.Subconcepts will have a
higher priority than the superconcepts because they
are more specialized.The most general and most specific
concepts are not considered as super or subconcepts.If
no super or subconcepts are found,the matching con-
cept itself is added to the potential list of concepts.
Step 7 orders the potentialconcepts for display.
Step 8 divides the category and the page links under
each category then retrieved the average rating for each
page link to be displayed for the user.
The strategy of choosing super and subconcepts gives
the user a better perception and a wider amount of relevant
information.Subconcepts contain all extents of the concept
( {Hypertext, IT Projects},
{Page1, Page2, Page5, Page8} )
( {Hypertext, IT Projects,E-government},
{Page1} )
( {Hypertext, IT Projects, Virtual Agency},
{Page5} )
( {Hypertext},
{Page1, Page2, Page5, Page6,
Page7, Page8} )
({IT Projects},
{Pag1, Pag2, Pag3, Pag4, Page5,
Page8} )
( {Hypertext, IT Projects, Virtual Classroom}
{Page2} )
Fig.7.Conceptual lattice:An E-learning example.
270 G.Beydoun et al./Expert Systems with Applications 32 (2007) 265–276
itself.This allows us to categorize the page-sets one level
deeper.Back to our example in Fig.8,suppose the student
logs in and enters keywords ‘‘Hypertext,IT projects’’,there
is a matching concept containing exactly those keywords.
Instead of getting the following results:
‘Hypertext,IT projects!WebPage1,WebPage2,Web-
Page5,WebPage8,’
KAPUST gives the student a better insight about those
web pages with the following recommendations:
Hypertext,IT Projects,E-government:WebPage1
Hypertext,IT Projects,Virtual Classroom:WebPage2
Hypertext,IT Projects,Virtual Agency:WebPage5
Hypertext:WebPage6,WebPage7,WebPage8
IT Projects:WebPage3,WebPage4,WebPage8
Degree of generality of a concept on a page and its pri-
ority are inversely related.Recommended pages are shown
to the user in order of decreasing priority.A page is dis-
played once unless,it belongs to n concepts on the same
level of generality in the lattice,it will be displayed n times.
The next section presents results of employing KAPUST in
an E-learning environment.
4.KAPUST1.2.for E-learning
A web server is needed to set up and store the semantic
web component of KAPUST.A database server (SQL) is
also required to store traces and execute the associated
FCA engine.A client machine is designated as administra-
tor and accesses the FCA engine fromany machine.Traces
Fig.8.Processing queries using the lattice.
G.Beydoun et al./Expert Systems with Applications 32 (2007) 265–276 271
between clients and KAPUST server are transferred in
XML.In our E-learning environment,students can do
their assignments from home.They are given an installa-
tion package for the client side of KAPUST.In this sec-
tion,we discuss and present our deployment of KAPUST
in an E-learning environment.We first describe specific fea-
tures of the domain of E-learning.
4.1.KAPUST for Learning
E-learning refers to the systematic use of networked
information and communications technology in teaching
and learning (Horton & Horton,2003).The emergence
of E-learning is directly linked to the development of
and access to information and communications technol-
ogy infrastructure.Distant academic learning is one
important application.E-learning techniques in the corpo-
rate world are also often used for residential workshops
and staff training programs.E-learning is flexible,rela-
tively cheap and supplies ‘‘just in time’’ learning opportu-
nities.
Creating an on-line collaborative learning environment
is a necessary aspect for E-learning.Creating a sense of
community and understanding the on-line behaviors of
the participants are also crucial (Blunt & Ahearn,2000).
Several efforts have been made to create such environ-
ments.Notably,in George Mason University under the
Program on Social and Organizational Learning (PSOL),
research is being done to create and maintain a Virtual
Learning Community for the participants in the program.
The purpose of that research is studying the learning of
the community within the developed environment and a
better understanding of the dynamics of collaborative dia-
logue to enable more informed and sound decision making
(Blunt & Ahearn,2000).
As an experimental workbench,we deploy KAPUST to
provide an E-learning environment for undergraduate class
in the Political Science and Public Administration Depart-
ment at the American University of Beirut directed by Pro-
fessor Roman Kultchitsky.Since this is an experimental
setting,the presence of the professor is maintained
throughout the semester.Students are free to use KAPUST
at home and in some lab sessions.
The class has 12 students from various backgrounds.
The course given,PSPA 289:Information Technology and
Public Administration,is a senior seminar course open to
all majors.It focuses on the impact of IT on various
aspects of public administration and policy around the
world.PSPA289 students use KAPUST as part of their
weekly assignments to answer assigned research questions.
The system’s database and website are deployed on an
online server at the university.A setup package to install
the internet browser add-on utility,together with a video
manual regarding the tool is arranged and distributed to
each student so he/she can install and use it from home.
For each of the first 12 weeks of the semester,the professor
gives a research assignment on a new topic.Students are
required to use KSPUST either from their homes or the
common computer lab at the university,to browse for
related articles and answer their assignment questions.
For each question,students choose one or more keywords
from the domain of the assignment question and enter
them as session keywords before browsing any web pages.
All visited web pages in this session are later annotated
with these keywords.After the student logs in,he searches
for articles to answer the question under study.For the stu-
dents,each assignment question is handled as a distinct
browsing session by the annotation tool (refer to Fig.3
to recall the life cycle of a user’s session).At the session
login the student chooses a new set of keywords represent-
ing question 2.
4.2.Data collected and observations
Browsing traces are collected on a weekly basis.Table 2
displays information about those traces.Fig.9 explains
how the tool behaves as the E-learning semantic web
matured over 12 weeks of experimentation (much the fall
semester of 2003 at AUB).As seen in Table 2 and Fig.9,
the annotation tool successfully collected user traces over
the 12 weeks of experimentation.A large number of traces
were collected.The rate of collecting traces increased over
the last few weeks.We attribute this increase to the deploy-
ment and use of the lattice structure.Using the conceptual
lattice gave the students a kind of motivation to work
harder and helped in their learning process.It is encourag-
ing to get relevant links directly to the point one is
Table 2
Pages visited per week
Week 1 Week 2 Week 3 Week 4 Week 5 Week 6
#of pages visited 77 30 23 2 18 9
#of crossing pages 9 0 0 0 0 0
Week 7 Week 8 Week 9 Week 10 Week 11 Week 12
#of pages visited 15 4 11 26 78 0
#of crossing pages 0 0 0 0 6 0
This table shows the number of pages visited by users per week,as well as the number of crossing pages per week (pages that have been visited by two
different users).Crossing pages give us a measure of how much sharing of knowledge is occurring.This table was constructed from a Statistics database
table created particularly for the purpose of analyzing the traces.
272 G.Beydoun et al./Expert Systems with Applications 32 (2007) 265–276
searching for.Moreover,students started to cross one
another’s trails in Period D.This shows the power of the
inference engine of the tool.The inference engine started
taking action during this period to query the conceptual
lattice and present recommendations to the students
according to their search criteria.
All throughout the term of 12 weeks,the tool proved
itself as an essential ontology builder application.Ontolo-
gies were automatically collected from user’s trails and a
database of ontologies related to the domain of informa-
tion technology and its impact on political and public sec-
tors has been created.Though the generic creation of
ontologies has a tendency to result in a lot of noise and syn-
tactical mistakes in the keyword entry process.In our case,
this problem was overcome by consulting a dictionary
module of ontologies.The ontologies used by the dictio-
nary module are the same ones that the system has col-
lected from user traces.Using the dictionary module to
validate students’ search criteria minimizes the possibility
of keyword redundancy and mistakes.These ontologies,
which are in the formof keywords,will be discussed further
in the following section while discussing the lattice
structure.
4.3.Conceptual lattice construction
Traces (in XML exported files (Delta and Full)) are col-
lected on a weekly basis.The first full exported file is fed to
the inference engine during Period C (Fig.9) to generate
the first version of the conceptual lattice.Sixty user sessions
are collected,112 notes and 156 ratings are made.Fig.10
shows how the conceptual lattice evolves during Period D
by measuring the number of keywords,pages and concepts;
concepts are scattered across eight levels (including most
specific and general concepts).The conceptual lattice is
finally formed of 255 pages,98 keywords and 109 concepts.
Table 3 displays characteristics of concepts at each of level
of the final lattice.Recall that a concept is formed of a set
of pages and a set of keywords (characterizing the session),
level 1 contains the most general concept (see Table 2).This
concept is formed of all the pages in its page set and con-
tains the empty set in its keyword set.At level 8 we have
the most specialized concept.This concept contains the
empty set in its page set and all the keywords in its keyword
set.
The fact that the lattice has several levels indicates
knowledge sharing by the students.It also means that the
E
D
B
C
A
1 2 3 4 5 6 7 8 9 10 11 12
W
E
E
K
#
Fig.9.Behavior of the tool over the 12 weeks:Period A represents the testing phase where students were getting familiar with the tool.In Period B,we
perceive trace collection and building up of the semantic web structure,and knowledge sharing between students through the notes and rating features of
the annotation tool.Period C represents the maturity of the semantic web structure and the construction of the first conceptual lattice.In Period D,the
functionality of the systemas a whole is observed.In this period,the semantic web structure is continually updated online,the matrix table and the lattice
structure are incrementally updated and constructed on a weekly basis,students continue to share their experiences,and most importantly students are
given recommendations for each new browsing session they initiate.Period E represents the ending of experimentation and collection of data.
Keywords, Pages & Concepts of the Conceptual
Lattice per Week
0
50
100
150
200
250
300
Accumulated Number
of Keywords
51 72 79 94 98
Accumulated Number
of Pages
116 154 178 240 255
Total Number of
Concepts
59 78 86 104 109
Week 8 Week 9 Week 10 Week 11 Week 12
Fig.10.Analysis of conceptual lattice evolution:The accumulated number
of keywords and pages come form the Matrix Table,while the total
number of concepts comes form the Lattice Structure.The update of the
lattice from week 9 till week 12 came from the exported traces showing
additional browsing experience (so called ‘Delta’ traces).
Table 3
Analysis of conceptual lattice by level:the number of keywords per
concept increases—in reverse to number of pages,as we move towards the
more specialized concepts
#of concepts#keywords
per concept
#pages per
concept
Level 1 (general) 1 0 255
Level 2 59 1 to 2 1 to 22
Level 3 21 2 to 3 1 to 12
Level 4 16 3 to 5 1 to 9
Level 5 6 5 to 8 1 to 6
Level 6 4 8 to 9 1 to 4
Level 7 1 12 1
Level 8 (specific) 1 98 0
G.Beydoun et al./Expert Systems with Applications 32 (2007) 265–276 273
tool has been successful in mapping concepts with the cor-
responding pages from different user trails to generate new
concepts on various levels of the lattice structure.At level
7,the second most specialized concept,we have one con-
cept which refers to the Google search engine in its page
set and which has 12 satisfied keywords.This concept is
not too relevant,because the Google search engine is not
a valid web page.It is rather a general site.This concept
resulted from improper usage of the tool and from testing
or noise data during Period A(see Fig.9).Constructing the
lattice (Fig.11) becomes time consuming as the number of
concepts increase (around 5 min for 109 concepts).This is
not crucial in our case,since we are performing the updates
on a weekly basis without hindering the students from
using the tool.However,if the system is to be deployed
in another domain where concepts grow fast and the lattice
has to be updated more frequently,then our FCA algo-
rithm may need to be reconsidered.
4.4.Discussion of results
Collected results are evidence for suitability of reasoning
over user traces using KAPUST.Our FCA algorithm used
to construct the matrix table and lattice performed well in
our domain.The method we use to query the lattice pro-
vides recommendations for the students in a categorized
way and that gave the students a way to share their knowl-
edge with their fellow students.Querying for a set of nam-
ing keywords where none of the concepts in the lattice
structure contains an exact match,we take each keyword
individually and leave it to the user to judge,this is like
an ordinary keyword search thus not fully benefiting from
the conceptual lattice.
Deploying KAPUST another semester,we expect a
more complex lattice to be constructed.Moreover,students
in the following semester will get even more benefit from
the tool,as they will be searching an existing lattice struc-
ture.This will further show the benefits of this approach in
an E-learning environment through knowledge sharing
among students who come from different generations.
Most concepts in our lattice are at level 2 which is
too general.Several factors relating to the nature of our
E-learning environment have led to this:similarity in key-
words and the fact that a new topic,and that a research
topic was introduced each week.Comparative research
assignments produced a deeper lattice.This leads to an
important question,how the research task assigned to the
students impact the development of the conceptual struc-
ture.We are planning to explore the relationship between
‘kinds’ of research assignments and semantic web develop-
ment.As finding trails for a given ‘‘topic’’,one can succeed
with a relatively high relevance by simply searching with
appropriate keywords.The problem is to find the web
pages that contain the answer on the given ‘‘question’’
from the given ‘‘topic’’.A possible extension to KAPUST
towards this,currently explored by one of the authors
(Prof.Kultchitsky),is to recognize a page as a collection
of paragraphs,where each paragraph has its own keyword.
This will be a major extension to KAPUST.
In its current form,KAPUST provided an E-learning
environment which in turn provided the students and the
professor involved means of sharing information and expe-
riences without requiring any paper work.Even during the
early stage of evolution of the semantic web for the class,
the tool still contributed to the learning process.It assisted
in collecting web pages under the domain of IT,public
administration,and political science and created a struc-
ture out of these web pages which accelerated the learning
of the class.At the pedagological level use of KAPUST
merges the reading and writing processes;and a whole
new dimension of implicit discourse between students
evolves on top of the original documents that lend them-
selves to new multiple reading pathways,and non-hierar-
chical and polyarchic structures without ruining the
integrity of any original research texts.
5.Summary and future work
In this paper,we developed and explored a technique to
help people mine the Internet more effectively.Instead of
categorizing text based search results,as some tools
attempt,our work develops categories based on users from
similar community and interests.Trails of users of similar
interests are processed to create a semantic web which is
then used to provide intelligent search responses.Our
approach to develop semantic web is incremental,building
meta-data describing the relationships amongst web pages.
We exploited these dependencies in an intelligent search
engine to yield a more focused search result than a stan-
dard text based search.We developed an add-on tool to
work with existing internet browsers.Our tool collects
users’ traces and generates a conceptual structure from a
collection of similar traces.Our testbed community of users
is a class of undergraduate students.Our tool provided
an infrastructure for an E-learning environment.Our
Time needed per number of concepts to construct
the lattice
0
50
100
150
200
250
300
350
Number of concepts 59 78 86 104 109
Time to construct
the Lattice Structure
(in sec)
83 146 181 266 292
Week 8 Week 9 Week 10 Week 11 Week 12
Fig.11.Relative time needed to construct an increasing size lattice.
274 G.Beydoun et al./Expert Systems with Applications 32 (2007) 265–276
approach is innovative in that we use information which is
typically ignored by other intelligent browsers.In current
browsers,e.g.Explorer,the virtual surfing trail is not com-
monly exploited and its use is limited allow backtracking in
a browsing session by a single user.The surfing trail is not
used to improve searching in future browsing sessions by
other users.
We have borrowed several rules from psychological
studies of social navigation on what makes a good naviga-
tion system:we considered the integration of the tool with
the browser,the presence of a community to share knowl-
edge,and the proxemic space to provide a transparent envi-
ronment for the users.This has further made our tool
easier to use.Our tool requires very little training.Anyone
familiar with using the internet browser can directly recog-
nize its functionalities.
Our approach is non-intrusive to users and incremental
and adaptive to any changes in the knowledge collected
fromthe users.We combined ideas frommanual ontologies
engineering and from automatic approaches based on
machine learning and data mining.We bypassed the need
for manual annotation of web pages.Our hypothesis in this
project has been that individuals in a given interest group
produce surfing trails that are of interest to the whole
group.Our software and our experiments have shown that
our hypothesis is indeed true.Individuals of our interest
group,students studying political science,produced surfing
trails that are of interest to the whole group.Students inter-
actions with the course notes have incrementally evolved
into a Semantic Web,describing the relationships amongst
contents of their course notes and ideas.These dependen-
cies were utilized by our intelligent search engine to yield
a more focused search result,for the students themselves
to use as they learn.The whole class was collectively learn-
ing and evolving as the subject progresses.Relationship
between the pages and students notes was automatically
discovered by our intelligent software running in the back-
ground,ensuring that the technology we used remains non-
intrusive and easy to use to non-technical students.Our
approach has bypassed difficulties associated with manual
annotation of web pages in two ways:We first lowered
the effort required by restricting the input from the user
to only name the session.This name is then applicable to
all subsequent visited pages.Secondly,we added a dictio-
nary module to ensure that similar names are associated
with the same surfing trails.In our approach users do
not have to agree on use of names;and they can remain
focused on the browsing task.
The automatic part of the system which organizes the
trails and the keywords is based on Formal Concept Analy-
sis.The generated conceptual structure provides to the stu-
dents a user friendly natural presentation of the semantic
web evolved.The conceptual lattice structures the data from
the most general to the most specific concept.It relates con-
cepts to each other based on their intents and extents,thus
providing a means for creating new data that was not
directly perceived from the user trails.Moreover,querying
the lattice is an easy task and it can vary between approaches
to make the most out of the structure.For instance,in our
method,if a user is searching for a certain concept,we pro-
vide him with a categorized result formed of the upper and
lower levels of the concept itself in order to gain more
insights about the extents constituting the concept.
5.1.Future work
Our system,KAPUST,is a formof open multiagent sys-
tem.The external agents are the browsers (students in our
case) and the internal agents are the intelligence building
algorithms and the intelligent interface modules (e.g.dic-
tionary).In this light,our framework for enhancing search
results in response to users evolves the behavior of the elec-
tronic medium (info space) to improve its effectiveness for
the users (students in our case),and is in general terms a
form of multiagent system evolution.This form of evolu-
tion in response to external agents behavior is of impor-
tance electronic markets agent systems.For example,
keeping track of clients preferences in a non-intrusive man-
ner can be used to evolve market access and maintain prod-
ucts lines (service products or otherwise).The non-intrusive
nature of our approach solves the problem of getting feed-
back from customers who may not have any true interest
or inclination to give it.Towards this,we are currently
investigating what kind of information can be obtained
non-intrusively.In Beydoun,Debenham,and Hoffmann
(2004),we looked at evolving e-markets in response to
external evolutionary factors,the processes of other elec-
tronic institutions.Our work in this paper is a formof mul-
tiagent system evolution,in response to external agents.
How the two external evolutionary factors,external agents
behavior and external processes in other e-markets are inte-
grated into a single framework for evolving e-markets is
also current work in progress in Beydoun et al.(2004).
Recently in Beydoun et al.(2005),we have devised mecha-
nisms to evaluate the result of cooperative modeling to
integrate with KAPUST.We are in the process of integrat-
ing those mechanisms to devise a measure of trust users can
invest in the developing conceptual structure.
References
Benjamins,V.R.,Fensel,D.,Decker,S.,& Perez,A.G.(1999).(KA) 2:
Building ontologies for the internet:a mid term report.International
Journal of Human–Computer Studies,51,687–712.
Beydoun,G.,Debenham,J.,& Hoffmann,A.(2004).Integrating agents
roles using messaging structure.In Pacific Rim multiagent system
workshop(PRIMA2004),Auckland,Auckland University.
Beydoun,G.,Hoffmann,A.,Breis,J.T.F.,Be
´
jar,R.M.,Valencia-Garcia,
R.,& Aurum,A.(2005).Cooperative modeling evaluated.Interna-
tional Journal of Cooperative Information Systems,World Scientific,
14(1),45–71.
Blunt,R.,& Ahearn,C.(2000).Creating a virtual learning community.In
The sixth international conference on asynchronous learning networks
(ALN2000),Maryland.
Davies,J.,Fensel,D.,& Harmelen,F.V.(Eds.).(2003).Towards the
semantic web:Ontology-driven knowledge management.Wiley:London.
G.Beydoun et al./Expert Systems with Applications 32 (2007) 265–276 275
Dieberger,A.(1997).Supporting social navigation on the world wide web.
International Journal of Human–Computer Studies,46(6),805–825.
Forsberg,M.,Hook,K.,& Svensson,M.(2001).Design principles for
social navigation support.In C.Stephanidis (Ed.),User interfaces for
all.Stockholm:Lawrence Erlbaum Assoc.
Ganter,B.,& Wille,R.(1999).Formal concept analysis:Mathematical
foundations.Springer-Verlag.
Hendler,J.(2001).Agents and the semantic web.IEEE Intelligent
Systems,16(2).
Horton,W.,& Horton,K.(2003).E-learning tools and technologies.John
Wiley and Sons.
Kahan,J.,Koivune,M.-R.,Prud’Hommeaux,E.,& Swick,R.R.(2001).
An open RDF infrastructure for shared web annotations.In Tenth
international world wide web conference (WWW10),Hong Kong.
Kim,M.H.,& Compton,P.(2001).Formal concept analysis for domain-
specific document retrieval systems.In 14th biennial conference of the
Canadian society for computational studies of intelligence (AI 2001).
Ottawa:Springer.
Lacher,M.S.,& Decker,S.(2002).RDF,topic maps,and the semantic
web,markup languages:Theory and practice.MIT Press.
Laus,F.O.(2001).Tracing user interactions on world-wide webpages.
Psychologisches Institut III.Germany,Westfa
¨
lische Wilhelms-
Universita
¨
t.
Quan,D.,Huynh,D.,& Karger,D.R.(2003).Haystack:A platform for
authoring end user semantic web applications.In The twelfth interna-
tional world wide web conference,Budapest,Hungary.
Sure,Y.,Erdmann,M.,Angele,J.,Staab,S.,Studer,R.,& Wenke,D.
(2002).OntoEdit:Collaborative ontology development for the seman-
tic web.In The first international semantic web conference 2002 (ISWC
2002),Sardinia,Italy.
Wexelblat,A.(1999).History-based tools for navigation.In IEEE’s 32nd
Hawai’i international conference on system sciences (HICSS’99).
Hawai:IEEE Computer Society Press.
Wexelblat,A.,& Maes,P.(1999).Footprints:History-rich tools for
information foraging.In Conference on human factors in computing
systems (CHI’99),Pittsburgh.
276 G.Beydoun et al./Expert Systems with Applications 32 (2007) 265–276