Cs-575

hurriedtinkleAI and Robotics

Nov 15, 2013 (3 years and 11 months ago)

39 views

Web Mining









by:






Katharotiya Manthan








Overview


Web Mining


Semantic Web


Ontologies


Semantic Web Mining


Future Work


References

Problems With Web
Interaction


Finding Relevant Information


Creating New Knowledge using Existing
Resources


Personlization of Information


Learning about Consumers or Individual
Users


Web Mining


The term created by Orem Etzioni
(1996)


Application of

Data mining

techniques



Web Mining into Subtasks


Resource finding


Information Selection and pre
-
processing


Generalization


Analysis

Different Types


Web Usage Mining



Web Content Mining




Web Structure Mining

Data Mining vs. Web Mining


Traditional data mining


data is structured and relational


well
-
defined tables, columns, rows, keys,
and constraints.


Web data


Semi
-
structured and unstructured


readily available data


rich in features and patterns


Web Structure Mining


Generate
structural summary
about the
Web site and Web page



Extraction of patterns from the hyperlinks


Mining of the structure of the document




Web Usage Mining


Discovering user ‘navigation patterns’
from web data.


Prediction of user behavior while the user interacts
with the web.



Helps to Improve large Collection of resources.


Usage Mining Techniques


Data Preparation


Data Collection


Data Selection


Data Cleaning


Data Mining


Navigation Patterns


Sequential Patterns




Data Mining Techniques


Navigation Patterns


Example:


70% of users who accessed
/company/product2
did so by
starting at
/company

and proceeding through
/company/new
,
/company/products
and
company/product1


80% of users who accessed the site started from
/company/products


65% of users left the site after four or less page references




Cont…


Sequential Patterns


In Google search, within past week 30% of
users who visited


/company/product
/ had
‘camera’ as text.


60% of users who placed an online order
in
/company/product1
also placed an order
in
/company/product4

within 15 days


Web Content Mining


‘Process of information’

or resource
discovery from content of millions of
sources across the World Wide Web


E.g. Web data contents: text, Image, audio, video,
metadata and hyperlinks


Goes beyond key word extraction, or
some simple statistics of words and
phrases in documents.



Semantic Web


The

Semantic Web

is an evolving
development of the

World Wide Web

in
which the meaning (semantics) of
information and services on the web is
defined, making it possible for the web
to "understand" and satisfy the
requests of people and machines to use
the

web content.

XML, RDF and Web Data


Structured and Unstructured Data


W3c Standards for RDF


Semantic Web: Different Kinds of
databases


Tight Coupling and Loose Coupling

RDF
-

Resource Description
Framework


Data Model consists of three object
types:


Resources


Properties


Statements


Example


Ora

Lassila

is the creator of the resource
http://www.w3.org/Home/Lassila


This sentence has the following parts:



Subject(Resource)



http://www.w3.org/Home/Lassila



Predicate (Property)

Creator



Object (literal)

"
Ora

Lassila
"


Cont…

Cont…

Ontologies


Ontologies are developed to provide
machine
-
processable semantics of
information sources that can be
communicated between different agents
(software and humans).


Developing an Ontology



Defining classes in the ontology,


Arranging the classes in a taxonomic
(subclass

superclass) hierarchy


Defining slots and describing allowed
values for these slots,


Filling in the values for slots for
instances.


Cont…

Semantic Web Mining


Closing the gap between Semantic Web
and Web Mining.


Use of ontologies

Mining the Semantic in Web

Evaluation Of Semantic Web
Mining


Web Mining Vs. Semantic Web Mining


A Note On E
-
Commerce






Research initiatives


Vivísimo

proposes a clustering approach
for web document organization


Haveliwala

also propose a methodology
for evaluating strategies for similarity
search on the Web.


Jaccard

coefficient

Future Work


Demonstrating the utility of web mining
can be done by
making exploratory
changes to web sites
, e.g., adding
links from hot parts of web site to cold
parts and then extracting, visualizing
and interpreting changes in access
patterns.


Conti…


There is often a tension in the
design
of algorithms

between
accommodating a wide range of data,
or customizing the algorithm to
capitalize on known constraints or
regularities.


Also
web content mining

can be
introduced to implementations of this
architecture.


References


http://en.wikipedia.org/wiki/Web_mining


http://www.engr.sjsu.edu/meirinaki/papers/NEMIS.p
df


http://www.w3.org


http://www.cs.washington.edu/research/projects/We
bWare1/www/softbots/papers/agents97.pdf


http://infomesh.net/2001/swintro/


http://www.ksl.stanford.edu/people/dlm/etai/etai
-
abstract.html