Web Intelligence (WI)

religiondressInternet and Web Development

Oct 21, 2013 (4 years and 23 days ago)

86 views

Web Intelligence (WI)

Definition
, Research Challenges
and
Major Tools

Yang Chen

UNC Charlotte

Outline


A brief history of Web Intelligence


Motivations for WI


Definition and Perspectives of WI


Research Agenda


Major Web Intelligence Tools


Conclusion

A Brief History of WI


1999: Collaborative research initiatives


Ning Zhong, Data Mining and Knowledge Systems


Jiming Liu, Intelligent agents and multi
-
agents


Yiyu Yao, Information retrieval and intelligent
information systems



Combined research efforts with common
goal: create a new sub
-
discipline covering
theories and techniques related to web
information.



A Brief History of WI


2000: Publication of a two
-
page position
paper on WI (Zhong, Liu, Yao,
Ohs
uga,
COMPSAC 2000)

A Brief History of WI


2001: First Asia
-
Pacific Conference on Web
Intelligence


2002: Publication of first special issue on WI in
IEEE Computer


2002: Web Intelligence Consortium


2003: First edited book on WI


2005: The international WIC Institute


Outline


A brief history of Web Intelligence


Motivations for WI


Definition and Perspectives of WI


Trends and Research Agenda


Major Web Intelligence Tools


Conclusion

Motivation


The sheer size of Web


Difficulties in the storage, management, and
efficient and effective retrieval


Complexity of Web


Heterogeneous collection of structured,
unstructured, semi
-
structured, interrelated,
and distributed Web documents


Consist texts, images and sounds

Motivation

Web Intelligence on the Web

Industrial Interests in WI


Web Intelligence
kis
-
lab.com/wi01/


Web
-
Intelligence
Home Page


www.web
-
intelligence.com/


Intelligence
on the
Web


w
ww.fas.org/irp/intelwww.html


WIN: home
WEB INTELLIGENCE
NETWORK,


smarter.net/


CatchTheWeb
-

Web
Research,
Web Intelligence
Collaboration www.catchtheweb.com/


Infonoia:
Web Intelligence
In Your Hands


www.infonoia.com/myagent/en/baseframe.html

Motivations


Data production on the Web is at an
exponential growth rate.


A fast growing industrial interest in WI


Only a few
academic

papers


We need to narrow the gap between
industry

needs and
academic

research.

Outline


A brief history of Web Intelligence


Motivations for WI


Definition and Perspectives of WI


Research Agenda


Major Web Intelligence Tools


Conclusion

What is Web Intelligence


Web Intelligence (WI)

exploits the fundamental
and practical impact that advanced
Information
Technology (IT)

and innovative

Artificial
Intelligence (AI)

will have on the Web:





Integration of IT with AI


Applications of AI on the Web

Web Intelligence System

Based on Zhong`s AWIC03
keynote talk

An Example

Advanced Questions


How the customer enters VIP portal

in
order to target products and manage
promotions and marketing campaigns?


What is the semantic association between
the pages the customer visited?


Is the visitor familiar with the Web
structure? Or is he or she a new user or a
random one?


Is the visitor a Web robot or other users?






Advanced WI System


Making a
dynamic recommendation

to a
Web user based on the user profile and
usage behavior;


Automatic modification

of a website’s
contents and organization;


Combining
Web usage data

with
marketing data

to give information about
how visitors used a website.

Advanced WI System

Perspectives of WI


WI can be classified into four categories
(based on Russel & Norvig`s scheme)

Outline


A brief history of Web Intelligence


Motivations for WI


Definition and Perspectives of WI


Research Agenda


Major Web Intelligence Tools


Conclusion

Research Agenda of WI


Semantic Web mining and automatic


construction of ontologies


Social network intelligence

The Semantic Web


The Semantic Web is based on
languages

that make more of the semantic content of

the page available in
machine
-
readable

formats

for agent
-
based computing.



A “semantic” language that ties the

information on a page to machine

readable semantics (
ontology
).

Components of Semantic Web


A unifying
data model

such as RDF.


Languages

with defined semantics, built on

RDF, such as OWL (DAML+OIL).


Ontologies

of standardized terminology for

marking up Web resources.


Tools

that assist the generation and processing

of semantic markup.


Ontologies provides the semantic backbone for
Semantic Web applications.


Ontologies offer


Communication


Normative models, Networks of relationships


Sharing & Reuse


Specifications, Reliability


Control


Classification, and Finding, sharing,
discovering relationships


Categories of Ontologies


A
domain
-
specific ontology

describes a well
-
defined

technical or business domain.


A
task

ontology might be either domain
-
specific

or reconstructed from a set of domain
-
specific

ontologies for meeting the requirement of a task.


A
universal

ontology describes knowledge at

higher levels.

Research Agenda of WI


Semantic Web mining and automatic


construction of ontologies


Social network intelligence

The Web as a Graph


We can view the Web as a directed social
network that

connects
people
(organizations or social entities).


Research Questions:




How big is the graph? (outdegree and indegree)



Can we browse from any page to any other? (clicks)



Can we exploit the structure of the Web? (searching and mining)



How to discover and manage the Web communities?



What does the Web graph reveal about social dynamics?

Social Network Intelligence

Social Network

Outline


A brief history of Web Intelligence


Motivations for WI


Definition and Perspectives of WI


Trends and Research Agenda


Major Web Intelligence Tools


Conclusion

Major
Web Intelligence Tools


I. Collection


Offline Explorer


SpidersRUs (AI Lab)


Google Scholar



II. Analysis (Data and Text Mining)


Google APIs


Google Translation


GATE


Arizona Noun Phraser (AI Lab)


Self
-
Organizing Map, SOM (AI Lab)


Weka



III. Visualization


NetDraw


JUNG


Analyst’s Notebook and Starlight

Collection:

Offline Explorer

Project list

Project properties setup window

File filters, URL filters,
and other advanced
properties.

Download
URLs

Download
level

File modification
check

Analysis:

Google APIs


Google provides many APIs to help you quickly develop your own application
s
.


http://code.google.com/more/




Examples

of Google APIs:


Google API for Inlink
:
D
iscovers what pages link to your website.


Google Data APIs: Provide a simple, standard protocol for reading and writing
data on the
W
eb. Several Google services provide a Google Data API, including
Google Base, Blogger, Google Calendar, Google Spreadsheets and Picasa Web
Albums.


Google AJAX Search API: Use
s

JavaScript to embed a simple, dynamic Google
search box and display search results in your own
W
eb pages.


Google Analytics:
Allows users
gather, view, and analyze data about
their

W
ebsite traffic
.
Users can s
ee which content gets the most visits, average page
views and time on site for visits.


Google Safe Browsing APIs:
A
llow client applications to
check URLs against
Google's constantly
-
updated blacklists of suspected phishing and malware
pages
.


YouTube Data API: Integrate
s

online videos from YouTube into your
application
s
.


GATE


Information Extraction tasks:


Named Entity Recognition (NE)


Finds names, places, dates, etc.


Co
-
reference Resolution (CO)


Identifies identity relations between entities in texts.


Template Element Construction (TE)


Adds descriptive information to NE results (using CO).


Template Relation Construction (TR)


Finds relations between TE entities.


Scenario Template Production (ST)


Fits TE and TR results into specified event scenarios.



GATE also includes:


Pa
rsers,
s
temmers,
and
I
nformation
R
etrieval

tools
;


T
ools for visuali
z
ing and manipulating ontolog
y; and


E
valuation and benchmarking tools
.


GATE

Results display

Attributes

Project information

SOM


The multi
-
level
self
-
organizing
map neural network
algorithm
was developed by Artificial Intelligence Lab at
the University of Arizona.



Using a 2D map display, similar topics

are positioned
closer according to their co
-
occurrence patterns;
more important topics occupy larger regions.

SOM

Topic
region

Topic

# of
documents
belonging to
this topic

Warm colors
represent
new topics.

Different
Topics

Visualization:

JUNG


T
he Java Universal Network/Graph Framework

(
JUNG
)
is a
software library for the modeling, analysis, and visualization of data
that can be represented as a graph or network. It
wa
s developed by
School of Information and Computer Science at the University of
California, Irvine.


http://jung.sourceforge.net/index.html



The current distribution of JUNG includes implementations of a
number of algorithms from graph theory, data mining, and social
network analysis
:


Clustering


Decomposition


Optimization


Random Graph Generation


Statistical Analysis


Calculation of Network Distances and Flows and Importance
Measures (Centrality, PageRank, HITS, etc.).


JUNG

Examples of visualization types


Conclusion


The marriage of hypertext and internet
leads to a revolution: the Web.


The marriage of Artificial Intelligence and
Advanced Information Technology, on the
platform of Web, will lead to another
paradigm shift: the Intelligent and Wisdom
Web.

Thank You

Any Question?