An Empirical Investigation of

religiondressInternet and Web Development

Oct 21, 2013 (3 years and 9 months ago)

64 views


An Empirical Investigation of Learning from the Semantic Web

An Empirical Investigation of

Learning from the Semantic Web

Pete Edwards

Gunnar AAstrand Grimnes

Alun Preece


Computing Science Department

University of Aberdeen

{pedwards,
ggrimnes
,apreece}@csd.abdn.ac.uk


Semantic Web Mining Workshop @ ECML 2002



An Empirical Investigation of Learning from the Semantic Web

Motivation



The Semantic Web should:


Facilitate learning from the Web.


Facilitate reuse of learning outcomes.


Hypothesis :

Learning from Semantically Marked
-
up data should
outperform learning from plain text.




An Empirical Investigation of Learning from the Semantic Web

Methods



Compare performance of learning from
plain text and from semantic Meta
-
data.


Using traditional ML algorithms as
baseline approach.


Naïve Bayes


K
-
Nearest Neighbour


Explore application of more knowledge
intensive approaches, such as ILP.


An Empirical Investigation of Learning from the Semantic Web

Datasets



Semantic Web still in its infancy,
so available datasets are limited.


Need dataset with instances
represented in plain
-
text and in some
semantic markup
-
language.


Forced to use artificial data
-
sets.


No ontological support.


An Empirical Investigation of Learning from the Semantic Web

ITTalks

http://ittalks.org



ITTalks is a real Semantic Web application.


Information about seminars at Universities
in the US.


Plain HTML and DAML+OIL versions of each
talk has slightly different content, but largely
overlapping.


No classification of data, so we did personal
preference labelling.


An Empirical Investigation of Learning from the Semantic Web

ITTalks example

<rdf:RDF>

<rdf:Description
about="http://www.ittalks.org/jsp/Controller.jsp?action=ViewTalk&amp;as=HTML&amp;talkid=20010620141011">

<Talk rdf:parseType="Resource">


<Title>PROBABILISTIC OPTIMIZATION TECHNIQUES FOR MULTICAST KEY MANAGEMENT … </Title>

<Abstract>Multicast is a key technology to support large group communications over the Internet… </Abstract>


<BeginTime>


<time:Year>2001</time:Year>


<time:Month>06</time:Month><time:Day>20</time:Day> ...

</BeginTime>

...

<Audience>General Public</Audience>


<DomainName>umbc</DomainName>


<Location rdf:parseType="Resource">


<Institution>UMBC</Institution>


</Location>



<Speaker rdf:parseType="Resource">


<Name>Ali Selcuk</Name>


<Organization>UMBC</Organization>


<Email>aselcu1@csee.umbc.edu</Email>

</Speaker>

</Talk>

</rdf:Description>

</rdf:RDF>









An Empirical Investigation of Learning from the Semantic Web

ResearchIndex

http://citeseer.nj.nec.com



ResearchIndex is scientific literature digital
library.


Articles from 17 different subject areas within
Computing Science.


Full text of article and BibTeX provided.


BibTex converted to RDF.


Full text is typically

6000 words.


BibTex is typically

10 RDF Statements.


An Empirical Investigation of Learning from the Semantic Web

BibTeX


RDF mapping


@inproceedings{


davies94agentk,


author = "W. H. E. Davies and P.
Edwards",


title = "Agent
-
K: An Integration of



AOP and KQML",


booktitle = "Proceedings of the CIKM'94



Workshop on Intelligent Agents",


address = "Gaithersburg, MD, USA",


editor = "T. Finin and Y. Labrou",


year = "1994",


url = "citeseer.nj.nec.com/15298.html"

}




<inproceedings rdf:about="davies94agentk">


<author>W. H. E. Davies and P.Edwards</author>


<title>Agent
-
K: An Integration of


AOP and KQML</title>


<booktitle>Proceedings of the CIKM'94


Workshop on Intelligent Agents</booktitle>


<address>Gaithersburg, MD, USA</address>


<editor>T. Finin and Y. Labrou</editor>


<year>1994</year>


<url>citeseer.nj.nec.com/15298.html</url>

</inproceedings>



An Empirical Investigation of Learning from the Semantic Web

Knowledge Sparse Learning

Representation



For each algorithm we use 3 instance
representations:

1. Conventional plain text

2. Meta
-
data as plain
-
text

3. Meta
-
data tags to feature mapping



An Empirical Investigation of Learning from the Semantic Web

Method 3

Meta
-
data tags to feature mapping


<xml>


<rdf>


<talk id='mlsemweb1'>


<title>An Empirical Investigation of Learning from


the Semantic Web</title>


<speaker>


<name>Gunnar AAstrand Grimnes</name>


<url>http://www.csd.abdn.ac.uk/~ggrimnes</url>


</speaker>


...

talk, title, speaker, name, url ...

{}, {empirical, investigation, learning, semantic, web}, {}, {gunnar, aastrand, grimnes}, {csd, abdn, ggrimnes}
...

Instance representation:

Meta
-
data instance:

Feature tags:


An Empirical Investigation of Learning from the Semantic Web

Knowledge Sparse Learning

Results


ResearchIndex

ITTalks


ITTalks:


Meta 2 performs poorly, caused by redundant features.


Text & Meta 1 are very similar, as those instances in this
dataset are almost identical.


ResearchIndex:


KNN performs better for the full text instances, as it is better
at dealing with large numbers of features.



An Empirical Investigation of Learning from the Semantic Web

Knowledge Intensive Learning

Representation



Ignore the plain
-
text representations.


RDF maps to 1st order logic Prolog
representation.


Using the ILP algorithm Progol4.4 to
learn Prolog rules for class descriptions.


Solve binary classification problems.



An Empirical Investigation of Learning from the Semantic Web

RDF


Prolog mapping


<inproceedings rdf:about="davies94agentk">


<author>W. H. E. Davies and P.Edwards</author>


<title>Agent
-
K: An Integration of


AOP and KQML</title>


<booktitle>Proceedings of the CIKM'94


Workshop on Intelligent Agents</booktitle>


<address>Gaithersburg, MD, USA</address>


<editor>T. Finin and Y. Labrou</editor>


<year>1994</year>


<url>citeseer.nj.nec.com/15298.html</url>

</inproceedings>


url( davies94agentk,


'citeseer.nj.nec.com/15298.html' ).

editor( davies94agentk, 'T. Finin' ).

editor( davies94agentk, 'Y. Labrou' ).

titleword( davies94agentk, 'agent' ).

titleword( davies94agentk, 'integration' ).

titleword( davies94agentk, 'aop' ).

titleword( davies94agentk, 'kqml' ).

author( davies94agentk, 'W. Davies' ).

author( davies94agentk, 'P. Edwards' ).

address( davies94agentk, 'Gaithersburg, MD,USA').

year( davies94agentk, '1994' ).

type( davies94agentk, ‘#inproceedings' ).

booktitleword( davies94agentk, 'proceedings' ).

booktitleword( davies94agentk, 'cikm94' ).

booktitleword( davies94agentk, 'workshop' ).

booktitleword( davies94agentk, 'intelligent' ).

booktitleword( davies94agentk, 'agents' ).



An Empirical Investigation of Learning from the Semantic Web

Knowledge Intensive Learning

Results


Agents experiment (155 clauses):


inClass(A) :
-

author(A,'A. Rao').

inClass(A) :
-

author(A,'D. Lambrinos').

inClass(A) :
-

titleword(A,agent), titleword(A,mobile).

inClass(A) :
-

type(A,'http://www.csd.abdn.ac.uk/òggrimnes/exp/#misc'),


textword(A,agent), titleword(A,agent).

inClass(A) :
-

year(A,1999), titleword(A,agents).

inClass(A) :
-

titleword(A,bdi).


Machine Learning (259 clauses):

inClass(A) :
-

publisher(A,'Morgan Kaufmann'), booktitleword(A,learning).

inClass(A) :
-

titleword(A,based), titleword(A,case).


Theory (279 clauses):

inClass(A) :
-

volume(A,18).


An Empirical Investigation of Learning from the Semantic Web

Future work

Learning Personal Profiles


Gunnar’s profile.

Based on 200 manually rated articles from the
ResearchIndex dataset.


inClass(A) :
-

titleword(A,image).

inClass(A) :
-

type(A,'http://www.csd.abdn.ac.uk/~ggrimnes/exp/#misc'),

textword(A,learning).

inClass(A) :
-

booktitleword(A,mining).

inClass(A) :
-

author(A,'N. Jennings').

inClass(A) :
-

titleword(A,indexing).

inClass(A) :
-

pages(A,143).


An Empirical Investigation of Learning from the Semantic Web

Conclusion



In terms of accuracy learning from the
Semantic Web was not superior.


Learning from RDF requires less
resources.


Datasets have no ontological support.


Learning outcomes from the Semantic
Web can be real, reusable knowledge.