Dong-Kyu Jeon

snufflevoicelessInternet και Εφαρμογές Web

22 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

64 εμφανίσεις

인공지능

특강

프로젝트

-

Development of Decision Tree Algorithm for Semantic Web data
-

2010313148

전동규

2

Agenda

1.
Project Purpose


2.
Motivation


3.
Related Work


4.
Algorithm


5.
Description of Problem

3

Relational Data

-

Semantic Web Data (Linked data)
-

Decision Tree Algorithm

-

C4.5 Algorithm
-

Semantic Decision Tree

1. Project Purpose

4

1. Project Purpose


Goal of project


Development of new kind of Decision Tree algorithm
which supports
decision making based on Semantic Web environmental information


Solve the several problems

which is already solved by other related
researches



Data : Linked Data(Semantic Web ontology)

5

1. Project Purpose


Ontology


Class : Definition of Set


Property : Relations between instances


Instance : Individuals which are belonged in classes

Schema of Example Ontology


Datatype Property


Object Property

Class

Range type

Boolean

Int

String

String

Rich

name

Person

Workplace

Location

Person

Doctor

Teacher

Student

Person

Hospital

School

age

String

name

6



Definition of Semantic Web


The Semantic Web is an evolving development of the World Wide Web in which the
meaning (semantics) of information and services on the web is defined, making it
possible for the web to understand and satisfy the requests of people and machines to
use the web content


Semantic Web is based on
ontologies

which corresponds to Semantic Web data




Linked Data


The term Linked Data is used to describe a method of exposing, sharing, and connecting
data on the Web

What is Semantic Web ?

[1] Berners
-
Lee, T. (2001),


The Semantic Web,


Scientific American
, Vol. 501.

2. Motivation

7



Increase of Semantic Web Data


Appearance of Semantic Web Document Search Engines


Falcons

: Twenty millions over

RDF/XML Documents


Swoogle : Three millions over

Semantic Web Documents







Open data in Semantic Web


LINKINGOPENDATA :


The goal of the W3C SWEO Linking Open Data community project is to
extend the Web with a data commons by publishing various open data
sets as RDF on the Web and by setting RDF links between data items from
different data sources


The data sets consist of over 4.7 billion RDF triples, which are interlinked
by around 142 million RDF links (May 2009).

Development status of Semantic Web

Date

RDF/XML

Quadruple

Falcons

2009
-
09
-
02

21,639,337

2,936,868,638

2009
-
05
-
29

19,919,364

2,177,084,709

Date

Semantic Web Document

Triple

Swoogle

2009
-
10
-
17

3,109,616

1,065,799,526

2. Motivation

8



Increase of Semantic Web Data


Semantic Web based Portal Site


Twine :


Twine is a Semantic Web Portal that making networks of information based
on user’s posts which consist of their own information and favorite things.


Every information composing Twine is written in RDF and OWL format.


Twine have millions of visitors in a month, and they have over millions of
relationships between 3 millions of semantic tags (March 2009)

The necessity of mining useful knowledge from huge size
ontology is highly expected. Therefore, Data Mining
methodology for Semantic Web should be ready for this
necessity.

Development status of Semantic Web

2. Motivation

9


Traditional Decision Tree algorithm is impossible to apply in Semantic Web


Semantic Web based Ontology has special characteristics for mining


Since Semantic Web document has network structure, multi
-
value issue is
occurred


Traditional Decision Tree just uses value of variables. Therefore, additional
information of Semantic Web are can not be applied


Converting Semantic Web data into single table style that used to use in
traditional decision tree algorithm is impossible

Decision Tree in Semantic Web

2. Motivation

10


Arno J.Knobbe[2]

developed decision tree algorithm for Multi
-
relational
database


Selection Graph

is suggested to do decision tree on RDB


Selection Graph is composed of Node, Edge, and condition and it can be expressed
in SQL syntax

3. Related Work

This research suggested partial solution about multi
-
value issue which also
happened in Semantic Web ontology. However, this methodology can not be
applied to Semantic Web which contains a lot of information than RDB


David Jensen[3] suggested methodology that converting social network data
to single table data which can be applied to Traditional Decision Tree
algorithm


‘QGraph' that kind of query language to get the local network from entire social
network is suggested


QGraph is composed of Node, Edge, and condition and it can query many objects at
once

Since ontology information are manually converted to single table form, missing
information will be occurred a lot

[2] Arno J. Knobbe.,


Arno Siebes.,


Danil Van Der Wallen.,


Syllogic B. V. (1999).

Multi
-
Relational Decision Tree Induction,


In Proceedings of PKDD

99,

[3] D. Jensen., and J. Neville.(to appear) (2002).

Data mining in networks,


Papers of the Symposium on Dynamic Social Network Modeling and Analysis.

11

4. Algorithm


Search procedure of algorithm follows C4.5 algorithm


New methodologies are required to learn concepts in ontology



Constructor
’ can be used as similar as attributes in traditional Decision tree


Related works used the terms ‘Refinement’ as an attribute in Decision Tree


12

4. Algorithm


What is a Refinement?


Refinement is a condition for split branches in decision tree. In this algorithm, property
and class from ontology are used as a refinement.


When define a refinement, Role Constructors from Description Logic are applied to
make the best use of information in Semantic Web



Type of Refinements


Concept Constructor Refinement :
Applying type information of instances


Cardinality restriction Refinement :
Applying cardinality information on object property


Domain restriction Refinement :
Applying value of datatype property


Qualification Refinement :
Applying information of quantification restrictions and range
class of object property

Refinements

13

Refinements

Example

Concept Constructor
Refinement

Hospital

not Human

Domain restriction
Refinement

Age.(

21)

Cardinality restriction
Refinement

≥ 3 hasChild

Qualification
Refinement



hasChild.Blond

4. Algorithm


Developed Refinements

14



The list of syntax information which can be expressed in ontology

Language

Syntax

RDF

rdf:type

RDFS

rdfs:domain

rdfs:range

rdfs:subClassOf

rdfs:subPropertyOf

OWL

owl:AllDifferent

owl:allValuesFrom

owl:cardinality

owl:Class

owl:complementOf

owl:DatatypeProperty

owl:DataRange

owl:differentFrom

owl:disjointWith

owl:equivalentClass

Language

Syntax

OWL

owl:equivalentProperty

owl:FunctionalProperty

owl:hasValue

owl:intersectionOf

owl:InverseFunctionalProperty

owl:inverseOf

owl:masCardinality

owl:minCardinality

owl:ObjectProperty

owl:oneOf

owl:sameAs

owl:someValuesFrom

owl:SymmetricProperty

owl:TransitiveProperty

owl:unionOf

4. Algorithm

15

5. Description of Problem

Train problem

Ten trains


After learning, found definition of eastbound train is as follows

16

5. Description of Problem


Artificial task of learning to predict whether a train is headed east or west


Data is consist of relation tuples


Relations


eastbound(T) : train T is eastbound


has
-
car(T,C) : C is a car of T


infront(C,D) : car C is in front of D


long(C) : car C is long


open
-
rectangle(C) : car C is shaped as an open rectangle similar relations for five other shapes


jagged
-
top(C) : C has a jagged top


sloping
-
top(C) : C has a sloping top


open
-
top(C) : C is open


contains
-
load(C,L) : C contains load L


1
-
item(C) : C has one load item similar relations for two and three load items


2
-
wheels(C) : C has two wheels


3
-
wheels(C) : C has three wheels

Train problem