Semantic Web and Machine Learning Tutorial

achoohomelessAI and Robotics

Oct 14, 2013 (4 years and 24 days ago)

137 views

Semantic Web and
Machine Learning Tutorial
Andreas HothoKnowledge and Data Engineering GroupUniversity of Kassel
Germany
Steffen Staab
ISWeb–Information
Systems and Semantic Web
University of Koblenz
Germany
2
Agenda
•Introduction
•Foundations of the Semantic Web
•Ontology Learning
•Learning Ontology Mapping
•Semantic Annotation
•Using Ontologies
•Applications
3
Syntax isnotenough
Andreas
•Tel
•E-Mail
4
Information Convergence
•Convergence not just in devices, also in “information”
–Your personal information (phone, PDA,…)
Calendar, photo, home page, files…
–Your “professional” life (laptop, desktop, … Grid)
Web site, publications, files, databases, …
–Your “community” contexts (Web)
Hobbies, blogs, fanfic, social networks…
•The Web teaches us that people will work to share
–How do we CREATE, SEARCH, and BROWSE in the non-text
based parts of our lives?
5
CV
name
education
work
private
Meaningof Informationen:
(or: whatitmeansto bea computer)
6
CV
name
education
work
private
< >
< >
< >
< >
< >
< Χς>
<ναµε>
<εδυχατιον>
<ωορκ>
<πριϖατε>
XML ≠Meaning, XML = Structure
7
XML is unspecific:
No predetermined vocabulary
No semantics for relationships
& must be specified upfront
Only possible in close cooperations
–Small, reasonably stable group
–Common interests or authorities
Not possible in the Web or on a broad scale in
general !
Source of Problems
8
(One) LayerModel of theSemanticWeb
9
SomePrincipalIdeas
•URI –uniform resourceidentifiers
•XML –commonsyntax
•Interlinked
•Layersof semantics–
fromdatabaseto
knowledgebaseto
proofs
Design principlesof WWW appliedto Semantics!!
Tim Berners-
Lee, Weaving
theWeb
10
Whatisan Ontology?
Gruber 93:
An Ontologyisa
formal specification
of a shared
conceptualization
of a domainof interest
⇒Executable
⇒Group of persons
⇒Aboutconcepts
⇒Betweenapplication
and „uniquetruth“
11
Taxonomy
Object
Person
Topic
Document
Researcher
Student
Semantics
Ontology
DoctoralStudent
Taxonomy:= Segmentation, classificationand orderingof
elementsintoa classificationsystem accordingto their
relationshipsbetweeneachother
PhDStudent
F-Logic
Menu
12
Thesaurus
Object
Person
Topic
Document
Researcher
Student
Semantics
PhDStudent
DoktoralStudent
•Terminologyforspecificdomain
•Graph withprimitives, 2 fixedrelationships(similar, synonym)
•originatefrombibliography
similar
synonym
Ontology
F-Logic
Menu
13
Topic Map
Object
Person
Topic
Document
Researcher
Student
Semantics
PhDStudent
DoktoralStudent
knows
described_in
writes
Affiliation
Tel
•Topics (nodes), relationshipsand occurences(to documents)
•ISO-Standard
•typicallyfornavigation-and visualisation
Ontology
F-Logic
similar
synonym
Menu
14
Ontology
F-Logic
similar
Ontology
F-Logic
similar
PhDStudent
DoktoralStudent
Ontology (in our sense)
Object
Person
Topic
Document
Tel
PhDStudent
PhDStudent
Semantics
knows
described_in
writes
Affiliation
described_in
is_about
knows
P
writes
D
is_about
T
P
T
D
T
T
D
Rules
subTopicOf
•RepresentationLanguage: PredicateLogic(F-Logic)
•Standards: RDF(S); comingup standard: OWL
Researcher
Student
instance_of
is_a
is_a
is_a
Affiliation
Affiliation
York Sure
AIFB
+49 721 608 6592
15
TheSemanticWeb
Employee
Employee
PostDoc
PostDoc
Professor
Professor
Person
Person
rdfs:subClassrdfs:subClass
rdfs:subClass
cooperatesWith
cooperatesWith
rdfs:Rangerdfs:Domain
Ontology
<swrc:Professor
rdf:ID="person_sst">
<swrc:name>Steffen Staab
</swrc:name>
...
</swrc:Professor>
http://www.aifb.uni-karlsruhe.de/WBS/sst
rdf:type
rdf:type
Meta-
data
<swrc:PostDocrdf:ID="person_sha">
<swrc:name>Siegfried
Handschuh</swrc:name>
...
</swrc:PostDoc>
Web
page
http://www.aifb.uni-karlsruhe.de/WBS/sha
URL
<swrc:cooperatesWithrdf:resource=
"http://www.uni-koblenz.de/~staab
#person_sst"/>
swrc:cooperatesWith
16
What’s in a link? Formally
W3C recommendations
•RDF: an edge in a graph
•OWL: consistency (+subsumption+classif. + …)
Currently under discussion
•Rules: a deductive database
Currently under intense research
•Proof: worked-out proofs
•Trust: signature & everything working together
17
What’s in a link? Informally
•RDF: pointing to shared data
•OWL: shared terminology
•Rules: if-then-else conditions
•Proof: proof already shown
•Trust: reliability
18
Ontologiesand their Relatives (I)
•Therearemanyrelatives around:
–Controlledvocabularies, thesauriand classificationsystems available
in theWWW, seehttp://www.lub.lu.se/metadata/subject-help.html
•ClassificationSystems (e.g. UNSPSC, Library Science, etc.)
•Thesauri (e.g. Art & Architecture, Agrovoc, etc.)
•DMOZ Open Directory http://www.dmoz.org
–LexicalSemanticNets
•WordNet, seehttp://www.cogsci.princeton.edu/~wn/
•EuroWordNet, seehttp://www.hum.uva.nl/~ewn/
–Topic Maps, http://www.topicmaps.org(e.g. usedwithinknowledge
managementapplications)
•In generalitisdifficultto find theborderline!
19
Ontologiesand their Relatives (II)
Catalog/ ID
Terms/
Glossary
Thesauri
Informal
Is-a
Formal
Is-a
Formal
Instance
Frames
Value
Restric-
tions
General
logical
constraints
Axioms
Disjoint
Inverse
Relations,
...
20
Ontologies -SomeExamples
•General purposeontologies:
–WordNet/ EuroWordNet, http://www.cogsci.princeton.edu/~wn
–TheUpper CycOntology, http://www.cyc.com/cyc-2-1/index.html
–IEEE Standard Upper Ontology, http://suo.ieee.org/
•Domain and application-specificontologies:
–RDF Site SummaryRSS, http://groups.yahoo.com/group/rss-dev/files/schema.rdf
–UMLS, http://www.nlm.nih.gov/research/umls/
–GALEN
–SWRC –SemanticWeb Research Community: http://ontoware.org/projects/swrc/
–RETSINA CalenderingAgent, http://ilrt.org/discovery/2001/06/schemas/ical-full/hybrid.rdf
–Dublin Core, http://dublincore.org/
•Web Services Ontologies
–Coreontologyof serviceshttp://cos.ontoware.org
–Web Service Modelingontologyhttp://www.wsmo.org
–DAML-S
•Meta-Ontologies
–SemanticTranslation, http://www.ecimf.org/contrib/onto/ST/index.html
–RDFT, http://www.cs.vu.nl/~borys/RDFT/0.27/RDFT.rdfs
–Evolution Ontology, http://kaon.semanticweb.org/examples/Evolution.rdfs
•Ontologies in a wider sense
–Agrovoc, http://www.fao.org/agrovoc/
–Art and Architecture, http://www.getty.edu/research/tools/vocabulary/aat/
–UNSPSC, http://eccma.org/unspsc/
–DTD standardizations, e.g. HR-XML, http://www.hr-xml.org/
21
Tools for markup...
PhotoStuff Demo
22
Not tied to specific domains
23
Not tied to specific domains
M-OntoMatis publicly available
http://acemedia.org/aceMedia/results/software/m-ontomat-annotizer.html
Shape
erasure
Shape Color
selection
Visual
Descriptor
selection
Draw panel
Descriptor
extraction
Shape
selection
VDE plug-in
launch
Domain
Ontology
Browser
Selected
region
Save
Prototype
Instances
24
SharedWorkspace(Xarop+ Screenshot)
25
Coming sooner than you may think…
26
Social networks:
e.g.
Friend of a Friend (FOAF)
•Say stuff about yourself (or others) in OWL files,
link to who you “know”
Estimates of the number of Foaf users range from 2M-5M
27
Using FOAF
in other contexts
http://trust.mindswap.org
JenniferGolbeck
28
Get a B&N price (In Euros)
29
Of a particular book
30
In its German edition?
31
32
The Semantic Wave
(Berners-Lee, 03)
YOU
ARE
HERE
2005
YOU
ARE
HERE
2003
33
Now.
•RDF, RDFS and OWL are ready for prime time
–Designs are stable, implementations maturing
•Major Research investment translating into application
development and commercial spinoffs
–Adobe 6.0 embraces RDF
–IBM releases tools, data and partnering
–HP extending Jena to OWL
–OWL Engines by OntopriseGmbH, Network Inference, Racer GmbH
–Proprietary OWL ontologies for vertical markets
•c.f. pharmacology, HMO/health care, ... Soft drinks
–Several new starts in SW space
34
Thesemanticweb and machinelearning
Whatcanmachinelearningdo for
theSemanticWeb?
1.LearningOntologies
(evenifnotfullyautomatic)
2.Learningto mapbetween
ontologies
3.DeepAnnotation: Reconciling
databasesand ontologies
4.AnnotationbyInformation
Extraction
5.Duplicaterecognition
WhatcantheSemanticWeb do
forMachineLearning?
1.Lots and lots of toolsto
describeand exchangedata
forlaterusebymachine
learningmethodsin a
canonicalway!
2.Usingontologicalstructures
to improvethemachine
learningtask
3.Providebackground
knowledgeto guidemachine
learning
35
Foundations of the Semantic Web: References
•Semantic Web Activity at W3C http://www.w3.org/2001/sw/
•www.semanticweb.org
(currently relaunched)
•Journal of Web Semantics
•D. Fenselet al.: SpinningtheSemanticWeb: BringingtheWorld WideWeb to ItsFull
Potential, MIT Press 2003
•G. Antoniou, F. van Harmelen. A SemanticWeb Primer, MIT Press 2004.
•S. Staab, R. Studer(eds.). Handbookon Ontologies. Springer Verlag, 2004.
•S. Handschuh, S. Staab (eds.). AnnotationfortheSemanticWeb. IOS Press, 2003.
•International SemanticWeb Conferenceseries, yearlysince2002, LNCS
•World WideWeb Conferenceseries, ACM Press, firstSemanticWeb paperssince
1999
•York Sure, Pascal Hitzler, Andreas Eberhart, Rudi Studer, TheSemanticWeb in One
Day,
IEEE Intelligent Systems
,
http://www.aifb.uni-karlsruhe.de/WBS/phi/pub/sw_inoneday.pdf
•Someslideshavebeenstolenfromvariousplaces, fromJim Hendlerand Frank van
Harmelen, in particular.
36
Agenda
•Introduction
•Foundations of the Semantic Web
•Ontology Learning
•Learning Ontology Mapping
•Semantic Annotation
•Using Ontologies
•Applications
37
TheOL LayerCake
Terms
Concepts
ConceptHierarchies
Relations
Rules
disease, illness, hospital
{disease,illness}
DISEASE:=<I,E,L>
is_a(DOCTOR,PERSON)
cure(dom:DOCTOR,range:DISEASE)
Synonyms
)),(),((,yxloveyxmarriedyx→∀
38
Howdo peopleacquiretaxonomicknowledge?
•I haveno idea!
•Butpeople applytaxonomicreasoning!
–„Neverdo harmto anyanimal!“
=> „Don‘t do harmto thecat!“
•Moredifficultquestions:
–representation
–reasoningpatterns
•Butlet‘sspeculatea bit! ;-)
39
Howdo peopleacquiretaxonomicknowledge?
Whatislivercirrhosis?
Mr. Smith diedfromlivercirrhosis.
Mr. Jagger suffersfromlivercirrhosis.
Alcoholabusecanleadtolivercirrhosis.
=>prob(isa(livercirrhosis,disease))
40
Howdo peopleacquiretaxonomicknowledge?
Whatislivercirrhosis?
Diseasessuch aslivercirrhosisare
difficultto cure. (New York Times)
41
Howdo peopleacquiretaxonomicknowledge?
Whatislivercirrhosis?
Cirrhosis:
noun[uncountable]
serious disease of the liver,
often caused by drinking too
much alcohol
disease))cirrhosis,isa(liver (prob
disease)sis,isa(cirrho cirrhosiscirrhosisliver

∧≈
42
Evaluation of Ontology Learning
The aprioriapproach is based on a gold standard ontology:
–Given an ontology modeled by an expert
-> The so called gold standard
–Compare the learned ontology with the gold standard
•Which methods exists:
–learning accuracy/precision/recall/f-measure
–Count edges in the “ontology graph”
•Counting of direct relation only (Reinbergeret.al. 2005)
•Least common superconcept
•Semantic cotopy
•…
–Evaluation via application (cf. section using ontologies)
43
TheSemanticCotopy
}''|'{),(cccccOcSC
OO
≤∨≤=
[Maedche& Staab 02]
44
ExampleforSC
bookable
rentable
joinable
driveable
appartment
car
bike
trip
excursion
rideable
root
thing
activity
vehicle
appartment
car
bike
trip
excursion
TWV
SC(bike)={bike,rideable,driveable.rentable,bookable}SC(bike)={bike,TWV,vehicle,thing,root}
=> TO(bike,O1,O2)=1/9!!!
45
Common SemanticCotopy
)}''('|'{),,('
11
2121
ccccCCccOOcSC
OO

∨≤∧∩∈=
46
ExampleforSC‘
bookable
rentable
joinable
driveable
appartment
car
bike
trip
excursion
rideable
root
thing
activity
vehicle
appartment
car
bike
trip
excursion
TWV
SC‘(driveable)={bike,car}SC‘(vehicle)={bike,car}
=> TO(driveable,O1,O2)=1
47
One moreExample
root
thing
activity
vehicle
appartment
car
bike
trip
excursion
TWV
SC‘(car)={car}SC‘(vehicle)={bike,car}
=> TO(driveable,O1,O2)=1/2
car
excursion
bike
trip
apartment
48
SemanticCotopyRevisited(OnceMore)
)
}
''('|'{),,(''
11
2121
ccccCCccOOcSC
OO
<∨>∧∩∈=

∉∈
=
21,
21
1
21
),,(
||
1
),(
CCc
OOcTO
C
OOTO
49
ExampleforPrecision/Recall
bookable
rentable
joinable
driveable
appartment
car
bike
trip
excursion
rideable
root
thing
activity
vehicle
appartment
car
bike
trip
excursion
TWV
P=100%
R=100%
F=100%
50
ExampleforPrecision/Recall
bookable
rentable
joinable
driveable
appartment
car
bike
trip
excursion
root
thing
activity
vehicle
appartment
car
bike
trip
excursion
TWV
P=100%
R=87,5%
F=93.33%
51
ExampleforPrecision/Recall
root
thing
activity
vehicle
appartment
car
bike
trip
excursion
TWV
bookable
rentable
joinable
driveable
appartment
car
bike
trip
planable
rideable
excursion
P=90%
R=100%
F=94.74%
52
AnotherExample
root
thing
activity
vehicle
appartment
car
bike
trip
excursion
TWV
car
excursion
bike
trip
apartment
P=100%
R=40%
F=57.14%
53
Evaluation Methodology


=
1
),,(
||
1
),(
21
1
21
Cc
OOcTO
C
OOTO





=
221
221
21
),,(''
),,('
),,(
CcifOOcTO
CcifOOcTO
OOcTO
|),,'(),,(|
|),,'(),,(|
max:),,(''
1221
1221
'21
2
OOcSCOOcSC
OOcSCOOcSC
OOcTO
Cc


=

|),,(),,(|
|),,(),,(|
:),,('
1221
1221
21
OOcSCOOcSC
OOcSCOOcSC
OOcTO


=
),(),(
),(),(2
),(
),()O,(O
),()O,(O
2121
2121
21
1221
2121
OOROOP
OOROOP
OOF
OOTOR
OOTOP
+
⋅⋅
=
=
=
54
LexicalRecalland F‘
||
||
),(
2
21
21
O
OO
C
CC
OOLR

=
)),(),((
),(*),(*2
),('
2121
2121
21
OOLROOF
OOLROOF
OOF
+
=
55
Evaluation of Ontology Learning
•The aposterioriApproach:
–ask domain expert for a per concept evaluation of the learned
ontology
–Count three categories of concepts:

Correct
: both in learned and the gold ontology

New
: only in learned ontology, but relevant and should be in gold
standard as well

Spurious
: useless
–Compute precision = (
correct + new
) / (
correct + new +
spurious
)
•As the result:
The a priori evaluations are aweful–BUT
A posteriori evaluations by domain experts still show
very good results, very helpful for domain expert!
SabouM., WroeC., Goble C. and MishneG.,LearningDomain Ontologiesfor Web Service Descriptions: an
Experiment in Bioinformatics, In Proceeedingsof the 14th International World Wide Web Conference (WWW2005),
Chiba, Japan, 10-14 May, 2005.
56
StartingPoint in OL fromtext
•Context-basedapproaches:
–DistributionalHypothesis[Harris 85]:
„Words are(semantically) similarto the
extentto whichtheyappearin similar(syntactic) contexts“
–leadsto creationof groups
•Lookingforexplicitinformation:
–Texts
–WWW
–Thesauri
57
Lookingforexplicitinformation
Therearetwosources:
•Lookingforpatternsin texts:
–‚is-a‘ patterns[Hearst 92,98],[Poesio et al. 02], [Ahmidet al. 03]
–‚part-of‘ patterns[Charniaket al. 99]
–‚causation‘ patterns[Girju02/03]
•UsingtheWeb:
–[Etzioniet al. 04]
–[Cimianoet al. 04]
58
Pattern based approaches (Hearst Patterns)
•Match patternsin corpus:
•NP0 such as NP1 ... NPn-1 (and|or) NPn
•such NP0 as NP1 ... NPn-1 (and|or) NPn
•NP1 ... NPn(and|or) otherNP0
•NP0, (including,especially) NP1 ... NPn-1 (and|or) NPn
•isaHearst(conference,event)=0.44
•isaHearst(conference,body)=0.22
•isaHearst(conference,meeting)=0.11
•isaHearst(conference,course)=0.11
•isaHearst(conference,activity)=0.11
))head(NP),(head(NPisa1NP allfor
0iHearsti
ni≤≤
,*)erns(tHearstPatt#
)t,erns(tHearstPatt#
)t,(tisa
1
21
21Hearst
=
59
WWW Patterns
Generatepatterns:•<t
1>s such as <t2>
•such <t
1>s as <t2>
•<t
1>s, especially<t
2>
•<t
1>s, including<t
2>
•<t
2> and other<t
2>s
•<t
2> orother<t
2>s
and Query theWeb usingtheGoogleAPI:
,*)Patterns(t#
)t,Patterns(t#
)t,(tisa
1
21
21WWW
=
60
TheVector-SpaceModel
•Idea:collectcontextinformationbasedon the
distributionalhypothesisand representitas a
vector:
•computesimilarityamongvectors
wrt. to somemeasure
XX
cirrhosis
XX
disease
eatenjoysuffer_fromdie_from
61
ClusteringConceptHierarchiesfromText
•Observation: ontologyengineersneedinformationabout
theeffectiveness, efficiencyand trade-offsof different
approaches
•Similarity-based
–agglomerative/bottom-up
–divisive/top-down: Bi-Section-KMeans
•Set-theoretical
–setoperations(inclusion)
–FCA, basedon Galoislattices
[Cimianoet al. 03-04]
62
ContextExtraction
•extractsyntacticdependenciesfromtext

verb/object, verb/subject, verb/PP relations

car: drive_obj, crash_subj, sit_in, …
•LoPar, a trainablestatisticalleft-cornerparser:
Parser
tgrep
Lemmatizer
Smoothing
Weighting
FCA
Lattice
Compaction
Pruning
63
Example
•People bookhotels. Theman drovethebike
alongthebeach.
book_subj(people)
book_obj(hotels)
drove_subj(man)
drove_obj(bike)
drove_along(beach)
book_subj(people)
book_obj(hotel)
drive_subj(man)
drive_obj(bike)
drive_along(beach)
Lemmatization
64
Weighting(thresholdt)
•Conditional:
•Hindle:
•Resnik:
)|(
arg
vnP









)(
)|(
log)|(
arg
arg
nP
vnP
vnP








⋅=








⋅⋅

)'(
)|'(
log)|'()(
)(
)|(
log)|()(
arg
'
argarg
arg
argarg
nP
vnP
vnPvS
nP
vnP
vnPvS
n
R
R
65
TourismFormal Context
XXexcursion
XXtrip
XXXXmotor-bike
XXXcar
XXappartment
joinablerideabledriveablerentablebookable
66
TourismLattice
67
Concept Hierarchy
bookable
rentable
joinable
driveable
appartment
car
bike
trip
excursion
rideable
68
Agglomerative/Bottom-UpClustering
car
bus
trip
excursion
appartment
69
LinkageStrategies
•Complete-Linkage:
–considerthetwomostdissimilarelementsof eachof theclusters
=> O(n2
log(n))
•Average-Linkage:
–considertheaveragesimilarityof theelementsin theclusters
=> O(n2
log(n))
•Single-Linkage:
–considerthetwomostsimilarelementsof eachof theclusters
=> O(n2)
70
Bi-Section-KMeans
excursion
trip
appartment
carbus
tripexcursion
excursion
trip
car
bus
appartment
appartment
buscar
buscar
71
DataSets
•Tourism(118 Mio. tokens):
–http://www.all-in-all.de/english
–http://www.lonelyplanet.com
–British National Corpus (BNC)
–handcraftedtourismontology(289 concepts)
•Finance(185 Mio. tokens):
–Reuters newsfrom1987
–GETESS financeontology(1178 concepts)
72
ResultsTourismDomain
73
Resultsin FinanceDomain
74
ResultsTourismDomain
75
Resultsin FinanceDomain
76
Summary
Weak-FairO(n2)36.42/32.77%Divisive
Clustering
FairO(n2
log(n))
O(n2
log(n))
O(n2)
36.78/33.35%
36.55/32.92%
38.57/32.15%
Agglomerative
Clustering
GoodO(2n)43.81/41.02%FCA
TraceabilityEfficiencyEffectiveness
77
OtherClusteringApproaches
•Bottom-Up/Agglomerative
–(ASIUM System) Faureand Nedellec1998
–Caraballo1999
–(Mo‘K Workbench) Bissonet al. 2000
•Other:
–Hindle1990
–Pereira et al. 1993
–Hovyet al. 2000
78
Ontology Learning References
•Reinberger, M.-L., & Spyns, P. (2005). Unsupervised text mining for the learning of dogma-inspired ontologies. In Buitelaar, P., Cimiano, P., & Magnini,
B. (Eds.), Ontology Learning from Text: Methods, Evaluation and Applications.
•PhilippCimiano, Andreas Hotho, Steffen Staab: Comparing Conceptual, Diviseand Agglomerative Clustering for Learning Taxonomies from Text.ECAI
2004: 435-439
•P. Cimiano, A. Pivk, L. Schmidt-Thiemeand S. Staab, Learning Taxonomic Relations from HeterogenousEvidence. In Buitelaar, P., Cimiano, P., &
Magnini, B. (Eds.), Ontology Learning from Text: Methods, Evaluation and Applications.
•SabouM., WroeC., Goble C. and MishneG.,LearningDomain Ontologiesfor Web Service Descriptions: an Experiment in Bioinformatics, In Proceeedings
of the 14th International World Wide Web Conference (WWW2005), Chiba, Japan, 10-14 May, 2005.
•Alexander Maedche, Ontology Learning for the Semantic Web, PhD Thesis, Kluwer, 2001.
•Alexander Maedche, Steffen Staab: Ontology Learning for the Semantic Web. IEEE Intelligent Systems 16(2): 72-79 (2001)
•Alexander Maedche, Steffen Staab: Ontology Learning. Handbook on Ontologies2004: 173-190
•M. Ciaramita, A. Gangemi, E. Ratsch, J. Saric, I. Rojas. Unsupervised Learning of semantic relations between concepts of a molecular biology ontology.
IJCAI, 659ff.
•A. Schutz, P. Buitelaar. RelExt: A Tool forRelation ExtractionfromText in OntologyExtension. ISWC 2005.
•Faure, D., & N´edellec, C. (1998). A corpus-based conceptual clustering method for verb frames and ontology.In Velardi, P. (Ed.), Proceedings of the
LREC Workshop on Adapting lexical and corpus resources to sublanguages and applications, pp. 5–12.
•Michele Missikoff, Paola Velardi, Paolo Fabriani: Text MiningTechniquesto AutomaticallyEnricha Domain Ontology. AppliedIntelligence18(3): 323-340
(2003).
•Gilles Bisson, Claire Nedellec, Dolores Cañamero: DesigningClusteringMethodsforOntologyBuilding-TheMo'KWorkbench. ECAI Workshop on
OntologyLearning2000
79
Agenda
•Introduction
•Foundations of the Semantic Web
•Ontology Learning
•Learning Ontology Mapping
•Semantic Annotation
•Using Ontologies
•Applications
80
Lots of Overlapping Ontologies
on the Semantic Web
Search Swoogle
for “publication”
185 matches in
the repository
Different
definitions,
viewpoints,
notions
© Noy
81
82
Creating Correspondences Between Ontologies
© Noy
83
Ontology-level Mismatches
•The same terms describing different concepts
•Different terms describing the same concept
•Different modeling paradigms
–e.g., intervals or points to describe temporal aspects
•Different modeling conventions
•Different levels of granularity
•Different coverage
•Different points of view
•...
© Noy
84
Ontology-to-Ontology Mappings:
Sources of information
•Lexical information: edit distance, …
•Ontology structure: subclassOf, instanceOf,…
•User input: “anchor points”
•External resources: WordNet,…
•Prior matches
© Noy
85
Mapping Methods
•Heuristic and Rule-based methods
•Graph analysis
•Probabilistic approaches
•Reasoning, theorem proving
•Machine-learning
86
Example
Object
Vehicle
Car
Boat
hasOwner
Owner
Speed
hasSpeed
Porsche KA-123
Marc
250 km/h
Thing
Vehicle
Automobile
Speed
hasSpecification
Marc’s Porsche
fast
0.9
1.0
0.9
simLabel
= 0.0
simSuper
= 1.0
simInstance
= 0.9
simRelation
= 0.9
simAggregation
= 0.7
0.7
87
Mapping Methods
•Heuristic and Rule-based methods
•Graph analysis
•Probabilistic approaches
•Reasoning, theorem proving
•Machine-learning
88
GLUE: Defining Similarity
Multiple Similarity measures in terms of the
Multiple Similarity measures in terms of the
JPD
JPD
Assoc. Prof
Snr. Lecturer
A,¬S
¬A, S
¬A,¬S
A,S
P(A,¬S) + P(A,S) + P(¬A,S)
P(A,S)
=
Joint Probability Distribution: P(A,S),P(¬A,S),P(A,¬S),P(¬A,¬S)
Hypothetical
Common
Marked up
domain
P(A∪S)
P(A ∩S)
Sim(Assoc. Prof., Snr. Lect.) =
[Jaccard, 1908]
89
GLUE: No common data instances
In practice, not easy to find data tagged with both
ontologies !
United StatesAustralia
Solution: Use Machine Learning
Solution: Use Machine Learning
A
¬A
S
¬S
90
Machine Learning for computing similarities
JPD estimated by counting the sizes of the partitions
JPD estimated by counting the sizes of the partitions
CLS
S
¬S
United States
Australia
A
¬A
S
¬S
CLA
A
¬A
A,¬S
A,S
¬A,¬S
¬A,S
A,S
¬A,S
A,¬S
¬A,¬S
91
GLUE: Improve Predictive Accuracy –Use Multi-
Strategy Learning
Single Classifier cannot exploit all available information
Combine the prediction of multiple classifiers
Combine the prediction of multiple classifiers
CLA
1
A
¬A
A
¬A
CLA
N
A
¬A

Content Learner
Frequencies on different words in the text in the data instances
Name Learner
Words used in the names of concepts in the taxonomy
Others …
Meta-Learner
92
•Constraints due to the taxonomy structure
•Domain specific constraints

Department-Chair
can only map to a unique concept
•Numerous constraints of different types
Staff
People
GLUE Next Step: Exploit Constraints
Staff
Fac
Prof
Assoc. ProfAsst. Prof
AcadTech
ProfSnr. Lect.
Lect.
People
People
Staff
Staff
Parents
Parents
Children
Children
Extended Relaxation Labeling to ontology matching
Extended Relaxation Labeling to ontology matching
93
Distribution Estimator
Joint Distributions:P(A,B),P(A,¬B),…
Taxonomy O2
(structure + data instances)
Taxonomy O1
(structure + data instances)
Putting it all together GLUE System
Relaxation Labeler
Generic & Domain
constraints
Mappings for O1 , Mappings for O2
Similarity Estimator
Similarity Matrix
Similarity function
Distribution
Estimator
Meta Learner
Learner CL1
Learner CLN
94
APFEL: Similarity Features
Set Similarityinstances
Relations

Instances
Set SimilaritysubclassOf
String SimilaritylabelConcepts
Similarity Measure
Feature
Aggregation-Example:
Interpretation: map(e
1j) = e2j
← sim(e1j
,e2j)>t

=
k
kk
fesimwfesim),(),(
95
APFEL: Optimize Integration
Optimized
Alignment
Method
Ontologies
Simple
Alignment
Method
User
Validation
Generation of
Feature/Similarity
Hypotheses
Generation
Of Initial
Alignments
x
Training:
Feature/Similarity
WeightingScheme
and Threshold
Fixing
Input
Output
Iterations
Features
Similarity
Aggregation
Interpretation
EntityPair
Selection
96
•Do two objects refer to the same entity?
–We know objects have the same type (their types are
mapped/merged)
•Examples
–Duplicate removal after merging knowledge bases
–Citation matching
Duplicate Recognition
© Noy
97
Appolo (USC/ISI)
–Combines information-
integration mediator
(Prometheus) with a record-
linkage system (Active Atlas)
–Uses a domain model of
sources and information that
they provide
Using External Sources for Duplicate Recognition
© Noy
98
Duplicate Recognition: Citation Matching
Pasula, Marthi, et.al. (UC Berkeley)
–Performs citation matching based on probability
models for
•author names
•titles
•title corruption, etc.
–Extends standard domain model to
incorporate probabilities
–Learns probability models from large
data sets
© Noy
99
References
User Input driven -Prompt, Chimaera, ONION
Chimaera(Stanford KSL; D. McGuinnesset al)
AnchorPrompt(Stanford SMI; Noy, Musenet al)
Similarity Flooding (Melnik, Garcia-Molina, Rahm)
IF-Map (Kalfoglou, Schorlemmer)
Using metrics to compare OWL concepts (Euzenat and Volchev)
QOM (Ehrig and Staab)
Corpus of Matches (O.Etzioni, A. Halevy, et.al.)
APFEL (Ehrig, Staab, Sure)
SAT Reasoning -S-Match (U. Trento; Serafiniet al)
Mapping Composition: Semantic gossiping (Abereret al),
Piazza (Halevyet al), PrasenjitMitra
100
Agenda
•Introduction
•Foundations of the Semantic Web
•Ontology Learning
•Learning Ontology Mapping
•Semantic Annotation
•Using Ontologies
•Applications
101
DAML
Onto-
Agents
Generate
Class
Instance
Attribute
Instance
Relationship
Instance
CREAM –CreatingMetadata
[K-CAP 2001;
WWW 2002]
102
AnnotationbyMarkup
Generate
Class
Instance
Attribute
Instance
Relationship
Instance
[K-CAP 2001]
Download of
markup-only
versionof
OntoMatfrom
http://annotation.
semanticweb.org
103
AnnotationbyAuthoring
Create Text and
if possible Links
out of a Class
Instance
Attribute Instance
Relationship
Instance
generates
simple text
[WWW 2002]
104
Annotation vs. Deep Annotation
Annotation
Input
Output
Ontology
Ontology
based-
Metadata
DB
DB
Mapping
Rules
DeepAnnotation
Input
Output
Ontology
Database
[WWW 2003]
105
Theannotationproblemin 4 cartoons
© Cimiano
106
Theannotationproblemfroma scientificpoint
of view
107
Theannotationproblemin practice
108
Theviciouscycle
109
CurrentState-of-the-art
•ML-basedIE (e.g.Amilcare@{OntoMat,MnM})
–start withhand-annotatedtrainingcorpus
–ruleinduction
•Standard IE (MUC)
–handcraftedrules
–Wrappers
•Large-scaleIE [SemTag&Seeker@WWW‘03]
–Large scalesystem
–disambiguationwithTAP
•(C-)Pankow(Cimianoet.al. WWW’04, WWW’05)
•KnowItAll(Etzioniet al. WWW‘04)
110
[EKAW 2002]
EU IST
Dot-Kom
Semi-automaticAnnotation
111
Comparison of CREAM and S-CREAM
Document
<hotel>
ZweiLinden
</hotel>
<city>
Dobbertin
</city>
<hotel>
ZweiLinden
</hotel>
<city>
Dobbertin
</city>
IE
M
A1
region
City
Hotel
accommodation
Thing
Located_at
ZweiLinden
Dobbertin
Located_at
?
Core processes: Input, Output

(M) Manual Annotation (OntoMat) Relational Metadata

(A1) Information Extraction (Amilcare) XML annotated Dokument
112
Different Results
<hotel> Zwei Linden </hotel>Zwei Linden InstOf Hotel
Zwei Linden Locatet_At Dobbertin
<city>Dobbertin</city>Dobbertin InstOf City
Zwei Linden Has_Room single_room_1
<singleroom>Single room</singleroom>single_room1 InstOf Single_Room
single_room1 Has_Rate rate1
rate1 InstOf Rate
<price>25,66</price>rate1 Price 25,66
<currency>EUR</currency>rate1 Currency EUR
Zwei Linden Has_Room double_room1
<doubleroom>Double room</doubleroom>double_room1 InstOf Double_Room
double_room1 Has_Rate rate2
rate2 InstOf Rate
<price>43,66</price>rate2 Price 43,46
<currency>EUR</currency>rate2 Currency EUR
Amilcare(IE-Tool)
OntoMat-Annotizer
113
Comparison of CREAM and S-CREAM
Document
<hotel>
ZweiLinden
</hotel>
<city>
Dobbertin
</city>
<hotel>
ZweiLinden
</hotel>
<city>
Dobbertin
</city>
IE
M
A1
region
City
Hotel
accommodation
Thing
Located_at
ZweiLinden
Dobbertin
Located_at
DR
Hotel
City
A2
A3
Hotel
City
Currently: Simple Centering-Modell
Future: LearnCoherencyRules
Core processes: Input, Output

(M) Manual Annotation (OntoMat) Relational Metadata

(A1) Information extraction (Amilcare) XML annotated Document
114
IE and Wrapper Learning
•Boosted wrapper induction
•Exploiting linguistic constraints
•Hidden Markov models
•Data mining and IE
•Bootstrapping
•First-order learning
115
Wrapper
No tutorial about IE and Wrapper learning but…
•IE often focuses on small number of classes
•Is not easily adaptable to new domains
•Needs a lot of trainings examples
Needed
•It would be great if IE would scale to a large number
of classes (concepts) on a large amount of unlabeled
data
116
SemTag
•The goal is to add semantic tags to the existing HTML
body of the web.
•SemTaguses TAP, where TAP is a public broad,
shallow knowledgebase.
•TAP Contains lexical and taxonomical information
about popular objects like music, movies, sports, etc.
Example:
“The Chicago Bulls announced that Michael Jordan will…”
Will be:
The <resource ref = http://tap.stanford.edu/Basketball
Team_Bulls
>Chicago Bulls</resource> announced yesterday
that <resource ref = “http://tap.stanford.edu/
AthleteJordan_Michael
”> Michael Jordan</resource> will...’’
Dill et al, SemTagand Seeker. WWW’03
117
SemTag
•Lookup of all instances from the ontology (TAP
) –65K
instances
•Disambiguate the occurrences as:
–One of those in the taxonomy
–Not present in the taxonomy
•Placing labels in the taxonomy is hard
•Use bag-of-words approach for disambiguation
•3 people evaluated 200 labels in context –agreed on only
68.5% -metonymy
•Applied on 264 million pages
•Produced 550 million labels and 434 spots
•Accuracy 82%
Dill et al, SemTagand Seeker. WWW’03
118
TheSelf-AnnotatingWeb
•Thereisa hugeamountof non-formalized
knowledgein theWeb
•Usestatisticsto interpretthisnon-formalized
knowledgeand proposeformal annotations:
semantics≈syntax+ statistics?
•Annotationbymaximal statisticalevidence
119
PANKOW: Pattern-basedANnotationthrough
KnowledgeOn theWeb
•HEARST1: <CONCEPT>s such as <INSTANCE>
•HEARST2: such <CONCEPT>s as <INSTANCE>
•HEARST3: <CONCEPT>s, (especially/including) <INSTANCE>
•HEARST4: <INSTANCE> (and/or) other <CONCEPT>s
•Examples:
–countriessuch as Niger
–such countriesas Niger
–countries, especiallyNiger
–countries, includingNiger
–Niger and othercountries
–Niger orothercountries
instanceOf(Niger,country)
120
Patterns (Cont‘d)
•DEFINITE1: the <INSTANCE> <CONCEPT>
•DEFINITE2: the <CONCEPT> <INSTANCE>
•APPOSITION:<INSTANCE>, a <CONCEPT>
•COPULA: <INSTANCE> is a <CONCEPT>
•Examples:
•theNiger country
•thecountryNiger
•Niger, a countryin Africa
•Niger isa countryin Africa
instanceOf(Niger,country)
121
PANKOW Process
122
Gimme‘ TheContext: C-PANKOW
•Contextualizethepattern-matchingbytakinginto
accountthesimilarityof theGoogle-abstractin whichthe
patternwas matchedand theoneto beannotated
•Download a fixednumbern of Google-abstracts
matchingso-called
clues
and analyzethemlinguistically,
matchingthepatternsoffline:
–matchmorecomplexstructures
–moreefficientas thenumberof Google-queriesonlydependson n
–moreoffline processing, reducingnetworktraffic
123
Comparison
74.37%29.35%682C-PANKOW
17.39%
(strict)
31%
26%
21%
24.9%
70.4%
>> 90%
Recall/
Accuracy
44%1200[Alfonseca02]
76%325[Hahn98]-CB
73%325[Hahn98]-CB
67%325[Hahn98] –TH
58.91%59PANKOW
n.a.8[Fleischman02]
n.a.3[MUC-7]
LearningAccuracy#System
LA based on least common superconcept
lcsof two concepts (Hahn et.al. 98)
124
Web-scale information extraction
KnowItAllIdea:
–Web is the largest knowledge base
–The goal is to find all instances corresponding to a given concept in the
web and extract them
The System is:
–Domain-Independent
–Use Bootstrap technique
–Based on Linguistic Patterns
KnowItAllvs(C-)Pankow
-Pankowstarts from a Web page and annotates a given term on the
page using the Web
-KnowItAllstarts from a concept and aims at finding all instances on the
Web
O.
O.
Etzioni
Etzioni
, 2004.
, 2004.
125
References Semantic Annotation
•S. Handschuh, S. Staab(eds.). Annotation for the Semantic Web. IOS Press, 2003
•P. Cimiano, S. Handschuh, S. Staab. Towards the Self-annotating Web. 13th International World
Wide Web Conference, WWW 2004, New York, USA, May 17-22, 2004.
•Siegfried Handschuh, Creating Ontology-based Metadata by Annotation for the Semantic Web,
PhD Thesis, 2005.
•O. Etzioni, M. Cafarella, D. Downey, S. Kok, A.-M. Popescu, T. Shaked, S. Soderland, D.S.Weld,
and A. Yates. Web-scale information extraction in KnowItAll(preliminary results). In
Proceedings
of the 13th World Wide Web Conference
, pages 100–109, 2004.
•S. Dill, N. Eiron, D. Gibson, D. Gruhl, R. Guha, A. Jhingran, T. Kanungo, S. Rajagopalan, A.
Tomkins, J.A. Tomlin, and J.Y. Zien. Semtagand seeker: bootstrapping the semantic web via
automated semantic annotation. In
Proceedings of the 12th International World Wide Web
Conference
, pages 178–186. ACM Press, 2003.
•S. Brin. Extracting patterns and relations from the World Wide Web. In Proceedings of the WebDB
Workshop at EDBT ’98, 1998.
•F. Ciravegna, A. Dingli, D. Guthrie, and Y. Wilks. Integrating Information to Bootstrap Information
Extraction from Web Sites. In Proceedings of the IJCAI Workshop on Information Integration on
the Web, pages 9–14, 2003.
•H. Cui, M.-Y. Kan, and T.-S. Chua. Unsupervised learning of soft patterns for generating
definitions from online news. In Proceedings of the 13th World Wide Web Conference, pages 90–
99, 2004.
•U. Hahn and K. Schnattinger. Towards text knowledge engineering. In AAAI’98/IAAI’98
Proceedings of the 15th National Conference on Artificial Intelligence and the 10th Conference on
Innovative Applications of Artificial Intelligence, 1998
126
Agenda
•Introduction
•Foundations of the Semantic Web
•Ontology Learning
•Learning Ontology Mapping
•Semantic Annotation
•Using Ontologies
•Applications
127
Using Ontologies
Ontologiesas:
•background knowledge for text clustering and
classification
•basis for recommender systems
•background knowledge in ILP
•knowledge for models in Statistical Relational
Learning
128
Text Clustering & Classification Approaches
clustering/
classification
algorithm
Documents
Bagof Words
background
knowledge
omanhasgranded…
Obj122
Obj211
Obj32

Obj42

1

0

00
0
0
129
Oman
has
granted
term
crude
oil
customers
retroactive
discounts
...
2
1
1
1
1
2
1
1
1
...
Bagof Words
Dok17892 crude
=============
Oman has granted term crude oil
customers retroactive discounts from
official prices of 30 to 38 cents per barrel
on liftingsmade during February, March
and April, the weekly newsletter Middle
East Economic Survey (MEES) said.
MEES said the price adjustments, arrived
at through negotiations between the
Omani oil ministry and companies
concerned, are designed to compensate
for the difference between market-
related prices and the official price of
17.63 dlrsper barrel adopted by non-
OPEC Oman since February.
REUTER
Documents
Furtherpreprocessingsteps
-Stopwords
-Stemming
Text Clustering & Classification Approaches
130
109377 Concepts
(synsets)
WordNetas an exampleand ontology
144684 lexical
entries
Root
entity
something
physicalobject
artifact
substance
chemical
compound
organic
compound
lipid
oil
EN:oil
covering
coating
paint
oilpaint
cover
cover withoil
bless
oil, anoint
EN:anoint
EN:inunct
oilcolor
crudeoil
144684 lexical
entries
Use of superconcepts
(Hypernymsin Wordnet)
•Exploit more generalized concepts
•e.g.: chemical compoundis the
3rd superconceptof oil
Strategies:
all, first, context
131
Ontology-basedrepresentation
strategy: add
Oman
has
granted
term
crude
oil
customers
retroactive
discounts
...
1
1
1
1
1
1
1
1
1
...
1
1
1
1
1
1
1
1
1
1
1
...
Oman
granted
term
(C) term
crude
(C) crude
oil
(C) oil
customer
(C) customer
...
2
1
1
1
1
1
1
1
1
1
1
...
3
Oman
granted
term
(C) term
crude
(C) crude
oil
(C) oil
(C) lipid
(C) compound
...
132
Evaluation of Text Clustering
0,616
0,618
0,570
0,300
0,350
0,400
0,450
0,500
0,550
0,600
0,650
addrepladdonlyrepladdonlyrepladdonlyrepladdonlyrepladdonlyrepladdonly
contextcontextfirstallcontextfirstall
005
falsetrue
tfidf - 30
without - 30
CLUSTERCOUNT
60
EXAMPLE
100
MINCOUNT
15
Mittelwert - PURITY
ONTO
HYPDEPTH
HYPDIS
HYPINT
WEIGHT
PRUNE
backgro..
depth
disambig.
integrat.
Evaluation parameter
•min 15, max100, 2619 documents
of thereuterscorpus
•clusterk = 60, withBiSec-KMeans
avg-purity
133
Evaluation: OHSUMED Classification Results
Top 50 classeswithWordNetand AdaBoost
134
Combine FCA & Text-clustering
1.preprocessReuters documents and enrich them
with background knowledge (Wordnet)
2.calculate a reasonable number
k
(100) of
clusterswith BiSec-
k
-Meansusing cosine
similarity
3.extract a descriptionfor all clusters
4.relate clusters (objects) with FCA
5.use the visualization of the concept lattice for
better understanding
135
Explaining Clustering Results with FCA
compound, chemical compound
oil
refiner
chain of concepts with
increasing specificity
136
Explaining Clustering Results with FCA
Crude oil
barrel
137
Explaining Clustering Results with FCA
resin
palm
•Resultingconceptlatticecan
also beinterpretedas a
concepthierarchydirectlyon
thedocuments
•all documentsin onecluster
obtainexactlythesame
description
138
Using Ontologies
•Wordnetand IR
–Query expansion with wordnetdoes not really improve the performance
Ellen M. Voorhees, Query expansion using lexical-semantic relations, Proceedings of the 17th annual
international ACM SIGIR conference on Research and development in information retrieval, p.61-69, July
03-06, 1994, Dublin, Ireland
•Text Clustering and Ontologies
–Wordnetsynsetchains
Green: WordnetChains (Stephen J. Green. Building hypertext links by computingsemantic similarity.
IEEE
Transactions on Knowledge and Data Engineering (TKDE)
, 11(5):713–730, 1999.
–Dave et.al.: worse results using an ontology (no word sense disambiguation)
(KushalDave, Steve Lawrence, and David M. Pennock. Mining the peanut gallery: opinion extraction and
semantic classification of product reviews. In
Proceedings of the Twelfth International World Wide Web
Conference, WWW2003
. ACM, 2003.)
–Part of Speech attributes and named entities used as features
(VasileiosHatzivassiloglou, Luis Gravano, and AnkineeduMaganti. An investigation of linguistic features and
clustering algorithms for topical document clustering. In
SIGIR 2000: Proceedings of the 23rd Annual
International ACM SIGIR Conference on Research and Development in Information Retrieval, July 24-28,
2000, Athens, Greece
. ACM, 2000.)
139
Using Ontologies
A kind of statistical concepts
•Calculating a kind of statistical concept and combine them with the classical bag of
words representation
L. Caiand T. Hofmann. Text Categorization by Boosting Automatically Extracted Concepts. In
Proc. of the 26th
Annual Int. ACM SIGIR Conference on Research and Development in InformaionRetrieval
, Toronto, Canada,
2003.
•Clustering word to setup a kind of concepts
G. Karypisand E. Han. Fast supervised dimensionality reduction algorithm with applications to document
categorization and retrieval. In
Proc. of 9th ACM International Conference on Information and Knowledge
Management, CIKM-00
, pages 12–19, New York, US, 2000. ACM Press.
•Clustering words and documents simultaneously
InderjitS. Dhillon, YuqiangGuan, and J. Kogan. Iterative clustering of high dimensional text data augmented by
local search. In
2nd SIAM International Conference on Data Mining (Workshop on Clustering High-Dimensional
Data and its Applications)
, 2002.
140
Using Ontologies
Text Classification and Ontologies •Using Hypernymsof wordnetas concept feature (no WSD, no significant better
results) Sam Scott , Stan Matwin, Feature Engineering for Text Classification, Proceedings of the Sixteenth International Conference on Machine
Learning, p.379-388, June 27-30, 1999
•Brown Corpus tagged with Wordnetsenses does not shows significant better results.
A. Kehagias, V. Petridis, V. G. Kaburlasos, and P. Fragkou. A Comparison of Word-and Sense-Based Text Categorization Using Several
Classification Algorithms.
Journal of Intelligent Information Systems
, 21(3):227–247, 2000.
•Map terms to concepts of the UMLS ontology to reduce the size offeature set, use
search algorithm to find super concepts, evaluation using KNN and medline
documents, show improvement.
B. B. Wang, R. I. Mckay, H. A. Abbass, and M. Barlow. A comparative study for domain ontology guided feature extraction. In
Proceedings of the 26th Australian Computer Science Conference (ACSC-2003)
, pages 69–78. Australian Computer Society, 2003.
•Generative model consist of feature, concepts and topics, using Wordnetto initialize
the parameter for concepts, evaluation on Reuter and Amazon corpus
Georgiana Ifrim, Martin Theobald, Gerhard Weikum, Learning Word-to-Concept Mappings for Automatic Text Classification Learning in
Web Search Workshop 2005.
141
Using OntologiesReferences
•Stephan Bloehdorn, Andreas Hotho: Text Classification by Boosting Weak Learners based on
Terms and Concepts. ICDM 2004: 331-334
•Andreas Hotho, Steffen Staab, GerdStumme: OntologiesImprove Text Document Clustering.
ICDM 2003: 541-544
•Andreas Hotho, Steffen Staab, GerdStumme: Explaining Text Clustering Results Using Semantic
Structures. PKDD 2003: 217-228
•Stephan Bloehdorn, PhilippCimiano, and Andreas Hotho: Learning Ontologiesto Improve Text
Clustering and Classification, Proc. of GfKl, to appear.
142
Ontology-basedRecommenderSystem
(Middleton, Shadbolt
2004)
143
Inferencing
Improved
recommendation
accuracy
Lessproblemswith
coldstart
(user/System)
144
Ontologiesand Recommender References
•Middleton, S. E.; DeRoure, D.; and Shadbolt, N. R. 2003. Ontology-based recommender systems.
In Staab, S., and Studer, R., eds.,
Handbook on Ontologies
. Springer.
•Peter Haase, Marc Ehrig, Andreas Hotho, BjörnSchnizler, Personalized Information Access in a
Bibliographic Peer-to-Peer System, In Proceedings of the AAAI Workshop on Semantic Web
Personalization, 2004, pp. 1-12. AAAI Press, July 2004.
•Peter Haase, Andreas Hotho, Lars Schmidt-Thieme, York Sure: Collaborative and Usage-Driven
Evolution of Personal Ontologies. ESWC 2005: 486-499
145
Agenda
•Introduction
•Foundations of the Semantic Web
•Ontology Learning
•Learning Ontology Mapping
•Semantic Annotation
•Using Ontologies
•Applications
146
Application: Data Integration
•Data integration identified as $100Bs world-wide market
–with significant govt interest creating a user-pull
•Ontology development efforts, in OWL, aimed at information mgt
ongoing in US govt include
–NIST, NLM, EPA, DHS, DoD, DOJ, FDA, NIH, USGS, NOAA
•Huge potential follow-on market -EAI for the small
business
–making external data and info resources integrable
•Could do for integration what Visicalc (excel) did for report
generation
147
Company-wideKnowledgeManagement
Project at Deutsche Telekom
Goals
•MaketheCompany‘sCompetences
•context
•visible
•usable
•Increaseefficiencyin salesand
consulting
Result
•Integration of heterogeneousSources
•GuidedSearch
Application: OntopriseSemanticMiner
148
Background
•65% of all customer in the
manufacturing industry change their
suppliers because there are not
satisfied with the service
•Service engineers spend a lot of
time with known problems
Goal
•Capturing and usage of engineers
and experts know-how
•Decision support for choosing the
right solution
•Increase customer satisfaction
Implementation
•Semantic Customer Service Support
Whydo KUKA RoboticsapplySemantic
Technologies
149
SemanticGuide: embeddedin SAP CS
& MAM
150
Application: Web Services
UltimateGoal: Applicationbuildingbydomain-experts
ratherthanbysoftwareengineers
–Avoidexpensivecommunicationof knowledge
–Fasterresponseto marketneeds
•OntologyLearningforWeb Services:
CreatingSemanticDescriptionsfromotherkindof
structures(Sabouet al. WWW2005)
•AnnotatingWeb Services bysemantics
•Usageof both:
Daniel Oberle„SemanticManagement of Web Services“,
Springer 2005/2006
151
Applications: Bibster
152
Application: Project Halo
•Knowledgeacquisitionfromtextbooks
•Wikipedialike,
•forformal
knowledge
153
Application: Project Halo