Powerpoint - Loria

disgustedtukwilaInternet and Web Development

Dec 14, 2013 (3 years and 4 days ago)

62 views



Ontologies in Terminology Work


Enabling Controlled Authoring

Dr Jörg Schütz

IAI Saarbrücken

j
oerg@iai.uni
-
sb.de

Copyright © 2001 by Dr. Jörg Schütz

What is it about?


Technical information and asserted knowledge in
SGML/XML environments


Acquisition


Production


Translation
-

localization, internationalization, globalization


Dissemination


Assimilation


Shared corporate (public) knowledge backbone for


guiding and supporting the processes


exchange with suppliers and customers


Application and deployment scenarios

Copyright © 2001 by Dr. Jörg Schütz

Road map


An introduction to the technology used


Why ontologies as knowledge backbone?


How ontologies support and guide the processes


How to build and constrain an ontology


The ontology life
-
cycle (steps and rules)


How to represent an ontology (exchange formats)


Application examples


Conclusion


Questions and answers

Copyright © 2001 by Dr. Jörg Schütz

What is Multidoc Technology?


Controlled Language enabling technology that identifies


spelling mistakes


grammar errors


stylistic weaknesses


terminology misuse and inconsistency


in SGML/XML encoded information objects


Based on IAI’s core language technology and a
prototype designed and built in an EC project with
automotive partners, a translation company and a
translation technology provider

BMW, Jaguar, Renault, Rolls
-
Royce, Saab, Volvo, and ITR and STAR

Copyright © 2001 by Dr. Jörg Schütz

What is Language Technology?


“Research, development and production of methods,
methodologies and tools for the processing and the
deployment of human language through computer
systems.


We distinguish between
software

for the algorithmic
and programmatic implementation, and
lingware

for
the formalization of the language knowledge
resources (lexicons, terminologies, grammars and
ontologies).”

Copyright © 2001 by Dr. Jörg Schütz

What is a Controlled Language?


Consists of


well
-
defined, unambiguous vocabulary (aka
terminology, preferably multilingual)


certain grammatical restrictions


set of writing guidelines for different types of
information objects (aka style rules)


Formalized in rules for computer use (similar to business
rules)


IF
conditions
THEN
actions


Rule engine operating on the results of the linguistic analysis
module (linguistic engine)

Copyright © 2001 by Dr. Jörg Schütz

Why is CL important?


Supports the complete information cycle consisting of
acquisition, production, translation, dissemination and
assimilation of technical information in several
dimensions:


terminological standardization


accuracy


readability and comprehensibility


reusability and maintainability


cost
-
effective, controllable and benchmarkable translation


reduced lead times and faster time to market


Copyright © 2001 by Dr. Jörg Schütz

How does it work?
\
1

Morphological

Analyser+












Spell Checker

Gram. Checker


Term Checker

Style Checker

Grammar

Rules

Term

DBs

Style

Rules

Lexicons

DTD Info

Copyright © 2001 by Dr. Jörg Schütz

How does it work?
\
2

Resources

LT

Modules

Formats

Language Technology

Grammar & Style

Terminology


Orthography

Information Mapping*

SGML/HTML/XML Tags

of a DTD or Schema

* Conformance Checking

Copyright © 2001 by Dr. Jörg Schütz

Why ontologies?


Create and maintain the vocabulary of the domain


Product data: parts lists, nomenclature, ...


Terminology: different user profiles, views, ...


Translation: styles, cultures, regulations, ...


Support the different processes and the knowledge
workers (writers, translators, workshop employees,...)


Encode knowledge about the domain beyond the capabilities
of term banks, specialized dictionaries, glossaries, indices and
thesauri


Search, retrieval and filtering of technical information


Sharing and exchange of technical information


Enable automation, e.g. controlled language application

Copyright © 2001 by Dr. Jörg Schütz

How ontologies guide and support


Highly structured repository of knowledge making
explicit the


concepts


attributes of concepts


properties of concepts


relationships between concepts


of a given domain


Encoded (multilingual) term formation rules for


Term mining (identify new terms/concepts)


Term seeding (distribute new terms/concepts)


within a given domain

Copyright © 2001 by Dr. Jörg Schütz

How to build an ontology


definition


“An explicit formal specification of how to represent
the objects, concepts, and other entities that are
assumed to exist in some area of interest and the
relationships that hold among them.”

[free online
dictionary of computing]




“An explicit specification of some subject field. In the
context of our project, it is a formal and declarative
representation of the automotive service and repair
domain (operating system domain)


vocabulary and
rules.”



Copyright © 2001 by Dr. Jörg Schütz

How to build an ontology


concept


... is a language
-
neutral symbol that is used to
represent meaning corresponding to a distinguished
type of entity.


... is characterized by


a unique label


an associated (intensional) definition


a set of characteristics encoded as attribute
-
value pairs


Example


CAR


#_of_doors



2/4/5





left/right
-
hand
-
drive


left/right





turbo




yes/no





catalyst




yes/no

Copyright © 2001 by Dr. Jörg Schütz

How to build an ontology


relation


... provides information about links between two (or
more) concepts and reflects how the respective
entities are related to each other:


super
-
concept/sub
-
concept (hyperonymy/hyponymy)


whole/part (meronymy)


participant role


Example


isa(<subordinate_concept>,<super_ordinate_concept>)


hasa(<whole>,<part>) and partof(<part>,<whole>)

Copyright © 2001 by Dr. Jörg Schütz

How to build an ontology


relation/2


Example


effector(<activity>,<entity>)


effected(<activity>,<entity>)


affected(<activity>,<entity>)


affector(<activity>,<entity>)


location(<activity>,<entity>)


instrument(<activity>,<entity>)


attribuant(<state>,<entity>)

Copyright © 2001 by Dr. Jörg Schütz

How to build an ontology


qualia


Need for multidimensional views to account for
different facets, e.g.


[gasoline] as a [liquid]


[gasoline] as a [combustible]


Qualia Theory of Pustejovsky (“mode of explanation”)


<constitutive> the relation between an object and its
constituent parts


[cons] with [whole] and [part]


<formal> that which distinguishes it within a large domain


[form] (isa)


<telic> its purpose and function


[purp] (purpose)


<agentive> factors involved in its origin or “bringing it
about”


[orig] (origin)


Quale meta
-
entity: qua(<quale>,<concept>)

Copyright © 2001 by Dr. Jörg Schütz

How to build an ontology


schema

Copyright © 2001 by Dr. Jörg Schütz

Ontology life
-
cycle


Information gathering and structural design


Definition of concepts


Definition of constraint


Population with instances (terminology integration)


Delivery


Deployment

Development basis of the Multidoc Ontology

automotive documentation; software documentation

automotive terms (13,800 EN; 15,000 DE; 12,000 FR; 6,800 SE);
software terms (9,300 EN)

Copyright © 2001 by Dr. Jörg Schütz

Ontology snap
-
shot

Copyright © 2001 by Dr. Jörg Schütz

Ontology snap
-
shot

Copyright © 2001 by Dr. Jörg Schütz

Exchange formats


XML based including the tools used for creation and
maintenance (ontology and associated population)


Possible exchange formats:


Pure XML (serialized with DTD or schema)


Ontology exchange formats such as OIL, or even KIF and
CGs (very close in notation, ANSI standard)


SALT’s XLT format (ISO 12620 related)


Topic Maps (ISO 13250 and XTM 1.0)


Evaluation of Topic Maps


Electronic information repository


Publishing and information/knowledge management


Integration, support and exchange with other resources of
corporate knowledge management systems

Copyright © 2001 by Dr. Jörg Schütz

Topic Maps (ISO 13250)


Genesis dates back in the early 1990: interchange of
computer documentation based on SGML; navigation
within this information resource similar to indices


Creation of the SGML DocBook DTD


Work on “Topic Navigation Maps” as a HyTime
application


Specification of Topic Maps evolved as a general
navigation enabling model by using independent (or
out
-
of
-
line) linking and addressing mechanisms, and a
basis for querying that extends full
-
text search


ISO Standard at the end of 1999


XML version at the end of 2000 (XTM)

Copyright © 2001 by Dr. Jörg Schütz

Topic Maps


ISO 13250 Intro


“This International Standard provides a standardized
notation for interchangeably representing information
about the structure of information resources used to
define topics, and the relationships between topics. A
set of one or more interrelated documents that
employs the notation defined by this International
Standard is called a topic map.”

Copyright © 2001 by Dr. Jörg Schütz

Topic Maps


main building blocks


Topic


topic types (categorization of a topic)


topic names (base name, display name, sort name)


Occurrence

(linked resources)


occurrence role (mnemonic)


occurrence type (link to the topic which further characterizes
the role)


Association

(relationship between two or more topics)





association types


association roles



Concept



Instance



Relation

Copyright © 2001 by Dr. Jörg Schütz

Topic Maps


more building blocks


Identity

-

enables the merge of topic maps to
accomplish semantic equivalence of a single topic
that is the union of the characteristics of two or more
topics (cf. public subject, topic map grove)


Facets

-

enable the assignment of meta
-
data through
attribute/value pairs (facet/facet value); query filter
creation; not used to qualify topic map elements


facet value name (token)


facet value type (reference to a topic which further qualifies
the relevance of the value)


Scope

-

the limit of validity of an assignment of a
topic’s characteristic (name, occurrence, association)

Copyright © 2001 by Dr. Jörg Schütz

Topic Maps


any buts?


Everything in a topic map is a topic


all types are defined as topics


scope is defined in terms of themes which are topics


Powerful model allowing


self
-
documentation


ontology description (the things the topic map consists of)


efficient navigation and querying


Control information


topic map processing


topic map templates (declarative part of a topic map)


Further research


query language, constraint schemas, user profiles, ...

Copyright © 2001 by Dr. Jörg Schütz

Applications using ontologies


Controlled language based authoring


Term mining


Classification/indexing


Retrieval and filtering


Customer basis


automotive industry


printing machines industry


software industry

Copyright © 2001 by Dr. Jörg Schütz

Highlights of the CL Product
\
1


Customizable


terminology import based on XML schema (ISO 12620
inspired)


rules according to existing in
-
house style guidelines


linguistic engine adaptable to certain SGML/XML DTD or
schema elements (triggers processing and thus performance
of the linguistic engine)


Integratable into and support of existing IT
infrastructure


SGML editors, DMS, IMS and KMS


Available for several languages


DE, EN, FR, SE, ...

Copyright © 2001 by Dr. Jörg Schütz

Highlights of the CL Product
\
2


Increases hit rates of Translation Memory (TM)
systems (can also check TM content)


Enables the setup of quality assurance processes
(quality index and translatability index), e.g. SAE
J2450


Supports B2B and B2C processes and operations


Linguistic Engine has well
-
defined input/output
behavior (XML based) to allow for different
deployment scenarios

Copyright © 2001 by Dr. Jörg Schütz

Highlights of the CL Product
\
3


Client/Server implementation


GUI with error rendering, navigation and editing capability is
implemented in Java


Client (glue code between GUI and LE) is implemented in C


Linguistic Engine is implemented in C

OS Platforms:

Solaris, Linux, HP
-
UX, Windows NT and 2000

Copyright © 2001 by Dr. Jörg Schütz

Releases available


Integrated SGML/XML editor (tag
-
save, tag
-
sensitive)


Full navigation within loaded information object
(multithreading capability)


Server architecture with integrated management facility
for user/group and resources client configuration


Fully
-
fledged development environment for rule
customization


MS Word (MS Office) integration for mass market

Copyright © 2001 by Dr. Jörg Schütz

Customer References


BMW AG
-

German (codename: Multilint)


Several German SMEs (codename: Tetris)


Sun Microsystems
-

English (codename: SunProof)


Volvo Car and Volvo Truck
-

Swedish


Saab
-

Swedish


Rolls
-
Royce and Bentley Motor Cars
-

English


Heidelberger Druckmaschinen AG
-

German


Siemens AG
-

German


DUDEN (language assistant)
-

German


Plan Software (electronic catalogues)
-

DE/EN/FR

Copyright © 2001 by Dr. Jörg Schütz

Demonstration on request

Copyright © 2001 by Dr. Jörg Schütz

Conclusion


Important aspects of an ontology as knowledge
backbone


increases productivity


avoids inconsistency


permits efficient exchange


allows for effective merge


Standardized model for language technology that


supports integration


effective deployment


Multilingual repository that guides and supports


(traditional) translation tasks


localization, internationalization and globalization processes

Copyright © 2001 by Dr. Jörg Schütz

Further information


Ontologies in general


www.cyc.com


www.bestweb.net/~sowa


www.ontology.org


www.ontoknowledge.org


...


Topic map


www.topicmap.org


www.infoloom.com


www.topicmap.com (empolis/Bertelsmann)


www.mondeca.com


www.ontopia.net


...

Copyright © 2001 by Dr. Jörg Schütz