S e m a n t i c H e t e r o g e n e i t y i n G l o b a l I n f o r m a t i o n S y s t e m s

blaredsnottyΤεχνίτη Νοημοσύνη και Ρομποτική

15 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

138 εμφανίσεις

Seman tic Heterogeneit y in Global nformation I Systems
The ole R f o Metadata on C text and On tologies

Vipul Kash y ap and Amit S heth

LSDIS D ept of Computer Science Univ f o G eorgia A thens GA

Dept f o Computer Science Rutgers Univ ersit y New Brunswic k NJ
June
Abstract
Seman tic heterogeneit y has b een iden tid as ne o of the ost m imp o rtan t and toughest p rob
lems when dealing with in terop erabil i t y and co op eration among m ultiple databases It w as
earlier studied in the con text of exc hanging sharing and in tegrating data esp e cially during the
sc hemaiew analysis phase o f sc hema or view in tegration or when writing a view or query
using a m ultidatabase anguage l With the adv tof oblgla in terconnectivi t y eno wneed to
deal with more heterogeneous information resources consisting of a v ariet y o f d igital data and
the scale of the problem has c hanged from a few d atabases to millions of information esources r
th us making it more imp ortan t than ev er to address this problem It is also recognized that the
problem has o nly b ecome harder and that simplistic solutions in v olving only represen tational
or structural comp onen ts of data will not w ork b e y ond a v ery restricted set of cases
In this c hapter w e explore approac hes to tac kle the seman tic heterogeneit y p roblem in the
con text of Global Information Systems GIS whic h are systems geared t o andle h information
requests on the Global Information Infrastructure I I These a pproac hes are based on the
capture and r epresen tation of metadata on c texts and on tologies In order to handle infor
mation overlo ad t w ould b e adv an tageous to bstract a o ut the represen tational details of the
underlying data and capture the information on c tb y using domain sp e ci metadata eh
next imp ortan t step is t hat of u nderstanding he t con text of the q uery u sing metadata to con
struct the con text and den i tifying the relev t data i n that on c text Another critical issue hat t
arises h ere is hat t of dir ent vo c abularies used to c haracterize similar information W e presen t
an approac h to deal with this problem at t he metadataon text lev b y using terms from do
main sp e ci ontolo gies to construct metadataon text W e eal d with seman tic heterogeneit y
at this lev el and prop ose an approac h using terminolo al r elationships hiev esmean tic
in terop erabili t y
In tro duction
Man y organizations face the c hallenge f o in terop erating among m ultiple indep enden d elop ed
database systems to p erform critical functions Three of the b est kno wn approac hes o t d eal with
m ultiple databases are tigh tlyoupled federation lo oselyoupled f ederation and i n terdep enden t
data managemen t SL She A critical task in creating a tigh tlyoupled federation is that of
sc hema in tegration H A ritical c task in accessing d ata in a lo selyoupled o ederation f
A HM stodne eavwoie v er m ultiple d atabases or to dee a query sing u a m ultidatabase
language The problem of seman tic eterogeneit h hic y h is eed d in K s identiation of
semantically related objects i n dirent atabases d and the resolution o f schematic dirences mong a
them is a critical issue in an y of the ab o v e three tasks
Ho w ev er with global in terconnectivit yw eno w need to deal w ith more heterogeneous informa
tion resources consisting f o a v ariet y o f digital data Huge amoun ts of digital data in a v ariet y
of structured relational databases semitructured eail messages and u nstructured
App ears as a c hapter in Co op er ative Information ystems S Curr ent T r ends and Dir e ctions apazoglou
and G Sc hlageter editors




ev tly
ac to gic
el
an
ten

en image ata d formats ha v e b een c ollected and stored in thousands of utonomous a r ep ositories
and CD OMs Ardable m ultim edia systems allo w c reation f o m ultim edia data and supp ort ac
cess and presen tation of suc h data These digital rep ositories are increasingly b eing made a v ailable

on the fast ev olving GI I o f whic h the W orld Wide W eb L is n a o ftited and p o pular example
A GIS no w h as to deal with milli ons of information resources s opp o sed to a f ew databases i n
am ultidatabase federation and simplistic solutions in v olving only represen tational or structural
comp onen ts of data will not w ork b ey ond a v ery restricted set of cases
In this c hapter w e explore approac hes that use metadata c ontext and ontolo gies to handle the
seman tic heterogeneit y p roblem in a IS G Tw o asic b comp onen ts of these approac hes re a igure
Use of metadata to capture the information c ontent of the data in the underlying rep ositories
In tensional descriptions constructed from metadata and termed as metadata c ontexts
c ontexts are used to abstract from the structure and organization o f t he individual rep ositories
T erms oncepts roles in domain sp eci on tologies are sed u to c haracterize con textual de
scriptions and are called c eptual c ontexts c ontexts nema tic in terop e rabilit yisac hiev ed
b y using terminological relationships b t e w een terms across on tologies
ONTOLOGICAL-TERMS
VOCABULARY
(Domain-specific, application-specific)
used-by
used-by
METADATA/CONTEXT
CONTENT
(intensional descriptions)
abstracted-into abstracted-into
REPOSITORIES
STRUCTURE
(Autonomous, Multiple formats)
Figure The basic comp nen o ts of our approac hes
The k ey objective of such app roaches should b e to reduce the p roblem of kno wing the contents and
the structure of each of the uge h numb er of info rmation ep r osito ries to the s igniantly smaller p roblem
of kno wing the contents of the d omain sp eci ontologies which a u ser familia r ith w t he domain is
lik ely to kno wo r e asily nderstand u In this c hapter w e demonstrate the n eed for tec hniques whic h
go b ey ond the structural and r epresen tational c omp onen ts of data and fo cus o n the application of
those tec hniques to structured atabases d
Diren tt yp es of metadata ma y b e stored n i the system indices c s hema information

The Rufus LS a nd the InfoHarness SKS systems se u automatically generated metadata
to access and retriev e eterogeneous h information indep enden tof t yp e represen tation nd a lo cation
In Section w e d iscuss the d iren t kinds of metadata and resen p t n a informal c lassiation W e
iden tify and prop ose domain sp e ci metadata as the k ey for solving the seman tic heterogeneit y
problem SS
Section iscusses d the construction of con texts from domain sp eci on tologies and their r ep
resen tation in a formalism that can b e easily mapp ed S o t a description logic DL expression
Issues of language and on tology in v olv b a o v e are also discussed T hese con texts are used
to represen t extr aknwole dge ab out he t information con ten t f o the database whic hma y n ot b e repre
sen ted in the sc hema of the database A user query can also b e represen ted as a con text Schema
c orr esp ondenc es S that capture the asso ciations b e t w een on c texts a nd the u nderlying data
are also discussed
The k ey to in terop erabilit yis vo c abulary sharing among the in tensional mon text and con text
descriptions asso ciated with the v arious databases Diren t c oncepts ma y b e u sed o t design con
textual descriptions for diren t atabases d W e assume the existence of applic ation and domain
sp e ci ontolo gies describing the information con ten tof the v arious databases from w hic h con tex
tual expressions ma y b e c onstructed In f act on tologies are view ed in our approac h as a sp ecial case

the in ed

onc
of domain sp e ci metadata n I S ection w e resen p t an approac h for seman tic i n terop erabilit y us
ing terminological relationships across o n tologies W e d iscuss the OBSER VER protot yp e KSI
whic h demonstrates the use f o synonym relationships to a c hiev e seman tic n i terop erabilit y E xten
sions of the ab o v eusngi hyp and hyp ernym relationships KIS a re also discussed Section
presen ts future a nd ongoing w ork a nd our c onclusions
What is Metadata
Figure illustrates the t w ocopmoenn ts of our pproac a h for addressing the information o v erload
problem in the GI I M etadata is the piv otal idea on whic h b oth the comp onen ts dep end he T func
tion of the metadata descriptions is to b e able to abstract out and c apture t he essen tial information
in the underlying d ata indep endent of r epr esentational details his T epresen r ts the st tep s in re
duction of the information o v erload as metadata descriptions are in general an rder o of magnitude
less in size than the underlying data n I this s ection w e iscuss d in detail our notion of etadata m
the v arious t yp es of metadata and the information they capture
Metadata in its most general sense is deed a s d ata o r information ab out data F or structured
databases the most common example of metadata is the sc hema of the database Ho w er with the
proliferation of v arious t yp es of m ultim edia data on the I G I w e shall refer to an expanded notion
of metadata of whic h the sc hema of structured databases is a mall p art W e u se metadata to
store deriv ed prop erties of media useful in information access or r etriev al They a m y d escrib e r o b e
a summary of the information con ten t of the data describ ed in an in tensional anner m They ma y
also b e used to represen t p rop erties of or relationships b et w een individual ob jects of heterogeneous
t yp es and media W eno w discuss a lassiation c of the d iren tt yp es of metadata and c haracterize
the amoun t of i nformation con ten t hey t capture W ealois edn tify the t yp es of metadata whic hwill
pla ya k ey role in enabling semantic inter er ability
A Classiation of Metadata
W eno w presen t a classiation of the v arious t yp es of metadata used b yv arious researc hers to
capture the information con ten t r epresen ted in the v arious t yp es of digital data able
Con ten t Indep enden t etadata M This t yp e o f etadata m captures information that do es not de
p end on the con t of the do cumen t w ith whic h it is asso ciated xamples E of this t yp e of
metadata re a lo c ation mo ationate of a o d c umen t and typ efensor used to record a
photographic image There is no information con ten t aptured c b y t hese metadata but hese t
migh t still b e useful for retriev al of do cumen ts from their a ctual ph ysical o l c ations and for
c king whether he t information i s upto date or not
Con ten t Dep enden t Metadata This t yp e f o m etadata dep ends on the con ten t of he t do cumen t
it is asso ciated with Examples of con ten t d t etadata m are size of a do cumen t max
c olors numb erf ows umb n erf olumns of an image W eno wpesnre t a categorization of
con ten t dep enden t metadata
Direct Con ten tased Metadata This t yp e of metadata is b ased directly on the con ten ts
of a d o umen c t A p opular example of this is fullext ndices i based on the t ext of the
do cumen ts Inverte dtr e e and cument e v ctors are xamples e of this t yp e of metadata
Con ten tescri p ti v e Metadata This t yp e of m etadata d escrib es the con ten ts of a o d u c
men t without direct utilization of the on c ten ts of the do c umen t An example of this t yp e
of metadata is textual annotations escribing d the c on ten ts of an image This t yp e of
metadata comes i n t w oa v ors
Domain Indep enden t etadata M These etadata m capture information presen t in the
do cumen t indep enden t o f the application or sub ject d omain of the information Ex
amples of these are he t C p e es and HTMLGML do cument typ e de
nitions

tr arse
do
enden ep
hec
di
ten
op
ev
onymDomain Sp eci Metadata Metadata of this t yp e s i escrib d d e in a manner sp eci
to the a pplication or sub j ect domain of information Issues of v o abulary c b ecome
v ery mp i ortan t i n this case as the terms ha v etobe c hosen in a d omain p s e ci
manner Examples of suc h m etadata are r elief land over from the GIS d omain
and ar e a p opulation from the Census domain In the case o f structured data the
database sc hema is an example of suc h etadata m Another in teresting xample e is
domain p s e ci ontolo gies erms t from whic hma ybe usde as v o cabulary o t c onstruct
metadata sp eci to that domain
Metadata Media T yp e Metadata T yp e
Q eatures H Image Video Domain Sp eci
R eatures JH Image Video Domain Indep e nden t
R eatures JH Image Video Con ten t ndep I enden t
Impression V ector KH Image Con t D escriptiv e
VI Spatial Registration S Image Domain Sp eci
Sp eec h feature index SW Audio Direct Con ten tased

T opic c hange indices HK Audio Direct Con ten tased

Do cumen tV ectors DF T ext Direct Con ten tased
In v erted Indices M T ext Direct Con ten tased
Con t lassiation C Metadata R MultiMedia Domain Sp eci
Do cumen t C omp sition o Metadata R MultiMedia Domain Indep e nden t
Metadata T emplates M Media Indep e nden t Domain Sp eci
Lando v er Relief SK Media Indep e nden t Domain Sp eci
P aren thild elationships R SKS T ext Domain Indep e nden t
Con texts SR Sa K Structured Databases Domain Sp eci
Concepts from Cyc HS Structured Databases Domain Sp eci

User Data A ttributes SLS T ext Structured Databases Domain Sp eci
Domain Sp eci On tologies KSI Mediandep enden t Domain Sp eci
T able Metadata for Digital M edia
Metadata A means for c apturing information con ten t
In this section w e discuss the information con ten t captured b yteh v arious t yp es of metadata en u
merated in the p revious section W e s hall also iden tify the lev el igure t a w hic h t his metadata
ma y b e sed u
Con ten t Indep enden t nformation I This t yp e of i nformation s i aptured c b yCon ten t I ndep en
den t metadata and elps h in the ncapsulation e f o information n i to units of in terest and a m ybe
represen ted as ob jects i n a data mo del
Capturing Represen tat al Information This t yp e o f i nformation is t ypically captured b y
Con ten t Dep enden t etadata M describ e d i n t he previous section This along with Domain
Indep enden t M etadata whic h p rimarily c aptures structural organization of the ata d enables
terop erabilit yvia an vigational a nd bro wsing approac hes whic h d ep end o n represen tational
details of the data
Capturing Information Con ten t Information C on tis t ypically captured to v arious egrees d
b yv arious t yp es of Con ten t Dep enden t Metadata Direct Con ten tased Metadata lies in a
grey area in the sense that it is not en tirely div orced from the represen tational details Ho w
ev er the metadata whic h helps abstract out represen tational details and apture c information
meaningful t o a particular application or sub ject domain is Domain S p e ci Metadata

ten
in
ion

ten

ND
tenV o cabulary for I nformation Con ten t Characterization Domain S p eci Metadata c an b e con
structed from terms in a d omain p s eci on tology or concept libraries describing information
in an application o r s ub ject omain d Th w eview On tologies as Metadata whic h themselv es
can b e view ed as a v o c abulary of terms for construction o f m ore domain sp eci metadata
descriptions Seman tic in terop erabilit yatteh v o c abulary lev el is ac hiev ed with the h elp of
terminological relationships
The ab o v e iscussion d suggests that domain sp e ci metadata capture information hic w his mroe
meaningful a p s eci application or a d omain The information aptured c b y the other t
of metadata primarily rect the format and organization f o the underlying data T his leads us to
prop ose domain sp eci metadata as the most appropriate for dealing with issues related o t s eman tic
heterogeneit y
Constructing n I tensional Description s from Domain p S eci Metadata
Domain sp eci metadata c an b e used to construct in tensional descriptions whic h capture t he in
formation con ten t of the underlying data W e ategorize c these in tensional descriptions as follo ws
Metadata Con texts mon texts These escriptions d primarily serv e t o a bstract he t represen
tational details in the underlying data and ma y b e view ed as b o c ombinations of the
individual metadatum These con texts are t ypically p opulated b fore e hand b y pro cessing the
underlying data They ma y also b e omputed c at runime b yungsi p ar ameterize dr outines x
amples of this t yp e f o metadata and ho wthye ma y beudse toin terop erate across m ultim edia
data are i llustrated in K
Conceptual Con texts on texts These descriptions primarily serv eto prcatue domain know l
e dge and help imp ose a conceptual seman tic view on the underlying data Con texts are
constructed from terms concepts roles in d omain sp eci n o tologies The terms used in
construction of the con texts migh tbe in terrelated to eac h ther o via relationships viz termi
nological domainange constrain ts on roles
In the rest of the c hapter w e fo cus on the structured data and the use of c on texts constructed
from domain sp eci n o tologies t o capture the information con ten t The r elationships b et w een
terms in the on tologies enable the represen tation of extr akownel dge not r epresen ted n i the database
sc hema W e shall also discuss the cases where con texts ma y b e constructed from diren t d omain
sp eci on tologies
Constructing c on texts f rom On tological T erms
In Figure w eha v eiend tid metadata s a the piv otal idea on whic h our approac hes to address the
information o v erload problem in the GI I are based In the previous section w e d iscussed the v arious
t yp es of metadata and iden tid domain sp eci metadata a s the most appropriate for andling h
seman tic heterogeneit y One approac h to onstruct c metadata w hic h apture c meaningful information
wrt an application domain is to use terms from domain sp eci n o tologies s a the v o cabulary to
c haracterize the information W eha v e iden tid s uc h etadata m escriptions d as con texts in the
previous section and in this section w e p resen t a discussion o f issues related o t heir t represen tation
and use
W e discuss the inadequacies of purely structural and apping m based m etho ds in represen ting
ob ject relationships and discuss the adv an tages o f r epresen ting con texts W e shall discuss a partial
represen tation of con texts and equiv alen t d escription logic expressions W e shall also discuss
op erations for a utomatic w a ys of comparing and m anipulating on c texts and illustrate with the
help of examples ho w they ma yb e used to ac hiev ein terop ration e a cross information sources A
brief discussion of issues r elating to the language for represen ting con texts nd a the on tologies from
whic h the con texts ma y b e constructed is also giv en W e shall refer to con texts as con texts
unless otherwise sp ecid in the rest f o the c hapter


an ole
es yp wrt
us Rationale for Con text represen tation
In c haracterizing t he similarit ybte w een ob ects j based on the seman tics asso ciated with them w e
ha v e to consider he t real w orld seman tics WS o f n a b o ject It is not p ossible o t c ompletely dee
what an ob ject enotes d or means i n he t mo del w orld G W e rop p ose the text of an ob ject as
the primary v ehicle o t capture the R WS of the o b ect j W e argue for the need for represen ting con text
b y sho wing the inadequacy of purely structural represen tations W e also d iscuss he t computational
b enes of represen ting con text
Inadequacy of purely Structural epresen R tatio ns
It has b een suggested b y Sheth and G alaash y ap G SK nd F ankhauser et al KN
that the abilit y to represen t the structure of n a ob ject do es not elp h capture the real w orld eman s tics
of the ob ject It is not p ossible to pro vide a structural and hence a mathematical deition of the
complex notion of real w orld seman tics In LNE a oneone apping m is assumed b et w een
the attribute deition and the attribute real w orld seman tics hey T d ee a n attribute in t erms
of ed descriptors suc has Uniqueness L owerpp er Bound Domain S c ale etc whic h a re used
to generate mappings b et w een t w o a ttributes They are also used to determine t he equiv alence of
attributes Ho w ev er what they establish is the structural equiv alence of these a ttributes whic his
necessary but not s uien t to determine the seman tic equiv alence of the ttributes a
Consider t w o attributes p ersoname and dep artmentame ema y b e ble a to dee a ap m
ping b et w een he t v alue domains of these t w o attributes but w ekno wthattehyaer nto emnsa tically
equiv alen t In order to b e able to capture this lac k o f quiv e alence w e p rop ose the mappings b et w een
the domains of the attributes b e made acno text W e eed t w o ob jects to b e seman tically equiv
alen tif it is p ssible o to dee mappings all kno wn and coherent c ontexts and the eition d contexts
of the objects should b e coherent wrt each other Deition on c texts and the notion o f oherence c re a
discussed later n i his t section Since the deition c on texts of p ersoname and dep artmentame
are not coheren t one iden tis an animate and the other iden tis an inanimate b o ject hey t re a
not deed to b e equiv alen t
Computational b enes of represen ti text
Shoham ho h as discussed the computational b enes that migh t accrue in mo deling and r ep
resen ting con text in AI and Kno wledgeased systems W e b eliev e that some f o those reasons re a
v ery relev an t in the presence of information o v erload in the G I I and suggest the iden tiation nd a
represen tation of con text
Econom y of represen tati on In a m anner akin to database views con texts c an act s a a fo cusing
me chanism when accessing the comp onen t d atabases on the G I I T hey an c b e a semantic
summary of the information in a database or group of databases a nd ma yb e a ble to apture c
seman tic i nformation not expressed in the database sc hema T h us unnecessary etails d can
b e abstracted f rom the user
Econom y of reasoning Instead of reasoning w ith the information p resen t in the database as a
whole reasoning can b e p erformed with the c on text asso ciated with a d atabase o r a group
of databases This approac h has b een used in Sa for information resource disco v ery nda
query pro cessing n i Multidatabases
Managing Inconsisten t nformation I In the GI I here w databases a re designed nd a dev elop ed
indep enden tly it i s n ot uncommon t o ha v e information in o ne database inconsisten t with
information in another As long as information is consisten t w ithin the c on text of the q uery
of the user inconsistency in information from diren t atabases d m a y b e allo w ed his T as h
b een discussed with the help of n a example in S
Flexible seman tics An imp ortan t onsequence c of asso ciating bstractionsappings a with the
con text is that the same t w o o b jects can b e r elated to eac h other diren tly in t w o iren d t

con ng
wrt
wrt


concon texts This is b ecause t w o ob jects migh tbe smean tically closer to eac h o ther in one con text
as compared to the other
A partial Con text represen tation
There ha v e b een attempts to represen t the similarit ybet w een b o j ects in diren t atabases d In
the previous section w esho w ed with the help o f n a e xample ho w a ed set of descriptors used in
NE do ot n guaran tee seman tic similarit h y an y represen tation of con text whic hcan be
describ ed b y a ed set of descriptors is not appropriate
The descriptors called metattributes or con textual co ordinates are not ed but are dynam
ically c hosen to mo del he t c haracteristics of the application domain in question It is not p o ssible
apriori to determine all p ossible con textual co ordinates whic hw ould completely c haracterize the
seman tics of the pplication a domain This leads to a p artial represen tation of con text as a collection
of con textual co ordinates
Con text C C
k k
T able sho ws ho w our con text descriptions can b e mapp ed to expressions in C LASSIC BMR a

DL system Using CLASSIC i t i s p ossible to dee rimitiv p e classes and in addition sp ecify classes
using in tensional descriptions phrased i n terms of necessary nd a suien t p rop erties that m ust b e
satisd b y their instances The in tensional descriptions ma y b e sed u to express the collection of
constrain ts that mak e up a con text Also eac hC roughly corresp onds to a role and eac hV roughly
i i
corresp onds to lers for the role the ob ject m a h v e W e shall also explain the meaning of the
sym bosl C and V b y using examples and b yen umerating the corresp onding CLASSIC expressions
i i
C i k is a con textual co ordinate d enoting an asp ect of a con text
i
C ma y mo del some c haracteristic o f t he sub ject d omain and ma y b e obtained from a domain
i
sp eci on tology iscussed later in this section
C ma y mo del an implicit assumption in the esign d f o a database
i
C ma yor am y not b e asso ciated with a n attribute A of an ob ject O in the database
i j
Con textual co ordinates and V alues CLASSIC escription d s
AND O ALL C V ALL C
k k k k
AND O ALL C AND O ALL C V
i i j j i j j j
C or SAMES C C
i j i i j
ro FILLS C ALL C V
i j j i i j j
T able Con textual c o rdinate o v alue pairs and he t corresp o nding CLASSIC expressions
The v alue V of a on c textual co ordinate C can b e represen ted in the follo wing manner
i i
V can b e a v ariable
i
It can b e unid n the sense of Prolog w ith another v ariable a set of sym b lso an
ob ject or t yp e deed in he t database or another v ariable
It can b e unid with another v ariable asso ciated with a on c text
It can b e used as a lace p holder to elicit answ ers f rom the databases a nd imp se o con
strain ts on them

W eha v e prop osed a minor addition roleet or classicxpress ion to CLASSIC expressions KSI o
enable retriev al of ob ject prop erties







ust

us Example
Supp ose w e are in terested n i p eople who are uthors a and who hold a p ost W e can represen t
the query on c text C iscussed later in this section as follo ws
q
C uthor X designee X
q
The same thing can b e expressed in a D escription L ogic L as follo ws
C author or f SAMES author esignee d
q
The terms author and designe e ma y b e r oles c hosen from a domain p s ci e o n tology
V can b e a et s
i
The set ma ybe na en umeration of sym b ls o from a omain d sp eci on tology
The set ma y b e eed d as the xtension e of an ob ject r o s a e lemen ts from the omain d of
at yp e deed in the database
The set ma y b e eed d b y p osing constrain ts on prexisting sets
Example
Supp ose w ew an ttornsperee t the assumptions implicit in the design of the ob ject M E
PLO YEE in a database W e c an represen t this as the deition c on text of EMPLO YEE
C MPLO YEE as follo ws
def
C MPLO YEE mplo y Dept yp es rest g rticle PUBLICA TION
def
The same thing can b e expressed in a D L a s follo
C MPLO YEE EMPLO YEE article PUBLICA TION
def
ALL emplo y Dept yp es rest yp es g
Dept yp es a t yp e deed in the database The sym bols r estyp es employer and article
ma ybe c hosen from a omain d sp eci on tology he yms bols employer and article ma ybe
related to attributes asso ciated with the nderlying u database ob jects T he sym bol r estyp es
acts as a role ller and a m y be mpapdte oadata v alue in the atabase d T he deition con text
expresses an asso ciation b et w een the ob jects EMPLO YEE and PUBLICA TION whic hma y
not b e captured i n the database sc hema
V can b e a v ariable sso a iated c with a c on text
i
This can b e used to express constrain ts whic h the result of a uery q should ob ey and is
called the constrain t con text
The constrain ts w ould apply to t he set t yp e or ob ject the v ariable X w ould unify with
Example
Supp ose w ew an t ll a articles whose titles con tain the ubstring s ab ortion in them his T c an
b e expressed in the follo wing query con text
C rticle X itle f y j substring ab ortion g
q
where denotes asso ciation a c text itle f y j substring ab ortion g thi a
v ariable nd a ensures that the answ er satiss the c onstrain ts expressed in the con text
The same thing can b e expressed in a D L a s follo
C article for ALL title f y j substring ab ortion g
q
V can b e a set t yp e o r a n ob ject sso a iated c with a on c text T his s i called t he asso ciation
i
con text and ma y b e u sed to xpress e seman tic dep endencies b et w een b o jects whic hma ynto be
mo deled in t he database sc hema
Example
Supp ose w ew an t to represen t nformation i relating ublications p to emplo y ees in a database
Let PUBLICA TION and E MPLO YEE b e ob jects in a atabase d The deition c on text of
HASUBLICA TION can b e deed as
C ASUBLICA TION rticle PUBLICA TION
def
uthor EMPLO YEE iation f researc h g

ws
on of

is
er
ALL AND
ws
es yp erwhere denotes a sso ciation of a c on text with an ob ject MPLO YEE and a con text
iation f researc h g
Asso ciation of a onc text with an ob ject is similar to d eing a view on the o b ject extensions
suc h that nly o those instances satisfying the constrain ts deed in the con text are e xp orted
to the GIS he T sym b ls o used as con textual o c ordinates e article author aiation
are obtained from a omain d sp eci on tology and ma y b e m app ed to attributes of database
ob jects The relationships b et w een he t database ob jects EMPLO YEE PUBLICA TION and
HASUBLICA TION captured in the con textual d escription are not mo deled in the d atabase
sc hema The s ame thing can b e expressed in a L D s a ollo f ws
C ASUBLICA TION AND HASUBLICA TION
def
ALL article PUBLICA TION
ALL author EMPLO YEE
ALL aiation ONEF f researc h g
Reasoning ab out a nd manipulation of con texts
W eha v e prop osed a partial represen tation of con text in the previous ection s This can b e sed u
to abstract out the information con ten t o f the underlying data and help reduce the information
o v erload in the GI I The n ext tep s i s o t use these represen tations meaningfully t o e nable a GIS to
fo cus on relev an t information nd a to correlate information from the v arious information sources on
the GI I In order to ac hiev e t his the follo wing need to b e precisely deed S
Sp eciit y The most common relationship b et w een con texts is the sp eciit y relationship Giv en
t w o con texts C C C C i C is at least s a s p ci e as C This is useful when ob jects

deed in a particular con text ha v e to ranscend t cC to a more sp eci r o g eneral con text
and is discussed in detail with examples in S
Organization in a Lattice Structure It is p o ssible that t w ocon texts ma y n ot b e comparable to
eac h other i it ma y not b e p ossible to decide whether one is more sp eci than he t other
Th us the sp eciit y relationship iv g es us a p artial order The follo wing useful op erations on
the con text lattice can b e deed
o v erlapn txt n txt This is the c ommon set of con textual attributes p resen t in the

con textual descriptions
coheren tn t xt n txt This op erator determines whether t he constrain ts determined

b y the v of t textual co ordinates are c onsisten t
Example
Let Cn txt alary f x j x g

txt alary f x j x g

Th us coheren tn txt n txt F ALSE

greatest lo w er b und o lb of t w o con texts The on c texts can b e organized in a sp ecial
kind of lattice s tructure c alled a me et semiattic e in whic hev ery pair of con texts as h
a reatest g o l w er b ound In tuitiv t glb computes the c onjunction of constrain ts ex
pressed in the on c textual escriptions d
Inferences using Con textual D escriptions
W eno w illustrate ho w reasoning with con textual descriptions can help nable e s eman tic in terop er
abilit y across diren t databases on the I G I he T in terop erabilit yisac hiev ed wrt the query whic h
is represen ted as a c on text and k no wn as the query con text C T he deition con texts of the
Q
v arious ob jects in the underlying d atabases enable the artial capture and represen tation of the
information con ten t in the databases T he query con text is compared with the eition d con texts
and this can b e easily implemen ted as a omc bination of the glb and overlap op erations discussed
ab o v e

he ely

Cn
con he alues


and
AND
andA critical assumption made in the examples illustrated b elo w i s hat t query nd a deition contexts
a re constructed from a common ontology hsi is a v ery unc alable assumption i n the con text of a
GIS One w a y of enhancing the scalabilit y s i to supp ort the use of prexisting and indep nden e tly
elop ed ften ado c domain on tologies This requires mec hanisms for comparing terms a cross
on tologies at runime whic h i s he t sub ject of d iscussion of the n ext ection s Issues of language to
represen t the con textual descriptions and on tologies a re discussed later in this section
Consider the comparison of the uery q con text C and the deition on c text C UBLICA TION
Q def
illustrated in Figure
C
C (PUBLICATION) Q
def
<(author,X)(designee,X)
<(researchArea, Department)>
(employer, Department)
(article,Yo<(title,{x | contains(x,"abortion")})>)
(researchArea,{socialSciences,politicalSciences})>
compare(C (PUBLICATION), C )
def Q
<(researchArea, {socialSciences, politicalSciences})>
Figure Comparison of con textual descriptions Iden tifying he t relev an t publications
The instances of the PUBLICA TION ob ject iden tid as b e longing o t the researc h areas so
cialScienc es and p olitic alScienc es are determined to b e relev t to he t user query his T is an example
of using con textual e xpressions for determining information relev tto a qruey
In the next example w e i llustrate ho w constrain ts in a query can b e applied to information i n
a database to determine the relev an t a ers Consider the query con text C and the deition
Q
con text C ASUBLICA TION illustrated in Figure
def
C
C (HAS-PUBLICATION) Q
def
<(author,X)(designee,X)
<(author,FACULTYo<(affiliation,researchTypes)>)
(employer,Department)
(article,PUBLICATION)>
(article,Yo<(title,{x | contains(x,"abortion")})>)
(researchArea,{socialSciences,politics})>
compare(C (HAS-PUBLICATION), C )
def Q
<(author,FACULTYo<(affiliation,researchTypes)>)
(article,PUBLICATIONo<(title,{x | contains(x, "abortion")})>)>
Figure Comparison of con textual descriptions I ncorp orating a constrain t romf het query
The constrain t in the query requiring the article titles t o con tain the w ord b o rtion is in
corp orated in the con textual descriptions describing the information con ten t f o the database and
propagated to the ob ject PUBLICA TION The m o id d con textual description th us c haracterizes
only those instances of the ob ject PUBLICA TION whic hcon tain the w ord ab ortion n i their titles
Another in teresting use of con textual descriptions is to rule out t he p ossibilit y of a database
ha ving information relev an t t o a query Supp ose w e are in terested in all authors a h ving a salary
Supp ose ll a the facult y mem b ers in the univ ersit y d atabase are represen ted as a h ving a
salary Consider the follo wing con textual descriptions
C A CUL TY ala ry f x j x g
def
C utho r X ala ry f x j x g
Q
compa re A CUL TYC inconsistent x
def Q
The univ ersit y d atabase is not relev an t f or the q uery Q

nsw
an
an
dev
Mapping Con textual descriptions to the Database Sc hema
As discussed earlier the con textual descriptions serv e to a bstract out the underlying represen
tional details and capture t he information con ten t Ho w ev er once the relev t ighev h el con textual
descriptions ha v e b een iden tid there is a n eed to retriev e the relev t data and ispla d y it t o the
user In S w e rop p ose a u niform formalism used to map c on textual descriptions to underlying
data W ork on mapping in tensional descriptions to SQL queries is rep orted i n BB Collet et al
HS a v e used articulation axioms to relate ob ject c lasses in databases o t concepts in he t Cyc
on tology Our approac h is similar to the ab o v ebut w eha v e a lso deed an algebra n i S o
k eep trac k of the c hanges in the mappings when the sso a iated c con textual descriptions c hange
Eac h information system xp e orts a g lobal ob ject O corresp onding o t t he ob jects O i t an m
G
ages to the GIS The ob jects O are obtained b y a pplying the c onstrain ts in the deition con text
G
C to the o b ect j O he T user sees only the exp orted ob jects T he con textual co ordinates C of
def i
the C act as the attributes o f O The exp rted o ob jects O are asso ciated with the ob jects
def G G
and t yp es deed in the database This asso iation c migh t b e implemen ted in diren tw a b y
v arious comp onen t systems W e seu sc hema corresp ondences d eed as follo ws to express hese t
asso ciations igure
sc hCor O f C j C C g ttrM
G G i i def
O is the exp rted o GIS ob ject of an ob ject O or t yp e T eed d n i t he database
G
The attributes of the ob ject O are the con textual co ordinates of the eition d con text
G
C
def
The mapping op ration e map stores he t asso ciation b et w een con textual co ordinate
O i i
C and attribute A of ob ject O henev w er there exists o ne
i i
The mapping M b et w een O and O can b e v e aluated using the pro jection rules en umerated
G
and illustrated in S
FEDERATION
LEVEL
GIS Object O Attributes C , C , ..., C
G 1 2 2 k
C (O) <(C, V) ... (C ,V )> ... map (C, A )
def 1 1 k k O i i
DATABASE
LEVEL
Database Attributes A , A , ..., A
O 1 2 k
Object
Figure hema Corresp o ndences Mapping con textual e xpressions to underlying data
W eha v e discussed in S a et s of pr oje ction rules whic h ap m a con textual expression to
underlying database ob jects W eno w iscuss d t w o xamples e whic hitehstrallu o w extr a information
ma y b e represen ted using con textual expressions
Represen ti ng relationsh i ps b et w een ob jects
W e illustrate a c ase w here the deition con text of the ob ject HASUBLICA TION captures its ela r
tionships with another database b o ject EMPLO YEE in an in tensional manner These relationships
are not stor e d in the atabase d and mapping the con textual description results in extr a information
b eing asso ciated with the GIS b o ject HASUBLICA TION aivn e ser u will rdinarily o not b e
G
a w are of this relationship The etailed d apping m of this relationship as h b een illustrated in KS


Sc
ys


an
an
taExample
Consider ob jects EMPLO YEE and PUBLICA TION deed earlier and n a ob ject
HASUBLICA TIONS Id in the same database hic w h epresen r ts a relationship b et w een em
plo y ees and the publications they write
C ASUBLICA TION uthorMPLO YEE iation f researc h g
def
HASUBLICA TION oinSS J SS HASUBLICA TION
G
Selectiation researc h g MPLO E YEE
This results in only those ob jects b eing exp orted to the G IS whic h satisfy he t constrain ts sp ecid
in the con textual descriptions The u ser th us do es not h a v etok eep trac k or kno w the relationships
bet w een the v arious ob jects i n he t database
Using terminologi cal relationshi ps in On tology to represen t extra information
In this section w e illustrate an example in whic h erminological t relationships obtained f rom a n on
tology are used to represen t extr a nformation i In the example illustrated b elo w the con textual co
ordinate r ese chInfo is a c omp sition o of t w ocon textual co ordinates r ese ar chA r e a journalTitle
and is obtained from the on tology of the domain T his is t hen used to correlate information b et w een
the ob jects PUBLICA TION and JOURNAL Ho w ev er the con textual co ordinate researc hArea as h
not b een mo deled for the ob ject PUBLICA TION h T us this results in extr a information ab out the
relev an t journals and researc h areas b ing e asso ciated w ith the ob ject PUBLICA TION even though
no info rmation ab out resea rch a reas is mo deled fo r UBLICA P TIO N
Example
Consider a database con taining he t follo wing ob jects
PUBLICA TIONd Title Journal where
C UBLICA TION
def
esearc hInfoOURNAL esearc hAreaept yp e sjournalTitleournalT
JOURNALitle rea A here w C OURNAL
The mapping expression is giv en as follo ws ee S for details
PUBLICA TION J oinresearc hArearea itleournal PUBLCA TION
G
SelectArea Dept yp es itle JournalT yp esJOURNAL
Only journals b longing e to he t researc h a reas orresp c onding to the d epartmen ts are selected
electArea IN Dept yp es AND JOURNAL
The join condition itle Journal ensures that only those articles hic w h are from the
researc h areas corresp onding to the departmen ts are exp rted o to the GIS
oinresearc hAreare a AND itle Journal
This is ac hiev en though the attribute Area is not mo deled for PUBLICA TION Th us
there is extra info rmation in terms f o asso ciation f o Dept yp es with PUBLICA TION through
the join condition
Issues of language and on tology i n con text represen tation
In this section w e d iscuss the issues of a language in whic h the explicit con text represen tation
discussed in Section can b e b e st expressed Besides as discussed arlier e w e use terms from
domain sp eci on tologies as v o cabulary to c haracterize domain sp eci information W e lso a iscuss d
in this section issues of on tology i the v o c abulary sed u b y the language to represen tthe onc texts
Language for con text represen tati on
In Section w eha v e represen ted con text as a c ollection f o con textual co ordinates a nd their v alues
The v alues themselv es ma yha v econ texts asso ciated with them n I this section w een umerate the
prop erties desired of a l anguage to xpress e the con text represen tation

ev ed
def
es yp
and ar
The language should b e declarativ e i n nature as the c on text will t ypically b e used to express
constrain ts on ob jects in an in tensional manner Besides t he declarativ e ature n of t he language
will mak e t i easier to p erform inferences on the con text
The language should b e able o t express the con text as a ollection c of c on textual c o ordinates
eac h describing a sp ci e asp ect of information resen p t n i he t database or requested b ya
query
The language hould s ha v e p rimitiv es or determining t he subt yp e of t w ot yp es attern p matc h
ing etc in the mo del w orld whic h ighm t b e useful in omparing c and manipulating con text
represen tations
The language should ha v e primitiv es to p erform na vigation in the o n tology to iden tify the
abstractions related to he t on tological ob jects in the query con text or the eition d con texts
of ob jects in the databases
The On tology Problem
The c hoice of the con textual c o rdinates o C s and the v alues assigned o t hem t V s is v ery imp or
i i
tan t in constructing the con texts There should b e ontolo gic al c ommitments that imply a greemen ts
ab out the on tological ob jects sed u b et w een he t users and the information system designers In
our case this corresp onds to an agreemen t o n t he terms and the v alues u sed or f the con textual
co ordinates b y b oth a user in form ulating the query con text and a d atabase administrator for for
m ulating the deition and sso a iation c con texts In the example in Section w eha v e deed
C MPLO YEE b y making se u of sym b ols lik e employer aiation and r eimbursement from the
def
on tology for con textual co ordinates and r ese ar ch te aching etc for the v alues of t he con textual
co ordinates
W e assume that eac h database has a v ailable to it an on tology c orresp onding to a sp eci d omain
The deition and asso ciation on c texts of the o b ects j tak e their terms and v alues from this on tology
Ho w ev er in designing the deition con texts and the query con text the ssues i o f om c bining the
v arious on tologies arise W eno wen umerate v arious approac hes one migh ttak e n i uilding b on tologies
for a GIS comprising of n umerous information sources Other than the on tological commitm en t a
critical issue in designing on tologies is t he scalabilit y on tology s a m ore information sources
en ter the federation Tw o approac hes re a discussed next
The Common On tology approac h
One pproac a h h as b een to build an e xtensiv e g lobal on tology A notable example of
global on tology is Cyc G consisting of around b o jects In Cyc the apping m
bet w een eac h individual i nformation resource and global on tology is ccomplished a b ya
set of articulation axioms whic h re a used to map t he en tities f o an information resource
to the concepts uc h a s frames and slots i n yc C xisting e on tology HS
Another approac h has b e en to exploit the seman tics of a single p roblem domain
transp ortation planning CHK he T domain mo del is a declarativ e description of
the ob jects and activities p ossible in the application domain as view ed b yat ypical user
The user form ulates queries using terms from the application omain d
Rese of Existing On tologie slassi ati on s W e xp e ect that t here will b e n umerous
information systems participating in the G IS In this con text it is unrealistic to exp e ct an y
one existing on tology or classiation to sue W e b eliev e that he t rese o f v arious existing
classiations suc h a s I SBN classiation for publications b otanical classiation f or plan ts
a v ery attractiv e alternativ e An example of suc h a classiation is illustrated in Figure
These on tologies can then b e com bined in diren tw a ys and made a v ailable to t he GIS
A critical ssue i in com bining the v arious on tologies is determining the o v erlap b et w een
them One p ossibilit y is to dee the n tersection and m utual exclusion p oin b e
t w een the v arious on tologies ie

ts
is
the ofLand Use and Land Cover Classification (USGS)
Urban
Forest Land Water
Residential
Industrial
Lakes
Evergreen
Commercial Reservoirs
Deciduous Mixed
Streams and Canals
A classification using a generalization hierarchy
Population Area Classification (US Census Bureau)
State
County
City
Rural Area
Tract
Block Group
Block
A classification using an aggregation hierarchy
Figure Examples of Generalization and Aggregation h ierarc hies for n O tology construction
Another approac h h as b een adopted in S The t yp es determined to b e similar b ya
sharing advisor a re classid in to a c ollection called c onc c onc ept hier is th us
generated based on the up s erconceptub conc ept relationship These t yp es ma ybe frmo
diren t databases and their similarit y or issimilarit d y s i b ased on heuristics with user
input as required
Seman tic In terop erabilit y sing u T erminological Relation
ships
In Figure w e illustrated ho w terms f rom d omain p s eci on tologies can b e used as v o abularies c
to c haracterize d omain sp eci information This is an essen tial comp onen t of he t approac hes
to enable tac kling the seman tic heterogeneit y problem on the I G I In the previous section w e
discussed ho w terms from an on tology ma y b e sed u t o construct con textual expressions and ho w
terminological relationships result in the r epresen tation o f xtra e information ot n represen ted i n
the database sc hema Ho w ev er there w as an implicit assumption of a common on tology b ehind the
construction of the con textual xpressions e As discussed e arlier this is a v ery unc alable assumption
In this section w e discuss the issues n i v olv ed when con textual descriptions ma y b e onstructed c from
diren t domain sp eci on tologies W e iscuss d h o w semantic inter op er ability ma ybe ac hiev ed
b yin terop eration a cross these domain sp eci o n tologies W eno w discuss approac to c a hiev e
in terop eration across on tologies using erminological t relationships lik e synonyms hyp onyms and
ernyms
Using synon yms to in terop rate e across on tologies
In this section w e p rop ose an approac htoin terop erate across on tologies whic hha v e b een expressed
using a description logic system lik e LASSIC C BMR W eha v e illustrated o h wcon textual
expressions ma y b e represen ted using escription d logic expressions W eno w d iscuss our w ork in the

OBSER KSI system whic h enables in terop eration across v arious indep enden t prexisting
on tologies based on synon ym relationships across terms in iren d ton tologies

Ontolo gy Base d ystem S Enhanc e dwthi R elationships or f V o c abulary hEter o geneity R esolution

VER
hyp
hes
chy ar ept An arc hitecture for in terop erati on
In this section w e discuss an arc hitecture for n i terop eration cross a domain sp eci on tologies Fig
ure
Data Repositories
IRM
Mappings
Ontology Server
Ontologies
Interontologies
Query Processor User Query
Terminological
Relationships
User Node
IRM node
Component Node
Component Node
Ontology Server
Ontology Server
Mappings
Mappings
Query Processor
Query Processor
Ontologies Ontologies
Data Repositories Data Repositories
Figure OBSER VER An arc hitecture to upp s rt o in terop ration e across on tologies
Query Pro cessor This comp onen ttak es as input a user query expressed in DLs sing u t erms
from a c hosen user ontolo gy It then a n vigates o ther comp onen ton tologies of the Global
Information System and t ranslates erms t in the u ser query in to the comp onen ton tologies
preserving the eman s tics of the u ser query This ma y esult r i n a partial translation of the query
at a comp o nen ton tology It a c bines the partial ranslations t at the p resen ton tology
with those determined t a revious p on tologies suc h that a ll constrain ts in the ser u uery q are
translated
On tology Serv er The On tology Serv er pro vides information ab out on tologies to the Query Pro
cessor It pro vides the deitions of the terms n i the on tology and etriev r es data underlying
the on tology It is resp onsible for ev aluating the mappings of the con textual expressions to
the underlying data and retrieving the ata d whic h atiss s the constrain ts in the user q uery
In teron tol es Relationsh ip s M anager RM Synon ym relationships r elating the t erms in
v arious on tologies are represen ted in a eclarativ d e anner m in an indep e nden t rep ository his
enables in terop eration across the v arious on tologies
On tologies hOn tology is a set of terms of in terest in a articular p information domain ex
pressed using Ls D in our w ork They are organized a s a lattice and ma y b e onsidered c as
seman tically ric h metadata capturing the information con ten t o f the underlying data rep osi
tories The v arious on tologies used in O BSER VER a re illustrated in the A pp endix
The In teron tolo s R elationshi p M anager RM
The IRM is the critical comp onen t whic h supp rtso no tologyased in terop eration It also enhances
the scalabilit y of the query pro essing c strategy b ya v oiding the eed n or f a designing a common
global on tology c on taining all the r elev an t terms in the Global Information System and in v esting
time and energy for the dev elopmen tofan no tology sp e ci for y our n eeds when imilar n o tologies

gie
Eac

ogi
om lsoare a v ailable Relationships b et w een terms across o n tologies that capture the o v erlapping of domains
are stored in a rep ository managed b y t he IRM The rep ository also includes information ab out
transformer functions whic h can transform v alues or rolellers from a domain in one n o tology to
another The main a ssumption b ehind the IRM is t hat the numb e r o f elationships r b et w een terms
across ontologies is an o rder of m agnitude smaller han t t he numb er of all t he terms relevant to the
system
Hammer and McLeo d M a v e suggested a set of relationship descriptors to capture re
lationships b et w een t erms across diren t o cally dev elop ed on tologies A set of terminological
relationships has b een rop p osed in il In the OBSER VER system w e d iscuss an approac h us
ing synonym relationships W e will d iscuss extensions to the O BSER VER system for using hyp onyms
and hyp ernyms in the next section
Query Pro essing c in O BSER VER
W eno w discuss a query pro cessing approac htatih n v olv es the r ese f o p r existing ontolo gies and
in terop eration across them T he query p ro cessor p erforms the follo wing imp o rtan tstsep
T ranslation of terms in the query in to terms in eac hcomponen ton tology T he query pro essor c
obtains information from the I RM iscussed n i ection S nd a the On tology Serv er
Com bining the partial translations in suc ha w a y that t he seman tics of the user uery q is
preserv
Accessing the On tology Serv er to obtain the ata d under the comp onen ton tology hat t s atisfy
the translated query This basically moun a ts to the ev aluation of the mappings of he t con
textual expressions to the underlying database sc hema and h as b een discussed i n t he previous
section
Correlation o f the ob jects retriev ed from the v arious data rep ositoriesn tologies
W e illustrate steps a nd using an example in KSI A detailed iscussion d of the q uery
pro cessing strategy is describ ed in the same ap p er onsider C a c on textual expression represen ted in
CLASSIC used for the ollo f wing query
et the titles autho rs do cuments a nd the umb n er o f pages o f do cto ral t heses dealing with eta
data and that h ave b een published t a least once
Let us assume that there are o n tologies escrib ed in detail in KSI as discussed b lo e w
Stanford I This on tology is a subset of the Bibliographic Data On tology ru ve elop ed
as a part f o the ARP A Kno wledge Sharing Ert ttpwwsltanfordduno wledge
sharing It orresp c onds to the s ubree under the c oncept eference of the Bibliographic
Data On tology a nd is illustrated in App e ndix D
Stanford This on tology is also a ubset s of the ibliographic B Data On tology and corresp onds
to the rest of the on tology It i s illustrated in App endix C
WN This on tology w as built b y r esing a part of the W ordNet on tology il The
concepts in the WN on tology are a subset of terms in t he h yp on ym tree of the noun prin t
media It is illustrated in App endix B
LSDIS This on tology is a lo cal omero wn o n tology whic h represen ts our view o f o ur Lab
publications and i s illustrated in App endix A
The query can b e c onstructed from the concepts in Stanford I enoted as the ser u o n tology
and represen ted in CLASSIC as follo ws


ed

itle autho r do ument c pages fo r AND do cto ralhesisef FILLS k o rds etadata
A TLEAST publisher
W eno wen umerate the translations of the uery q in to the on tologies discussed ab o v eand iden tify
the translated and nonranslated parts
Stanford I The query alw a ys rep resents a full translation into he t user ontology
Stanford There is a partial translation of the query at this on tology
T ranslated P art itle author NULL n b e rfages for
AND do ctoralhesis A TLEAST p ublisher
Nonranslated P art FILLS k ords etadata
WN T erms in the query are substituted b y their deitions in the n o tology from w hic h heyt rea
c hosen tanford I to obtain a complete translation in to WN
do ctoralhesisef AND thesisef FILLS t yp ef ork o ctoral
thesisef AND publicationef FILLS t yp ef ork hesis
T ranslated P art ame c reator NULL pages for AND prin tedia
FILLS ten t thesis o ctoral A TLEAST p ublisher
FILLS generalopics etadata
LSDIS There is a partial translation at this on tology where t he v alue of the roleller of he t role
k eyw o rds is transformed b y the transformer function b et w een he t roles k eyw o rds Stanfo rd I
and subject SDIS
T ranslated P art itle authors lo cationo cumen t N ULL for AND publications
FILLS t yp e do ctoral hesis FILLS sub ect j MET AD A T A
Nonranslated P art A TLEAST publisher
Consider the partial t ranslations of the ser u query at the on tologies tanford S and LSDIS As
the in tersection f o he t nonranslated parts of the partial translations in to Stanford and SDIS L is
empt y then the in tersection of b oth partial answ ers m ust s atisfy all the constrain ts in the query
In tuitiv ely
F rom Stanford d o ctoral theses a b out an y sub ject whic hha v e b een p ublished at east l once will
b e retriev
F rom LSDIS do cumen ts ab out metadata whic hma y not ha v e b een published will b e r etriev ed
The in tersection of the ab o v e w ill b e those do cumen ts classid as do ctoral theses ab out metadata
and ha v e b een published at least once whic h s i exactly the u ser query
After obtaining he t corresp onding ata d for eac hon tology in v olv ed in the user query that data
m ust b e com bined to giv e an answ er to the user F or eac haswn er epresen ted as a r elation the
Query Pro cessor will transform the v alues i n the format of the u ser on tology b yin v oking he t ap
propriate transformer functions obtained from the IRM After this initial step the d iren t p artial
answ ers can b e correlated ince s all f o hem t are expressed in the language of the u ser o n tology he
correlation plan corresp onding to the translations illustrated ab o v eis
User Query Ob jects b O jectself title author o d umen c t ages p for AND do ctoralhesisef FILLS
k eyw ords etadata A TLEAST publisher
Stanford I Ob jects b O ectself j title author d o c umen t pages for AND do ctoralhesisef FILLS
k eyw ords etadata A TLEAST publisher Stanford I
Stanford Ob jects Ob jectself title author NULL n um b rfages e f or AND do ctoralhesis A TLEAST
publisher Stanford
WN Ob jects O b jectself name creator NULL pages for prin tedia FILLS con ten t hesis
con ten t o ctoral FILLS generalopics etadata W N
LSDIS Ob jects Ob jectself title authors o l c ationo cumen t ULL N for AND publications FILLS
t yp e do ctoral hesis FILLS sub ect j ET A T A L SDIS

AD
AND


ed
con
eyw
um
eywBased on the com bination of partial ranslations t the d ata retriev ed from the rep sitories o underlying
the on tologies can b e com bined as follo ws
User Query Ob jects Stanford I Ob jects WN Ob jects
Stanford Ob jects LSDIS Ob jects
Using h yp on yms and h yp ern yms to in terop erate across on tologies
Synon ym relationships b et w een erms t in indep enden t dev elop ed o n tologies are v ery i nfrequen t On
the con trary and real examples conm it hierarc hical relationships ik l e onyms and hyp ernyms
are found more frequen tly T he substitution of a term b yits h yp ern yms or h yms c hanges the
seman tics of the query e t ry to translate the nonranslated terms i n the user on tology i n to terms
hic h are not its synon yms in a target comp onen ton tology
W e substitute a n onranslated term b yteh in tersection of its immediate paren ts or the u nion
of its immediate c hildren The loss of information is measured in b oth cases and translation with
less loss of information is c hosen This metho d is applied recursiv ely n u til a full translation of the
concting term is obtained U sing h yp on ym and h yp ern ym relationships as describ ed ab o v ecan
result in sev eral p ossible translations f o a nonranslated term in to a target o n tology ery s imple
in tuitiv e measures ep d ending o n t he extensions of the terms in the nderlying u on tologies ma y h elp
in c ho osing the t ranslations and minim izing the loss of information
In order to obtain the mmediate i paren c hildren of a term in the target o n tology w o
diren t kinds of relationships related to the concting t erm m ust b e u sed
Synon yms h yp on yms and h yp ern yms b t e w een terms in he t user and target on tology
Synon yms h yp on yms and h yp ern yms in the u ser on tology
The st three t yp es of relationships are stored in the IRM rep sitory o T he second are r elation
ships b et w een terms n i t he same on tology s ynon yms a re equiv alen t erms t h yms re a those t erms
subsumed b y he t nonranslated term and h yp ern yms a re those terms that s ubsume the c oncting
term
The task of getting the immediate paren ts hildren is not easy to p erform T o o btain the
paren ts hildren within the user on tology the corresp onding functions subsumption of the
DL systems can b e used But w em c bine that answ er with the imm ediate paren ts hildren i n
the target on tology aking in to accoun t that some relationships stored n i the IRM an c b e redundan t
hey w ere indep enden tly deed b y iren d ton tologies administrators suc h a task can b e quite
diult W ew ould need a L D ystem s dealing ith w istributed on tologies
In Figure w esho wt w oon tologies with some relationships b et w een them arro ws are h yp on yms
relationships double arro ws are synon yms and ashed d lines are in teron tology relationships and
with the in tegrated n o tology synon yms are group ed in to one term on the igh r t W e c an see that
obtaining the immediate paren ts is not e viden t for instance to get the immedia te paren ts of B w e
m ust deduce that A is a c hild of B There are also redundan t elationships r l ik e t he one b et w een
A and B
T ow ork with the ab o v e r elationships i n a homogeneous w a y a n a pproac his toin tegrate the user
and the target on tologies and to se u the deductiv epo w er of the D L s ystem to obtain the immediate
paren ts hildren of a term in t he target on tology IGP The rop p erties b et w een terms in the
diren ton tologies are exactly the in teron tology relationships stored in the IRM so no in terv en tion
of the user is needed Although some o f t he previous relationships can b e redundan t the D L ystem s
will classify the terms in t he righ t place in the on tology okno w f i the resulting terms o f the
in tegrated on tology are primitive or dee d ep ending on A and B the rules describ ed in IG
can b e used
Conclusions
W eha v e discussed in this c hapter the implications of the exp onen tial gro wth of t he information on
the GI I on the semantic heter o geneity problem and xplored e new tec hniques to e nable a solution



om ust
on yp
and ts


on yp
hypA3
A1
A1
A3
B1
B1
A2#B3
INTEGRATION
A2 B3
A1
B2
A1
A4
B4 A5
B4
A4
A5
B2
A6
A6
Figure In tegrating t w oon tologies
to the same Information o v erload whic h arises as a consequence of the heterogeneit y of he t digital
data and media t yp es is iden tid as the st problem W e xplore e an approac h whereb y metadata
descriptions are used to abstract out the represen tational details nd a c haracterize the information
con ten t An informal classiation of he t v arious t yp es of metadata used to handle the ide w v ariet y
of digital data w as presen ted in Section The amoun t f o information con ten tcpratued b yeca h
is iden tid and domain sp e ci metadata are iden tid as critical to the seman tic heterogeneit y
problem
W e then discuss ho w approac hes ep d enden t n o represen tational or structural comp onen ts are in
adequate and argue the eed n for represen tation o f on c textual expressions in Section W e iscussed d
the represen tation of these expressions using d escription ogics l and prop ose op erations to reason
with con textual expressions W esoh who w extr a nformation i whic hma y not b e represen ted in the
database sc hema ma y b e r epresen ted u sing con textual descriptions W e illustrated ho wcon textual
expressions ma y b e onstructed c from domain sp eci n o tologies a nd ho w t erminological relationships
bet w een concepts in an on tology enable represen tation of extra information
W eha v e recognized the problem of vo c abulary haring s as the m ost critical problem n i con
struction of con textual escriptions d W e rop p ose approac hes o t act kle the seman tic heterogeneit y
s opp osed to represen tational eterogeneit h y at this lev el in Section Seman tic in terop erabilit y
across on tologies is enabled b y utilizing terminological relationships l ik e synonyms yp h ernyms and
onyms
W eha v eth us explored v arious approac hes ased b on etadata m con text and o n tologies whic h
w e b eliev e are imp ortan t nd a pro vide the required capabilities t o andle h the seman tic heterogeneit y
problem in the con text of the GI I This researc h is a part of the InfoQuilt pro ject ithin w he t theme of
Enabling Info c osm Sb F er at the arge L Scale istributed D Information Systems ab L oratory
ttpsdissgadu at the Univ ersit y of eorgia G Some of the in teresting researc h t opics that
are b eing in v estigated further in this this theme re a as follo ws
Use of domain sp e ci m etadata to enable correlation o f nformation i cross a image nd a struc
tured data A future extension of this pro ject will b e to lo ok in to use o f metadata standards
suc hasF GDC GIS O and d omain sp eci on tologies to escrib d e m ultim edia data
Extending the OBSER VER system t o e nable supp ort for hyp onyms and hyp ernyms
Measures to c haracterize the loss o f nformation i accrued when a term is replaced b y expres
sions with diring s eman tics These easures m are b eing dev elop ed and exp erimen ted w ithin
extended OBSER VER system

hyp Pro viding a metadataased reference link A M REF as an alternativ e t o the ph ysical
reference link A HREF his T is b eing implem ted a s a n e xtension to HTML n o the
WWW K This enables the publisher f o n a HTML o d cumen t o t sp ecify domain sp eci
metadata hic w h are then mapp ed to the nderlying u m ultim edia data b y the enhanced serv er
This w ould enable a igherev h el metadata based metatructure o v er the urren c tWWW
References
CHK Y Arens C Chee C Hsu and C noblo K c k Retrieving nd a In tegrating Data from
Multiple Information S ources International Journal of Intel igent l nd a Co op er ative In
formation Systems June
S J Anderson and M Stonebrak er Sequoia Metadata Sc hema for Satellite Images
in Sc
B A Borgida nd a R Brac hman Loading Data in to Description Reasoners In Pr o c e e dings
of A CM SIGMOD a y
BMR A Borgida R Brac hman D McGuinness and L Resnic k CLASSIC A structural data
mo del for ob jects n I Pr o c e e dings o f A CM SIGMOD
IG J Blanco A Illarramendi and A G o ni Building a F ederated Database System An
approac h using a Kno wledge Based System International Journal on Intel igent l a nd
op ative Information Systems Decem b e r
IGP J Blanco A Illarramendi A Go ni a nd J P erez Using a terminological ystem s to
in tegrate relational databases Information Systems Design and Hyp erme dia Cep adues
Editions

L T Bernersee e t a l W orldide W eb he T Information Univ erse onic Network
ing R ese ar ch pplic A ations and Policy
R K Bohm and T Rak o w Metadata for Multimedia Do cumen c nSK ts

HK F Chen M Hearst J Kupiec J P ederson and L Wilco x Metadata for Mixededia
Access i n Sc
HS C Collet M uhns H and W Shen Resource In tegration using a Large no K wledge Base
in Carnot IEEE Computer Decem b er

DF S Deerw ester S Dumais G F urnas T Landauer and R Hashman Indexing b y
Laten t Seman tic Indexing Journal of the A meric an So ciety f or Information Scienc e

H U y al and H w H ang View deition nd a Generalization or f Database In tegration of
a Multidatabase System IEEE T r ansactions on Softwar e Engine ering o N v em ber

er C F erguson In to the nfo i cosm Computerworld L e adership Series July
KN P ankhauser M Krac k er and E Neuhold Seman tic vs Structural resem blance of
Classes SIGMOD R e c or d sp e cial issue on Semantic Issues in Multidatab ases A S heth
ed Decem b er
ru T Grub er Theory BIBLIOGRAPHIC A T A Septem ber h ttpww
ksltanfordduno wledgeharingn tologi es i blio graphicata ndex tm l
SW U Gla vitsc h P c hauble and M W ec hsler Metadata or f In tegrating p S e ec h Do cumen ts
in a Text Retriev al System in Sc


tml

Da

ctr Ele

er Co


enM D bigner nd a D McLeo d A federated arc hitecture for Information Systems A CM
T r ansactions on O e Information Systems
M J Hammer and D cLeo M d An approac h to resolving S eman tic Heterogeneit yin a
Federation o f Autonomous Heterogeneous Database Systems International Journal of
Intel ligent and Co op er ative Information Systems arc h
H R Jain and A Hampapuram Represen tations of Video Databases in Sc
KH Y Kiy oki T Kitaga w a and T Ha y ama A m etaatabase System for Seman tic Im
age S earc hb y a Mathematical Mo del f o Meaning SIGMOD R e c or d e cial issue on
Metadata for Digital M e dia W laus K A Sheth eds Decem ber
M B Kahle and A Medlar An Information ystem S for C orp orate Users Wide Area
Information Serv ers Connexions The Inter op er ability R ep ort No v em ber
Sa V y ap and A heth S Seman ticsased Information B rok ering In Pr o c e e indgsofthe
Thir d nternational I Confer enc e o n Information and Know le dge Management IKM
v em b e r
Sb V y ap and A Sheth Seman ticsased Information Brok ering A tep s to w ards
realizingthe I nfo cosm T hnical Rep ort DCSR Departmen t o f Computer Science
Rutgers Univ ersit yrca h
Sc W Klaus and A Sheth etadata M for igital d m edia SIGMOD R e c or d sp e cial issue on
Metadata for Digital M e dia W laus K A Sheth eds Decem ber
S V Kash y ap and A Sheth Sc hematic and Seman tic imilariti S b et w een D atabase b O
jects A Con textased Approac h T ec hnical Rep ort TRS LSDIS Lab Univ er
sit y f o Georgia Jan uary Av ailable t a h ttpsdissgaduami tcon text
algebras An abridged v ersion S a pp ears in the VLDB J ournal
S V Kash y ap and A Sheth Seman tic and Sc hematic Similarities b et w een Databases
Ob jects A Con textased approac h The VLDB Journal Octob er T o
app ear h ttpwwsgaduSDISam i tLD Bs
SS V Kash y ap K Shah and A Sheth Metadata or f building he t MultiMedia atc P h Quilt
In S Ja jo dia and V S ubrahmanian editors MultiMe dia Datab ase S ystems Issues and
R ese ch Dir e ctions pringer S V erlag
A W Lit win and A Ab dellatif Multidatabase I n terop erabilit y IEEE Computer
Decem b r e
G D Lenat and R V Guha Building L ar dge Base d ystems S R epr esentation
and Infer enc einthe CcPy r oje ct Addison esley Publishing C ompan y I nc
NE J Larson S Na v athe and R Elmasri A heory T of Attribute Equiv alence in Databases
with Application to Sc hema In tegration IEEE T r ansactions on Softwar e E ngine ering

cC J McCarth y Notes on formalizing Con text In Pr o c e e dings of the International oint J
Confer enc eonA rtiial Intel ligenc e
il G Miller W ordNet A exical L Database for English Communic ations of the A CM
No v em ber
KIS E Mena V Kash y ap A Illarramendi and A Sheth M anaging Multiple Information
Sources through On tologies Relationship b e t w een V o c abulary Heterogeneit y nd a Loss
of Information In Pr o c e e dings of he t workshop on Know le dge R esentation me ets
Datab ases in c onjunction with Eur e an Confer eon A rtiial Intel igenc l e August


enc op
epr


le Know ge
ar
es


ec
Kash
No
Kash


sp

HeimKSI E Mena V K ash y ap A Sheth and A I llarramendi OBSER VER An approac h
for query pro cessing i n g lobal information systems ased b on n i terop ration e across pre
existing on tologies In Pr o c e e dings o f the First I F CIS International Confer eonCo
op er ative Information Systems o opIS June
S D McLeo d a nd A Si The Design and Exp erimen tal E v aluation of an Information
Disco v ery Mec hanism for Net w orks of Autonomous Database Systems In Pr o c e e dings
of the th IEEE Confer enc e on Data Engine ering ebruary
M J Ordille and B Miller Distributed Activ e C atalogs and Metaata Cac hing in Descrip
tiv e Name Services In Pr o c e e dings f o t he th International Confer e on Distribute d
Computing Systems a y
G A Sheth and S ala G A ttribute relationships An imp edimen t n i automating Sc hema
In tegration In Pr o c e e dings f o the NSF Workshop o n Heter o gene ous Datab ases Decem ber

he A Sheth F ederated Database Systems or f managing Distributed Heterogeneous and
Autonomous Databases T utorial Notes the th VLDB Confer enc eeempt b e r
ho Y Shoham V arieties of Con text
K A Sheth and V Kash y So F c hematically y et So Near Seman tically Invite d
p er in Pr o c e e dings of the IFIP TCWG Confer enc e o n S emantics of Inter op er able
Datab ase ystems S DS o v em b er In IFIP Transactions A North Holland

K A Sheth and V Kash y ap Mediandep enden t C orrelation o f I nformation What
w In Pr o c e e dings of the First IEEE Metadata Confer e April
h ttpsdissgaduk ash y apEEEpap er
L A Sheth and J Larson F ederated atabase D ystems S for managing Distributed Het
erogeneous and Autonomous Databases A CM Computing S urveys eptem S ber


LS K Sho ens A Luniewski P c h w artz J Stamos and J Thomas T he Rufus System
Information Organization for Semitructured Data In Pr o c e e dings of the h VLDB
Confer enc e S eptem b r e
SKS L Shklar A Sheth V Kash y ap nd a K Shah Infoharness Use of Automatically Gen
erated Metadata for Searc h and Retriev al of Heterogeneous Information In Pr o c e e dings
of CA iSE June Lecture Notes in C omputer Science
SR E Sciore M Siegel and A Rosen thal Con text In terc hange using Metattributes In
Pr o c e e dings of the CIKM
ie G Wiederhold In terop eration Mediation and On tologies F GCS W orkshop on
Heter o gene ous C o er ative Know e l dgeases Decem b e r

op
In

enc Ho

ap
ar ap


enc

encA The LSDIS on tology
PUBLICATIONS
SUBJECT-BASED TYPE-BASED
WORKFLOW-PUB
JOURNALS
CONSISTENCY-PUB THESIS
CONFERENCES
INTEGRATION-PUB TECHNICAL-REPORTS
INFORMATION-MODELING-PUB
METADATA-PUB
B WN A subset of the W ordNet on tology
PRINT-MEDIA
PUBLICATION JOURNALISM
PRESS
FLEET-STREET WIRE-SERVICE
NEWSPAPER MAGAZINE
PHOTOJOURNALISM
DAILY
PULP-MAGAZINE COMIC-BOOK
SLICK-MAGAZINE
BOOK
PERIODICAL
PICTORIAL SERIES JOURNALS
TEXTBOOK
TRADE-BOOK BROCHURE
MONTHLY
WEEKLY
QUATERLY
BEST-SELLER
TICKET-BOOK CRAMMER PRIMER
REFERENCE-BOOK SONGBOOK
PRAYER-BOOK
BREVIARY MISSAL
DIRECTORY
COOKBOOK
BOOK-OF-PSALMS
ENCYCLOPEDIA
PHONE-BOOK BLUE-BOOK
INSTRUCTION-BOOK HANDBOOK
ANNUAL
WORDBOOK
FARMERS-CALENDAR
ALMANAC
BIBLE GUIDEBOOK
DICTIONARY THESAURUS
MANUAL
ROADBOOK TRAVEL-BOOK
BILINGUAL-DICTIONARY POCKET-DICTIONARY
ETYMOLOGICAL-DICTIONARY
REFERENCE-MANUAL
INSTRUCTIONS
C Stanford A subset of he t Bibliographic Data on tology
BIBLIO-THING
AGENT CONFERENCE
DOCUMENT
PERSON AUTHOR ORGANIZATION
PUBLISHER UNIVERSITY
TECHNICAL-REPORT
PROCEEDINGS
MISCELLANEOUS-PUBLICATION
BOOK
PERIODICAL-PUBLICATION
TECHNICAL-MANUAL MULTIMEDIA-DOCUMENT
EDITED-BOOK
JOURNAL NEWSPAPER
CARTOGRAPHIC-MAP
THESIS COMPUTER-PROGRAM
MAGAZINE
ARTWORK
MASTER-THESIS DOCTORAL-THESIS
D Stanford I A subset of t he Bibliographic Data on tology
REFERENCE
PUBLICATION-REF
NON-PUBLICATION-REF
PERSONAL-COMMUNICATION-REF GENERIC-UNPUBLISHED-REF
BOOK-REF
TECHNICAL-REPORT-REF
EDITED-BOOK-REF
PROCEEDINGS-PAPER-REF
BOOK-SECTION-REF
MISC-PUBLICATION-REF
ARTICLE-REF
TECHNICAL-MANUAL-REF MULTIMEDIA-DOCUMENT-REF
JOURNAL-ARTICLE-REF NEWSPAPER-ARTICLE-REF
COMPUTER-PROGRAM-REF
CARTOGRAPHIC-MAP-REF
THESIS-REF
MAGAZINE-ARTICLE-REF
ARTWORK-REF
DOCTORAL-THESIS-REF MASTER-THESIS-REF