Sharing and Browsing
Linguistic Data
EMELD Arizona:
Terry Langendoen
Scott Farrar
Since Santa Barbara
Focus on morpho
-
syntax
Decided to build ontology (to be
discussed later in this talk)
Decided to build supporting tools
–
smart search engine (Hedwig)
–
editor
Some work on xml markup
The Problem
Currently there is no general way for
researchers in the endangered
languages community to
electronically share information.
The Web is the most likely tool that
could provide a solution.
The current WWW is not adequate.
An Example from the WWW:
Further Complications
What about other data formats?
–
lexicons
–
grammatical descriptions
–
(comparative) word lists
–
paradigms
–
etc.
Warumungu Description
'Grammatical case suffixes' are those which
express grammatical relations (subject,
object, indirect object), like /karriny
-
ji/ in
(4). A noun without a case suffix is
interpreted as having Absolutive case
-
/nanttu/ in (4) and /wangarri/ in (5)
-
or
as being the main predicator, or as
agreeing with some argument with
Absolutive case
-
/kumppu/ and
/pulyurrulyurru/ in (5).
(from J. Simpson 1998)
(4)
Karriny
-
ji +ajjul nyirri
-
njina nanttu, ngapa
-
kajji.
people
-
ERG +3pl.S put
-
PAST.CONT humpy, water
-
LEST
'The people were erecting humpies for fear of the rain.'
[JS:PND:RS]
(5)
Nyirri
-
nyi +ama wangarri kumppu pulyurrulyurru.
place
-
PAST.PUN +he rock ABS big.ABS red.ABS
'He placed a big red hill.' [JS:PND:RS]
Chichewa Description
Other elements that appear as verbal
prefixes include modals
–
for
instance,
-
ngo
-
'just, merely'
–
as
well as directional elements
-
ka
-
'go'
and
-
dza
-
'come'. These are placed
in the immediate pre
-
OM position,
after the tense. This is shown by the
following:
(from Mchombo 1998)
(8a)
Mkângo s
-
ú
-
ná
-
ngo
-
wá
-
phwány
-
a maûngu . . .
3
-
lion NEG
-
3SM
-
past
-
just
-
6OM
-
smash
-
fv 6
-
pumpkins . . .
'The lion did not just smash them, the pumpkins . . .'
(8b)
Mkângo u
-
ku
-
ká
-
phwány
-
á máûngu.
3SM
-
pres.
-
go
-
smash
-
fv 6
-
pumpkins
'The lion is going to smash some pumpkins.'
A Solution
Take advantage of new Web
technology
Build a community of practice on the
Semantic Web
What is the Semantic Web?
The Semantic Web
New markup: <xml>, <rdf>, <owl>
New tools: smart search engines
ontologies, new editors
Meaning is encoded explicitly.
Pages are interpreted by a reasoner.
An Example from the Semantic
Web
New markup adds functionality to
existing <html> documents.
Example:
<rdf:Description rdf:about="
#A110604
">
<rdf:type rdf:resource="
#State
" />
<NS0:name>
Tennessee
</NS0:name>
</rdf:Description>
<rdf:Description rdf:about="
#876555
">
<rdf:type rdf:resource="
#Language
" />
<EMELD:name>
Navajo
</EMELD:name>
</rdf:Description>
Aardvark
nocturnal burrowing mammal of the grasslands of Africa that feeds on
termites; sole extant representative of the order Tubulidentata WordNet for
'aardvark'
Nouns:
1. nocturnal burrowing mammal of the grasslands of Africa that feeds on
termites; sole extant representative of the order Tubulidentata
Synonyms: aardvark,ant_bear,anteater,Orycteropus_afer
Verbs:
Adjectives:
Adverbs:
<html><head>
<rdf:RDF
…
<Word rdf:about="aardvark">
<hasSense rdf:resource="9385"/>
</Word>
<SynSet rdf:about="9385">
<type rdf:resource="noun"/>
<rdfs:comment>nocturnal burrowing mammal of the grasslands of Africa that
feeds on termites; sole extant representative of the order Tubulidentata
</rdfs:comment>
<hasElement rdf:resource="aardvark"/>
<hasElement rdf:resource="ant_bear"/>
<hasElement rdf:resource="anteater"/>
<hasElement rdf:resource="Orycteropus_afer"/>
</SynSet>
</rdf:RDF>
</head><body>
WordNet for 'aardvark'<br><br>
Nouns:<br><br>
1. nocturnal burrowing mammal of the grasslands of Africa that
feeds on termites; sole extant representative of the order Tubulidentata<br>
Synonyms: aardvark,ant_bear,anteater,Orycteropus_afer<br><br>
Verbs:<br><br>
Adjectives:<br><br>
Adverbs:<br><br>
</body></html>
The Ontology
Crucial component of the Semantic
Web
A resource that explicitly defines
what entities can exist in a domain,
i.e., the endangered languages
community
A resource that defines what
relations hold between entities
demo
OWL Web Ontology Language
Analogous role of <html> on the
WWW
The most current “standard”
Semantic Web language
Under development at the W3C:
www.w3c.org
Facilitating Tools
Search tools for the Semantic Web
Editors for composing Semantic Web
pages
Reasoning engines
An extensible data model
A Search Engine
EMELD Arizona’s prototype (Hedwig)
http://emeld.douglass.arizona.edu:
8080/searchindex.html (temporarily
out of service)
demo on Sunday
An Editor
EMELD Arizona’s prototype (name?)
demo on Sunday
A Good Data Model for Creating a
Community of Practice
Language data should be searchable
and comparable
—
broad access
(centralized).
Authors or communities want control
over their data (local/distributed).
Local control should be balanced with
data interoperability (Semantic
Web).
Centralized Model
Warumungu
Wari
Mocovi
Biao Min
Archi
Hopi
Community
Local Control with Broad Access
Semantic Web
ontology
Wari
<xml>
Hopi
<xml>
Archi
<xml>
Community
tools
tools
tools
Community Requirements
No need to standardize your
terminology or abandon tradition.
No need to learn <xml> (it doesn’t
hurt!)
Use EMELD tools to put your data on
the Semantic Web
Maintain your data
Contact Info
Terry Langendoen
Scott Farrar
langendt@u.arizona.edu
farrar@u.arizona.edu
See our website:
http://emeld.douglass.arizona.edu:8080
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Comments 0
Log in to post a comment