Sharing and Browsing

observancecookieSecurity

Nov 5, 2013 (4 years and 2 days ago)

50 views

Sharing and Browsing
Linguistic Data

EMELD Arizona:

Terry Langendoen

Scott Farrar

Since Santa Barbara


Focus on morpho
-
syntax


Decided to build ontology (to be
discussed later in this talk)


Decided to build supporting tools


smart search engine (Hedwig)


editor


Some work on xml markup

The Problem


Currently there is no general way for
researchers in the endangered
languages community to
electronically share information.


The Web is the most likely tool that
could provide a solution.


The current WWW is not adequate.


An Example from the WWW:

Further Complications


What about other data formats?


lexicons


grammatical descriptions


(comparative) word lists


paradigms


etc.

Warumungu Description

'Grammatical case suffixes' are those which
express grammatical relations (subject,
object, indirect object), like /karriny
-
ji/ in
(4). A noun without a case suffix is
interpreted as having Absolutive case
-

/nanttu/ in (4) and /wangarri/ in (5)
-

or
as being the main predicator, or as
agreeing with some argument with
Absolutive case
-

/kumppu/ and
/pulyurrulyurru/ in (5).

(from J. Simpson 1998)

(4)

Karriny
-
ji +ajjul nyirri
-
njina nanttu, ngapa
-
kajji.

people
-
ERG +3pl.S put
-
PAST.CONT humpy, water
-
LEST

'The people were erecting humpies for fear of the rain.'
[JS:PND:RS]


(5)

Nyirri
-
nyi +ama wangarri kumppu pulyurrulyurru.

place
-
PAST.PUN +he rock ABS big.ABS red.ABS

'He placed a big red hill.' [JS:PND:RS]

Chichewa Description

Other elements that appear as verbal
prefixes include modals


for
instance,
-
ngo
-

'just, merely'


as
well as directional elements
-
ka
-

'go'
and
-
dza
-

'come'. These are placed
in the immediate pre
-
OM position,
after the tense. This is shown by the
following:

(from Mchombo 1998)


(8a)

Mkângo s
-
ú
-

-
ngo
-

-
phwány
-
a maûngu . . .

3
-
lion NEG
-
3SM
-
past
-
just
-
6OM
-
smash
-
fv 6
-
pumpkins . . .

'The lion did not just smash them, the pumpkins . . .'

(8b)

Mkângo u
-
ku
-

-
phwány
-
á máûngu.

3SM
-
pres.
-
go
-
smash
-
fv 6
-
pumpkins

'The lion is going to smash some pumpkins.'

A Solution


Take advantage of new Web
technology



Build a community of practice on the
Semantic Web



What is the Semantic Web?


The Semantic Web


New markup: <xml>, <rdf>, <owl>



New tools: smart search engines




ontologies, new editors



Meaning is encoded explicitly.



Pages are interpreted by a reasoner.

An Example from the Semantic
Web


New markup adds functionality to
existing <html> documents.


Example:

<rdf:Description rdf:about="
#A110604
">



<rdf:type rdf:resource="
#State
" />



<NS0:name>
Tennessee
</NS0:name>


</rdf:Description>

<rdf:Description rdf:about="
#876555
">



<rdf:type rdf:resource="
#Language
" />



<EMELD:name>
Navajo
</EMELD:name>


</rdf:Description>

Aardvark


nocturnal burrowing mammal of the grasslands of Africa that feeds on
termites; sole extant representative of the order Tubulidentata WordNet for
'aardvark'


Nouns:



1. nocturnal burrowing mammal of the grasslands of Africa that feeds on


termites; sole extant representative of the order Tubulidentata


Synonyms: aardvark,ant_bear,anteater,Orycteropus_afer


Verbs:


Adjectives:


Adverbs:



<html><head>

<rdf:RDF



<Word rdf:about="aardvark">


<hasSense rdf:resource="9385"/>

</Word>

<SynSet rdf:about="9385">


<type rdf:resource="noun"/>


<rdfs:comment>nocturnal burrowing mammal of the grasslands of Africa that

feeds on termites; sole extant representative of the order Tubulidentata


</rdfs:comment>


<hasElement rdf:resource="aardvark"/>


<hasElement rdf:resource="ant_bear"/>


<hasElement rdf:resource="anteater"/>


<hasElement rdf:resource="Orycteropus_afer"/>

</SynSet>

</rdf:RDF>

</head><body>

WordNet for 'aardvark'<br><br>

Nouns:<br><br>

&nbsp;&nbsp;1. nocturnal burrowing mammal of the grasslands of Africa that

feeds on termites; sole extant representative of the order Tubulidentata<br>

&nbsp;&nbsp;Synonyms: aardvark,ant_bear,anteater,Orycteropus_afer<br><br>

Verbs:<br><br>

Adjectives:<br><br>

Adverbs:<br><br>

</body></html>

The Ontology


Crucial component of the Semantic
Web


A resource that explicitly defines
what entities can exist in a domain,
i.e., the endangered languages
community


A resource that defines what
relations hold between entities


demo

OWL Web Ontology Language


Analogous role of <html> on the
WWW


The most current “standard”
Semantic Web language


Under development at the W3C:


www.w3c.org


Facilitating Tools


Search tools for the Semantic Web


Editors for composing Semantic Web
pages


Reasoning engines


An extensible data model

A Search Engine


EMELD Arizona’s prototype (Hedwig)



http://emeld.douglass.arizona.edu:


8080/searchindex.html (temporarily
out of service)



demo on Sunday

An Editor


EMELD Arizona’s prototype (name?)




demo on Sunday

A Good Data Model for Creating a
Community of Practice


Language data should be searchable
and comparable

broad access
(centralized).


Authors or communities want control
over their data (local/distributed).


Local control should be balanced with
data interoperability (Semantic
Web).


Centralized Model

Warumungu

Wari

Mocovi

Biao Min

Archi

Hopi

Community

Local Control with Broad Access

Semantic Web

ontology

Wari

<xml>

Hopi

<xml>

Archi

<xml>

Community

tools

tools

tools

Community Requirements


No need to standardize your
terminology or abandon tradition.


No need to learn <xml> (it doesn’t
hurt!)


Use EMELD tools to put your data on
the Semantic Web


Maintain your data


Contact Info


Terry Langendoen


Scott Farrar







langendt@u.arizona.edu



farrar@u.arizona.edu



See our website:





http://emeld.douglass.arizona.edu:8080