NLP Interchange Format

addictedswimmingAI and Robotics

Oct 24, 2013 (3 years and 8 months ago)

193 views

www.sti
-
innsbruck.at


© Copyright 2008 STI INNSBRUCK
www.sti
-
innsbruck.at

NLP Interchange Format

José M. García

www.sti
-
innsbruck.at

Outline


What is NIF?


Design requirements


URI schemes


NIF ontologies


Use cases


Relationship with ELRA


Roadmap for NIF 2.0


Conclusions


2

www.sti
-
innsbruck.at

What is NIF?


N
atural Language Processing
I
nterchange
F
ormat



NIF
is an RDF/OWL
-
based format that aims to achieve interoperability
between Natural Language Processing (NLP) tools, language
resources and
annotations.



Building blocks


URI scheme for identifying elements in texts


Ontology for describing common NLP terms



Created and maintained by AKSW group of University of Leipzig, during
the LOD2 EU project.



Community project:
http
://persistence.uni
-
leipzig.org/nlp2rdf/



3

www.sti
-
innsbruck.at

NIF design requirements

Compatibility
with RDF

Coverage

Structural
Interoperability

Conceptual
Interoperability

Granularity

Provenance and
Confidence

Simplicity

Scalability

4

www.sti
-
innsbruck.at

URI schemes


Text needs to be
referenceable

by URIs



With URI references text can be used as resources in RDF statements



NIF distinguishes:


Documents


Text of the document


Substrings of the text.



URI scheme is an algorithm to create IDs for text and substrings



URI elements


Document URI


Separator


Character indices

5

www.sti
-
innsbruck.at

RFC 5147


Canonical URI scheme for NIF is based on RFC 5147



It standardizes fragment identifiers for text/plain media type


6

http://www.w3.org/DesignIssues/LinkedData.html

www.sti
-
innsbruck.at

RFC 5147


Canonical URI scheme for NIF is based on RFC 5147



It standardizes fragment identifiers for text/plain media type


7

http://www.w3.org/DesignIssues/LinkedData.html

http://www.w3.org/DesignIssues/LinkedData.html#char=0,26610

www.sti
-
innsbruck.at

RFC 5147


Canonical URI scheme for NIF is based on RFC 5147



It standardizes fragment identifiers for text/plain media type


8

http://www.w3.org/DesignIssues/LinkedData.html

http://www.w3.org/DesignIssues/LinkedData.html#char=0,26610

http://www.w3.org/DesignIssues/LinkedData.html#char=1206,1218

www.sti
-
innsbruck.at

NIF Core Ontology


Classes and properties to describe relation between


Documents


Text


Substrings


Corresponding URI schemes



9

www.sti
-
innsbruck.at

NIF Core Ontology


Additional classes and properties (unstable/testing)



More URI schemes



Text structure (words, sentences, paragraphs…)



Part of Speech (POS)



Annotations with
Stanbol



Confidence



10

www.sti
-
innsbruck.at

Workflows, Modularity and Extensibility of NIF


Workflows for NLP integration


Normalization


Tokenization


Merge RDF annotations



11

www.sti
-
innsbruck.at

Workflows, Modularity and Extensibility of NIF


NIF ontology logical modules


Terminological model


Inference model


Validation model



Vocabulary modules


FISE


ITS


OLiA


NERD





12

www.sti
-
innsbruck.at

Workflows, Modularity and Extensibility of NIF


Granularity profiles

13

www.sti
-
innsbruck.at

ITS Use Case


The Internationalization Tag Set 2.0 is a W3C working draft that is
becoming a Recommendation.



ITS standardizes HTML and XML attributes which can be used to
annotate nodes with processing information for language service
providers (i18n, l10n)



ITS 2.0 RDF ontology was developed using NIF, including a round
-
trip
conversion algorithm from ITS to NIF.



NIF is expected to receive wide adoption by translation & language
service providers



ITS 2.0 RDF ontology provides properties which can be used to provide
best practices for NLP annotations.



14

www.sti
-
innsbruck.at

OLiA

Use Case


The
O
ntologies of
Li
nguistic
A
nnotation provide stable identifiers for
morpho
-
syntactical annotation tag sets, so that NLP tools can use these
ids for better interoperability.



OLiA

provides Annotation Models and a Reference Model, comprising
more than 110 OWL ontologies for over 34 tag sets in 69 languages



Features


Documentation


Flexible Granularity


Language Independence



NIF provides two properties


nif:oliaIndividual

(links a
nif:String

to an
OLiA

Annotation Model)


n
if:oliaCategory

(links to the Reference Model)

15

www.sti
-
innsbruck.at

RDFaCE

Use Case


RDFa

C
ontent
E
ditor is a rich text editor that supports WYSIWYM
authoring including various views of the semantically enriched textual
content.








It combines results of different NLP APIs for automatic content
annotation


Heterogeneous APIs access, URI generation and output data structure


Solution: server
-
side proxy, hard
-
coded input and connection of each API.



NIF simplified the integration, adding an interoperability layer

16

www.sti
-
innsbruck.at

What is ELRA?


E
uropean
L
anguage
R
esources
A
ssociation



http://www.elra.info



Effort to make available Language Resources (LR) for language
engineering and to evaluate language engineering technologies.



LR marketplace



Related organizations


ELDA (ELRA’s operational body)


LREC conferences

17

www.sti
-
innsbruck.at

What is ELRA?

18

www.sti
-
innsbruck.at

Relationship with NIF


Different objectives



LR written resources (esp. Corpora) can be annotated with NIF for
further interoperability and integration with NLP tools



ADVANTAGE
: Large test data collection to evaluate NLP tools



DISADVANTAGE
: Cost of LR (though there are free ones)

19

www.sti
-
innsbruck.at

Roadmap for NIF 2.0


Release of NIF 1.0


DONE

(Nov 2009)



Release of NIF 2.0 Draft


CURRENT

effort on solving pending issues


Adoption in ITS 2.0 W3C (soon
-
to
-
be) Recommendation


NIF
-
Core ontology is becoming stable


RLOG
-

an RDF Logging Ontology


NIF Validator software available



Release of NIF 2.0 Core



Release of NIF 2.0 Extensions


ITS ontology, PROV ontology, Lemon Ontology, NERD, UIMA, MARL opinion
ontology…



20

www.sti
-
innsbruck.at

Conclusions


NIF allows to integrate NLP tools using Linked Data



Ongoing effort



Many adopters and supporters


LOD2 EU project


Several W3C working groups


Named Entity Recognition and Disambiguation (NERD)


Ontologies of Linguistic Annotation (
OLiA
)






27 different implementations and use cases


Some available at
http://persistence.uni
-
leipzig.org/nlp2rdf/


21

www.sti
-
innsbruck.at


© Copyright 2012 STI INNSBRUCK
www.sti
-
innsbruck.at

Thanks for your attention

Q
uestions?

22

www.sti
-
innsbruck.at

References

1.
http://persistence.uni
-
leipzig.org/nlp2rdf
/


2.
Integrating NLP using Linked Data

by

Sebastian Hellmann,

Jens
Lehmann,

Sören

Auer, and

Martin
Brümmer

in

12th International
Semantic Web Conference, 21
-
25 October 2013, Sydney, Australia

23