The Future of KYOTO

jazzydoeSoftware and s/w Development

Oct 30, 2013 (3 years and 8 months ago)

88 views

N. Calzolari

1

2nd KYOTO Workshop, Gifu, Japan, January 2011

Nicoletta Calzolari


Istituto di Linguistica Computazionale


CNR


Pisa


glottolo@ilc.cnr.it

The Future of KYOTO

… with some historical notes to show a path along an evolving vision

Language Resources





in today EU context: META
-
SHARE, ...

Why

such needed LRs,
are lacking


after 30 years of R&D in the field?



1)
Because the main trend until
mid
-
’80s

was to privilege the processing of
so
-
called

“critical” phenomena
,

studied by the dominating linguistic
theories, rather than focusing on the deep analysis of the real uses of a
language


As

a

result

CL

was

focusing

on
:



few

examples

-

often

artificially

built


lexicons

made

of

few

entries

(
toy

lexicons
)


grammars

with

poor

coverage




2
)

B
ecause

large
-
scale

LRs

are

costly

&

their

production

requires

a

big

organizing

effort

N. Calzolari

2

2nd KYOTO Workshop, Gifu, Japan, January 2011

Old slide with Antonio Zampolli (’80s/early ‘90s)

Why

we
still

lack them??

… back from the early ‘80s


It became evident that:

Part of the results of meaning extraction
, e.g. many meaning distinctions, which could be
generalised over lexicographic definitions and automatically captured,

were
unmanageable at the formal representation level
, and had to be blurred into unique
features and values

Unfortunately, it is
still today

difficult to constrain word
-
meanings within a
rigorously defined organization
: by their very nature they tend to evade any strict
boundaries

N. Calzolari

3

2nd KYOTO Workshop, Gifu, Japan, January 2011

Automatic acquisition of lexical information from MRDs

Was my first research & became central in the
Pisa
group (
ACQUILEX
)

And also

Amsler
, Briscoe,
Boguraev
,
Wilks
’ group,
IBM
, then
Japanese

groups, …

The trend was: “
large
-
scale computational methods for the transformation of
machine readable dictionaries into machine tractable
dictionaries


Instead of relying on linguists’ introspection

Pioneering

Research

Historical notes


Automatic acquisition of info

from texts:

This trend

has become
today a consolidated & pervasive fact

From acquisition of
“linguistic information”

To acquisition of
“general knowledge”,

with more data intensive, robust,
reliable methods

N. Calzolari

4

2nd KYOTO Workshop, Gifu, Japan, January 2011

… back from the late ‘80s

After acquisition from MRDs,

Historical notes

Need of adequate
models

to handle
actual usage of
language

Lesson
learned

(
IN
-
)Adequacy of
(current)
lexicons

Lesson
learned

Going from core sets
to large
coverage

has implications not just
in quantitative terms, but more
interestingly in terms of
changes to
the models and the strategies of
processes

Lesson
learned

N. Calzolari

5

2nd KYOTO Workshop, Gifu, Japan, January 2011


All started with the situation we had in the
late ‘80s


early ‘90s


With all the



5

MultiLex

GeneLex

AcquiLex

Xxx
-
Lex

A. Zampolli: Let’s be coherent:

Xxx
-
Lex

After the “Grosseto Workshop” (1985):
a turning point


EAGLES

ISLE


ISO LMF

Lexical
Markup

Framework

N. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January 2011

6

Structural
skeleton, with
the basic
hierarchy of
information in a
lexical entry

+
various
extensions


Modular framework


LMF
specs comply with modelling UML
principles


an XML DTD allows implementation

Builds on

EAGLES/ISLE

NEDO

Asian


Lang.uages

NICT
Language
-
Grid
Service
Ontology

ICT

KYOTO

LIRICS

New

initiatives



LexInfo

N. Calzolari

7

2nd KYOTO Workshop, Gifu, Japan, January 2011

KYOTO

A search environment using semantic technologies

A “compass” for the web2.0


Interdisciplinarity


scientific community (LRT, web
technologies, knowledge engineers),
companies, domain experts


Multilingualism


7 languages (2 Asiatic
languages)

needs to share lexical/knowledge bases & tools

both general &
domain
-
related

under

the form of
lexical/ontological & sw
repositories

Kyoto Core System
is
open &
free

The “resource” perspective

Annotation Format
(KAF
)

Multi
-
level Annotation Format


stand
-
off

annotation


uniform

representation for 7 languages


Shared through the languages



Text
: tokenisation, sentences, paragraphs with
reference to the sources


Terms
: words & multi
-
words, parts
-
of
-
speech, etc.


Chunks
: constituents & syntagmatic
realization


Dependencies
:
grammatical

functions



L1


Semantic modules
:
M
ultiword tagging,
Sense Tagging, Named Entity Recognition,
OntoTagging




L2


Semantic module
: event/fact extraction



N. Calzolari

8

2nd KYOTO Workshop, Gifu, Japan, January 2011

from
Piek

Vossen

N. Calzolari

9

2nd KYOTO Workshop, Gifu, Japan, January 2011

KYOTO System &

Adoption of Standards

Linear

MAF/SYNAF

Linear

SEMAF

Term extraction

Tybot

Generic

TMF

Semantic annotation

Linear

Generic

FACTAF

Fact extraction

Kybot


Domain editing

Wikyoto


Wordnet

Domain Wordnet

LMF API

Ontology

Domain ontology

OWL API

Concept

User

Fact

User

from
Piek

Vossen

Source

Documents

Could be at the
basis of a new
standard?

2nd KYOTO Workshop, Gifu, Japan, January 2011

A common representation format for
WordNets


similar but not identical

hampered interoperability


to be accessed both intra
-

and
inter
-
linguistically

to
support easier integration

Wn

IT

Wn
EN

Wn
EU

Wn
NL

Wn
JP

Wn
CH

Wn
ES


endow
WordNet

with a
representation format allowing
easy access, integration &
interoperability
among resources



Wn

IT

Wn
EN

Wn
EU

Wn
NL

Wn
JP

Wn
CH

Wn
ES

2nd KYOTO Workshop, Gifu, Japan, January 2011

N. Calzolari

11

GlobalInformation

Lemma

Monolingual

ExternalRef

Monolingual

ExternalRefs

Sense

LexicalEntry

Statement

Definition

SynsetRelation

SynsetRelations

Monolingual

ExternalRef

Monolingual

ExternalRefs

Synset

Lexicon

Interlingual

ExternalRef

Interlingual

ExternalRefs

SenseAxis

SenseAxes

LexicalResource

1..1

1..*

0..1

1..*

1..*

1..1

0..*

0..1

1..*

Meta

0..1

0..1

Meta

0..1

0..1

Meta

Meta

0..1

Meta

0..*

0..1

0..1

0..1

1..*

1..*

0..*

0..1

1..*

A common representation format:
Data

Categories

from Monica
Monachini

2nd KYOTO Workshop, Gifu, Japan, January 2011

Towards a Centralized
A list of 85
sem.rels

as a result

of a mapping of the KYOTO

WordNet

grid

Inter
-
WN

Intra
-
WN

N. Calzolari

12

2nd KYOTO Workshop, Gifu,
Japan, January 2011

N. Calzolari

13

SWN

<fuego_3, llama_1>

09686541
-
n

<!ELEMENT SenseAxes (SenseAxis+)>

<!ELEMENT SenseAxis (Meta?, Target+,
InterlingualExternalRefs?)>

<!ATTLIST SenseAxis

id ID #REQUIRED

relType CDATA #REQUIRED>

<!ELEMENT Target EMPTY>

<!ATTLIST Target

ID CDATA #REQUIRED>

<!ELEMENT InterlingualExternalRefs
(InterlingualExternalRef+)>

<!ELEMENT InterlingualExternalRef (Meta?)>

<!ATTLIST InterlingualExternalRef

externalSystem CDATA #REQUIRED

externalReference CDATA #REQUIRED

relType (at|plus|equal) #IMPLIED>

IWN

<fuoco_1, fiamma_1>

00001251
-
n

WordNet
-
LMF Multilingual level
-

Cross
-
lingual Relations

WN3.0

<
fire_1 flame_1 flaming_1
>

13480848
-
n

groups monolingual synsets
corresponding to each other
and sharing the same
relations to English

link to ontology/(ies
)


specifies the type of
correspondence

from Monica
Monachini

N. Calzolari

14

2nd KYOTO Workshop, Gifu, Japan, January 2011

Complex picture!

Is there anything we need to do for Interoperability?

Work within ISO:


LMF:


abstract meta
-
model for lexical representation


Ontology Group

or more Group
s
?


Language Resource
Ontologies
: ontology of data categories

Real life:


Lexicons (e.g.
WordNets
) that are called
Ontologies


Lexicons linked to
Ontologies
: to be used in applications, in multilingual
systems, domains, …


Work on “
ontologising
” Lexicons: to allow exploiting various relations, to
make inferences, …


Semantic Lexicons, with many types of relations among semantic units: these
are often of “conceptual/world
-
knowledge” nature. Do we want DCs for
these?

ISO SC 4/WG 4


Lexicon
-
Ontology relations

New work item:

PWI 24622

KYOTO can
contribute

N. Calzolari

15

2nd KYOTO Workshop, Gifu, Japan, January 2011

To explore the need of doing something within ISO
about the relations between Lexicon and Ontology

Do we/ISO need to address another (lexical) layer?


How lexicons and ontologies are linked and information mapped from one to
the other



The
ontological layer

in a/connected to a lexicon

Possible issues/questions
:


Is LMF enough to represent Ontological links?


How to connect work being done in ISO Lexical group and ISO
Ontology groups?


Lexicon and Ontologies: separation? or lexicalised ontologies? or ontologies
lexicons?


Lexicon, Ontologies and Domains


On a very different dimension: Ontology of lexical/semantic/conceptual
categories? Standardised semantic categories, ontology labels?


Relation to multilinguality


...

KYOTO can
contribute

N. Calzolari

16

2nd KYOTO Workshop, Gifu, Japan, January 2011

Input to Multilingual Web


http://www.multilingualweb.eu/



The MultilingualWeb project is exploring
standards and best practices

that
support the
creation, localization and use of multilingual web
-
based
information


It aims to raise the visibility of existing best practices and standards and
identify gaps


The core vehicle for this is a series of four workshops, for networking across
communities that span the various aspects involved


Next workshop on best practices aimed at development of Content for the
Web, including creation of content ranging from personal authoring for blogs
and social networking sites to development of large corporate or
organizational enterprises:

“Content on the Multilingual Web”

4
-
5 April 2011

Pisa, Italy


KYOTO can
contribute

N. Calzolari

17

2nd KYOTO Workshop, Gifu, Japan, January 2011

A new paradigm of R&D in LRs & LT

Since few years

Open & distributed linguistic infrastructures for LRs & LT

A
dopting the paradigm of
accumulation of knowledge
,

so successful in more
mature disciplines, based on sharing LRs, tools
& results

A
bility to build on each other achievements,
allowing controlled & effective
cooperation of many groups on common tasks
(see HumanGenomeProject)

e. g. initiatives to achieve international consensus on annotation guidelines

Emerging concept of
collective intelligence

Emphasize
interoperability

among LRs & LT

Some steps for a “new generation” of LRs

N. Calzolari

18

2nd KYOTO Workshop, Gifu, Japan, January 2011


From

huge efforts building
static, large
-
scale, general
-
purpose LRs

To
dynamic
LRs rapidly built on
-
demand, tailored to specific user needs

From

closed, locally
developed and centralized
resources

To LRs residing over distributed places,
accessible on the web, choreographed by
agents acting over them

From

Language
Resources

To
Language Services


Need of an infra that makes this vision operational

Lexical WEB


As a critical step for
semantic mark
-
up

in the Semantic Web

N. Calzolari

19

2nd KYOTO Workshop, Gifu, Japan, January 2011

ComLex

SIMPLE

WordNets

WordNets

WordNets

FrameNet

Lex_x

Lex_y

with

intelligent

agents

NomLex

Standards for

Content
Interoperability

Enough??

Global WordNet GRID





Bio
Lexicon

SIMPLE
-
WEB




(Distributed) Language Services

N. Calzolari

20

2nd KYOTO Workshop, Gifu, Japan, January 2011

content
interoperability

standards

supra
-
national
cooperation


architectures

enabling
accessibility

Collaborative & collective/social development & validation
,
cross
-
resource integration & exchange of information

Create new
resources on the
basis of existing

Exchange &
integrate
information across
repositories

Compose new
services on
demand

Can KYOTO
contribute?

N. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January 2011

21

Which Communities?


Language Resources


Language Technologies


Standardisation


Content/Ontologies


System developers


Integrators


SSH




EC


National funding
agencies


Industry

Many
applications/domains


MT


CLIR





e
-
government


content industry


intelligence


e
-
culture


e
-
health


domotics…

core

EU

Forum

with

Focus on cooperation

Many LRs & LTs exist, but a global vision,
policy
&
strategy is needed

for

CLARIN

for SSH

FLaReNet

Network

META
-
NET

NoE

Need
to consider together



technical


organisational


strategic


economic, social


cultural


legal


political
issues wrt LRs & LTs

Many dimensions

Fostering Language Resources Network

FLaReNet at a glance

An international Forum

to facilitate interaction
, to


Overcome the fragmentation

in LR & LT & recreate a community

Anticipate
the needs of
new types of LR & LT & Language Infrastructures


Create a
shared policy

for the next years


Foster a
European strategy

for consolidating the sector

22

http://www.flarenet.eu

N. Calzolari

22

2nd KYOTO Workshop, Gifu, Japan, January 2011

98 Institutional
Members

From 33 countries

351
Individual
Subscribers

Essential
Community mobilisation


(also to
prepare the ground for a

RI
)

A
“roadmap”
: a
plan of actions
as
input to policy development

A
(

EU)
model for the LRs/LTs area of the next years


Ambitious!

N. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January 2011

23


Create

a

shared

repository

of

data

formats,

annotations
,

etc
.

as

a

major

help

to

achieve

standardisation


Common

repositories

for

tools

&

language

data

should

be

established

that

are

universally

and

easily

accessible

by

everyone


Coordinate input to ISO/W3C standardisation work

Results from Vienna & Barcelona Forum:


Shaping the Future of the Multilingual Digital Europe

Standards, Interoperability & Metadata
are topics to be
approached in cooperation

Access

to

LRs

is

critical

&

should

involve

all

the

community

Need

to

create

the

means

to

plug

together

different

LR

&

LT
,


In

a

web
-
based

resource

and

technology

“grid”


For a new world
-
wide language infrastructure

2
nd

Blueprint


Result of a permanent and cyclical consultation


Inside the community it represents


Outside it, through connections with neighbouring projects, associations,
initiatives, funding agencies


Organised along
three main “directions”:


Infrastructural Aspects


Research and Development


Political and Strategic Issues


Reflect three major
development factors
that can boost or
hinder the growth of the field of LRT


N. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January 2011

24

Provide feedback!


http://www.flarenet.eu/
sites
/default/
files
/D8.2b.pdf


Sources: many meetings

Operational
Interoperability

Asian
Collaboration
Workshop

FL
-
SILT

Workshop

Lexicon/
Ontology

Standards
NEERI

2
nd

FLaReNet
Forum


Less
-
resourced
Languages

Automatic
Acquisition

Legal
Issues

Standards

International
Cooperation

N. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January 2011

25

N. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January 2011

26

3
rd

FLaReNet Forum

The European Language Resources and Technologies Forum:



Important role in
defining recommendations


In Barcelona:
120 Participants from 22 Countries



Define final recommendations

Previous
Proceedings
& Reports

on the web


Blueprint

will be
discussed


Also
for adoption &
endorsement by
FLaReNet
Institutional
Members

N. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January 2011

27

Issue

Challenge

Recommended Actions





Metadata

Interoperability
of Metadata sets


Set up a global infrastructure of
common and uniform and/or
interoperable metadata sets

Metadata usable
both by humans
and by machines


C牥r瑥t
machine
-
understandable
metadata
with formal syntax and clear
semantics


Au瑯ma瑥t瑨攠e牯捥csf 整e摡瑡
捲敡瑩on


D敶敬op s瑲u捴c牥搠m整e摡瑡

Documentation

Reliable
documentation
of LRs
according to
common best
practices


Collect

all possible and existing LR
documentation


Devise and adopt a widely agreed
standard documentation template
for all types of resources

Infrastructural Aspects

Political and Strategic dimensions

N. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January 2011

28

Issue

Challenge

Recommended Actions

Funding Agencies
policies

Devise models to
allow different
types of players
easy access
to
resources


bnsu牥⁴ha琠
publicly funded
resources are
publicly available
either
free of charge or at a small distribution
cost


bn捯u牡g支敮fo牣攠us攠
of best
practices
or standards in LR
production projects


䵡k攠eus瑡inabili瑹tan搠
sha物ng⽤/s瑲tbu瑩on plans man摡瑯r礠yn
p牯j散瑳⁣ n捥牮ing⁌删o牯摵捴con

LR citation

Appropriate
citation of
Language
Resources like
traditional
publications


D敶敬op
a standard protocol for
citing
language resources

KYOTO can
be an example

LRE Map: Why??

The Map as an answer
to start to fill this gap, but also:


To encourage the needed
“change in culture”

N. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January
2011

29

Problem:


Lack of information
&
documentation about resources is, in the e
-
science paradigm, a very critical issue


Non documented resources don’t exist!!


A
collective enterprise
: Each
researcher must become aware of the
importance of his/her
personal engagement in documenting resources


A task as important as creating new resources and not an accessory to be
disregarded


As the necessary
service to the whole
community


Will become an essential instrument to
monitor the field

www.resourcebook.eu


N. Calzolari

30

2nd KYOTO Workshop, Gifu, Japan, January 2011

How many LRs & Types at LREC?

Corpora: 785

Lexicons:
289

Tagger/Parser: 181

Annotation tool: 134

Ontology: 73

Evaluation data:
40

Annotation
Guidelines: 35
...

Submissions: 1288

LR forms: 1994

30

How many LRs & Types at COLING?

Submissions: 880

LR forms: 735

Corpora
:
359
-

50%

Tagger/Parser: 81
-
11%

Lexicons: 71
-

10%

Evaluation data: 51
-

7%

Ontology,
Annotation tool,
Evaluation tool,
Tokenizer
,
NER < 20
-

2%

Languages:


But

obviously …

N. Calzolari

31

2nd KYOTO Workshop, Gifu, Japan,
January 2011

170

!!

image courtesy of Wordle (http://www.wordle.net)

Availability

N. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January
2011

32

Freely
available!

The wide majority of resources

are freely available

3%

15%

25%

LREC

COLING

The Project META
-
NET

N. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January 2011

33


META
-
NET
is a
Network of Excellence
(
coord
. Hans
Uszkoreit
) dedicated to
fostering the technological foundations of the European multilingual information society

Objectives:


Prepare the ground for a
large
-
scale concerted effort
by building a strategic alliance
of national and international research programmes, corporate users and commercial
technology providers and language communities


Strengthen the European research community through research networking and by
creating new schemes and structures for sharing resources and efforts


Build bridges by approaching open problems in collaboration with other research fields
such as machine learning, social computing, cognitive systems, knowledge technologies
and multimedia content

Final goal:

META


The Multilingual Europe Technology Alliance

language
communities

policy makers
and funding
bodies

user
industries

provider
industries

language technology

community

machine

learning

community

semantic

techno
-

logies

community

cognitive

systems

community

multimedia

content

techno
-

logies

The META Alliance

N. Calzolari

34

2nd KYOTO Workshop, Gifu, Japan, January 2011

Founding Members


Deutsches

Forschungszentrum

für

Künstliche

Intelligenz

GmbH, Germany


Barcelona Media


Centre
d'Innovació
, Spain


Consiglio

Nazionale

Ricerche



Instituto

di

Linguistica

Computazionale

“Antonio
Zampolli
”, Italy


Institute for Language and Speech
Processing, R.C. “Athena”, Greece


Charles University in Prague, Czech Republic


Centre National de la
Recherche

Scientifique



Laboratoire

d'Informatique

pour la
Mécanique

et les
Sci.s

de
l'Ingénieur
, France


Universiteit

Utrecht, The Netherlands


Aalto University, Finland


Fondazione

Bruno Kessler, Italy


Dublin City University, Ireland


Rheinisch

Westfälische

Technische

Hochschule

Aachen, Germany


Jožef

Stefan Institute, Slovenia


Evaluations and Language Resources
Distribution Agency, France

N. Calzolari

35

2nd KYOTO Workshop, Gifu, Japan, January 2011

Three Lines of Action


The META
-
NET objectives translate into three lines of action:

N. Calzolari

36

2nd KYOTO Workshop, Gifu, Japan, January 2011

The Process


2010





2011






2012

communication

within

META
-
NET (
META
-
VISION
)

communication

in
the




wider LT
community





and

among

other

stakeholders


communication

to

policy

makers



funding

bodies
,
public

N. Calzolari

37

2nd KYOTO Workshop, Gifu, Japan, January 2011


Data has become a key factor in LT R&D



A few indicators:


Increasing size & importance of LREC conference, corpora mailing list, etc.


Citation ranks of publications on language resources


Language research and language technology belong to the
Data
Intensive Sciences


Expensive data become valuable through sharing


However, the long demanded and well
-
contemplated instruments for
managing and sharing this data are
still missing

N. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January 2011

38

META
-
SHARE: Key Features


META
-
SHARE is an
open, integrated, secure, interoperable
exchange infrastructure
(resp. Stelios
Piperidis
) for
language data
& tools
for the
Human Language Technologies
domain


ever
-
evolving, scalable, including free and for
-
a
-
fee LRs/LTs and services


including legacy, contemporary and emerging datasets, tools and technologies


A
marketplace
where language data & tools are documented,
uploaded and stored in repositories, catalogued and announced,
downloaded, exchanged, aiming to support a
data economy
(includes
free and for
-
a
-
fee LRs/LTs and also services)


Standards
-
compliant
, overcoming format, terminological and
semantic differences


Based on
distributed networked repositories
accessible through
common interfaces



N. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January 2011

39

What we’re offering


A channel to
share

and
distribute
language data and
tools


Technical solutions for
building

your
own
repositories


Protocols and mechanisms for making the descriptions of

your
resources
(and the actual resources)
harvestable


Guidelines and recommendations on
standards

used in the LR
production and documentation
processes


Recommendations on data and tools
licensing
issues


Access
to large catalogues of documented, high
-
quality
resources, as well as the actual data and
tools

N. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January 2011

40

KYOTO can be
among the first

Features


Single Sign
-
On


Easy Administration


Metadata Harvesting


Persistent Identifiers (
PIDs
)


Intuitive Search


N. Calzolari

41


Open Source


Service
-
Oriented


Distributed


Replication/Backup


Reporting & Statistics

2nd KYOTO Workshop, Gifu, Japan, January 2011

v0 architecture

On the communication/mobilisation side


A
change of culture


Convincing arguments that
data assets and their value do not
necessarily grow if locked in the drawer


Incentives
and
models
that can convince data holders that there is life
after the announcement of data existence and/or sharing (share does not
necessarily mean for free, nor for unbridled use)


Interoperability
, common metadata, formats, etc.


In other words we need to create/reinforce
a data economy
based on
widely agreed principles and rules,

mutual understanding, sustainable and
adaptive models, simplified copyright rules and licensing models


The present time window seems appropriate

Challenges

43

N.Calzolari

Multilingual Web, Madrid,
2010

KYOTO can
be a “model”

For other projects
to follow

Collaborative
iResources

LR building
as collaborative “common shared task”

New methodology of work

Assemble a comprehensive


map of language data and
mechanisms
” for the planet’s languages
(


LRE Map
)

Interoperability

acquires even more value

Needs consensual
planning

of common
strategies
towards
shared objectives

Not just the sum

of many individual efforts

But an organised,
well
-
structured, collective enterprise

Similar to
more mature sciences
: Physicists/
Astronomers’s

experiments … of X,000 people working on the same big enterprise


N. Calzolari

44

2nd KYOTO Workshop, Gifu, Japan, January 2011

META
-
SHARE
is a
big step that

needs a real
Paradigm shif
t



N. Calzolari

N. Calzolari

45

2nd KYOTO Workshop, Gifu, Japan, January 2011

We wanted more & more data ...

Have we been too successful ?!?

We experience today a sort of
statistical “intoxication” !

It started as a new strategy, a revolution maybe? But it has turned to tactics.
Stuck with it? In a narrow loop of small advances, not linked to each other

Can we add

Main
Statement

We tend to forget about
& the need to
understand its properties & complexities

Where do we (try to) encode what we know about language properties?

In annotations

Preamble

Vision

Like the big Genome project
, ...
a large

Is there
any theoretical knowledge of or

Any serious methodology of studying and exploiting

the
among the various annotation layers?


BUT

N. Calzolari

N. Calzolari

46

2nd KYOTO Workshop, Gifu, Japan, January 2011

Strategy

A Multilingual Annotation Plan

As a Very Large International Initiative


(parallel)
texts

for

languages

With

possible annotation layers

Similar to
more mature sciences
, e.g.
p
hysics, … of
thousands of people
working together
on the same big experiment

Create
a sustainable infrastructure

for a
Where we
Collaborative Resources :

A new paradigm for a big language map

Means
a change of mentality:
going

beyond “individual” research

interests

From “my approach” to some “compromise” allowing to go for big amounts/
integration/building on each other/…

N. Calzolari

From no infrastructure ...

To many infrastructures/networks


We were complaining there was no infrastructure ...


Have we been
too successful??



Now
many infrastructural/networking initiatives




Very good opportunity


But only if we are able to act in a
coordinated & coherent
way


Otherwise we spoil & confuse the field

47

47

2nd KYOTO Workshop, Gifu, Japan, January
2011

N. Calzolari