Improving communication in e-democracy using NLP and semantic tools

snufflevoicelessInternet and Web Development

Oct 22, 2013 (4 years and 18 days ago)

95 views

ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07

Improving communication in e
-
democracy
using NLP and semantic tools

Michele Carenini

ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07

Summary



Where does Natural Language Belong?



Natural Language (Processing) in very few words



Why Putting Semantics into the Web and…



… how to do it



EDEN: the Gap between Us and Them



Good and Bad Lessons



What now?



ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07

Where Does Natural Language
Belong?

vs.

Artificial Language

Natural Language

ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07

NL is
the theoretical set of
well
-
formed

phrases/sentences of human languages


NL In NLP



NLP deals with the possibility of making
computers process NL;




By definition, computers can process only
computable objects;




There is at least two main features of NL that are
(or theoretically can be) computable: morphology
and syntax;




Well
-
formedness is a pre
-
requisite on which
(morphology and) syntax may be computed.


Evolution in NLP

Complexity

morphology

syntax

semantics

pragmatics

ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07

Well Formedness


((α → β) → (¬β → ¬ α))


vs.

*
((α → β) → (ββ))α))



John eats the cake


vs.

*
John are eaten one cakes


ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07

Putting Some Semantics Into The Web

The Web: a system of interlinked, hypertext
documents accessed via the Internet. With a
Web browser, a user views Web pages that
may contain text, images, and other
multimedia and navigates between them
using hyperlinks.

ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07

Putting Some Semantics Into The Web

WHY:

ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07

AND/OR

Intelligent search

Putting Some Semantics Into The Web

WHY:

42,600,000?!?

CAR

TRUCK

MOVING

DRIVING

HOW:

ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07

Putting Some Semantics Into The Web

Web 2.0



The transition of web sites from
isolated information silos to sources of
content and functionality



A social phenomenon embracing an
approach to generating and distributing
Web content itself, characterized by
open communication, decentralization of
authority, freedom to share and re
-
use



Enhanced organization and
categorization of content, emphasizing
deep linking


ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07

Putting Some Semantics Into The Web

Semantic Web

Some elements of the semantic web are
expressed in formal specifications,
including:



Resource Description Framework (RDF)



Data interchange formats (e.g. RDF/XML,
N3, Turtle, N
-
Triples)



Notations such as RDF Schema (RDFS)



The Web Ontology Language (OWL) all of
which are intended to formally describe
concepts, terms, and relationships within a
given knowledge domain.


ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07

Putting Some Semantics Into The Web

Web 3.0



Ubiquitous Connectivity, broadband adoption, mobile
Internet access and mobile devices



Network computing, software
-
as
-
a
-
service business
models, Web services interoperability, distributed
computing, grid computing and cloud computing



Open technologies, Open APIs and protocols, open
data formats, open
-
source software platforms and open
data (e.g. Creative Commons, Open Data License)



Open identity, OpenID, open reputation, roaming
portable identity and personal data



The intelligent web, Semantic web technologies such
as RDF, OWL, SWRL, SPARQL, Semantic application
platforms, and statement
-
based datastores



Distributed databases, the "World Wide Database"



Intelligent applications, natural language processing,
machine learning, machine reasoning, autonomous
agents

ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07

Putting Some Semantics Into The Web

Web 3.0



Ubiquitous Connectivity, broadband adoption, mobile
Internet access and mobile devices



Network computing, software
-
as
-
a
-
service business
models, Web services interoperability, distributed
computing, grid computing and cloud computing



Open technologies, Open APIs and protocols, open
data formats, open
-
source software platforms and open
data (e.g. Creative Commons, Open Data License)



Open identity, OpenID, open reputation, roaming
portable identity and personal data



The intelligent web, Semantic web technologies such
as RDF, OWL, SWRL, SPARQL, Semantic application
platforms, and statement
-
based datastores



Distributed databases, the "World Wide Database"



Intelligent applications, natural language processing,
machine learning, machine reasoning, autonomous
agents

ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07

EDEN: Where It All Began (at least some of it)

ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07

EDEN: The Gap Between Us And Them

Us: the technicians

Them: the PA’s

End
-
Users: the Citizens

ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07

General Objective Of The NLP Tools

(in the eDemoc
ra
cy framework)

Interacting
to

(CHI) or
through

(CMI) an artificial system...

... in order to get information that makes the participation to
decision
-
making process more effective.

ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07

Main problem: the
mutual understanding

of different fields of interest
and expertise.

One Overall Problem

Users: difficult to
deal with the very
notion of
Natural
Language
.
Lost on
Bad
-
Language World

Technicians:
difficult to deal
with
a less than
pefect NL
definition
.
Lost on
NLP Planet

ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07

Good Lessons…

ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07

… And Bad Ones

ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07

1. Linguistic Resource Re
-
use: the main purpose of the grammar(s)
developed within EDEN is
information extraction
, not (full) linguistic
analysis. Then major effort was devoted to cover most “information
-
bearing” constituents, as (complex) Noun Phrases and main Verb
-
Noun
and Verb
-
Adjective relations.
-
>
Easy replication to different (Western)
languages:


the
four linguistic analysers

made available to the project (Dutch, English,
German and Italian)
have been deployed with the same development tool

(Yap4NL);


consequently, they all share the
same approach to linguistic analysis

(rule
based, full
-
path parsing with post
-
parsing procedure, which simulates a shallow
parser);


finally,
no major change, or significant integration workouts were necessary,
for the localisations of modules
, from the point of view of software design.




J
Good Lessons (1)

ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07

2. Fast Prototyping: development of the
Dutch Grammar

was carried out
completely from scratch in less than one person/year. Fast prototyping
was mainly allowed by:


the availability of an
advanced dedicated tool for grammar development
;


the
simplicity in the approach

to linguistic processing.


Interesting outcome: ouput format in terms of flat (
no structure, no hierarchy,
no explicit internal link
) lists of “triples”.




J
Good Lessons (2)

ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07

3. Grammar Re
-
usability: in the grammar format used in EDEN, each
grammar rule has
a syntagmatic part
, which corresponds to the reduction
rule, and a
set of “actions”

which independently build the feature structure
of each syntactic phrasal constituent. This took to two interesting aspects:


the same linguistic analyser has been embedded in
several different modules
;
and


an interesting experiment of
grammar re
-
use (from Dutch to German)

has been
carried out, with encouraging results.












J
Good Lessons (3)

ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07

1. NLP Exploitation in a Specific Domain: First problem concerned the very
notion of
Natural Language
:


Traditional NLP definition: “Natural Language is the theoretical set of all
well
-
formed sentences

used by humans to communicate”. For instance,
John eats the
cake

is a sentence belonging to NL, while
*John are eaten one cakes

is not.


-
> Reason: there must be a “minimum threshold” that must be respected in order to have
an artificial system properly behaving (i.e., assigning a structure).


First EDEN definition by users: “Natural Language is
whatever string expressed
by citizens
, possibly including mis
-
spellings, non
-
existing words, bad syntactic
structures”. Therefore,
any juxtaposition of strings, once it has been typed in by a
citizen, belongs to NL
.


-
> Reason: in communication (and especially in e
-
mail communication) a lot of mistakes
occur; the system must be able to deal also with them.












L
Bad Lessons (1)

ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07

2. Final User’s Expectations: EDEN modules are (of course) aimed at
manipulating symbols in order to make some information accessible.
Instead, citizens sometimes expected the system to “understand” what
they typed in.


They expected the system to be able to understand
trans
-
phrasal

phenomena (as
personal pronouns solution


“I need a garage for my car; where can I find
one
?”
);


they even expect the system to manage possible
pragmatic

phenomena (like plan
inference, over
-
answering, etc.


“What time is the train
leaving to Rome
?”
).












L
Bad Lessons (2)

ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07


Dealing with DNLP (“Dirty NLP”):



What We Learned (1)

must be well
accepted by

the system

ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07


Hiding technology:

What We Learned (2)

must become

“Bringing technology
to the people”

“Bringing technology
to the people

without
letting them know


ICT FOR SAFE DIGITAL CITIES


Inclusive e
-
services


Bologna 29.06.07

What Now?

The Future



Standard data interchange formats



Interoperability



Grid Computing



Distributed systems



Standard Notation Schemes



Standard Ontologies Accessible from
Different Perspective



Adaptive Filtering



Advanced Multimodal Interfaces



Remotely Accessible Applications



Privacy and Security Standards and
Tools



Real AI (Knowledge Representation,
Decision Support Systems, Machine
Learning, Autonomous Agents, NLP)