PrepNet: a Framework for Describing Prepositions ... - IRIT

leathermumpsimusSoftware and s/w Development

Dec 13, 2013 (3 years and 7 months ago)

73 views

PrepNet:

a Framework for Describing
Prepositions:

Preliminary Investigation results


Patrick Saint
-
Dizier

IRIT
-
CNRS, France

Long
-
term objectives


Construct a repository of preposition syntactic and
semantic behaviors,


Develop a multi
-
level approach, from prototypical uses to
unexpected ones, that accounts for diversity of preposition
uses and for their polysemic behavior,


Develop a relatively shallow semantic characterization
based on frames,


Investigate the verb
-
preposition
-
NP relations: restrictions
and compositionality


Develop a multi
-
lingual approach.



Applications: MT, Knowledge extraction, QA, etc.

This paper:


basic elements of a preliminary approach


Introduce a general characterization of preposition senses
viewed as
abstract notions
,


Characterize these abstract notions by means of
frames

(viewed as linguistic or conceptual macros),


Populate preposition frames via corpus and then validate,


Develop a multi
-
level characterization of preposition uses,
to organize the diversity of their uses in language,


Raise a few questions about multilinguality
(prepositions can
be realized by other categories or by morphology in some languages)




Investigate evaluation methods, in abstracto, and via
applications.

Related work


Very little in CL circles compared to verbs and nouns, in
spite of their necessity in a number of applications (MT,
IE, QA, …),


Almost nothing in EWN, FrameNet or VerbNet,


Some valuable work in AI: e.g. temporal, spatial
reasoning,


A few isolated works in linguistics on a given preposition,


Quite a lot of work in psycho
-
linguistics.


Other resources: B. Dorr’s large description for English, with
MT in view (about 500 entries).

Why is that so ?


High polysemy (but may be not more than adjectives?, and
smaller number: 95 preps. in French + compounds, 32 in
Spanish: not always agreement on what a preposition
is…..)


Linguistic realizations very difficult to predict, large
number of idiosyncratic uses and cross
-
linguistic
differences,


Syntactic difficulties due to the chain V
-
Prep
-
N, e.g.: PP
-
attachment problems, VPC,


Deep level in the semantic
-
cognitive structure:
prepositions often used in metalanguages as primitives



Study here only compositional uses of prepositions

Global architecture of the proposal

Prep. Senses: 3 level set of abstract notions

Shallow semantic representation with strata

Uses in language 1

Uses in language 2

etc.

General architecture (1): categorizing
preposition senses



Preposition categorization on 3 levels:


Family

(roughly thematic roles): localization, manner,
quantity, etc.


Facets
: localization: source, position, destination, etc.


Modalities
.



Facets viewed as abstract notions on which PrepNet is
based


12 families defined



Families/ facets

Quantity
: numerical/ frequency / proportion

Accompaniment
: adjunction/ simultaneity/ inclusion/ exclusion

Manner
: means/ manners and attitudes/ imitation or analogy

Localisation:

source/ destination/ via/ fixed position

Choice and exchange:
exchange / choice or alternative / substitution

Causality
: cause/ goal or consequence/ intention

Opposition

Ordering:

priority/ subordination/ hierarchy/ ranking/ degree of
importance

Minor elements:
about, in spite of, comparison

(see examples in paper)





Conceptual/ ontological status of these dictinctions ??



Families



‘superframes’ : general principles and
restrictions



Facets
:


frames,
strata
: subframes : with some general
forms of inheritance and property consistency



Whenever appropriate:
modalities



subframes

Frames are viewed as
linguistic macros
, to be interpreted.

They are shallow or coarsed
-
grained representations so far.

Language realizations are a priori associated with the lower
level frame nodes.

(2): a conceptual, prelexical structure

Frame of

abstract notion

SF
1

SF
2

SF
3

-

name + gloss,

-

shallow restrictions

-

simplified LCS representation

strata of

abstract notion:

subframes


Structure of a frame


Structure:


Number, name, gloss,


Frame with shallow constraints: X <Action> Y [Number] Z


Conceptual representation in simplified LCS (kind of LST)


In the future: inferential patterns (within a frame or among frames)


195 senses/abstract notions described using 65 primitives


Shallow constraints:


(1) generic semantic types


(2) generic verb class types from WordNet


(3) generic semantic fields from the LCS:
temp, poss, loc, psy,
epist, perc, amount, comm, prop, abs,

etc.

Example 1: ‘via’

[1] :

VIA
-

generic
.

'An entity X moving via a location Y'

X

<ACTION>

[1]

Y

X: concrete entity, ACTION: movement verb, Y: location

representation:

X : via(loc, Y)

French synset
: {par, via}
example
: Jean rentre par la porte


Stratification 1:

[1.1]

:
VIA
-

narrow passage
.

'An entity X moving via / an action that uses a narrow passage in an object Y'

X

<ACTION>

[1.1]

Y

X: concrete entity, ACTION: perception verb, Y: location with a narrow passage

representation:

X : through(loc or temp, Y)

French synset
: {a travers, au travers de, dans}

example
: Jean regarde a travers la grille / dans les jumelles.


.

Example 1, cont’:


Stratification 2:


[1.2.1] VIA UNDER


from generic




'An entity X moving via under a location Y'



X


<ACTION>

[1.2.1]

Y



X: concrete entity, ACTION: movement verb,





Y: location with a form of passage under it




representation
: X : via(loc, under(loc,Y))


French synset
: {par dessous}


example
: Jean passe par dessous le pont.



[1.2.2] VIA ABOVE


from generic


etc.

Example 2: instruments

Stratification requires the taking into account of 2 relations,

characterized by means of primitives
(Mari and Saint
-
Dizier 03):


Actor/instrument
:
undergo

(no control),
select

(controls
another prop.),
control
,


Instrument/ V+NP object
:
be

(passive, but participates),
react

(other prop than controlled by the agent),
act

(full participation)

Contrast:
cut the bread with a knife / eat soup with a spoon




John burned himself with boiling oil.


A generic entry for instruments, and, p
otentially: 9 strata
(combinations), depends on language.




4 strata for French

(2) cont’

[5] : MANNER
-

MEANS
-

Instrument


'Someone X doing an action Y using instrument Z.'


X

<ACTION>

Y

[5]

Z


X: human, ACTION: verb of change, Y: object Z: instrument


representation: X: by
-
means
-
of(_, Z)



Followed by a priori 9 Strata.

Example: Application to French:

1. Be(X,Z)
Λ

Undergo(Z, Action+Y) : synset: {
grâce à
} , restrictions…

2. Be(X,Z)
Λ

Select (Z, Action+Y) : synset: {
par
} , restrictions…

3. Select(X,Z)
Λ

React (Z, Action+Y) : synset: {
avec
} , restrictions…

4. Act(X,Z)
Λ

Control (Z, Action+Y) : synset: {
avec, au moyen de
}, …..

(3) The language realization level





SF
i
(= lower frame level)

Multi
-
level partitioning of realizations from usage norms

Direct uses

Indirect uses

etc…

etc…

restr1 restr2 restr3

Derived types, …

synset1

synset3

synsets ??

….


+ frequency
measures

Populating preposition frames from
corpora


Conceptual frames are associated with shallow constraints




Move on to the language level, elements of a method:


For a given language: associate each frame strata with
corpus and dictionary observations


Manual analysis: identify prototypical uses, promote usage
norms


multi
-
level partitioning of realizations


Contrast, if possible, direct versus indirect (mainly
metaphorical) realization levels


Elaborate conceptual/ontological status of categorizations
and related constraints (mainly semantic types)

A few notes


Multi
-
level architecture: helps to account for the large
variety of (compositional) behaviors, investigate in more
depth partitioning strategies,


incremental depth to get
finer
-
grained analysis worth pursuing??


For each synset: develop frequency measures, identify
contexts of use (from syntactic to type of text): frequency
rates are very diverse
(some uses are only found in dictionaries!)


Populate but then valide on new corpora: develop several
forms of corpus annotations (the frame; the relation with
the head, with the NP, etc.)


Looking at other languages


Hypothesis: given an abstract notion (interlingua),
translations are constructed on the basis of the restrictions
that hold on the corresponding synsets,



BUT:


Large realization variations are in general observed, even
for closely related languages: up to what point is this just
surface language contrasts? Or is it also conceptual ? :


Regarder
dans

le microscope / look
through

the
microscope (
durch
;
a travès de
)


Some languages have do not use so much pre
-
/post
-
positions, but other categories, incorporation in heads, or
just case marks .

Preliminary conclusions


Preliminary investigation to identify difficulties and
organize the research,


Global architecture looks an interesting approach


Abstract notion definitions seem to be quite stable, status
of strata needs further investigations,


Multi
-
level approach to language realizations seems a good
direction, but needs a much larger testing on a number of
languages and a more clear method to organize sets of
realizations


Implement an open system on the Web.




Some obvious research directions


ontological/conceptual status of categorizations and
restrictions,


Investigate integration with other frameworks: VerbNet,
FrameNet,


Investigate preposition polysemy and derived uses in more
depth, and ways to characterize it


Relations Head
-
preposition
-
NP, and compositionality
(Head is often a verb, but can be any other kind of
predicate): some PPs have wider scope over the
proposition.


Inferential patterns associated with prepositions (e.g. for
approximation notions, spatial notions, etc.)