The challenge of non-discreteness:

imminentpoppedIA et Robotique

23 févr. 2014 (il y a 3 années et 6 mois)

87 vue(s)

1

The challenge


of non
-
discreteness:




Focal structure in language


Stockholm,
August 31,

2012


Andrej A. Kibrik


(Institute of Linguistics RAN

and Lomonosov Moscow State University)

aakibrik@gmail.com


2

The problem


We tend to think about language as a
system of discrete, segmental units
(phonemes, morphemes, words,
sentences...)


But this view does not survive an
encounter with reality

3

Simple example:

morpheme fusion


Russian adjective
детский

‘children’s, childish’


det
-
sk
-
ij


child
-
Attr
-
M.Nom


Root
-
Suffix
-
Ending





suffix



[
d’e
c
k’
-
ij ]


root


Many human languages have something like
that in morphological structure



4

Similar phenomena abound at
all lingustic levels


Phonemes


Syllables


Words


Clauses


Sentences


5

Phonemes


Coarticulation:



c
at

k
eep


c
ool


Engwall (2000): articulographic study of how
pronunciation of Swedish fricatives is affected by
surrounding vowels


Sequences such as
asa,
ɪsɪ, ɔsɔ, ʊsʊ, aɕa, ɔʂɔ, ʊfʊ
,
etc.




For example: context of
labial vowels strongly
increases lip protrusion

6

Phonemes (continued)


Also, the tongue is
more anterior in
the context of the
front vowel /
ɪ
/
compared to
back

vowels

(
Engwall
2000: 10)


That is, boundaries between “segments” are not
really segmental


Trying to posit boundaries in the signal
inevitably means a kind of digitalization

7

Syllables


Language speakers often naturally “feel” the syllabic structure


But segmentation into syllables is usually less than clear
-
cut


For example, speakers of Pulaar confidently segment words into
syllables, e.g.
gor
|
ko

‘man’


But cf. the behavior of geminated consonants


On the one hand, when asked to segment a word into syllables,
speakers of Pulaar usually posit a boundary between the two copies
of a geminated consonant:
hok
|
kam

‘give me’


On the other hand, a Pulaar secret language is reported


the encrypting sequence
lfV
is inserted after the first syllable of a word
(Gaden 1914, Labouret 1952: 108):


hokkam

ndiyam ‘give.me water’


ho
lfo
kk
am

ndi
lfi
yam


Geminates such as
kk

are thus inconsistent:


in some way they belong to two different syllables


in some other way they form the onset of a syllable
(
Koval 2000: 114,
185)

8

Words


Possessive constructions N + N


English is often said to have two kinds of genitives:


synthetic s
-
genitive:
the queen’s retinue


analytic of
-
genitive:
the retinue
of
the queen


On the one hand,
of


is a preposition and thus clearly belongs to the possessor
rather than to the possessed


the retinue
[
of
the queen
],
lots
[
of

stuff
]


On the other hand, there are indications of reanalysis


Jurafsky et al. 1998:
of


is so often reduced that one must posit the allomorph [
ɔ
]


Native users of English feel that and render that in spelling, also altering the affiliation
of the clitic


lots
of
> lots
a
, couple
of

> coupl
a



“Kind
a

outt
a

luck” (song by Lana del Rey)


This kind of graphic practices suggest that language users attach the clitic
of

to
the possessed rather than to the possessor


In terms of Nichols 1986, in these kinds of examples English hesitates on
behaving as dependent
-
marking or head
-
marking


Of

displays doubleface behavior in two ways


as any clitic, it is a semi
-
word, that is something between a word and an affix


it oscillates between two possible hosts


9

Clauses


Widely held view of “syntax from the discourse
perspective” (see Chafe 1994):


Local discourse structure consists of quanta, or
chunks, or
elementary discourse units (EDUs)

(Kibrik
and Podlesskaya eds. 2009)


EDUs can be defined by a set of prosodic criteria


Thus identified EDUs typically coincide with
clauses


The level of such coincidence mostly varies within the
range between 1/2 and 3/4

10

Clauses (continued)


Language

Percentage of
clausal EDUs

English (Chafe 1994)


60%

Mandarin (Iwasaki and Tao 1993)


39.8%

Sasak (Wouk 2008)


51.7%

Japanese (Matsumoto 2000)


68%

Russian (Kibrik and Podlesskaya
eds. 2009)


67.7%

Upper Kuskokwim (Kibrik 2012)


70.8%

11

Clauses (continued)


However, there is a significant residue


Non
-
clausal EDUs


Subclausal EDUs


Increments


(translation from a Russian spoken corpus “Night Dream
Stories”


Kibrik and Podlesskaya eds. 2009)


And suddenly I saw a box.


With a ribbon on top.


Increments appear after a clear prosodic boundary


At the same time, they semantically and grammatically
fit into the preceding base clause


Such increments simultaneously belong and do not
belong to the preceding clause


They are outliers in clause structure

12

Paradigmatics


So far we have only discussed difficulties associated with
the
syntagmatic

indentification of units


The same problem applies to
paradigmatic

boundaries


That is, boundaries between classes, types, or categories in an
inventory


Marginal phonemes


“One might consider the voiceless velar fricative /x/ occurring in
words such as
Bach
(the German composer) or
loch

(a Scottish lake)
as a marginal phoneme for some speakers of English” (Brinton and
Brinton 2010: 53)


Russian [w] in loan words


Russian has phonemes /v/ and /u/


English William > Russian

Вильям
or

Уильям





Vil’jam


Uil’jam





[v]


[u], recently [w]


English
wow

> Russian: usually spelled
вау
vau
, pronounced [wau]


13

Semantics


Semantics provides particularly abundant evidence of non
-
discrete boundaries


Plethora of examples have been discussed in cognitive
semantics


Textbook example from Labov’s 1973 “Boundaries of words
and their meanings”




cup






bowl

14

Diachronic change


Diachrony provides innumerable examples of non
-
discrete boundaries between linguistic elements or
stages


Hock and

Joseph
1996
:
237
-
238


Old English
w
ēod

‘plant’ and

̅
d(e)
‘garment’


Both developed into modern English
weed


The meaning ‘garment’ only survives in a couple of expressions,
such as
widow’s weed

‘a widow’s mourning clothes’


Modern speakers tend to connect this usage with the winning
weed


The erstwhile meaning of

̅
d(e)

is echoed in the modern
language as a faint trace

15

Language wholeness


Languages are identifiable, but
every language has internal
variation


Consider a very small language,
Upper Kuskokwim Athabaskan


Ethnic group of about 200
individuals in central interior Alaska


About 20 remaining speakers


The members of the group have a
clear feeling of identity, as well as
separateness from other
neighboring Athabaskan languages


Still, striking dialectal variation


In particular, the rendering of
Proto
-
Athabaskan coronal
consonant series

© Michael Krauss, 2011

16

Language wholeness
(continued)

Interdental

Dental

Retroflex

As in:

Dialect:

‘my tongue’

‘snow’

‘raven’

Conservative:

no merger

si
t
s
ula’

ts
et
ł
'

do
tr
on'

Tanana

Standard merger:


loss of interdentals

si
ts
ula’

ts
etł'

do
tr
on'

Tsetsaut

Downriver merger:

loss of retroflex

si
t
s
ula’

ts
et
ł
'

do
ts
on'

Koyukon

Merger of all three

si
ts
ula’

ts
et
ł
'

do
ts
on'

Ahtna

17

Language wholeness
(continued)


Note that the rendering of coronal series is traditionally
used as the basis for classifying the family into branches


This situation can be explained by geographical and
demographic factors


The Upper Kuskokwim traditional territory probably occupied
over 50 K square kilometers


Traditionally, contact between famlies/bands was seasonal or
sporadic


Still, what identifies the language’s wholeness and
boundaries in terms of internal characteristics?



18

Proto
-
languages


Linguists often speak about proto
-
languages (Proto
-
Germanic, Proto
-
IE, etc.), as if they were fixed, 100%
homogeneous communities without any internal variation


Dahl (2001) discussed the status of Old Nordic


He questions the notion of Common Nordic and the
assumption that the Scandinavians “changed their language
all at the same time and in the same fashion, as if
conforming to a EU regulation on the length of cucumbers”
(p. 227).


Contrary to the traditional tree
-
like picture of a proto
-
language splitting into daughter languages, Dahl suggests
that the spread of prestige dialects may have led to a
decrease in diversity and to unification


19

Language contact


Trudgill 2011: 56
-
58


Contact with Low German affected Scandinavian
languages significantly


This influence can generally be described as
simplification


That was possible because in the 1400s cities such as
Bergen and Stockholm had about 1/3 or more of
German population


When non
-
native population reaches close to 50%,
natives accommodate


Boundaries between languages are thus penetrable


20

Other cognitive domains


Studies by the Russian psychologist Yuri
Alexandrov


Alexandrov and Sergienko 2003: psychophysiological
experiments demonstrate the non
-
disjunctive
character of mind and behavior


“Continuity is the overarching principle in the organization of
living things at various levels” (p. 105)


Alexandrov and Alexandrova 2010: complementary,
non
-
disjunctive character of cultures


Niels Bohr, discussing the relationships between cultures,
emphasized that, “unlike physics <...> there is no mutual
exclusion of properties belonging to different cultures”.

21

Intermediate conclusion


Language (as well as cognition in general) simultaneously


longs for discrete, segmented structure


tries to avoid it


The omnipresence of non
-
discreteness effects has not yet
led to proper recognition in the mainstream linguistic
thinking


Linguists are often bashful about non
-
discreteness


But non
-
discreteness is not just a nuisance


Non
-
discrete effects permeate every single aspect of
language


This problem is in the core of theoretical debates about
language

22

Possible reactions


“Digital” linguistics:



More inclusive (“analog”) linguistics:

often a mere statement of continuous boundaries
and countless intermediate/borderline cases




ignore non
-
discrete

phenomena or dismiss them

as minor



Ferdinand de Saussure:



language only consists



of identities and differences


the discreteness
delusion

a bit too
simplistic

appeal

of scientific rigor
but reductionism

23

Cognitive science



Wittgenstein: family resemblance


Rosch: prototype theory


Lakoff: radial categories

A

B

C

D


A is the prototypical
phoneme/word/clause/meaning...


B, C, and D are less prototypical
representatives


We still need a theory for:


boundaries between related categories


boundaries in the syntagmatic structure

Picture from Janda
and Nesset 2012

24

My main suggestion


In the case of language we see the structure
that combines the properties of discrete and
non
-
discrete:
focal

structure


Focal phenomena are simultaneously distinct
and related


Focal structure is a
special
kind of structure
found in linguistic phenomena, alternative to the
discrete structure


It is the hallmark of linguistic and, possibly,
cognitive phenomena, in constrast to simpler
kinds of matter


25

Various kinds of structures



focal point 1

focal point 2






discrete structure



continuous structure

focal structure

1

2

1

2

or anchor
point

outlier

hybrid

26

A possible analogy:

neuronal structure with synapses

27

Examples



focal point 1

focal point 2

det




[c]



sk

v





w



u


w
ēod

(widow’s) weed


̅
d(e)


Old Norse


Norwegian

Low German

Syntagm.

Paradigm.

Diachr.

Lg.contact

etc., etc.

28

Caveat


The claim about non
-
discrete boundaries should not be
overstated


Phonemes, words, clauses, and languages do exist


They are just not as discrete and segmental as we
apparently want them to be


We should not replace the discrete structure with the
idea of a mere continuum, basically non
-
structure


Cf. Goddard 2010: 233 defending the discrete character
of meaning by dismissing the idea of a continuum or
merging


Something like focal structure is in order as the major
model of linguistic and cognitive “matter”

29

Peripheral status of non
-
discrete
phenomena in linguistics


Are linguists unaware about the non
-
discreteness effects?


No, they are aware of them


“distinct but related”


But they tend to ignore them


Why?


I am not sure


But I suspect the answer is related to the
well known Kant’s problem

30

Kant’s puzzle


The Critique of Pure Reason:
The role of observer, or
cognizer, crucially affects the knowledge of the world


“The schematicism by which our understanding deals with the
phenomenal world ... is a skill so deeply hidden in the human
soul that we shall hardly guess the secret trick that Nature
here employs.”


It is possible that the human analytical mind is digital, and it
wants its object of observation to be digital as well


In addition, standards of scientific thought have developed on
the basis of physical, rather than cognitive, reality


Physical reality is much more prone to the discrete approach


Compared to physical world, in the case of language and
other cognitive processes Kant’s problem is much more acute


because mind here functions both as an observer and an
object of observation, so making the distinction between
the two is difficult


31

A paradoxical state of affairs


Language is full of non
-
discrete phenomena


But our “digital” mind is biased towards discreteness


Perhaps, partly because of the scientific tradition based on
segmentation and categorization (Aristotelian, “rational”, “left
-
hemispheric”, etc.)


It is like eyeglasses

keeping only a part

of the reality

and filtering out the rest



Addressing the “analog” reality in its entirety is often
perceived as pseudo
-
science, or quasi
-
science at best


Language is unknowable, a
Ding an sich?

32

What to do?


We need to develop a more embracing
linguistics and cognitive science that address
non
-
discrete phenomena:


not as exceptions or periphery of language and
cognition


but rather as their core



Can we outwit our mind?


Two suggestions towards this goal

1.
Object of investigation: concentrate on obviously
non
-
discrete communication channels, not so
burdened with the tradition of discrete analysis

2.
Methodology: new type of models


33

SUGGESTION 1: Look at communication
channels other than verbal


Explore gesticulation accompanying speech


Michael Tomasello (2009): in order to “understand how humans
communicate with one another using a language <…> we must
first understand how humans communicate with one another
using natural gestures”


I discuss a case study in “Reference of discourse” (2011)


Explore prosody


Sandro Kodzasov (2011): “there is a multitude of prosodic
techniques <...> defining the basic gestalts of our perception of
the world”


These communication channels are obviously less
discrete than the verbal code


So it may be a good idea to develop new theoretical
approaches on the basis of gesticulation and prosody,
then apply them to traditional, “segmental” language

34

Sentences


In written language, sentences are separated from each
other by dedicated punctuation marks


Is the notion of sentence applicable to spoken language?


cf. the “written language bias” (Linell 2005)


written language, inherently digital, hypnotizes people and makes
them think that language is generally discrete


“Is sentence viable?” (Kibrik 2008)


In brief, spoken Russian displays two major prosodic
patterns:


“comma intonation”:
rising

on the main accent of EDU


“period intonation”:
final falling

on the main accent of EDU

But also “falling comma intonation”


non
-
final falling
:


similar to comma intonation in terms of discourse semantics


formally similar to period intonation


/ ,

\

.

\

,

35

What to do?


It appears that non
-
final falling is not as low as final
falling


But the difference cannot be identified in absolute
terms


Great variation (gender, individual)


What is final falling in one person can be non
-
final in
another


Employ the speaker’s “prosodic portrait”


Final falling , targets at the bottom

of
the given speaker’s

F0 range


Non
-
final falling targets at a level several dozen Hz
(several semitones) higher than the final falling in the
given speaker


36

F0 graph for an example


\
o
zero,
\
m
a
len’koe
\
nebol’


\
brevn
o

kakoe
\
most
a
.


takoe,
š
o
e.




















-
to,

12

10

12

5

8

There was a lake,




/

either a river,




/

or a lake,





/

but I guess a lake,




\

because somehow it was small,


\

not a big one.




\

And across it there was a log,



\

like a bridge.




\


37

Representation of EDU continuity
types (or “phase” types) in corpus

33%
23%
44%
0%
10%
20%
30%
40%
50%
Final
falling
Non-final
falling
(Non-final)
rising
38

Sentences (continued)


There are clearly contrasted, focal patterns:


final falling (end)


rising (non
-
end)


Speakers and listeners usually “know” when a sentence is
completed and when it is not


Spoken sentences are the prototype of written sentences


In addition, the hybrid type must be recognized: non
-
final
falling


It can be identified on the basis of speaker’s prosodic portraits


This helps to deal with tremendous phonetic variation


With this analysis, the notion of spoken sentence remains
viable


39

SUGGESTION 2:

Entertain another type of models


Methodological point


1960s: a fashion of “mathematical methods” in
linguistics


That did not bring much fruit, primarily
because of the non
-
discreteness effects


Time for another attempt of bringing in more
useful kinds of mathematics



40

Ongoing project: Modeling
referential choice in discourse


When we mention a person/object, we choose from a set of options


proper name:
Kant



description:
the philosopher


reduced form:
he


Corpus of Wall Street Journal texts


words


45016, EDUs


5497, anaphors


3994


Annotation for multiple variables, candidate factors of ref. choice


distances to antecedent


antecedent’s syntactic role


protagonisthood


animacy


..............


Machine learning algorithms


logical


logistic regression


compositions


Two
-
way task
:
Full NP vs. pronoun


Three
-
way task
:

proper name vs. description vs. pronoun


41

Results of machine learning
modeling

42

Non
-
categorical referential
choice


100% accuracy cannot be reached


The choice is not always deterministic:


often only one option is appropriate


sometimes both
Kant

and
he

are appropriate


Experiment (Mariya Khudyakova)


Nine texts in which the algorithms deviated in their
prediction compared to the original referential choice:
pronoun instead of a proper name


Each text was presented to 60 experiment participants, in
one of the two variations: original (proper name) and
altered (pronoun)


Questions testing the understanding of the referent in
question


43

Non
-
categorical referential
choice (continued)


In seven texts out of nine, accuracy of answers to
pronouns was the same as in answers to proper names


In these instances the algorithm correctly predicted a pronoun,
even though deviating from the original referential choice


In two instances participants showed a significant drop
in their accuracy


In these instances the algorithms erred in their prediction


Logistic regression provides the degree of certainty in
prediction


that can be, with due caution, interpreted as probability


In one more instance the algorithm showed too high
certainty of prediction (0.89) which must not be the
case given that the original choice was different


We are working on the improvement of the method (Kibrik et al.
ms. 2012)



44

New type of models
(continued)


Non
-
categorical referential choice: a hybrid
between the clear, focal instances


Probabilistic modeling and machine learning
techniques can be used to simulate human
behavior in non
-
categorical situations


We need to employ mathematical methods
appropriate for the “cognitive matter”


45

Conclusion


Just as we invoke scientific thinking, we tend to immediately
turn to discrete analysis


This may the reason why discrete linguistics is so popular, in
spite of the omnipresence and obviousness of non
-
discrete
effects


This may be our inherent bias, or a habit developed in
natural sciences, or a cultural preference


But in the case of language and other cognitive processes
we do see the limits of the traditional discrete approach


It remains an open question if linguists and cognitive
scientists are able to eventually overcome the strong bias
towards “pure reason” and discrete analysis, or language
will remain a
Ding an sich


But it is worth trying to circumvent this bias and to seriously
explore the focal, non
-
discrete structure that is in the very
core of language and cognition


46

Thanks for your attention


CONGENIAL QUOTATIONS




Unfortunately, or luckily, no language is tyrannically
consistent. All grammars leak.


(Sapir 1921: 38)



“Words as well as the world itself display the ‘orderly
heterogeneity’ which characterizes language as a whole”
(Labov 1973: 30)



“The mind
-
brain is both modular and interconnected
<...> To insist on one to the exclustion of the other is to
short
-
change the enormous complexity of this
quintessentially hybrid system” (Giv
ó
n 1999: 107
-
108)


47

References


Alexandrov, Yuri I., and Natalia L. Alexandrova. 2010. Komplementarnost’ kul’tur. In:
M.A.Kozlova (ed.) Ot sobytija k bytiju. M: Izd. dom VShE, 298
-
335.


Alexandrov, Yuri I., and Elena A. Sergienko. 2003. Psixologicheskoe i
fiziologicheskoe: kontinual’nost’ i/ili diskretnost’? Psixologicheskij zhurnal 24.6, 98
-
109.


Brinton, Laurel J., and Donna Brinton. 2010. The linguistic structure of modern
English. Amsterdam: Benjamins.


Chafe, W. 1994. Discourse, consciousness, and time. Chicago: University of Chicago
Press.


Dahl, Östen. The origin of the Scandinavian languages. 2001. In: Dahl, Östen, and
Maria Koptjevskaja
-
Tamm (eds.) The Circum
-
Baltic languages. Typology and contact.
Vol. 1. Amsterdam: Benjamins, 215
-
236.


Engwall, Olov. 2000. Dynamical aspects of coarticulation in Swedish fricatives


a
combined EMA & EPG study. TMH
-
QPSR 4/2000.


Givon, T. 1999. Generativity and variation: The notion ‘Rule of grammar’ revisited.
In: B.MacWhinney (ed.) The emergence of language. Mahwah: Erlbaum, 81
-
114.


Goddard, Cliff. 2011. Semantic analysis: A practical introduction. Oxford: OUP.


Hoch, Henrich, and Brian Joseph. 1996. Language history, language change, and
language relationship. Berlin: Mouton de Gruyter.


Iwasaki S., Tao H.
-
Y. 1993. A comparative study of the structure of the intonation
unit in English, Japanese, and Mandarin Chinese. Paper presented at the annual
meeting of LSA.


48

References (continued)


Jurafsky, Daniel, Alan Bell, Eric Fosler
-
Lussiery, Cynthia Girand, and William Raymond.
1998
. Reduction of English functionwords in switchboard. In Proceedings of ICSLP
-
98
,
Sydney


Kibrik, A.A
.
2008
a.
Est’ li predlozhenie v ustnoj rechi?

//
A.V.Arxipov et al. eds. Fonetika i
nefonetika. M.: JaSK,

104

115
.


Kibrik, A.A
.
Reference in discourse. Oxford,
2011
.


Kibrik, A.A
.
Prosody and local discourse structure in a polysynthetic language.
2012


Kibrik A. A., Podlesskaya V. I. (eds.)
2009
. Rasskazy o snovidenijax: Korpusnoe
issledovanie ustnogo russkogo diskursa [Night Dream Stories: A corpus study of spoken
Russian discourse]. Moscow: JaSK.


Koval A.I. Morfemika Pulaar
-
Fulfulde [Formal morphology of Pulaar
-
Fulfulde]
//

V.A.Vinogradov ed. Osnovy afrikanskogo jazykoznanija. Morfemika. Moscow: Vost.
literatura,

2000
,

103
-

290


Labouret, Henri.
1952
.
La langue des Peuls ou Foulbé.
Dakar

: IFAN.



Labov, William.
1973
. The boundaries of words and their meanings. In: R. Fasold (ed.)
Variation in the form and use of language. Georgetown University Press,
29
-
62
.


Linell, P.
1982
. The written language bias in linguistics. Linköping, Sweden: University of
Linköping.


Matsumoto K.
2000
. Japanese intonation units and syntactic structure. Studies in
Language
24
:
525
-
564
.


Trudgill, Peter.
2011
. Sociolinguistic typology: Social determinants of linguistic complexity.
Oxford: Oxford University Press.


Wouk F.
2008
. The syntax of intonation units in Sasak. Studies in Language
32
:
137

162
.



49

Acknowledgements

Yuri Alexandrov

Mira Bergelson

Svetlana Burlak

Olga Fedorova



Vera Podlesskaya

Natalia Slioussar

Valery Solovyev