MSc Human Language Technology

randombroadAI and Robotics

Oct 15, 2013 (3 years and 11 months ago)

191 views

[Type the document title]


i

|
P a g e



MSc Human
Language Technology

Representation of Understan
ding for Machine
Translation




Author:
James Christian Read

Supervisor: Yorick Wilks

9/2/2008


ii

|
P a g e


Signed Declaration

All sentences or passages quoted in this dissertation from other people's work have been specifically
acknowledged by clear cross
-
referencing to author, work and page(s). Any illustrations which are not
the work of the author of this
dissertation have been used with the explicit permission of the originator
and are specifically acknowledged. I und
erstand that failure to do this
amounts to plagiarism and will
be considered grounds for failure in this dissertation and the degree examinat
ion as a whole.


Name:


James Christian Read

Signature:

Date:


iii

|
P a g e


Abstract

This dissertation investigates the creation of an intermediate representation of understanding adequately
fitted to the multilingual translation needs of the European Parliament. Mach
ine translation can be
reduced to the task of replacing source language strings from the source text with target language strings
to produce a target text.

Replacing which strings produces the best translation?

A source text can be tokenised in a number o
f different ways. It can be tokenised into words, n
-
grams,
phrases, or sentences. This dissertation investigates which of these tokens represents the optimal
multilingual unit of translation in terms of a playoff between the desirable qualities of
low
mult
i
lingual
translation ambiguity and
high
reusability
.

Experimentation showed that sentences are the lowest level
of
unambiguous unit of translation

but with
low coverage
.
Words and short
n
-
grams
, in general,
were found
to be too ambiguous to useable.
It is

shown that t
he use of sentence templates and phrase templates can increase this coverage while
preserving low ambiguity. A language independent intermediate representation is presented which
makes use of a mix of senten
ces, sentence templates, and phrase
templates
as the unit
s

of translation in
a
manner which aims for maximum coverage without introducing ambiguity. Consideration is given to the
feasibility of automatically learning a translation dictionary of these optimal units of translation. The
conclus
ion is optimistically positive.



iv

|
P a g e


Acknowledgements

Many thanks to my mentor and supervisor Professor Yorick Wilks who has been a constant source of
inspiration throughout my journey from casual linguist to computational linguist. Without hi
m this
project w
ould never

have been possible.

Many thanks also to Professor Robert Gaizauskas and to Dr. Mark Hepple who have given me a firm
educational basis in methods of
machine learning,
natural language processing and text processing, the
very skills I needed to tr
ansfer my linguistic theories into computational experiments on real data.
Learning Perl made text processing
so

much easier and some background knowledge in machine
learning was inspirational

to an understanding of automatically learning alignments
.

Final
ly, many thanks to all the members of the Department of Computer Science of the University

of
Sheff
ield for their support and guidance in the art of technical writing.


v

|
P a g e



Table of Contents

Signed Dec
laration

................................
................................
................................
................................
....

ii

Abstract

................................
................................
................................
................................
....................

iii

Acknowledgements

................................
................................
................................
................................
..

iv

Table of Contents

................................
................................
................................
................................
......

v

Chapter 1


Introduction

................................
................................
................................
............................

1

1.1 Multilingual Machine Translation

................................
................................
................................
...

1

1.1.1 Why Multilingual Machine Translation?

................................
................................
.................

1

1.1.2 The Proceedings of the European Parliament

................................
................................
..........

1

1.1.3 The Europarl Corpus

................................
................................
................................
................

1

1.2 Project Aims & Objectives

................................
................................
................................
..............

2

1.3 Dissertation Overview

................................
................................
................................
.....................

2

Chap
ter 2


Theory

................................
................................
................................
................................
....

3

2.1 Introduction

................................
................................
................................
................................
.....

3

2.2 Multilingual Machine Translation

................................
................................
................................
...

4

2.2.1 Advantage of Using an Intermediate Representation

................................
...............................

4

2.2.2 Interlingual Intermediate Representations

................................
................................
...............

5

2.3 Machi
ne Translation & Units of Translation

................................
................................
...................

8

2.3.1 Machine Translation as the Operation of String Replacement

................................
.................

8

2.3.2 Units of Transl
ation

................................
................................
................................
.................

8

2.3.3 Multilingual Units of Translation

................................
................................
............................

9

2.4 Theory of Language & Translation

................................
................................
................................
.

9

Chapter 3


Units of Translation Literature Review

................................
................................
................

10

3.1 Introduction

................................
................................
................................
................................
...

10

3.2 Word Driven Machine Trans
lation

................................
................................
................................

11

3.2.1 Problems with Using Words

................................
................................
................................
..

11

3.2.2 Word for Word Translation
................................
................................
................................
....

12

3.2.2 Words & Rules
-

Rule Based Machine Translation

................................
...............................

12

3.2.3 Statistical Machine Translation & Unigrams

................................
................................
.........

15

3.2.
4 Final Words on the Single Word as the Unit of Translation

................................
..................

17

3.3 N
-
Gram Driven Machine Translation

................................
................................
...........................

18

3.3.1 Statistical Machin
e Translation & N
-
Grams

................................
................................
..........

18

3.3.2 N
-
gram Coverage & The Problem of Data Sparsity

................................
..............................

19

3.3.4 Closing Words on N
-
grams as the Unit of

Translation

................................
..........................

19

3.4 Phrase Driven Machine Translation

................................
................................
..............................

20

3.4.1 Phrases

................................
................................
................................
................................
.......

20


vi

|
P a g e


3.4.2 SYSTRAN & Phrases

................................
................................
................................
............

20

3.4.3 Statistical Machine Translation & Phrases

................................
................................
............

21

3.4.4 Conclusions on Phrases

................................
................................
................................
..........

21

3.5 Sentence Driven Machine Translation

................................
................................
..........................

21

3.5.1 Early Observations on Sentence Driven Machine Translation
................................
...............

21

3.5.2 Sentences and Example Based Machine Translation

................................
.............................

22

3.5.2 Conclusions on Sentences

................................
................................
................................
......

22

3.6 Template Driven Machine Translation

................................
................................
..........................

22

3.6.1 Templates & The Stanford System

................................
................................
........................

22

3.6.2 Templates & Example Based Machine Tra
nslation

................................
...............................

23

3.6.3 Conclusions on Templates

................................
................................
................................
.....

23

3.7 Knowledge & Knowledge Based Machine Translation

................................
................................

23

3.7.1 The Deficiency of Sentences as Units of Translation

................................
............................

23

3.7.2 KANT & Knowledge Based Machine Translation

................................
................................

24

3.8 Conclusions Drawn from Literature Reviewed

................................
................................
.............

25

Chapter 4


Experimentation with Units of Translation

................................
................................
..........

26

4.1 Introduction

................................
................................
................................
................................
...

26

4.2 Potential Reusability Experiments

................................
................................
................................

26

4.2.1 Gathering Word Statistics

................................
................................
................................
......

27

4.2.2 Gathering Bigram Statistics

................................
................................
................................
...

27

4.2.3 Gathering Trigram Statistics

................................
................................
................................
..

27

4.2.4 Gathering 4
-
gr
am Statistics

................................
................................
................................
....

27

4.2.5 Gathering 5
-
gram Statistics

................................
................................
................................
....

27

4.2.6 Estimating Phrase Statistics

................................
................................
................................
...

28

4.2.7 Gathering Sentence Statistics

................................
................................
................................
.

29

4.2.8 Some Phrase Template Statistics

................................
................................
...........................

30

4.2.9 Some Sentence/
Full Clause Template Statistics

................................
................................
....

30

4.2.10 Potential Reusability Comparison of the Various Units

................................
......................

31

4.3 Translation Ambiguity Exp
eriments

................................
................................
.............................

32

4.3.1 Word Translation Ambiguity Experiments

................................
................................
............

32

4.3.2 Bigram Translation Ambiguity Experiments

................................
................................
.........

33

4.3.3 Trigram Translation Ambiguity Experiments

................................
................................
........

34

4.3.4 Phrase Translation Ambiguity Experiments

................................
................................
..........

35

4.3.5 Phrase Template Translation Ambiguity Experiments

................................
..........................

36

4.3.6 Sentence Translation Ambiguity Experiments

................................
................................
.......

37

4.3.7 Sentence Template Translation Ambiguity Experiments

................................
.......................

38

4.3.8 Unit Translation Ambiguity Comparison

................................
................................
...............

39

4.4 Translation Ex
periments

................................
................................
................................
...............

39


vii

|
P a g e


4.4.1 Word Driven Translation Experiments

................................
................................
..................

40

4.4.2 Bigram Driven Translation Experiments

................................
................................
...............

40

4.4.3 Trigram Driven Translation Experiments

................................
................................
..............

42

4.4.4 Phrase Driven Translation Experiments
................................
................................
.................

42

4.4.5 Full Clause Driven Translation Experiments

................................
................................
.........

43

4.4.6 Sentence Driven Translation Experiments

................................
................................
.............

43

4.4.7 Template Driv
en Translation Experiments

................................
................................
............

43

4.4.8 Comparison of Translation Experiments

................................
................................
...............

44

Chapter 5


Analysis
................................
................................
................................
................................

44

5.1 Introduction

................................
................................
................................
................................
...

44

5.2 Analysis of Reusability Experiments

................................
................................
............................

44

5.3 Analysis of Translation
Ambiguity Experiments

................................
................................
..........

45

5.4 Analysis of Translation Experiments

................................
................................
............................

45

5.5 Analysis Conclusions

................................
................................
................................
....................

46

Chapter 6

Intermediate Representation
................................
................................
................................
..

46

6.1 Introduction

................................
................................
................................
................................
...

46

6.2 Motivation for Intermediate Re
presentation

................................
................................
.................

47

6.3 Interlingual Representations

................................
................................
................................
..........

48

6.4 Intermediate Representation Design
................................
................................
..............................

48

6.4.1 Definition of Language Independent Intermediate Representation

................................
........

48

6.4.2 Translation as Encryption / Decryption Operation

................................
................................

48

6.4.3 Translation (Encryption / Decryption) as Number Conversion Operation
.............................

49

6.4.4 Optimal Representation of Understanding

................................
................................
.............

49

6.4.5 The Intermediate Representation Design

................................
................................
...............

50

Chapter 7


Aligning Units of Translation

................................
................................
..............................

56

7.1
Introduction

................................
................................
................................
................................
...

56

7.2 Bilingual Alignment of Europarl Corpus

................................
................................
......................

56

7.3 Multilingual Alignment by Hand

................................
................................
................................
..

57

7.4 Automatic Multilingual Alignment of the Corpus.

................................
................................
.......

57

Chapter 8


Conclusion

................................
................................
................................
...........................

59

8.1 The Opt
imal Unit of Translation

................................
................................
................................
...

59

8.2 Language Independent Intermediate Representation

................................
................................
....

59

8.3 Multilingual Translation Dictionary

................................
................................
..............................

60

References

................................
................................
................................
................................
...............

61

Appendix A
-

Glossary

................................
................................
................................
............................

64


Chapter 1


Introduction

1

|
P a g e


Chapter 1


Introduction

1.1 Multilingual
Machine
Transl
ation


1.1.1 Why Multilingual
Machine
Translation?

Bilingual translation is the translation of a document from one language, the source language, to another
language, the target language. Multilingual translation is the translation of a document from one
l
anguage, the source language, into
multiple

languages, the target languages.

Many organisations have need of multilingual translation. Product manufacturers want to advertise
their
wares
and provide information about the
m

in many nations. Webmasters want
to be able to
reach a
wider audience by
display
ing

their web content in many languages. Pharmaceutical companies need to
be able

(often by law)

to supply instructions, contraindications and information about possible side
effects in the

native

language of
each of the countries where their medicines are distributed.

Translating documents into many languages comes at an economic cost in terms of the need to hire and
pay professional human translators to ensure the job is done to a high enough standard. The pr
ospect of
being able to translate in
to

multiple languages automatically is an inviting one for many businesses and
organisations because of the promise of economic savings and time savings implied.
One particular
organisation with growing needs for multili
ngual machine translation is the European Parliament.

1.1.2

The Proceedings of the European Parliament

The European Union (EU) has an ‘equality of language’ policy. This means that the Proceedings of the
European Parliament must be translated into the 23 (
and growing) official languages of the EU’s
Member States (Bulgarian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek,
Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovene,
Spanish a
nd Swedish). That’s 253 language pairs and 506 individual language pair directions.

1.1.3

The Europarl Corpus

The Proceedings of the European Parliament document the discussions held at plenary sessions between
members of the European Parliament. Most of

the discussions were originally held in English with the
exception of the contributions of a few Members of the European Parliament who prefer to express
themselves in their first language. All of the discussions from 15
th

April 1996 to 12
th

October 2006
were
translated into the then 11 official languages of the European Union (English, Italian, Spanish,
Portuguese, French, German, Danish, Swiss, Dutch, Greek and Finnish).

Phili
p
p

Koehn, of
The Institute for Communicating and Collaborative Systems of
The
University of
Edinburgh, realised the value of the data in the Proceedings of the European Parliament back in 2002
(Koehn 2002, 2005). He noted the great value of this corpus in these words:


In the following, we will describe in detail the acquisition of
the Europarl corpus of European
Parliament Proceedings for the website of the European Parliament. The proceedings are
published in all the 11 official languages of the European Union. This means that we can not
only extract a conventional parallel corpus,

but a multilingual corpus of 11 languages, or 10
parallel corpora [of] any of the languages.


(Koehn
,

2002)

The Europarl corpus, built by Koehn and now in version 3

(herein denoted as ‘the corpus’)
,
is
downloadable from the Europarl corpus website (http:/
/www.statmt.org/europarl/). The multilingual
corpus is divided into 11 directories (one for each language) of 658 texts representing
the
specific dates
of meetings. Each text is marked up with XML to indicate chapter (dividing major discussions) and
speake
r (who spoke the following words). Most of the original text was uttered in English and the other
ten language versions are largely translations from English.

Chapter 1


Introduction

2

|
P a g e


This dissertation makes use of this multilingual corpus to investigate the possibility of creatin
g an
intermediate representation of understanding for
the
multilingual machine translation needs of the
European Parliament

to automatically translate their proceedings
.

1.2 Project Aims & Objectives

The general aim of this project is to:



Design an interme
diate representation of understanding that encodes enough information for
multilingual machine translation purposes

The specific goals of this project are to:



Investigate the various units of translation exhibited by
and in
version 3 of
the Europarl corpus



Investigate which combination of units best captures all the information necessary for
the
multilingual translation needs

of the European Parliament to automatically translate their
proceedings



Design an intermediate representation for the future multilin
gual translation needs of the
European Parliament

to automatically translate their proceedings



Design the intermediate representation su
ch that it implies minimal

analysis

logic

to read
and
minimal
generation logic

to produce

1.3 Dissertation Overview

The
main focus of the investigation in this dissertation is the
unit of translation
. Every MT paradigm is
limited by the qualities of its chosen units of translation. The structure of this dissertation, therefore, is
not built around any particular MT paradigm

but around the search for the optimal unit of translation
which can be used to build an intermediate representation of understanding well suited to the
multilingual machine translation needs of the
Proceedings of the
European Parliament.

Chapter 2
-

Theo
ry

provides the theoretical grounding for the entire investigation. The advantages of the
use of an intermediate representation of understanding are discussed and a theoretical overview of the
candidate units of translation is presented.

Chapter 3
-

Units
of Translation

Literature Review

presents a

literature based

review of the way various
units of translation have been

used in

different

MT

architectures. As the focus of this chapter is what
research shows about the qualities of
the
various units of transl
ation the chapter

is organised around the

units of translation
themselves

rather than around

specific architectures

or paradigms
.

High reusability
and low translation ambiguity are considered to be the desirable qualities of a good unit of translation.

Cha
pter 4
-

Experimentation details experiments

conducted with
the
various
possible
units of translation
(words, n
-
grams, linguistic phrases, full clauses, sentences and templates) in an attempt to find the
optimal linguistic units for creating an intermediat
e representation of understanding. As it was not
possible to test each unit fully
,

representative experiments were devised to quickly reveal the
characteristic qualities of each unit of translation.

In order to test the quality of reusability, statistical
ly analytic experiments were conducted for each unit
of transla
tion to ascertain their potential reusability
.

In order to test their level of
potential
translation
ambiguity statistical translation
alignment
information was gathered from samples of each ty
pe of unit

of translation
.
In order to test their

actual level of translation ambiguity
multilingual translation

experiments were carried out

to see how each unit could be used to resolve potential translation
ambiguities
.

Chapter 1


Introduction

3

|
P a g e


Chapter

5


Analysis
analyses the

results

of the experiments in C
hapter 4

and the experiments
themselves
.


The conclusion is drawn that of all the tokens tested only the sentence and full clause offer
the desired low level of ambiguity needed for the production of high quality translation
s. However, as
they are frequently
unique and do not reoccur
a mixture of sentence templates and phrase template
variables presented the optimal uni
t of translation.

C
hapter 6 presents
discusses methodologies that have been used to form an interlingual int
ermediate
representation of understanding
.

It gives an in depth consideration of what a language independent
representation is (in terms of technology) and what its design goals should be. Reasoning is presented to
demonstrate that a numerical key is the o
ptimal form of intermediate representation and the design of a
language independent intermediate representation is illustrated with examples.


Chapter 7
considers, form a purely theoretical point of view, the building of a multilingual translation
dictiona
ry for the Proceedings of the European Parliament. An overview is given of the alignment tools
that already exist including sentence alignment tools that ship with version 3 of the Europarl corpus and
the use of the expectation maximisation (EM) algorithm
of the GIZA++ tool to automatically learn
bilingual word and phrase alignments. A theoretical consideration is given to extending bilingual
alignment technology to multilingual alignment counterparts. A theoretical consideration is also given to
the plausi
bility of automatically or humanly aligning sentence templates.

Chapter 8 presents the conclusions drawn in light of the whole body of experimentation with various
units of translation and with
investig
ation
into

building a multilingual translation diction
ary
. It was
concluded that
sentence
templates with phrase variables presents the optimal unit of translation
capturing the optimal blend of low
translation
ambiguity and reusability.
It was further
optimistically
concluded that

there is a possibility that
the optimal mix of sentence templates and phrase templates
offering maximal coverage while maintaining low ambiguity could be automatically learned in a
massively parallel environment.


The key to reading and understanding this dissertation is that it is
c
entred around the unit of translation

and
not

around any particular MT paradigm. The MT paradigm is a consequence of the selection of the
unit of translation.

Chapter 2


Theory

2.1 Introduction

T
he purpose of this chapter

is to present the underlying the
ory which stimulates the entire investigation.
It includes:



A discussion on the benefits of using an intermediate representation in a multilingual machine
translation architecture



The early history of suggestions of
interlingual

representations



A critical
consideration of Bar
-
Hillel’s (1960) critique of
interlingua

with reference to the
interlingua

proposals of his time and the definition of
interlingua

perpetuated by Vauquois’
(1968) definition of
interlingua

in association with his Machine Translation Pyr
amid



A consideration of the benefit of using a language intermediate representation of understanding
that does not imply heavy generation logic to create and does not imply heavy analysis
logic to
read and how
such a representation could invalidate Bar
-
Hi
llel’s objections



A simplified definition of what machine translation is

Chapter 2


Theory Review

4

|
P a g e




A theoretical consideration of the various possible units of translation and their predicted
qualities and suitability



A theory of language and of translation

These discussions, whi
le highly theoretical in nature,
build a foundation for and
set the scene nicely for
the rest of the
dissertation.

2.2

Multilingual Machine Translation

2.2
.1 Advantage of Using an Intermediate Representation

As noted in section 1.1.2

the EU has 23 official

languages. That’s 253 language pairs and 506 language
pair directions. Without the use of an intermediate representation of understanding 506 analysis modules
and 506 generation modules would be needed (one for each language pair direction). With an
inter
mediate representation that is common to all the supported languages this number is greatly reduced
to the need for only 23 analysis modules and 23 gener
ation modules. Figure 2.1

illustrates the advantage
of using an intermediate representation (IR) of und
erstanding.



Figure 2.1

Figure 2.2

illustrates the ease of adding support for a new language when using an intermediate
representation of understanding.

Chapter 2


Theory Review

5

|
P a g e




Figure 2.2

Booth (1958) generalised this kind of observation more formally by noting that tr
anslating
n
languages
without an intermediate representation implies the creation of
n(n
-
1)

programs. This observation was the
driving force of efforts to create
interlingual

machine translation architectures starting in the 1950’s.

2.2
.2 Interlingual Inte
rmediate Representations

The earliest known references to a form of language independent representation date back to the 17
th

century of our common era. Leibniz (1677) suggested that thought is mathematical. He theorised a
universal concept language ‘
chara
cteristica universala
’ which is independent of character
representations. A key quotation of his reasoning along these lines is:


It is obvious that if we could find characters or signs suited for expressing all our thoughts as
clearly and as exactly as ar
ithmetic expresses numbers or geometry expresses lines, we could
do in all matters insofar as they are subject to reasoning all that we can do in arithmetic and
geometry. For all investigations which depend on reasoning would be carried out by
transposing
these characters and by a species of calculus.


(Leibniz
,

1677)

In the 1950’s theoretical work began on using a universal representation of understanding as an
intermediate representation giving birth to the interlingual approach to machine translation

(Go
de
,

1955
.
,
Rhodes
,

1956
.
, Richens
,

1956
.
, Masterman
,

1957)
. This work was largely fuelled by the already
mentioned observation that the use of an
interlingua
reduces the problem of creating
n(n
-
1)
programs to
the problem of creating
2n
. Bar
-
Hillel (1960) r
efuted the value of this observation in these words:

Chapter 2


Theory Review

6

|
P a g e



The fallaciousness of this argument is immediately obvious, however, as soon as one realizes
that using one, any one, of the original n languages as a mediating language would reduce the
number of progr
ams even more, namely to 2(n
-
l)


(Bar
-
Hillel
,

1960)

His underlying insinuation being that using an intermediate representation further complicates the
translation process by requiring the process to perform 2 analyses and 2 generations rather than
translat
ing directly with 1 analysis and 1 generation step. Bar
-
Hillel’s objection may have merit for
bilingual machine translation architectures but for multilingual machine translation architectures it fal
ls
down as it assumes that there is
no

conceivable
inter
mediate representation
which implie
s little to no
additional
analytical and generation processing requirements

to read from and write to
. In fact, he
himself conceded:


This counterargument does not, of course, prove that the idea of using an interlingua f
or MT
purposes is wrong as such, since other arguments might be brought forward in its support
,”


(Bar
-
Hillel
,

1960)

He further admitted the appeal of using an intermediate representation of understanding with reference
to Leibniz’s characteristica uni
versalis:


I admit that the idea of a "logical," unambiguous (in every respect, morphologically,
syntactically, and semantically) interlingua has its appeal today as had the related idea of a
characteristica universalis in the 17th and early 18th centuries


(Bar
-
Hillel
,

1960)

His major objection to this appeal, in his own words, was:


Its fallacy lies in the assumption that "translation" from a natural language into a "logical"
one is somehow simpler than translation from one natural language into another.
This
assumption, however, is totally unwarranted, whatever its appeal to someone with little direct
experience with symbolic language systems


(Bar
-
Hillel
,

1960)

This objection undeniably had

merit, given his own personal experiences with symbolic language
s of his
time. However, where the logic of his reasoning falls
is in the assumption that
no

other kind of feasible
intermediate representation

exists

which does not imply the kind of complex processing he was referring
to.

In order to understand Bar
-
Hillel
’s objections more fully they need to be put in the context of
the
(then)
contemporary proposals for
interlingua
. Perhaps the best visual summary of
(then)
contemporary ideas
of
interlingua
is the Machine Translation Pyramid presented by Vauquois (1968) de
picted in figure 2.3
below.









Figure 2.3 The Machine Translation Pyramid

Lexical Transfer

Syntactic Transfer

Semantic Transfer

Interlingua

S
O
U
R
C
E


T
A
R
G
E
T

Chapter 2


Theory Review

7

|
P a g e


The essence of this pyramid is that by the time of Vauquois three distinct and well defined basic
approaches to machine translation had emerged:

1.

Direct

2.

Transfer

3.

Int
erlingual

Hutchins and Somers (1992) provide a good overview of these three basic approaches in Chapter 4 of
their book
Introduction to Machine Translation
. The following is an attempt to summarise the essence
of their deeper explanation.



The
direct

appro
ach entailed no form of intermediate representation merely the lexical transfer
of words in the source text to words in the target text.



The
transfer

approach made use of an intermediate representation but not a language
independent one. The representatio
n of understanding is dependent on a specific language pair
and is the result of a language pair specific syntactic analysis of the source language. The
driving force of
the transfer approach

is therefore the process of syntactic transfer.



The
interlingua
l

approach involved the generation of a language independent representation.
The generation of this language independent representation was the result not only of a
syntactical analysis of the source text but also of a semantic analysis of the source text.

Production of the source text was the result of both semantic and syntactic generation. The
driving force of the whole process was therefore seen as one of semantic transfer

For these reasons the generation of an
interlingua

was seen as the deepest possib
le analytical process.
Vauquois’ (1968) Machine Translation Pyramid, therefore, was an apt illustration as it visually showed
the uphill processing complexity of performing the analysis
of the source text
and generation

of the
interlingua
followed by deep
analysis of the
interlingua
and
generation of
the target text. It was this
kind of understanding of
interlingua

that Bar
-
Hillel was refuting although he himself admitted that in
his time:


the terms "interlingua," "intermediate language," "mediating langua
ge"

and their
counterparts in Russian

are being used in many different senses


(Bar
-
Hillel
,

1960)

Although, making this necessary concession he went to on to assert:


Natural languages, artificial languages of the Esperanto type, symbolic language
-
systems
of
the type treated by logicians, "algebraic" languages of various denominations, all have been
suggested at one time or other as candidates for mediating languages. I believe that my present
criticisms hold equally against each of these interpretations.



(Bar
-
Hillel
,

1960)

Bar
-
Hillel’s objections against
interlingual representation
, therefore, were not categorically against the
concept of a language independent intermediate representation


just against all the candidate
representations he had been presen
ted with up to that time. If an intermediate representation had been
pre
sented to him that did not imply

such expensive additional
generation to be produced and analysis to
be read

he may well have responded more positively.

The visual effect of a computa
tionally inexpensive

intermediate representation would be to take
Vauquois’ (1969) Machine Translation Pyramid and squash it to an almost direct approach with an
intermediate representation in between. See figure 2.4, below, for a graphical representation.

In order to
avoid confusion with
Vauquois’
(1969)
definition
of the term
interlingua

(result of syntactical and
semantic analysis) the figure uses the term
concept text

for the intermediate representation.

Chapter 2


Theory Review

8

|
P a g e










Figure 2.4 Flattened Pyram
id


The definition of
interlingua

should not be whether it was created as a result of syntactical and semantic
analysis or of what particular data structures it contains but may best be summed up in the following
words taken from a Farwell, Guthrie and Wil
ks
(1992) paper


Each individual language system is independent of all other language systems within ULTRA.
Corresponding sentences in different languages must produce the same IR and any specific 1R
must generate corresponding sentences in the five langua
ges.


(Farwell, Guthrie,

Wilks
,

1992)

As long as an intermediate representation satisfies this definition it is
interlingual

irrespective of which
data structures it has and how they were derived. However, in the interests of avoiding confusion
with
Vauquo
is’ (1969) definition
an effort is made to avoid the use of the term
interlingua

in this dissertation.

2.3

Machine Translation & Units of Translation

2.3
.1 Machine Translation as the Operation of String Replacement

There are many different machine translat
ion (MT) paradigms of varying complexity. What they all
have in common is that they can be reduced to the following process:



The replacement of strings in the source language from the source text with strings in the
target language to build the target text
.

A source text can be cut up into strings in a seemingly infinite number of ways and combinations. The
most fundamental of design points for any MT architecture is which strings of what length should be
replaced. That is to say, what is the optimal unit o
f translation?

The next subsection introduces the subject of
units of translation

and a hypothesis on the qualities each
different unit will display is produced.


2.3
.2 Units of Translation

The Europarl corpus exhibits a number of levels of textual tokens

each offering different features and
implying different drawbacks. In summary, these units can be modelled as:

1.

The entire document collection

2.

Individual documents

3.

ChaptersSpeakersParagraphs

4.

SentencesFull clauses

5.

Clauses (main clauses, relative clauses)

6.

Ph
rases (linguistic phrases / n
-
grams)

7.

Words

8.

Morphemes

Conceptual Transfer

Concept Text

S
O
U
R
C
E


T
A
R
G
E
T

Chapter 2


Theory Review

9

|
P a g e


These tokens can be visualised as a kind of linguistic ladder with the document collection at the top and
with morphemes down at the bottom. There are clear advantages and disadvantages implied by moving

up or down the ladder.

The major advantage of using the document collection as a unit of translation is that this unit has all its
contextual (pragmatic and semantic) information inside of it and as such is a token which is unique in
meaning and extremel
y low in ambiguity. The major disadvantage of using such a unit is that its
extremely low frequency of natural occurrence (only once up until now) makes the unit unusable as a
reusable unit of translation as it is not likely to occur ever again in the Proc
eedings of the European
Parliament.

Taking the other extreme, consider the advantages and disadvantages of using the morpheme as a unit of
translation. The major advantage of the morpheme is that they reoccur with great frequency within the
corpus and the
prediction that they are likely to occur again, with similar frequency, in the future
translation needs of the European Parliament can be reliably made. Their inherent disadvantage is that
they carry very little in the way of meaning and their meaning is l
argely affected by their immediate
context, making them the most ambiguous unit in the linguistic ladder modelled.

Moving up the linguistic ladder implies reduction in ambiguity, making the unit more reliably
translatable. Moving down the linguistic ladder

the unit becomes more frequent, implying clear
advantages of reusability and large coverage for a machine translation design. Much of the
experimentation in this dissertation is centred around the search for the unit of translation which exhibits
the opti
mal balance between low
translation
ambiguity and reusability.

Level of translation ambiguity refers to the number of possible translations the token can legally have.
This will vary for any given token depending on how many target languages there are. Syn
onymous
translations are not considered to c
ontribute to ambiguity. Selecting a synonymous translation is a
question of style not a question of translational accuracy. Reusability refers to how often a translation
token can be expected to be reused.

2.
3
.3
Multilingual Units of Translation

One important consideration when selecting the optimal unit of translation is whether the unit is suitable
for an intermediate representation or not. The acid test for whether a string is suited for

language
independent

in
termediate representation is whether it consistently translates into the same string for each
individual target language supported. The answer to this question largely depends on which languages
are the intended target languages. A string which may be tran
slated consistently for closely related
languages, such as the Romance languages (Italian, Spanish, Portuguese and French) may produce a
range of different translations in other languages (e.g. the Germanic languages: German, Danish, Swiss
and Dutch).

For
example, consider the English string
I love you
. On the surface, the string may seem like a great
candidate for multilingual translation. However, the string, in English, is used with a number of different
senses (Platonic love, maternal love, romantic lov
e). When translating this string into Italian the
intended sense is important. If the intended sense is romantic love between a couple then the translation
is
Ti amo
. However, if the sense is that of maternal love from a mother to her child then the more l
ikely
translation is
Ti voglio bene
.

2.4

Theory of Language & Translation

A surface analysis of language may reduce language to words and grammar. However, the purpose of
language is functional and the structures used for communicating those functions are
merely an agreed
protocol that facilitates effective communication. A purely grammatical analysis of language falls short
of the mark and is not completely suited for the task of translation. A purely grammatical analysis of
language can reduce many senten
ces to
subject
,

verb

and

object
(English order).
Such an

analysis may
also
stimulate

the observation that this order can vary from language to language or that order is not as
Chapter 2


Theory Review

10

|
P a g e


important in inflected languages where the subject and object are evidenced by t
heir inflection rather
than sentence position.

In many cases the simple detection of the subject, verb and object in a source sentence and their
subsequent transformation into corresponding subject, verb and object in the accepted order (if
important) in t
he target language produces satisfactory translations. But in many cases it does not.
Consider the English sentence
I like oranges
and the equivalent Italian sentence
Mi piacciono le arance
which describes the exact same concept. While in English
oranges

i
s an object in Italian
le arance

is the
subject. In fact, the Italian sentence has no direct object, only the indirect object
Mi

(English
to me
). The
exact same concept is translated with completely different grammatical structures.

The driving force behi
nd the selection of the correct translation is one of functionality. The detection of
the correct English function, expressing a personal food preference, stimulates the human translator to
select the correct Italian structure to express the same function.

Consider the English sentence
I really
really like oranges
. The double use of
really

serves the function of emphasizing the personal preference.
The literal translation of English
really

is Italian
davvero
or
veramente
. However, the Italian translation
of

this function is
Le arance mi piacciono da morire

with Italian
da morire

literally meaning
to die
. A
further example is the English translation of
I love oranges
the function of which may be translated into
Italian as
Le arance mi fanno impazzire
(literal
ly
Oranges make me go crazy
). In these examples the
English words and the English grammar do little to help select the correct Italian translation. Their
specific arrangement in a structure indicative of a specific function, however, makes the disambiguati
on
routine.

Language, therefore, can be better described as a series of functions than being reduced to words and
grammar. In fact, so strong is the functional quality of language that the usual use of words and the
usual grammatical structures can be morp
hed by the power of functionality. Consider the following
examples of real language with real contexts:

1

Nice one mum. This is
sick
. (Mum just bought him the coolest snowboard on the market)

2

Wait for you I will. (best friend trying to sound like Yoda)

That
language is functional and this has implications for machine translation has been observed by MT
researchers past. Consider the following words from Farwell, Guthrie and Wilks (1992) :


what is universal about language is that it is used to perform acts of

communication: asking
questions, describing the world, expressing one's thoughts, g
etting people to do things, warni
ng
them not to do things, promising that things will get done and so on


(Farwell, Guthrie &
Wilks
,

1992)

It is therefore predicted that in
dividual words will prove to be poor units of translation
while specific combinations of words with structures indicative of functions will pro
ve
better.

Chapter 3


Units of Translation

Literature Review

3.1 Introduction

In this review, reference is made

to a number of
machine translation (
MT
)

paradigms including rule
based machine translation (RBMT), statistical machine translation (SMT), and knowledge based
machine translation (KBMT). However, the purpose of this review is
not
to show
what past research

shows ab
out the nature of each MT paradigm. The purpose of this literature review is to establish what
past MT research shows about the nature of various units of translation.


Chapter 3


Units of Translation Literature Review

11

|
P a g e


The units considered in this literature review are:

1.

words

2.

n
-
grams

3.

phrases

4.

sent
ences

5.

templates

Single words have been used in both RBMT (base installation of SYSTRAN) and in SMT (early IBM
work on CANDIDE). N
-
grams have been used as the basis of recent work in SMT (Moses). Phrases are
used by the Commission MT system (a highly traine
d and extended version of SYSTRAN) and also
with limited success in SMT (Moses). Sentence templates were used in
interlingual

efforts

(the Stanford
Machine). Items of knowledge have been used as partial units of translation for
word sense
disambiguation in

KBMT (KANT).

As of the writing of this dissertation
no

empirical analysis of units of translation which deals with the
subject of units of translation independently from architecture

has been found in machine translation
research literature
. It is inevita
ble, in this review, that various architectures using the various units
considered are presented. However, it cannot be emphasised strongly enough that the purpose of this
review is to
review the qualities of each unit of translation and
not

the MT paradig
ms themselves
.

3.2 Word Driven M
achine
T
ranslation

3.2.1 Problems with Using Words

The main problem with using words as the unit of translation is their high level of ambiguity. Any
particular word (especially frequently occurring words) could exhibit a ra
nge of senses dictated by
various contexts. While these senses may all be signified by one word in any particular language those
distinct senses may be signified by different words in different contexts in different languages. This
gives rise to the proble
m of word sense disambiguation.

Yngve (1955), one of MT’s earliest pioneers, made similar observations in these words:


Some of the most serious difficulties confronting us, if we want to translate, arise from the fact
that there is not a one
-
to
-
one corre
spondence between the vocabularies of different
languages.


(
Yngve,
1955)

It was on the basis of the ambiguous nature of the word as the unit of translation that Bar
-
Hillel (1960)
argued against the feasibility of fully automatic high quality translation (
FAHQT) with the following
words:


It is an old prejudice, but nevertheless a prejudice, that taking into consideration a sufficiently
large linguistic environment as such will suffice to reduce the semantical ambiguity of a given
word. Let me quote from th
e memorandum which Warren Weaver sent on July 15, 1949 to
some two hundred of his acquaintances and which became one of the prime movers of MT
research in general and directly initiated the well
-
known researches of Reifler and Kaplan [1]:
"... if ... one c
an see not only the central word in question, but also say N words on either side,
then, if N is large enough one can unambiguously [my italics] decide the meaning of the
central word. The formal truth of this statement becomes clear when one mentions that

the
middle word of a whole article or a whole book is unambiguous if one has read the whole
article or book, providing of course that the article or book is sufficiently well written to
communicate at all." Weaver then goes on to pose the practical questi
on: "What minimum
Chapter 3


Units of Translation Literature Review

12

|
P a g e


value of N will, at least in a tolerable fraction of cases, lead to the correct choice of meaning
for the central word," a question which was, we recall, so successfully answered by Kaplan.

But Weaver's seemingly lucid argument is riddled

with a fateful fallacy: the argument is
doubtless valid (fortified, as it is, by the escape clause beginning with "providing") but only for
intelligent readers, for whom the article or book was written to begin with. Weaver himself
thought at that time th
at the argument is valid also for an electronic computer, though he did
not say so explicitly in the quoted passage, and on the contrary, used the word "one"; that this
is so will be clear to anyone who reads with care the whole section headed "Meaning and

Context." In this fallacious transfer Weaver has been followed by almost every author on MT
problems, including many Russian ones
.”

(Bar
-
Hillel
,

1960)

Of course, it is not only the field of word sense disambiguation that suffers from the use of the word a
s a
unit of translation but also the problem of word order. As Yngve (1955) put it:


Another great problem is that the word order


frequently quite different in the two
languages


further obscures the meaning for the reader.


(
Yngve,
1955)

The implicatio
ns of using words as the uni
t of translation are, therefore
:

1.

the need to employ reliable word sense disambiguation logic

2.

the need to perform reliable word reordering logic

3.2.2 Word for Word Translation

Perhaps the simplest translation paradigm is that of

word for word substitution. This is the kind of
translation process employed by humans to create interlinear translations of ancient documents as an aid
for students and scholars who do not have the time to keep consulting a lexicon. Yngve (1955)
commente
d on the quality of translations produced by such a paradigm in the following words:


Word
-
for
-
word translation could be handled easily by modern data
-
handling techniques. For
this reason, much of the work that has been done up to this time in the field of

mechanical
translation has been concerned with the possibilities of word
-
for
-
word translation2,3. A word
-
for
-
word translation consists of merely substituting for each word of one language a word or
words from the other language. The word order is preserve
d. Of course, the machine would
deal only with the written form of the languages, the input being from a keyboard and the
output from a printer. Word
-
for
-
word translations have been shown to be surprisingly good
and they may be quite worth while. But they
are far from perfect. Some of the most serious
difficulties confronting us, if we want to translate, arise from the fact that there is not a one
-
to
-
one correspondence between the vocabularies of different languages.


(
Yngve,
1955)

It is easy to demonstrate

the fundamental truth of these words. Consider the English word

like

. ‘
Like

can be translated into Italian into one of two words (as a minimum). It can be translated as

piacere


(to
like
v.
) or as

come


(with synonym sense of
as
). This disambiguation

can, in many cases, be made by
successful

part of speech parsing. Consider the reverse case. Italian
come

can be translated into English
like
,
as

or the construction
as … as
. Italian

È

come un leone


translates as English

He is like a lion


(if
the cont
ext is talking about a male) while Italian

È

veloce come un leone


translates as English

He is as
fast as a lion


and Italian

È

veloce, come un leone


translates as English

He is fast, like a lion
.


The
disambiguation, in this case, cannot be reduced t
o simple part of speech tagging. It relies on specific
structural combinations indicative of function.

3.2.2 Words & Rules
-

Rule Based Machine Translation

In an attempt to solve the problem of word ordering a whole generation of rule based machine
tran
slation architectures were implemented of differing sophistication. The earliest rule based
experiments with words as the unit of translation employed the direct method. Hutchins & Somers (1992)
Chapter 3


Units of Translation Literature Review

13

|
P a g e


describe such early efforts in detail. There was no analysis
of the syntactic structure or semantic
relationships of the source sentence. Words were identified via morphological analysis and translations
provided by bilingual dictionary lookups. A little local word reordering was performed to produce more
acceptable

translations. Hutchins & Somers (1992) provide the following Russian to English translations
as examples of the kind of output such experiments produced:

Original Russian:



My trebuem mira.

Machine translation:

We require world

Human translation:


'We wa
nt peace.'


Original Russian:



Nam nužno mnogo uglja, železa, elektroenergii.

Machine translation:


To us much coal is necessary, gland, electric power.

Human translation:

'We need a lot of coal, iron and electricity.'


Original Russian:


On dopisal stran
itsu i otložil ručku v storonu.


Machine translation:

It wrote a page and put off a knob to the side.

Human translation:


'He finished writing the page and laid his pen aside.'


Original Russian:


Včera my tselyi čas katalis' na lodke,

Machine translation:


Yesterday we the entire hour rolled themselves on a boat.

Human translation:


'Yesterday we went out boating for a whole hour.'


Original Russian:


Ona navarila ščei na neskol'ko dnei.

Machine translation:


It welded on cabbage soups on several days.

Hum
an translation:

'She cooked enough cabbage soup for several days.'


Translation examples
taken
from Hutchins & Somers (1992)

The climax of the rule based paradigm was epitomised with the conception of SYSTRAN, perhaps the
most successful MT system to be im
plemented to date. SYSTRAN has a trainable functionality which
allows the system to be trained to recognise long strings of text and translate them directly and much of
its success has been due to this functional
ity (more on this in section 3.4.2
).

Howeve
r, the untrained base system makes use of bilingual single
-
word dictionary entries. Non English
words are recognised via means of morphological analysis and a syntactical analysis of the source text is
performed to reveal its syntactical structure. A trans
fer process then makes transfers to produce the
target syntax. For this reason SYSTRAN has often been described as a transfer system (Hutchins &
Somers
,

1992
.
, Arnold et al
.

1993
.
, Senellart et al 2001) although Wilks (1992) objects that:

Chapter 3


Units of Translation Literature Review

14

|
P a g e



SYSTRAN has been

described by its owners as a transfer rather than a direct system, even
though, in fact, it has no true separable transfer lexicon for a language pair


(Wilks
,

1992)

The mistaken premise of such an architecture is that the act of translation is that of:



t
ranslating individual words and reordering them (possibly inflecting them also) in accord with
good grammatical practise in the target language

In practice, such a generic method frequently produces undesirable results. Consider the following
translation e
xample from a Russian email into English (SYSTRAN’s flagship language pair):

Original Russian email

Как делишки? Как вы добрались? Чего новенького? Разобралась в своихмужиках? Или еще
больше запуталась? Родители успокоились? С того дня как вы уехали, твоя
мама мне ни
разу не позвонила. Как детишки? В школу ходят? Толик не появлялся? У меня все
попрежднему. Хожу на работу. В личной жизни все без изменений! Никого! На мне наверное
какое то проклятье!!!!!!!!!! Напиши мне хоть пару слов! Люблю. Целую.

SYSTRAN p
owered Google translation

As delishki? How do you get? What's new? Decompose in their guy? Or even more confusing?
Parents calmed down? Since that day as you left, your mom, I never called. As detishki? In go to
school? Fraction did not appear? I have all
oprezhdnemu. Hoja at work. In private life, all
unchanged! No! For me probably What then curse !!!!!!!!!! Write me at least a few words! I love.
Target.

Human post edited version (with knowledge of the situation and general context)

How are things? How did

you guys get

on? What’s new? Have you chosen one of your men yet
? Or
are you still confused? Did your parents calm down? Since the day you left your mum hasn’t called
me once? What about the kids? H
ave they started school yet? Did Tolik turn

up? Everythi
n
g’s pretty
much the same here.

I work each day. My love life is the same. Nobody! What a drag! Write me at
least a few words. Love you. Hugs and kisses.

Automatically generated translation courtesy of Google translate facility
http://translate.google.co.u
k/translate_t#

The reason the automatically generated translation is so poor in places is because this method is based
on the incorrect assumption that translation can be performed by reordering translated words into a
grammatically acceptable order in the

target language. This assumption fails as it stands upon the weak
foundation of the over generalised assumption that:



Concepts in one language are expressed with the same grammatical structures in other
languages

This assumption is demonstrably false and
therefore makes a very weak basis for a machine translation
model. This problem is commented on by Hobbs (1992) in these words:


There are a number of classical problems with the Transfer Approach, principally arising
when the two languages express the sam
e concept in very different ways syntactically. For
example, what is expressed by the main verb in one language may be expressed adverbially in
another. Conjunction reduction may be possible in one language, while lexical factors make it
impossible in anot
her.


(Hobbs
,

1992)

It is perhaps for these reasons that a purely rule based system often produces poor quality translation
when using the single word as the unit of translation. Improvements in quality of SYSTRAN’s
Chapter 3


Units of Translation Literature Review

15

|
P a g e


translation have largely been due to its

flexible design which allows it to be trained to store translations
of large

chunks of text (see section 3.4
.
2
).

3.2.3 Statistical Machine Translation & Unigrams

Alternative and more recent work using single words as units of translation was the original
IBM work
on statistical machine translation (SMT). Statistical methods and ideas from information theory for the
problem of machine translation were first suggested by Weaver (1949). However, it is argued that his
suggestions could not have been implemente
d in his time because of hardware restrictions (Brown et al.,
1990). The original SMT system, CANDIDE, is described by Brown et al (1990) and translates from
French to English using single French words as the units of translation.

The statistical method of

translation is described by Brown et al. (1990) as follows:


We take the view that every sentence in one language is a possible translation of any sentence
in the other. We assign to every pair of sentences (S, T) a probability, Pr(T|S), to be interpreted

as the probability that a translator will produce T in the target language when presented with S
in the source language. We expect Pr(T|S) to be very small for pairs like (Le matin je me
brosse les dents lPresident Lincoln was a good

lawyer) and relativel
y large for pairs like (Le president Lincoln btait un bon avocat l President
Lincoln was a good lawyer). We view the problem of machine translation then as follows. Given a
sentence T in the target language, we seek the sentence S from which the translator

produced T. We
know that our chance of error is minimized by choosing that sentence S that is most probable given
T. Thus, we wish to choose S so as to maximize Pr(S|T). Using Bayes' theorem, we can write

Pr (S|T) = (Pr (S) Pr (T|S)) / Pr (T)

The denomina
tor on the right of this equation does not depend on S, and so it suffices to choose
the S that maximizes the product Pr(S)Pr(T|S). Call the first factor in this product the language
model probability of S and the second factor the translation probability
of T given S. Although
the interaction of these two factors can be quite profound, it may help the reader to think of the
translation probability as suggesting words from the source language that might have
produced the words that we observe in the target
sentence and to think of the language model
probability as suggesting an order in which to place these source words
.


(Brown et al
,

1990)

In summary, there are two major driving forces to the CANDIDE system


the translation model and the
language model. T
he translation model (single words) provides all the possible English translations of
each single French word (automatically learned by several iterations of EM re
-
estimation). Each word
may have several possible translations. Each possible translation has

a probability based on the number
of times a corresponding alignment has been found by the EM algorithm. The language model (n
-
gram
based) provides the reordering mechanism of CANDIDE. Translations are chosen as a playoff between
translation likelihood an
d good sentence order.

Some view the recent rise of SMT as a triumph over rule based methods. Fred Jelinek, in the context of
a workshop on the evaluation of NLP systems said:


Every time I fire a linguist, the performance of our speech recognition system
goes up


(Fred Jelinek

1988
)

This is an attitude that, post 1990 and IBM’s published advances in SMT, became prevalent at IBM. So
much so that Wilks (1993) in 1993 wrote:


how much I resent IBMs use of "linguist" to describe everyone and anyone they ar
e against


(Wilks 1993)

Chapter 3


Units of Translation Literature Review

16

|
P a g e



However, in reality, there is no such clear distinction between ‘statistical’ and ‘rule
-
based’. While it is
convenient for opponents of ‘rule
-
based’ mechanisms to consider SMT systems to be purely powered by
‘statistics’ such a vie
w is demonstrably considerably misguided. As already mentioned CANDIDE has
at its core the
machine learned
translation model TM and the language model LM. The combined effect
of the translation model and language model is that CANDIDE has the finest graine
d implied ‘rule’
system for single word units of translation to date available to the MT research community. Perhaps the
best proof that the product of the combination of a translation model and language model is a rule
system is the 2008 paper by Dugast e
t al


Can we relearn an RBMT system? (Dugast et at
,

2008).

The
paper outlines how the output of a base installation of SYSTRAN was used to train the translation
model of an SMT system.

CANDIDE’s rules are so fine grained it has rules for every single word

supported. What word can each
word be translated by? In what contexts? How should the words be reordered? This rule system is far
more complex than any of the high level generalisations which are typically seen in transfer
architectures. Perhaps it is for

this reason that statistically driven methods have seen the greatest success
when using single words as the unit of translation.

Brown et al (1990) provide some example
translations produced by CANDIDE when testing, as follows:

Exact


French
:


Ces ammende
ments sont certainement n~cessaires.

Hansard
:


These amendments are certainly necessary.

Decoded
:


These amendments are certainly necessary.

Alternate


French:


C'est pourtant tr~s simple.

Hansard
:

Yet it is very simple.

Decoded
:


It is still very simple.

Different


French
:


J'al re~u cette demande en effet.

Hansard
:

Such a request was made.

Decoded
:

I have received this request in effect.

Wrong


French
:


Permettez que je donne un example ~, la Chambre.

Hansard
:

Let me give the House one example.

Decoded
:

L
et me give an example in the House.

Ungrammatical


French
:


Vous avez besoin de toute l'~de disponible.

Hansard
:

You need all the help you can get.

Decoded
:

You need of the whole benefits available.

Example translations taken from Brown et al (1990)

Chapter 3


Units of Translation Literature Review

17

|
P a g e


Obviou
sly, the implied rule set and therefore results obtained can be tweaked by altering the language
model or translation model and experimentation with different methods of weight setting is the subject
of
a great deal

of experimental research (Och & Weber
,

1
998
.
, Sato & Nakanishi
,

1998
.
, Wang &
Waibel
.

1998
.
, Della Pietra et al
,

1997).

Perhaps the best example of how the implied rule set was improved is the description of the 5 IBM
translation models which can be found in The Mathematics of Statistical Machi
ne Translation:
Parameter Estimation (Brown et al, 1993). The purpose of the five models is to make better word
alignments from a bilingual corpus. The advantage of making better alignments is that a more reliable
translation model is produced. A more reli
able TM means its fine
-
grained rules are better defined.
Improved quality of translation results as the direct impact of having a better defined fine
-
grained rule
set. What is even more interesting to the fine
-
grained rule debate is how IBM’s 5 models are
implemented. Their methods can best be summed up in Brown et al’s own words:


In Model 1 we assume all connections for each French position to be equally likely

In Model 2 we make the more realistic assumption that the probability of a connection depends o
n
the positions it connects and on the lengths of the two strings

In Models 3, 4, and 5, we develop the French string by choosing, for each word in the English string,
first the number of words in the French string that will be connected to it, then the id
entity of these
French words, and finally the actual positions in the French string that these words will occupy. It is
this last step that determines the connections between the English string and the French string and it
is here that these three models d
iffer. In Model 3, as in Model 2, the probability of a connection
depends on the positions that it connects and on the lengths of the English and French strings.

In Model 4 the probability of a connection depends in addition on the identities of the French

and
English words connected and on the positions of any other French words that are connected to the
same English word.

Models 3 and 4 are deficient, a technical concept defined and discussed in Section 4.5. Briefly,
this means that they waste some of the
ir probability on objects that are not French strings at
all. Model 5 is very much like Model 4, except that it is not deficient.


(Brown et al
,

1993)

As can be seen from the above direct quotations, model 1 is the simplest with no restraints for alignment

other than the connections having to be in aligned sentences. In model 2 an improvement is derived by
working with the linguistic assumption that words found at the beginning of a sentence in one language
are more likely to align to words at or near the b
eginning of the sentence in the target language. The
general pattern seems to be that the more linguistic information encoded in the models the better defined
the translation model produced. As explained in section 3.3.1 the translation model, and thus the

implied
rule set, can be further enhanced by the use of n
-
grams.

3.2.4 Final Words on the Single Word as the Unit of Translation

Of all the architectures reviewed that use single words as the unit of translation none of them, with the
exception of a pure
word for word translation, can really claim to make exclusive use of the word as unit
of translation. The Hutchins & Somers (1992) description of early direct approach efforts made use of
neighbouring words to make word reordering decisions. Transfer archi
tectures make use of
neighbouring words to make analytical decisions on the syntactical structures exhibited in the source
sentence. And CANDIDE (Brown et al
,

1990) makes use of n
-
grams in its language model to make
decisions on word reordering and word se
nse disambiguation.

All things considered the single word, in isolation, is a pessimal unit of translation that can only produce
intelligible translation where its default translation is a suitable one in the target language and the word
order in the sourc
e text is sufficiently similar to the word order in the target language.
In conclusion
,
w
ord sense disambiguation operations and word reordering decisions cannot be reliably made when
using the single word as the sole unit of translation.

Chapter 3


Units of Translation Literature Review

18

|
P a g e


3.3 N
-
Gram Driven

M
achine
T
ranslation

3.3.1 Statistical Machine Translation & N
-
Grams

Great improvements in the performance of the original CANDIDE (Brown et al
,

1990) proof of concept
have been drawn from expanding the paradigm to using n
-
grams, of varying length, as the

unit of
translation. Perhaps the most notabl
e pioneer in this field is Phil
i
p
p Koehn of The Institute for
Communicating and Collaborative Systems (ICCS) of the University of Edinburgh. Koehn was the chief
designer and developer of
the open source SMT pack
age
Moses (Koehn et al
,

2007) and its predecessor
Pharoah (Koehn 2004).

With CANDIDE the burden of decision making about word sense disambiguation and reordering largely
fell on the definition of the language model. However, when using n
-
grams in the trans
lation model the
translation model takes a more active role in producing the rule set for translation. This is simple to
demonstrate with a few key examples.

In English there is only one definite article

the

. In Italian, there are a variety of different

articles
depending on the gender, number and first letter (vowel or not) of the noun phrase it is a part of. To
further complicate matters there are times when English makes use of the article but Italian doesn’t. To
even further complicate matters Italia
n articles merge with prepositions to form compounds (e.g.
di
+
il
=
del
). All this combines to give English

the

a huge variety of possible Italian translations and a very
complex rule system that makes the final distinction.

When using n
-
grams in the t
ranslation model many of these problems
get

resolved. The English n
-
gram
the boy

lets the translation model know which gender and number the noun phrase has. The English n
-
gram
on the table

lets the translation model know not only what gender and number th
e noun phrase has
but provides the preposition that will be merged with the article. All of this results in more stable
probabilities in the translation model and takes a huge burden off the language model. However, these
observations only hold true for th
e kind of short range dependencies shown in the examples given.

Koehn (2005) has done extensive experimentation with all the individual language pairs in the Europarl
corpus (110) and evaluated the results automatically using the NIST implementation of the

BLEU
automatic evaluation algorithm (Papineni et al
,

2002). They withheld a common 2,000 sentence test