Department of English

fishhookFladgeInternet και Εφαρμογές Web

13 Δεκ 2013 (πριν από 3 χρόνια και 3 μήνες)

424 εμφανίσεις



Masaryk University

Faculty of Arts


Department of English

and American Studies


English Language and Literature


Veronika Hradilová



The Use of Software Programmes for
Translation Support


Bachelor

s Diploma Thesis



Supervisor: Ing. Lucie Šnytová

2011














I declare that I have worked on this thesis independently,


using only the primary and secondary sources listed in the bibliography.



……………………………………………..

Author’s signature
















I would like to thank Ing. Lucie Šnytová for her guidance,
help and valuable advice.

I would also like to express my thanks to 15 participants who took part in the survey evaluating the
quality of machine
-
translated texts:

Karel Tlusťák

Irena Hradilová

Jana Urbanová

Magda Juráňová

Milan Skácel

Pavel Mikulenka

Tomáš Koláček

Ondřej Pumprla

Petr Švásta

Pavla Matochová

Eva Rathouzská

Katarina Černá

Ondřej Dočkal

Martin Hostinský

Gabriela Řezníčková



Table of Contents

1

Introduction

................................
................................
................................
....

1

1.1

Thesis Outline

................................
................................
.........................

2

2

About
Translation Programmes

................................
................................
......

3

2.1

History

................................
................................
................................
.....

3

2.2

Use
................................
................................
................................
...........

6

2.2.1

The Reason Why

................................
................................
................

6

2.2.2

Effectiveness of Machine Translation

................................
................

7

2.3

Main Principles

................................
................................
.......................

8

2.4

Machin
e Translation Approaches

................................
.........................

12

2.4.1

Rule
-
based Machine Translation

................................
.....................

1
2

2.4.2

Example
-
based Machine Translation

................................
...............

13

2.4.3

Statistical Machine Translation

................................
........................

14

2.4.4

Hybrid Machine Translation

................................
............................

14

2.5

Future Prospects

................................
................................
....................

14

3

Google Translate

................................
................................
..........................

16

3.1

Histor
y

................................
................................
................................
...

16

3.1.1

Chronological Order of Language Options

................................
......

17

3.2

Basic Principles of Google Translate

................................
....................

19

3.2.1

Translation mistakes and oddities

................................
....................

20

3.3

Statistical Translation

................................
................................
............

21



3.3.1

Alignments

................................
................................
.......................

23

3.4

Usage of Google Translate

................................
................................
....

26

3.4.1

Browser Integration
................................
................................
..........

26

3.4.2

Android Version

................................
................................
...............

26

3.4.3

iPhone Version

................................
................................
.................

27

3.4.4

Go
ogle Translation Toolkit

................................
..............................

27

3.4.5

Google Translate Client for Windows v1.1

................................
.....

29

3.5

Future Prospects of Google Translate

................................
...................

29

3.5.1

Google Translate and the Future of Voice

................................
.......

29

4

Bing Translator

................................
................................
.............................

30

4.1

Main Principles of Microsoft Translator

................................
...............

31

4.1.1

Microsoft’s Statistical MT Engine

................................
...................

31

4.2

Usage of Microsoft Translator

................................
..............................

35

4.2.1

Polyglot


an Elegant Windows Phone Translator Application

......

36

4.2.2

Kywix Babilia HD


a Highly Functional Application for iPad

......

36

4.2.3

Windows Live Messenger Translation Bot

................................
......

36

4.2.4

Microsoft Translator Installer for Microsoft Office

.........................

37

5

PC Translator

................................
................................
................................

37

5.1

PC Translator 2010

................................
................................
...............

38

5.1.1

Main Advantages

................................
................................
.............

38

5.1.2

Translator
-
friendly System

................................
..............................

39



5.1.3

Final Assessment
................................
................................
..............

39

5.2

Usage

................................
................................
................................
.....

40

5.2.1

Double
-
sided and Multidisciplinary Dictionary

..............................

40

5.2.2

Text Translator

................................
................................
.................

40

5.2.3

Website Translator

................................
................................
...........

40

6

Comparison of Single Translations

................................
..............................

41

6.1

Measuring Translation Performance

................................
.....................

41

7

Practical Part

................................
................................
................................
.

43

7.1

The Survey

................................
................................
............................

43

7.2

Evaluation of the Survey

................................
................................
.......

44

7.3

Main Drawbacks

................................
................................
...................

47

8

Conclusi
on

................................
................................
................................
....

49

Résumé


................................
................................
................................
......................

54

Resumé


................................
................................
................................
......................

55

Appendices




1


1

Introduction

In today´s modern world, a need for automation has become an everyday issue.
There is no wonder then that it affected
a field of translation to a great extent as well.
Currently
,

translation industry is searching for new developments and impro
vements in
this field. It becomes more necessary to make the translations as professional as
possible and translate large pieces of text within a reasonable period of time

simultaneously
, better to say in the shortest time possible. The best way how to ach
ieve
this
, is to make use of the so
-
called machine translation (MT further in the text) which
substantially shortens the time needed.

MT
in th
e

text
refers to online translation
services, namely Google Translate and Microsoft Bing
, which are widely
available
online,

and a commercial PC Translator. But is it really appropriate

to employ MT and
still maintain the same

quality? This proves to be

a

difficult

question to answer. There
are professional translators who do not seem to
be convinced of

the qua
lities of MT
software on one hand, but on the other hand there are people who would use MT t
o
have the text translated
rather than

not use the machine translation

software

at all.

The thesis tries to describe this issue from the very beginning, providing
the
overview of
the whole field

and then trace some of the widely available MT tools
operating on the Internet

(i.e. Google Translate and Bing Translator)

as well as th
e
commercial one

(i.e. PC Translator)
. The thesis wants to argue that if working skillfu
lly
with the MT software it is possible to make it a full
y
-
valued source of information
which will pr
ovide its user not only with

a
correct tra
nslation equivalent but with a

correct
translation of a whole sentence.

2


The paper also wants to
discuss
an idea
that MT tools could be widely used in
translation practice of the future.

The practical part of the thesis focuses on practical use of MT, which is
illustrated on several translations of authentic texts covering different topics. Every
single translation
is evaluated and
its quality is then rated to find out which is the most
appropriate and of the best quality.

The main aim of the thesis is to respond to thesis question
s

which have been
stated
. The first question concerns the fact whether machine transla
tion is able to
achieve similar quality as human translation. The second question examines which area
each MT service is suitable for.

1.1

Thesis Outline

Theoretical part of the
thesis is divided into
four

main chapters and several
subchapters in which it pre
sents the entire topic. It describes the translation issues
starting with the historical background, moving to MT usage, future prospects and
visions and enlisting the most availabl
e
MT programmes

which can be

used by every

translator
as
well as a layman.

The p
ractical part of the thesis cove
rs several examples of computer
-
translated
texts which were taken from

various
information
sources
. I would like to illustrate the
limits and highlight the advantages of each MT tool used.


3


2

About Translation Programme
s

In this chapter the paper
presents

an overview of the gene
ral facts about
translation applications

and provides
a summary of
its history, curren
t usage of
translation software
, the basic principles which they operate on and finally, future
prospects and
possible usage are mentioned in this chapter.

First and foremost, it is vital to mention what the

concept of

machine translation
stands for. Machine translation

means that

a text

is

translated by
means of
a software
translation tool which transfers the so
urce text into the target language without
intervention

of
a
ny

human translator.
This enables the human translator to translate long
passages of text in much shorter time while maintaining
more or less
the same qu
ality,
which is the main aim of computeriz
e
d translations. The translation practice, however,

works differently as the translated text needs post
-
editing in order to be published. The
computer software only provides a rough skeleto
n of the translation which requires

necessary alterations. In total,

howev
er, the time spent by correction of the translated
text is considerab
ly shorter.

2.1

History

The history

of machine translation
, though short as it
only
covers
approximately

60 years, could be di
vided into several
stage
s
thus allowing
to trace

the development of
translation programmes within each decade.
Given

the fact
that history is not the crucial
point of the entire thesis, it
would be appropriate

to provide only a brief overview of
historical milestones which
have
changed and shap
ed the
de
velopmen
t

of machine
translation

until present days.

4


T
racing the very first ideas of automating translation process,
it will take us back
to the seventeenth century when Leibnitz and Descartes, the great philosophers,
designed a simple code which could be

used when linking words within different
language. No further development was done so their code remained purely theoretical.

In

the 1930s Georges Artsrouni (
French
-
Armenian) and
,

more significantly,
Petr
Troyanskii (
Russian) came with an innovative idea

“proposing not only a method for an
automatic bilingual dictionary
, but also a scheme for coding interlingual grammatical
roles and an outline of how analysis and synthesis might work”

(Hutchins
,

2000
-
1
).
Unfortunately, Troyanskii and his ideas remained u
nknown until the 1950s when the
real development of computer
s

as well as machine

translation started.

In the 1950s the computer began to be thought
of
as a vital aid for computerized
transl
ation, which was suggested by

American researcher Warren Weaver. H
e
highlighted “proposals based on the wartime successes in code breaking, the
developments in information theory and speculations about universal principles
underlying natural language” (Hutchins
,

2000
-
1). A few years later vivid collaboration
among Americ
an universities was established. The most flourishing demonstration of
cooperation was set up between Georgetown University and IBM. Despite using a fairly
“restricted vocabulary of 250 words and just 6 grammar rules” (Hutchins
,

1991) it
served as an impor
tant impulse for other univ
ersities and research institute

projects all
over the world.

The early systems basically
consisted of

dictionaries, mainly
bilingual ones.
The
system provided the user with
several equivalents of a

word and tried to implement th
e
word within a sentence. However, soon it became evident that the system should
operate using certain linguistic rules to give more precise outputs.
Despite this
promising development
, researchers became disillusioned as they encountered
5


“semantic
barriers” (Hutchins
,

2000
-
1) which were simply impossible to
overcome
.
Moreover, in 1964 the US government sponsors grew sceptical towards slow
improvement and the Automatic Language Proces
sing Advisory Committee, so
-
called
ALPAC, was established. The comm
ittee issued the (in
)famous

report stating that “
there
is no immediate or predictable prospect of useful machine translation” (ALPAC
,

1966).
For
ALPAC
,

there was no point

in continuing with the research as the translation was
not

as accurate as

human trans
lation and furthermore, it was twice more expensive.
However, the report also proposed that
it should be continued with development of
these translation tools.


The advancement then

flourished
especially
in Germany, France and Canada
.
With increasing glob
alisation and high translation demand in Canada, France and Japan,
the need for co
mputerized translation programmes

grew bigger.

Easier access to computers in t
he 1980s caused the MT applications

for
commercial use
to become more available on the market.
New operational and
commercial systems were developed and promoted. Systran and Logos were among the
most successful ones. During this research period of the 1980s an important strategy
was applied. This strategy “w
as that of indirect translation via intermediary
representations, sometimes interlingual in nature, involving semantic as well as
morphological and syntactic analysis and non
-
linguistic knowledge bases” (Hutchins
,

2000
-
1).
Significant projects were develop
ed including even two multilingual ones.

In the 1990s, three innovative approaches appeared. Firstly, an approach using
purely statistical methods (Candide system developed by IBM), secondly, the method
called computer
-
based translation (Japanese groups u
sed this methods for the first time)
,

and thirdly,

an approach based on

the research “on speech translation

involving the
integration of speech recognition, spe
ech synthesis and translation mo
dules” (Hutchins
,

6


2000
-
1). The most significant aspect of the re
search in the 1990s was the focus on
practical development that would help human translators.

Moving to the 2000s, it is possible to see
a
great boom in using MT
applications
.
Moreover, the fastest improvement takes place in the software localization. The

Internet
became an influential media and new websites such as Babel Fish by Alta Vista or
Google Language tool
s appeared; both of them operating

on Systran technology.
There
are ma
jor advancements seen in the field of MT nowadays. The research mainly
con
cerns statistical
-
based translation and example
-
based MT. There is an aim to
automatically translate Internet websites. The researchers try to combine morphological
and syntactic knowledge as well as statistical and example
-
based approaches to develop
a sy
stem which would make the translators´ work as easy and enjoyable as possible.

2.2

Use

2.2.1

The Reason Why

Regarding the use of software systems

for translation support
,

a crucial question
has to be asked: Why should we use and be interested in MT tools?
There are

several
reasons why to do so:



Firstly, there is a need to translate great amount of texts within a very short
period of time. Translation made by
translation

software can be completed in
much shorter time.



Secondly, companies tend to invest low money to
have a text translated
.



Thirdly, the text

is often highly technical.

7




Fi
nally, as John Hutchins states
,


computers are consistent, but human
t
ranslators tend to seek variety”

(Hutchins, 2005).
This practically means that
software sticks to the same terms th
at are used all the time, which is required as
far as the techn
ical texts are concerned. On the other hand, a human translator
tends to use different expressions with the same meaning.

Keeping this list of reasons in

mind, then why
are
the MT software syt
ems

not
used
in common tr
anslation practice more often?


Because computers do not produce
good translations, some people think that t
hey are
of
no use at all to anyone”

(Hutchins,
2005). They do not, but there are certain circumstances when the best quality is not

a

crucial factor.
Under these circumstances
, automated translation is used much more
widely.

2.2.2

Effectiveness of Machine Translation

When evaluating the usefuln
ess of MT we should bear in mind the fact whether
the translation will be used personally or publicly.
Generally, MT software is very likely
to produce a translation of sufficient quality without being interfered or adjusted by a
human translator. This is
applicable to general and technical texts. However,
a
more
critical look should be taken when translating culture
-
related texts concerning
philosophy, literature, sociology and other fields. When having these texts translated by
computer so
ftware, the fina
l ve
rsion needs thorough editing

and proof
-
reading. The
same, however, applies to human translators


if they do not have proper cultural
knowledge, the result is unlikely to be of a good quality.

MT tools are often employed by large companies which want
their translations
to be completed in a very short time and
simultaneously,

at a reasonable cost. Having a
8


text translated for one´s own purposes,

MT works perfectly and provides a substantial
and satisfactory result.

Obviously, MT tools are prone to mist
akes and there are certain limits to the
translation produced by MT software. Not surpri
singly enough, they

are very similar to
the mistakes made by human translators.
Both MT software and a human translator fail
in the same
situation
. When the human trans
lator has insufficient knowledge of either
the source or target language or the translation system has not been programmed to
perform well in both languages. The failure may also occur in case where cultural
background of a translated subject has not been
specified and caused problems in terms
of its translation.

On the other hand, there is a chance for improvement both for the human
translator and computer

operated

software
. Obviously t
he human translator gains more
experience by t
ranslation practice

and the
software by improving and “updating
dictionaries and grammar by users and developers” (Hutchins, 2005).

2.3

Main Principles

Every

language is in constant evolution.
Given tha
t
, how can a computer
programme

keep up with the changes in language and produce a translation
that would

be
of an acceptable quality
? How can it cope with
all the aspects in relation to syntax,
morphology, grammar and semantics of a language
? Actually, every translation
software progra
mme adopts knowledge and rules which are used by developers to
model the language. This is usually done by analyzing enormous number of texts which
the rules could be drawn
up
on. Throughout the process of software
compilation
, the
developers bear in mind t
hat the programme should imitate translation “behaviour” of a
9


human
translator. However, language represents

a
highly
complex

system
, which makes
it even more difficult to construct a
translation
programme which would function
accurately and reliably.

“Th
e major problems of all MT systems concern the resolution of lexical and
structural ambiguities


(Hutchins, 1994).
The ambiguities causing the translation
programmes to have problems with translating

the text are illustrated below. In this
case, the different meanings of the word
kurz

were recognized correctly.


Kurz se konal.

The course took place.


Kurz padl.


Rate fell.

Here is another example of ambiguous sentences regarding the structure.


Ja
n miluje Lucii.

John loves Lucy.


Lucii miluje Jan.

Lucy loves John.


Due to the fixed word order of the English sentence, the software is not able to
distinguish a different meaning of these two sentences
.

Despite these and many other difficulties, the

translation programmes in general
have
made a great progress since the time
of its creation

and

currently they have
become good helpers when dealing with texts written in foreign languages.

Getting to the main topic of this chapter, it is vital to describe the general
principles according to which the translation software operates. The chapter deals only
with written forms of text which are then transferred into another language. It does not
concern
an
oral form which would be automatically translated into a text in foreign
language. But there are future visions of translating the speech and transferring it to
a

written document.
However, such software takes a lot of time and effort to be
comp
leted.

10


To make the process more structured and

transparent, single stages are described
here
:




T
ranslation software
comprises

words and rules according to which the
words are then combined into s
entences and

paragraphs
which result in the
whole text. Durin
g the first stage, the source text is decomposed into individual
words to which appropriate equivalents are searched for in the target language.

The software
usu
a
ll
y

tries to keep the original layout

(PC Translator proved to
maintain the layout

as the only

translator examined in the survey
)

of the text so
that it would be able to produce a target language text in the identical layout.






Each translation tool usually

contain
s

a dictionary which is important for
analyzing and translating the sentence and its

components, defining gender and
semantic classification.

Basically, all the word forms should be put into the dictionary, which
means that it should include all possible varieties of a single word. It may be
illustrated on an example of a Czech verb
chod
it:
chodí, chodil, bude chodit,
chodíš, chodila, chodíme
etc. This process of inputting word forms into the
dictionary is very often omitted. Morphological decomposition is relied on in
most cases. The word is reduced into its canonical form which is the most basic
or standard form of an expression.
Then that ba
sic form is used when looking up
in a dictionary and grammatical information is added to the actual word form,
for example
chodíme


is assigned 1
st

person plural.




Another fact which is necessary to be taken into account is the sentence
structure. At firs
t, the developers of translation programmes thought that the best
idea how to translate whole

sentences was to translate it word by word.
Unfortunately, this was soon afterwards found out as not functioning because
11


different languages have different langua
ge structures and a word can convey
several meanings. The translation which resulted from this process was actually
a set of words which did not relate to each other and therefore the whole
sentence was completely chaotic and lacking the meaning. So there
was a need
for the translation software to get equipped with a grammar tool which would
determine each word its role within a sentence.

Determining the role of a word is
absolutely crucial for knowing which words combine together or which
combination
s

of w
ords are possible or have to be omitted. Only by adopting
these rules it is possible to produce quality translations.

Words are not only dependent on the context but also on the relationships
between sentences, which is another necessary aspect to conside
r. Translation
programme needs to know whether the subject of a translated text is a person or
a machine, for example. Then it is much easier to assign a
proper pronoun to the
particular subject.




A
fter a

word is
associated with the meaning it conveys and

the
grammatical characteristics is assigned, the suitable translation may then be
chosen. Judging from the source language sentence structure
,

the translation
software is then able to build the sentence in the target language. As the
translator observes t
he grammatical rules of the target language, the resulting
sentence may then
differ from the source sentence.




The layout information should be available throughout the whole
translation process. Then the translated text looks very much the same as the
source text because the layout of the text has been preserved. This means that
the information in bolds should app
ear in bolds in the target text as well.


12


2.4

Machine Translation Approaches

This cha
p
ter deals with the approaches to
wards

machine translation. Generally,
MT uses a method whose basis is grammatical rules. By this it is meant that the
translation of a word f
unctions according to linguistic rules. The system finds the most
suitable equivalent in the target language and this particular equivalent will then replace
the word in the source text. The main concern which arises here is the fact that the
target text d
oes not, in some cases, sound very natural. There are four main approaches
to machine translation which are briefly listed below.

2.4.1

Rule
-
based Machine Translation

This system involves a set of rules which are called grammar rules, lexicon and
software
programmes which process these rules. The rule
-
based MT was the first
technology

which was used in connection with machine translation. The rules were
composed in accordance with the linguistic knowledge gathered with a great help of
linguists. During the
translation process, there are several stages: “syntactic processing,
semantic interpretation and contextual processing of language
” (Robin, 2009).

One of the great advantages is a deep analysis which is carried out at a syntax
and semantic level. However
, the disadvantage is that a
n

extensive number of rules and
a
great amount of linguistic knowledge need to be applied to cover all the language
aspects
.

2.4.1.1

How Is the Translation Done

The translation using the rule
-
based translation system is done by means of

pattern matching of the rules. The system tries to avoid unsuitable rules. General
13


knowledge of

the

world is needed to avoid misinterpretations and ambiguation. Context
knowledge may also be applied to
identify single word classes so that the translated
s
entence makes sense in that particular situation.

A knowledge representation covers
both the “knowledge base and interference techniques. Interference techniques apply
interference rules to derive new sentences from the knowledgebase” (Robin, 2009).

2.4.2


Exam
ple
-
based Machine Translation

The basic principle of the example
-
based machine translation is to reuse some of
the words which have already been translated and make them a basis for the new
translation. The process of translating by means of example
-
based
translation can be
divided into three stages:



Matching stage


in this stage suitable words, which would contribute to the
output, are being looked for “on the basis of their similarity with the input”
(Robin, 2010). The process of sequ
ence comparison
whe
n the “input and
examples
can be matched by comparing character by character

is called
sequence comparison
” (Robin, 2010).



Alignment
-

usually identifies which parts of the parallel translation will be
reused. The process is automated and is carried by us
ing bilingual dictionaries
or by comparison with other translations.



Recombination


is the last step in this process. Recombination
assures that the
reusable parts identified during the alignment phase are combined in a
legitimate manner. It usually
“takes source language sentences and a set of
translation patterns as inputs and produces target language sentences as output

(Robin, 2010)
. Recombination is fully dependable on the previous two stages.

14


2.4.3

Statistical Machine Translation

Statistical machine

translation produces translations relying on statistical
metho
ds, which are based on currently existing multilin
gual corpora.

To make statistical
MT work, a corpus containing at least 2 million words

for a specific domain

is needed
.
For general language e
ven more words and sentences are necessary to input.

Where those highly specialized corpora are available, the results may

be
overwhelming. However, they

are unfortunately very rare. But on the other hand, the
accuracy of statistical MT tools is continual
ly being improved.

2.4.4

Hybrid Machine Translation

Hybrid MT
basically
combines the best aspects of both the statistical and rule
-
based translation systems.

It simply works in the way that

t
he translation is done by
means of a rule
-
based MT and statistics
is

th
en applied to verify and correct the output.

2.5

Future P
rospects

What the users expect when using various Internet services is information
written in whatever language. Therefore what a user seems interested in is “integration
of inf
ormation retrieval, extra
ction and summarization systems with translation”
(Hutchins, 2005).

According to John Hutchins
,
an

authority on machine translation systems
, the
future of MT is clear:

there will be

“fewer pure MT systems (commercial, on
-
line or
otherwise) and many more computer
-
based too
ls and applications” (
2005) which will
include the automatic translation tool just as their component. Translation software
will
become available for all people inte
rested in its usage and anyone will be able to access
15


it from his/her own computer or any other device used in everyday lives, such as TV
and mobile phone. This, of course, does not mean the end of MT systems but it would
be “a demand
-
led expansion of the
provision of translation software which is more
accessible and usable in the information society” (Hutchins, 2005).

MT translation as well as the human translation will be both used in those cases
when the translated text should be of sufficient quality i
n order to be published. MT
translation is

much more

effective as far as the costs are concerned and where the
translations

of (usually boring) technical documentation or manuals for software
localization, whi
ch are frequently of a repetitive pattern
, are
needed.
On the other hand,
the human translations remain “unrivalled for non
-
repetitive linguistically sophisticated
texts e.g. literature and law, and even for one
-
off texts in specific highly specialized
technical subjects” (Hutchins, 2005). Additionaly,

it

is

well possible that there would be
a great demand for human translators who have not used the translation software before
due to poorer quality of the translation software output.

Unlike this, the texts where the translation quality is not crucial f
or the overall
understanding of the text, the translation software is used more frequently. The text
s

concerned may be highly technical or
include
scientific texts which are intended to be
read by
a
small number of people just because they contain
a
great
amount of facts and
background information. Translation software is nowadays sufficiently evolved to be
able to translate
personal letters while maintaining reasonably high quality

of the
translation
. Contrastingly, there will still be
a
place for human tr
anslators in business
correspondence for instance, where the nature of the information is too delicate and
sensitive.

As far as the translation of spoken language is concerned, “there can be no
prospect of automatic translation replacing the interpreter o
f diplomatic exchanges”
16


(Hutchins, 2005).
It
appears very unlikely that automatic translation of speech could one
day be extended “to open
-
ended interpersonal communication” (Hutchins, 2005). It
is
now
apparent that
the human as well as machine translation

remain in balance and
harmony.

MT

systems are currently opening
new fields of activities w
hich the human
translation has n
ever penetrated before. These activities integrated within the scope of
software translated texts are as follows: “the production of draft versions for authors
writing in a foreign language, the real
-
time translation of television subtitles, the
translatio
n of databases, the on
-
line translation of webpages etc.” (Hutchins, 2005).

As global communication networks are expanding very quickly and as the
practicability and effectiveness of not
-
the
-
best
-
quality texts will become more
recognized and respected by
public, development of such new applications will be in
demand. The perception of the field of human translation technology which is
concerned rather with
bilingual communication aids than the translation systems is
therefore really vital.



3

Google T
ransla
t
e

3.1

History

Google Translate is free statis
tical machine translator which is provided by
Google Inc. It is able to translate
single words, sentences,
and sections

of texts, whole
documents and webpages into whatever combination of languages supported by the
programme. The translator supports 57 languages in total.
It is a helpful tool which
17


enables its users to understand various texts regardless of the lang
uage th
ey were
written in.

The service was fi
rst introduced on 28 April 2006.
Throughout the process of
developing
the
Google Translate, SYSTRAN based translator was used as a helpful aid.
This system is also used by other translation services such Yahoo, Babel
Fish etc.

3.1.1

Chronological Order of Language O
ptions

Google Translate covers 57 languages in total. The history of chronological
order of adding new language options to become
a
part of the translation service may be
divided into 23 stages. To cut the long story short, only the “milestones” will be
included in this language enumeration.

In the first few stages, mainly European languages and their combination
s

came
into existence.

They were as follows: French, German, Spanish, Portuguese, Dutch and
Italian. Then the translator started to cover Far
-
East language
s

such as Japanese, Korean
and a simplified version of Chinese. In the 5
th

stage

launched in April 2006, translation
from E
nglish to Arabic and vice versa came into existence; Arabic languages were
followed by Russian being developed in the same year, and Chinese in 2007. This stage
is of higher importance as the service includes translation from simplified version of
Chinese
into its traditional form.

In the 8
th

stage launched in October 2007, Google Translate machine translation
system was used when translating from and into all 25 language pairs.

In the next stage, Hindi and its translation from or into English appeared. T
he
10
th

stage is mainly important for the Czech speakers of English as English
-
Czech and
Czech
-
English pair
s were

introduced together with other European languages such as
Bulgarian, Croatian, Danish, Finnish, Norwegian, Greek, Polish, Romanian and
18


Swedish
. All these were launched in May 2008. In this stage, translation was done
between any two languages but was made through English, which might then
have
alter
ed

sentence structure of the final translation. Few months later, in September, more
European languages were added e.g. C
atalan, Latvian, Lithuanian, Slo
vak, Slovene,
Serbian, Ukrainian as well as languages of remote countries, such as Vietnamese,
Indonesian

and Filipino. Hebrew was introduced and added as well.

In the 12
th

stage taking place in January 2009, Albanian, Estonian, Galician,
Hungarian, Maltese, Thai and Turkish came to accompany
by now a large

number of
languages. Persian was included as well a
s more exotic language
s

like Afrikaans,
Swahili, Yidish but also other language
s

appeared: Belarus
s
ian, Icelandic, Urish, Malay
and Welsh.

The 15
th

stage is marked by the fact that th
e Beta stage was finished. From

this
period, the users
could

choose
to u
s
e romanization, i.e. to have a

word written in the
Latin al
phabet even though the word used

differen
t
writing system. This applied

to
languages for instance Chinese, Japanese, Russian, Bulgarian, Greek, Thai and many
other
s. When
users want
ed

to translate

fro
m Arabic, Persian and Hindi, a Latin
transliteration
was

available. The text w
as

then translated into the native
script. The text
is possible to be read by a text
-
to
-
speech programme, which is a programme converting
the written text into speech.

In Ja
nuary 2010, Haitian Creole extended the language selection and in April
,

speech programme in Hindi and Spanish was launched. In May,
a
speech

recognition

programme in languages enlisted in the 10
th
, 11
th
and 12
th

stage was introduced and
implemented.

The
19
th

stage was marked by addition of languages such as Armenian,
Azerbaijani, Basque, Georgian and Urdu. Later on, romanization used for Arabic texts
19


was brought in. Then Google Translate started providing phonetic typing fo
r

Arabic,
Greek, Persian, Hindi,

Russian, Serbian and Urdu languages and Latin

was introduced
.
In December 2010, when the 22
th

stage was launched, romanization of Arabic language
was removed. Spell check was added to the system. Text
-
to
-
speech synthesizers from
eSpeak (which is a “
compac
t open source software speech synthesizer for English and
other languages, for Linux and Windows” (
Duddington, 2010
)
of some languages were
replaced by SVOX (
=
a Swiss speech technology company) technology
. Among the
se

languages there are e.g. Czech, Dutch
, Greek, Hunga
rian, Norwegian, Portuguese,
Swedish etc. At

the same stage, the speech
recognition
programme including Arabic,
Japanese and Korean was launched.

The latest stage of the development and implementation of other languages
started in June 2011
and is currently in progress.

3.2

Basic Principles of Google T
ranslate

Google Translate
works on the basis of statistical machine translation systems
and
therefore
,

it does not apply grammatical rules, which is the case of rule
-
based
translation systems. The father
-
creator of Google translate is Franz Josef Och who has
always criticize
d

how ineffective the rule
-
base algorithms were.
Hence

he base
d

his
translation sys
tem on
a
statistical approach and on his own research which was awarded
first price in DARPA
1

competition. Currently, Och himself is the head of Google´s
translation group.




1

DARPA is a United States institution responsible for developing new technology which is then
used by military

forces
. DARPA also funds developing technologies which influence the whole world.

20


When creating a solid base for his translation system, Och decided to use one
bili
ngual text corpus
(or collection of parallel corpora)
counting
more than
1 million of
words and two

monolingual corpora, each counting
more than

1 billion of words.
Knowing

this data,
a
statistical approach is applied to provide the user with a translation

between the particular languages.
To
acquire such an extensive amount of information
and data, Google Translate used the United Nations documents because each document

had

to be translated and publish
ed

in

six official languages
2

of the United Nations.
Th
anks to this, Och compiled a large corpus containing
identical texts in
6 languages.

In the

following statement
Och explains that he

decided
to use

statistical
approach instead of the rule
-
bas
ed

model: “
Several research systems, including ours,
take a
different approach: we feed the computer with billions of words of text, both
monolingual text
s

in the target language, and aligned text
s

consisting of examples of
human translations between the languages. We then apply statistical learning techniques
to b
uild a translation
model” (Och, 2006). He concluded

that the results reached in
research evaluations
were

very

good.

What machine translation also does is that it processes natural language and tries
to define the rules according to which fixed constructi
ons work. The source is encoded
in symbols from which the
translated
target language is then derived.

3.2.1

Translation mistakes and oddities

Due to

the fact that Google Translate uses
the
statistical translation model rather
than
a
model based on grammatical r
ules, it is prone to mistakes which can have a form



2

Among the six official langu
ages of the United Nations

are Arabic, Chinese, English, French,
Russian and Spanish.

21


of nonsense word
s

or
switch of meaning
s
. Selecting a wrong equivalent is usually the
most frequent mistake.

3.3

Statistical T
ranslation

In this chapter
,

the thesis deal
s with the machine translation
approached

from the

statistical point of view presenting the equations according to which the statistical
machine translation works. The equations are

valid for all the languages and t
herefore it
is applicable to the

Czech language as well.

It is possible
to translate a string of English words, marked
e
, into a string of
Czech words, marked
c
, in several different manner
s. Regarding

the statistical MT, it is
presumed

that
every
Czech string,
c
,
is possible to translate
and get an English string,
e
.
Every st
ring pair (
e, c
) is assigned a number
Pr
(c|e)

which is then interpreted “as the
probability that a translator, when presented with
e
” (Brown, 2003) will produce
c

as his
translation.
Pr(e)
stands for probability of the occurrence of the English string,
e,

in the
given language model.
When the Czech string of words,
c
, is given, the translation
system tries to find string
e

which would be
otherwise
produ
ced
by a native speaker´s
.
Errors are minimized by selecting English string
ê

for which
Pr(e|c)

is the most
probable.

This results in the following
equation

(1)
:























“Since the denominator here is independent of
e
, finding
ê

is the same as f
i
nding
e
” (Brown, 2003) to enlarge the product
Pr(e) Pr(c|e)

as much as possible. Then the
Fundamental Equation of Machine Translation

(2)

comes into being:


22


ê

=

arg
max Pr(e) Pr(c|e
)



e

This equation describes the process
when
a human
translator translates a text
from Czech into English. The emphasis is placed on the fact that a translator firstly
understands the Czech lang
uage and secondly, express
es

the meaning in English.
F
ormally, the equation is

appropriate. The conditional distrib
ution is presented as a
large table “that associates a real number between zero and one with every possible
pairing an English passage” (Brown, 2003) and a Czech passage. If the distribution is
proper and adequate, high quality translation may be produced.


Unfortunately it is not possible to examine every single Czech and English pair
because the number of them is simply enormous.
It shapes
, however, a practical
problem, not the princi
ple one.

“Equation (2) summarizes the three computational cha
llenges
presented by the
practic
e of statistical translation: estimating the language model probability (the
language modelling problem), estimating the translation model probability (the
translation modelling problem) and devising an effective and efficient subop
timal
search for the English string that maximizes their product” (Brown, 2003)
.

Particular attention will be paid to the estimate of
Pr(e|c)
.
Pr(c|e)

can be
estimated quite easily, so why it is not possible to reverse the entire process round to get
an e
stimate of
Pr(e|c)
? For better imagination, English and Czech strings of words can
be divided into well
-
formed and badly
-
formed strings. Strings such as
On jde do
knihovny
or
I live in a house

or even more sophisticated sentences are regarded well
-
formed.
Contranstingly
, strings such as
do jde on knihovny
or
a I live house in
are
badly
-
formed. It is

therefore very important that the model for
Pr(e|c)

focuses its
probability on the English strings which are well
-
formed. But it is not that important for
23


the m
odel for
Pr(c|e)

to concentrate its probability on

the well
-
formed Czech strings
because if the probability of
the
well
-
formed Czech strings is reduced by the same
factor, the probability would be liberated over
the badly
-
formed Czech strings and
therefore there would not be any influence on the translation. “The argument that
maximizes some function
f
(x)
also maximizes
pf
(x)
for any positive constant
p
” (Brown,
2003).

The factors of the second equation perfectly cooperat
e.
The translation model
probability concerning the English strings with words being roughly placed in the right
position to represent the Czech words is extensive, regardless of the fact whether they
are well
-
formed or badly
-
formed. The language model pro
bability is very wide as far as
the well
-
formed English strings are concerned, excluding the connection to the Czech
ones. Being put together, they represent an extensive probability for well
-
formed
English strings that can be well
-
explained in Czech.

3.3.1

Ali
gnments

The image of an alignment of two strings representing an object which indicates
a word in the English string for the word in the Czech string was introduced by Brown
et al. in
1990. To make the Brown´s study, which compares the English and French
l
anguages, applicable to the Czech language, equivalent sentences in Czech were
chosen for these illustrative purposes.

The alignments are

graphically depicted in
Figure

1 by line
s which are called connections.

The
1


newspaper
2


is
3



published
4


daily
5



Ty
1



noviny
2

jsou
3



vydávány
4


každý
5


den
6


Figure 1
.

Alignment with independent English words
.

24



The alignment in this figure has
six

connections: the, ty; newspaper, noviny; is,
jsou; etc. The alignment

could

also

be

written as
Ty noviny jso
u vydáván
y

každý den |
The(1) newspaper(2) is(3) published(4) daily(5,6).
The numbers indicate the position of
the words in
the
Czech string to which the words in the English string are connected. As
it may be expected, each alignment here is correct with
certain probability and thus the
translation is suggested appropriate.

Each Czech word is connected to precisely an English word regarding Figure 1.
However, “more general alignments are possible and may appropriate for some
translations” (Brown, 2003). T
his is illustrated in Figure
2

where a Czech
word

is
connected to more English words.

The
1








Půda
1

land
2








byla
2

was
3








územím
3

the
4








původních
4

territory
5







obyvatel
5

of
6

the
7

aboriginal
8

people
9

Figure 2
.

Alignment
with i
ndependent Czech words
.


The alignment is then possible to
be
written down in this manner:
Půda byla
územím původních obyvatel | The(1) land(1) was(2) the(3) territory(3) of(4) the(4)
aboriginal(4) people(5)
.
In general, there may also be such senten
ces where several
Czech words are connected to several English wo
rds as it is shown in Figure 3. The
25


example sentences may be written as
Chudí jsou na mizině | The(1) poor(1) don´t(2,3,4)
have(2,3,4) any(2,3,4) money(2,3,4).

It is apparent here

that the four English words
cooperate to produce the three Czech words.




The


poor


don´t


have


any


money


Chudí



jsou



na



mizině.

Figure 3
.

A general alignment
.


Generally speaking
, an English passage is a concept web which functions
according to the English grammar rules. To express that these words are related but do
not form the entire story, they shape a cept. When the particular passage is translated
into Czech
,

every cept
contributes
some Czech words to the final translation.

Th
e set of English words which are connected to a Czech word in a certain
alignment is called the cept which produces the Czech word. An alignment then sorts
out the English string into a set where the cepts can overlap each other. This is called
the concept
ual scheme. When one or more Czech words cannot be connected to any of
the English words, the conceptual scheme then contains an empty cept from which all
these words resulted.

26


3.4

Usage of Google T
ranslate

3.4.1

Browser I
ntegration

Google services currently inc
lude several Firefox extensions, which stands for
software add
-
ons which have been specially produced for Mozilla Firefox and are based
on web browsers.
This application, of course, exists for Google Translate as well. This
enables the users to get access

to the service just by one click.

Google Translate system was implemented into the latest version of Google,
Google Chrome browser, in 2010. The service GoogleTrans gadget got spread over to
the English Wikipedia website and eleven other wikis which pro
vide their users with
this extensive

service.

3.4.2


Android V
ersion

Google Translate service may also be used in its Android version, whic
h means
that it practically beco
me
s

accessible from the users´ mobile devices.
In January 2010,
the first version took off

and it operates similarly to the web browser version. The
current version encourages Conversation Mode between English and Spanish. This
mechanism enables fluent communication between two native people. All the 53
languages
and voice input of 15 languages

are supported by this application which can
be easily downloaded from Android Market
3

and can operate on Android 2.1 devices.




3

Android Market stands for a software store running online. It was specially developed by
Google for Android devices. It enables the Android users to browse and download various application
programmes.

27


3.4.3

i
Phone V
ersion

The iPhone
s

were not omitted as far as the translation application is concerned.
Google Translate launched a HTML
5 application for iPhone lovers
,

which
was

officially

made

available in February 2011. It can translate a word or a sentence into
more than 50 languages and the voice input is possible in 15 languages. Last but not
least, the application is so evolved that

it is possible to have the translation spoken
out
loud

in 23 distinct languages.

3.4.4

Google Translation Toolkit

Google Translator Toolkit is a web service which has been designed to help
translators edit the translation produced by Google Translate. Thanks to this device,
translators are able to “organize their work and use shared translations, glossaries and
transl
ation memories” (Ruscoe, 2009). The software translates texts of various formats
including Microsoft Word, HTML, OpenOffice texts, Wikipedia articles and others
which may be uploaded. Only the
document
size of up to 1 MB repre
sents a limiting
factor.

The
Toolkit is supported by the Google Translate service and provides the users
with an immediate translation of the uploaded text

or a website whose link is added.
The statistical method is used when translating the text.
The Google Translator Toolkit
is desc
ribed by the Google company as “effort to make information universally
accessible through translation” and as a help for translators to “translate better and more
quickly through one shared and innovative translation technology” (
Google Inc., 2011).

28



Ima
ge 1
.

The screen showin
g the Google Translator Toolkit (Google, 2011).

3.4.4.1

Workflow in Google Translate Toolkit

The Google Translator Toolkit workflow may be depicted in the following way.
The user
s upload a file or choose

an article from Wikipedia or Knol
4

wh
ich they intend
to trans
late. The Toolkit then makes a draft translation of a

document. It cuts the whole
text into single sentences and searches translation databases to find a segment
previously translated by a human translator which are available o
n the

Internet. If there
is the

segment translated, Google Toolkit automatically selects the highest
-
ranked
search result and uses that translation to
´pretranslate´ the sentence with. If there is no
pretranslated segment, Toolkit uses automatic translation to
translate the sentence
without human interference.











The users may then edit and improve the automatic translation. By means of
Translation Toolkit they may also see and check translations translated by other users



4

Knol is a large project operated by Google. The pr
oject includes articles of various topics
written by users.

29


befor
e. Moreover, multi
-
lingual glossaries may be used and machine translation
allusions may be seen. It is possible for the users to share their translations with their
friends by clicking on the share button. When the translation is finished, it can be easily

downloaded to their desktop.

3.4.5

Google Translate Client for Windows v1.1


Google Translate Client represents anoth
er device of how to make

translation

practice
more convenient. A Windows client

is able to translate selected texts contained in most
Windo
ws applications. This system does not provide its users with any special features;
it only makes the translation easier and faster by connection to the Google Translate
system. It is not necessary to install the application. It just
needs to be downloaded
and
opera
ted. It perfectly works on Windows XP, Vista and 7.
The only inconvenience is
the fact that the system is capable of translating short paragraphs counting 500
characters maximum.


3.5


Future P
rospect
s of Google Translate

3.5.1

Google Translate and the F
uture

of V
oice

One of the directions Google Translate would like to move in is the extensive
development and improvement of the voice translation application
, which ha
s already
been
introduced
,

on the Android smartphone.


The technology has already been
i
ntroduced at the IFA electronics fair in Berlin by Google´s chief executive, Eric
Schmidt. The time within which the technology will be improved is ro
ughly estimated
within several

months and the process of translation is claimed to be immediate.


30


It is also pointed out that “technology becomes really exciting when it becomes
invisible” (The Telegraph, 2011). The new improvement in technology will enable the
users to communicate fluently with a person speaking a
nother language.

After pressing
the microphone button and beginning to speak, Google Translate will automatically
translate the part of speech

and read it out loud to the foreign language speaking person
who will respond in their own language and the speec
h will be translated
and spoken
back to the partner.


4

Bing

Translator

Bing translator is a service to translate passages of texts or even the whole
website into foreign languages. The service is supported by Microsoft.
Microsoft
Translator

technology suppo
rts all the
translation
language pairs

which have been
developed by Microsoft Research
5
.

Text related to computers are translated using the Microsoft´s own syntax
-
based
statistical technology.


Image 2
.

The image above represent
s

the front screen of Bing Translator

(Bing, 2011)
.




5

Microsoft Research is a Microsoft research centre established in 1991 for developement of various
computer ideas and their integration in Microsoft products.

31


4.1

Main P
rinciples of Microsoft Translator

The statistical machine translation system was developed for Microsoft´s own
purposes in 2002 and since that time it is mainly used “in the customer support
knowledge base and for post
-
editing software and documentation” (Wendt, 2010).
Five
years later, th
e system became
widely accessible globally at the Microsoft/Bing
Translator webpage. A mechanism for submission, evaluation and approval of a human
quality translation is included in the entire system. These features are then used in both
the following tra
nslations made automatically a
nd engine customization and
opti
mization of machine translation.

4.1.1

Microsoft’s Statistical MT E
ngine

Two decoding systems are used as far as Microsoft statistical translation engine
is concerned. Firstly, a decoder based on sy
ntactically informed tree is used and
seco
ndly, a decoder which is string
-
based. The former uses “a parses building
dependency treelets” (Wendt, 2010) which perform better quality translations of
language where the word order differs. The latter does not require any linguistic
knowledge to be able to function well.

Statistical
model is used in both
decoding systems.


The image below shows all the possible probabilities which are considered by
the decoder assigning each of them particular value which is then taken into account in
the process of decision making.

32



Image 3
.

Descr
iption of probabilities taken into account during a process of decision
making

(visualised in Wendt, 2010)
.


It is apparent that a proper translator cannot work without a continuous
development and up
-
dating its services. Not surprisingly enough, Bing Tran
slator
undergoes frequent changes and training
as well. The translator is then able to provide
its users with up
-
to
-
date terminology, and modern language.


Image 4
.

Continual development of a translation service

(Wendt, 2010)
.

33


As the model picture sugg
est
s
, it is possible to see how the translator is being
continually developed and improved

by training
.

Regarding the method of translation process, various methods and user interface
controls
6

focused on
gathering and submitting human or machine translat
ions and their
edits by humans are included within the scope of
the Microsoft Translator API
7
.

By
means of this collaboration, collaborative translations are produced.

The
n

the humans
can assess these suggestions
and decide on evaluation each suggestion should be
assigned so that such edit can be later used in the following translation. Microsoft
translation service then stores all the submissions, edits and their evaluation online to
become accessible to the users

and a vital part of the translation service.
Not only does
the translation

s
ervice produce a translation, it is also possible to have the translation
delivered in the voice format.

4.1.1.1

Support Knowledge Base

Collabor
ative translations produced

by humans and

machines have been
implemented in a number of places as far as the Microsoft usage is concerned. One such
place is represent
ed by a support knowledge base, which is
a
documentation of all
Microsoft products (manuals, error codes, etc.)

Concerning the know
ledge base,
a
modified version of the Microsoft translation
engine is used. Any internal copy of any knowledge base is open to be visited by just
any support staff all over the world who are allowed to edit articles translated by
machines. To do so, web us
er interface is used. The articles are republished in no time



6

UI (User I
nterface) is a space

where the interaction between
humans

and
machines

takes place.

7

API (Application Programming Interface) is a term in informatics and stands for a certain interface for
application programming. It is a collection of procedures, functions or library units

which can be used by
a programmer.

34


after the support staff initiation. At that particular moment “simple automatic translation
is triggered, using the human edit for any matching sentences, and the current MT
answer for any other

sentences” (Wendt, 2010)
. This procedure occurs every time the
support staff find out any translation made by a machine to appear unclear.

The graph below shows the success rate of texts translated both by humans and
machines
. It becomes apparent that i
n some cases, the quality of machine translated
texts is comparable with that of human translated ones.

It is apparent that since Microsoft contains an extensive amount of translated
texts regarding information technology area, Bing Translator performs bet
ter success
rate thanks to the enormous training data.


Figure 4
.

A comparison of performance between a human translator and a machine

(as
visualised in Wendt, 2010)
.


35


The Czech language is not anyhow omitted and therefore the Czech knowledge
base
becomes machine translated. In October 2009, 2.5% of the E
nglish knowledge
base content was

translated into Czech by humans.
The top 2.5% cover about 50% of
the page displays. In Ja
nuary 2010 the remaining part was

translated using machine
translation.

4.1.1.2

Mi
crosoft Developer Network

The website of Microsoft Developer Network enables the users to edit machine
translated texts. In general, any user is allowed to submit them. Dedicated users whose
edited translations prove high quality are given trusted status;

their edits are of
a
great
approval and are accessible immediately to all the users.

Collecting the edits produced by human translators is among other ways of how
to enlarge the supply of data. Apart from this, the aim to collect new data is focused on
o
ther sources such as selected websites, manuals compiled by Microsoft, dictionaries,
data provided by governments, Wikipedia articles, art
icles in newspapers and other
sources.

4.2

Usage of Microsoft Translato
r

This chapter includes various Microsoft translati
on app
lications used

by
computer and phone

users.

Microsoft services are continually being updated so that the
users gain the best translations in the languages provided and up
-
to
-
date information.

36


4.2.1

Polyglot


an Elegant Windows Phone Translator
A
pplicatio
n

Polyglot is
a

Microsoft phone application

completely free of charge

which is
currently on the market.
Its usage

is especially remarkable because it enables translation
of texts on the go. It is also claimed to be “one of the best looking Windows
applications in the market” (Dendi, 2011)

directly by its developers
.

4.2.2

Kywix Babilia HD


a Highly Functional A
pplication

for iPad

Kywix Babilia HD stands for another highly functional and convenient
application which can be easily implemented into every iPad and quickly optimized for
whatever touchscreen size. Its function is similar to the previously mentioned
application.

Moreover, it has a connection to Dropbox, Facebook and Twitter, which
makes is much easier for the user to share the translated content.

4.2.3

Windows Live Messenger Translation B
ot


A new translation bot has recently been introduced by Microsoft. This
Messenge
r Bot can simply translate all the user´s sentences.
The user can either join
one
-
to
-
one conversation with the bot, or can even invite friends speaking in different
languages and the bot is capable enough of translation into user´s own language.
Col
loquial

expressions and slang is

not widely recommended, as the bot could be
caused problems
regarding

the
final output.

37


The translator bot is able to translate into several Asian languages (Japanese,
Korean, Chinese etc.), European languages (Danish, Dutch, Ger
man, French, Italian
etc.) among which the Czech language has been included as well, and a number of other
extensively used world languages.

4.2.4

Microsoft Translator I
nstaller for Microsoft Office

Translating the entire Microsoft Office documents has never
proven to b
e easier
.
Microsoft Translator suitable for translation of the Office documents translates words,
phrases, sentences, even the whole documents without having to leave this application.
The process is done through the Research task pane and funct
ions well regarding both
the Microsoft Office 2003 and 2007.


5

PC Translator

A far as the machine translation programmes are concerned, it

would seem
reasonable

to

mention an effort of Czech programmers and developers. The last
computer translator which th
e thesis focuses on in a more detailed way is the PC
Translator

developed by
Rostislav Janča.


The PC Translator is the first software tool for translation from and into different
foreign languages which was launched on the Czech market. It does not transl
ate the
text in a

complex way; instead it divides

s
entence
s

into separate words, searches the
dictionary and provides

the user with several suitable translations of a particular word.
The results
are

displayed in an
organized table which serves

as a certai
n help when
compiling the final translation.

38


Despite it being a commercial product, the PC Translator is a well
-
established
translation tool on the Czech market and ranges

among the most successful ones

available there
.

5.1

PC Translator
2010

The 2010

PC Translator version works offline, which
presents its main
advantage in contrast to
the two previously described translation services which need
Internet access for the
ir operation.

The translator is competent

enough to translate into
Czech from other f
oreign languages or vice versa. The dictionary used by the PC
Translator is stored directly in the computer, where the programme has been installed.

The very basis is formed by a
collection of dictionaries. Languages such as
English, German, Russian, Span
ish and others range among the most advanced. But
there are more language
s

in

t
he total offer, such as Polish, Hungarian, Swedish,
Esperanto and Latin. The languages enumerated
in the

first

place
, English and German,
have the largest amount of phrases (850

000) and collocations (3,5 mil). The sizes of the
r
emaining ones differ
.

5.1.1

Main

Advantages

Concerning the PC Translator, among
its

advantages there is an i
ntegrated
trans
lator of Internet websites.
When translating a text, the user may select
the type of
preferable dictionary used for
the particular
tr
anslation
.

39


5.1.2

Translator
-
friendly System

The PC Transla
t
or seems to be a user
-
friendly sytem

providing
good range of

vocabulary

and offering

a number of options of how to search for and process sepa
rate
words and collocations. There is
a

possibility of
listen
ing

to particular words being
pronounced by native speakers. Example database of how each expression is used in the
sentence

is also offered.

A vocabulary manager, as this tool could be calle
d,
provides the users with well
-
organized editing, import and export of the dictionaries and input words. As far as the
complex view of the sentences is concerned, the result
(especially when translating in

direction into Czech)
is more or less insufficient a
nd serve
s

only as an orientation point
in the translation as the translated text needs further editing
,

regarding spelling and
grammar. Obviously it is not generally recommended to use the translated text without
any
further
editing. However, when having
a

website machine
-
translated,
the user is
very likely to understand the key information conveyed by
a
technical or specialized
text.

5.1.3

Final Assessment

Apparently, the PC
Translator has been

improved over the years and got closer
to the worldwide spread onli
ne translation services
.

T
he PC Translator offers an
opportunity to work
without the Internet access.

On the other hand, in comparison with
online translation services the system is not upgraded frequently
,

whic
h may cause it to
be rather old
-
fashioned in
specific fields.

40


5.2

Usage

This chapter briefly enumerates the most important tools the PC Translator
offers
to
its clients.

5.2.1

Double
-
sided and Multidisciplinary Dictionary

As the title itself suggests, the PC Translator provides the users with
quite wide
rang
e

of words and collocations. The programme is able to translate texts from and into
Czech, and vice versa, producing a translation of reasonable quality.
An option of
having the words read by a native speaker appears as an advantage. Among other
advantages

there is the fact that

users may edit and add new vocabulary to the
dictionaries.


5.2.2

Text Translator

The translator uses a
word
-
by
-
word translation method so the result is not
always perfect. However, it is sufficient enough for the very understanding of
the text.
The user can set how many different meanings of a word should be displayed and
according to which prioriti
es. The final translation may be

revised by a spellchecker
tool in order to minimize the spelling mistakes.


5.2.3

Website Translator

Website Tra
nslator works on a basis similar to the Text Translator. As far as the
resulting translation
is concerned, it is necess
a
ry

to bear in mind that
the translation
lacks the

required
quality
.


41


6

Comparison of Single Translations

This section deals with the eva
luation of translated texts which are the outputs of
different translation programmes. So far, the results have been evaluated by human
translators. But the fact that the resulting texts are not evaluated by human translators
but by a specially developed d
evice, a method which could be presented “as an
automated understudy to skilled human judges which substitutes for them when there is
need for quick or frequent evaluations” (Papineni et al., 2002) presents an innovative
approach. Furthemore, this approach

is cheaper, faster and language independent. This
method is usually opted for because human evaluation of translations produced by
translation programmes is often time
-
consuming, high
-
priced and therefore not cost
-
effective.

Human evaluation
of machine
-
translated texts include
s

a number of translation
features such
as
“lexical, grammatical, semantic, and stylistic accuracy and fluency”
(White et al., 1994). Mostly, the human evaluation is more expensive and a single text
evaluation can last for weeks or even months, which appears to be a real problem
because the effects

of daily changes to the translation systems need to be observed by
software developers. Only then, the good ideas can be distinguished from the bad ones.
As it has already been suggested before, this method is fully able to correspond with the
human evalu
ation.

6.1


Measuring Translation Performance

As the developers themselves propose their key idea, the closer the machine
translation is to the translation produced by human, the better. Generally speaking, there
42


is a vast number of metrics used for evaluatio
n of texts. The list of various metrics
includes BLEU
8

(Papineni et al., 2002), METEOR
9

(Lavie, Denkowski, 2009), GTM
10

(Melamed et al., 2003), CDER
11

(Leusch et al., 2006) or TER
12

(Snover et al., 2006).
The evaluation software measures the translation “clos
eness to one or more reference
human translations according to a numeral metric” (Papineni et al., 2001). Thus, the
actual evaluation is done by means of:



Metric indicating numerical closeness of the translation



A corpus compiled of human reference
translations of good quality

The metric
o
f closeness is interconnected with the word error rate metric which
is used by the speech recognition community
,

and is modified
for various

reference
translations. The metric facilitates moderate differences in cho
ice of
a
particular word
and its placing within a sentence. The key principle of the method

is to use a weighted
average of variable length
phrase matches against the reference t
ranslations” (Papineni
et al., 2
001).

Because of not being provided with eno
ugh space, the thesis will not deal with
any of the metrics described in this paragraph.








8

BLEU stands for BiLingual Evaluation Understudy

9

METEOR reperesents Metric for Evaluation of Translation with Explicit ORdering

10

GTM stands for General Text Matcher

11

CDER is a simple variant of WER (Word Error Rate