Machine Translation at DARPA

addictedswimmingΤεχνίτη Νοημοσύνη και Ρομποτική

24 Οκτ 2013 (πριν από 4 χρόνια και 2 μήνες)

96 εμφανίσεις

Approved for Public Release, Distribution Unlimited

Machine Translation at DARPA

Joseph Olive

Program Manager

Agenda


Pre
-
GALE Programs and Studies


DARPA and the Language Community


GALE Plans


GALE MT Evaluation


GALE Accomplishments


Future Research

1

Approved for Public Release, Distribution Unlimited

Language Research at DARPA


Four Decades of Research


Continuous progress


Limited vocabulary single talker


Speaker
-
independent speech recognition


Large vocabulary


Machine translation


Natural language processing


TIDES and EARS


Great Accomplishments


Need for a New Program

2

Approved for Public Release, Distribution Unlimited

GALE Program Goal

3

Approved for Public Release, Distribution Unlimited

Enable Automated Processes &

English Speaking Soldiers and Commanders to
Absorb & Analyze
All

Incoming Information

In a Timely Manner

Genres



Newswire


Broadcast news


New Groups


Talk Shows

.

.

.

Languages



Arabic


Chinese

.

.

.

Topics



Unbounded

Planning for GALE


The community offered:


More Data


Evaluations


Word Error Rate
-

WER


Bilingual Evaluation Understudy
-

BLEU


DARPA Questions:


What are the applications for the research?


When is a technology good enough?


What is new?


How will progress be measured?

4

Approved for Public Release, Distribution Unlimited

Pre
-
GALE Studies


Main question


how good is good enough?


New MT study


Interpolation between human and machine translation


Analysts as subjects


The birth of Human
-
Targeted Translation Error Rate
-

HTER


HTER is the GALE MT metric

5

Approved for Public Release, Distribution Unlimited

HTER Translation Evaluation

6

Approved for Public Release, Distribution Unlimited

Foreign Language Text & Speech


No. of errors

Accuracy =1




No. of words

Translators

Evaluators

Adjudicator

Human Editors who
conduct comparison

Gold Standard Translation

GALE Machine
Translation

Which is right?

Can it be ambiguous?

Is it an idiom?


GALE Machine
Translation Engine

HTER Editing Example

7

Approved for Public Release, Distribution Unlimited


Machine translation

The statement said that the brothers in the military wing to
regulate Al Jihad base in the country had carried out the
assassination of one of the criminals in the city of penalty.


Corrected machine translation

The statement said that

the

your

brothers in the military wing
to regulate Al Jihad base in the country had carried out the
assassination of one of the criminals in the city of penalty.


1 error


Corrected machine translation

The statement said that
the

your

brothers in the military wing
to regulate

of the
Al Jihad base in the country had carried out
the assassination of one of the criminals in the city of penalty.


5 errors

Corrected machine translation

The statement said that
the

your

brothers in the military

wing
to regulate

of the
Al
Qaeda

Jihad base in the country had
carried out the assassination of one of the criminals in the
city of penalty.


6 errors

Corrected machine translation

The statement said that
the

your

brothers in the military wing
to regulate

of the
Al
Qaeda

Jihad

organization

base

in

the
country

Mesopotamia

had carried out the assassination of
one of the criminal
tyrant
s

in the city of
penalty

Baquba
.


11 errors in 33 words (67% accuracy)

Deletion

Insertion

Corrected machine translation

Human
-
Translated Reference

The statement said that “your brothers in the military wing of
the Al
-
Qaeda Jihad Organization in Mesopotamia carried out
an assassination of one of the criminal tyrants in the city of
Baquba
.”

New Technologies Implemented in GALE


Topic
-
Dependent Language Modeling


Morphology


Extraction


Syntax Analysis


Hierarchical Classes


Long Distance Language Models


Semantic Analysis


Predicate Argument Analysis

8

Approved for Public Release, Distribution Unlimited

Arabic Translation Targets


Structured Language

9

Approved for Public Release, Distribution Unlimited

Base
Φ
1
Φ
2
Φ
3
Φ
4
Φ
5

Line

90

80

70

60

50

40

90

80

70

60

50

40

75
/
90

55

35

Accuracy (%)

Translation
from text

Translation
from speech

Completed

Pre
-
GALE

(
% accuracy
/
% of documents)

35

55

75
/
90

65
/
80

65
/
80

80
/
90

80
/
90

75
/
80

75
/
80

75
/
90

Targets include accuracy
and consistency

85
/
85

85
/
90

85
/
90

85
/
85

90
/
85

90
/
85

90
/
90

90
/
90

90
/
90

90
/
90

90
/
95

90
/
95

Arabic Translation Results


Newswire

10

Approved for Public Release, Distribution Unlimited

% Accuracy

% of documents

Ph 4

Target

Arabic progress

Approved for Public Release, Distribution Unlimited

% error

Arabic
Machine Translation

Formal

Text

Semi
-
Formal
Text

Formal

Audio

Semi
-
Formal
Audio

11

Chinese Progress

12

Approved for Public Release, Distribution Unlimited

Formal

Text

Semi
-
Formal
Text

Formal

Audio

Semi
-
Formal
Audio

Human vs. Machine

GALE is as good as a single human
in Arabic

Percent Accuracy

Percent of Documents

Human vs. Machine Arabic Formal Text

Percent Accuracy

Percent of Documents

Human vs. Machine Arabic Semi
-
Formal Text

Percent Accuracy

Percent of Documents

Human vs. Machine Chinese Formal Text

Percent Accuracy

Percent of Documents

Human vs. Machine Chinese Semi
-
Formal Text

13

Approved for Public Release, Distribution Unlimited

Improving Translation of Chinese Speech


Chinese transcription error rates are extremely low, but increase along with
perplexity


Improvement in translation of Chinese speech will require work in lowering
perplexity

14

Approved for Public Release, Distribution Unlimited

Evaluation
Set

Formal Audio

Semi
-
Formal
Audio

Overall

PPL

CER

PPL

CER

PPL

CER

Phase 2

21

2.7

33

14.8

26

8.5

Phase 3

30

4.6

33

18.7

31

11.7

Phoneme Transcription Experiment, Human Vs. Machine


Overall Goal


Assess the bounds of human phonetic recognition and compare with machines


Previous Work


Human recognition tested on artificial stimuli


Results show that human accuracy is extremely high


Artificial stimuli lack the complexity of natural speech


The Problem


Isolate phonetic recognition from language biases


Human phonetic discrimination abilities are intimately tied with language,
phonotactic

and prosodic processing, and lexical and semantic familiarity


Solution


Use natural speech for stimuli


Use transcribers who lack prosodic,
phonotactic
, lexical, and semantic information,
but share a phoneme space


15

Approved for Public Release, Distribution Unlimited


Japanese speakers


Italian transcribers


15 Human Subjects


420 phonemes per subject

16

Approved for Public Release, Distribution Unlimited

System

Subst

Del

Ins

PER

ASR

HMM
-
CI

19.6

7.9

7.4

34.9

Human

Average

15.3

8.6

5.9

29.9

Best

9.0

4.0

4.3

17.2

Worst

16.6

10.7

10.2

37.5

Phoneme Transcription Experiment, Human Vs. Machine


The difference between

human and machine performance was around 10%


Result indicates that progress in STT will require improved language models

Systems in Use Today

17

Approved for Public Release, Distribution Unlimited

17

FOUO

Real
-
time translation of Arabic, Chinese,
Spanish*, or Farsi* broadcasts and web
text into English

BBN

Broadcast Monitoring System

& Web Monitoring System

Real
-
time translation of Arabic, Chinese,
Spanish*, or Farsi* broadcasts and web
text into English

BBN Web
Monitoring System

IBM

Translingual

Automated
Language Exploitation System

“The Baghdad system was under
extensive operation and the users
were very pleased with its
capability”





LTC. John
Venhaus
, commanding officer for Joint PSYOP
Group at CENTCOM (Oct. 2007)

*
Farsi and Spanish were funded by outside sources.


We are excited about the
upgrades and think the program
is a great asset to the Global War
on Terror and beyond.





SFC Douglas
Wilderman

10th Special Forces
Group(A) (Nov. 2008)

Broadcast Monitoring System*

Arabic example

18

Approved for Public Release, Distribution Unlimited

18

Real
-
time streaming video

(~5 min delay)

1

Automatic
transcription

of Arabic speech

2

Automatic
translation

of Arabic transcript

3

Although

there

are

no

official

sources,

and

accurate

numbers

of

dead,

many

believe

that

the

number

this

year

is

the

largest

since

the

American

invasion

of

Iraq

and

the

fall

of

Saddam

Hussein’s

regime

two

thousand

three
.


The

estimated

number

of

civilians

killed

daily

in

Iraq

at

least

one

hundred

and

twenty

persons

as

well

as

the

wounded
.

Sample Fielded Arabic
Translation

DARPA Present Status

19

Approved for Public Release, Distribution Unlimited

Success



GALE


Groundbreaking Improvements in machine translation of Arabic and
Chinese text and speech, in some cases approaching human performance



TRANSTAC


New state of the art in two way multi
-
lingual communication by
speech for tactical use



Deployment


GALE and TRANSTAC technologies have been integrated into
operational systems and transitioned to users.



DARPA Present Status
(Continued)

20

Approved for Public Release, Distribution Unlimited

Limitations



Lack of Flexibility


No ability to communicate or monitor informal language


Conversations, chat, messaging, etc. are mostly informal


Technology does not exist to cope with informal language models



Lack of Reliability


Error propagation in multiple dialogue turns


To perform multi
-
turn conversations and chat we need extremely high translation accuracies


Need human machine dialogue to clarify and disambiguate input to reduce probability of error



Lack of Robustness


No capabilities to translate speech signals of less than 25db SNR


Conversing and monitoring of conversation are often not in clean signal.


Transcription of degraded signals are unusable



Lack of Generality


Costly and time consuming methods to develop new language


Cannot duplicate the GALE effort for each new language and dialect


Huge parallel corpora


$60M
-
$160M/language


Parallel corpora are insufficient


e.g. Chinese corpora already consist of 200 million words


Requires expensive and time consuming annotations

Future Language Research Areas


One way translation


Monitoring


Improvement of translation quality in language very different from English (e.g. Chinese)


Inclusion of informal genres


conversation, e
-
mail, web chat, messaging


Extension into Arabic dialects


Modern Standard Arabic is seldom used in informal
genres


Fast acquisition of new language capabilities


Robustness to noise



Two way translation


Communication


Human
-
machine dialogue


Human
-
human and human
-
computer verbal and text interaction



Information retrieval


linguistically enabled search


Accurate retrieval of relevant, non
-
redundant information


Natural language query capability



Language Understanding


Grounded language comprehension through experiential learning of objects, actions, and
consequences

21

Approved for Public Release, Distribution Unlimited

These four thrusts share many underlying technologies

Future Algorithm Research


Rugged Syntactic, Semantic Role Labeling, and Predicate

Argument Analysis


Unconstrained topics and genres


Use semantic equivalences


Analysis of incomplete sentences and/or Analysis of inconclusive acoustic output


Projection of syntax and SRL from known to unknown languages


Powerful Language Models


Modeling non
-
adjacent words


Utilizing syntactic and semantic information


Using wild cards for incomplete sentences and/or inconclusive acoustic output


Analysis and Translation of Longer Input


discourse threading


Prosodic cues


Coherency of topics


Co
-
reference resolution


Content analysis



22

Approved for Public Release, Distribution Unlimited

Future Algorithm Research
(Continued)


Increasing reliability of two
-
way communication and natural language query


Human


machine dialogue for clarification and disambiguation


Automatic error detection


Ambiguity resolution


Language generation


Multimodal input


Semantic Role Labeling and Dependency Parsing Analysis in Both Source and
Target Languages


Dialects


Translation from one dialect to another (e.g. Modern Standard Arabic to dialectal Arabic)


Dialect detection and identification


New Techniques in Automatic Evaluation of Translation Quality as a Target for
Optimization and Automatic Quality Assessment


Language Understanding


23

Approved for Public Release, Distribution Unlimited

www.darpa.mil

24

Approved for Public Release, Distribution Unlimited

25

Approved for Public Release, Distribution Unlimited

Abstract: Defense Advanced Research Projects Agency (DARPA) Program Manager
Joseph Olive will discuss the Chinese and Arabic machine translation work being carried
out under DARPA's Global Autonomous Language Exploitation Program. Topics will
include preparation for the program, the evaluation paradigm, the current status, and
potential future research directions.