matejka_interspeech2012_ICSLP..

muscleblouseAI and Robotics

Oct 19, 2013 (3 years and 10 months ago)

75 views

1

Patrol LID System for DARPA RATS P1 Evaluation

Pavel Matejka

Patrol Team Language
Identification System

for

DARPA RATS P1 Evaluation

Pavel Matejka
1
, Oldrich Plchot
1
, Mehdi Soufifar
1
, Ondrej Glembek
1
, Luis Fernando D’Haro
1
,
Karel Vesely
1
, Frantisek Grezl
1
, Jeff Ma
2
, Spyros Matsoukas
2
, and Najim Dehak
3


1
Brno University of Technology,
Speech@FIT and IT4I Center of Excellence, Czech

2
Raytheon BBN Technologies
, Cambridge, MA, USA

3
MIT Computer Science and Artificial Intelligence Laboratory
, Cambridge, MA, USA


matejkap@fit.vutbr.cz, smatsouk@bbn.com, najim@csail.mit.edu

2

Patrol LID System for DARPA RATS P1 Evaluation

Pavel Matejka

Outline


About DARPA RATS program


Datasets and task description


Subsystems with analysis


Fusion and Results


Conclusion

3

Patrol LID System for DARPA RATS P1 Evaluation

Pavel Matejka

DAPRA RATS Program


RATS = Robust Automatic Trascription of Speech


Goal :
create algorithms and software for performing the following
tasks on speech
-
containing signals received over communication
channels that are extremely noisy and/or highly distorted
.



Tasks :


Speech Activity Detection


Keyword Spotting


Language Identification


Speaker Identification


Data collector : LDC


Evaluation by SAIC


Performer: PATROL Team led by BBN

4

Patrol LID System for DARPA RATS P1 Evaluation

Pavel Matejka

Data Specification


Languages:


Dari, Levantine Arabic, Urdu, Pashtu, Farsi


>10 out of set languages


Durations: 120s, 30s, 10s, 3s


T
elephone conversations retransmitted over
8
noisy radio
communication channels

[marked as A
-
H]


Available: collections of 2
-
min audio samples


LDC2011E95


split to train and dev by SAIC


LDC2011E111


split to train and dev by Patrol team


LDC2012E03


supplemental training for non
-
target languages


The amount of audio data for different languages heavily
unbalanced


Added shorter duration samples


Derived from 2
-
min samples, based on our SAD output




5

Patrol LID System for DARPA RATS P1 Evaluation

Pavel Matejka

Datasets


Train


Main



Only files where VAD detects
>60s of speech


30774 files together


Unbalanced

= 668 files for Dari, 12778 for Leventine Arabic


Balanced


Balanced over files
for each language and channel


7150 files for each duration


673 files for Dari, otherwise ~1300


Extended


Main + all 30sec cuts

from Main set


+ entire LDC2012e03 (only nontarget languages)


~170k segments


Development Set


Corpus was driven by Dari
-

only 679 source files, other languages limited to 1000
files, 2432 files for non target languages


~7120 files for each duration


Evaluation Data


2527 files for each duration

6

Patrol LID System for DARPA RATS P1 Evaluation

Pavel Matejka

LID Patrol System Architecture

Audio


CZ Phoneme
Recognition

Phonotactic
iVector LID

iVector LID

JFA LID

BBN SAD

iVector LID

Combined
Score


BUT SAD

Calibration
& Fusion

7

Patrol LID System for DARPA RATS P1 Evaluation

Pavel Matejka

Speech Activity Detection


One of the most important blocks since the data really difficult


See separate paper about SAD development on Wednesday 16:00 in
Pavilon West


Used both GMM
-
based (BBN) and MLP
-
based (BUT) detectors.

8

Patrol LID System for DARPA RATS P1 Evaluation

Pavel Matejka

Speech Activity Detection


Comparison of different SAD systems


Robust SAD tuned for noisy telephone speech


Robust SAD tuned for RATS


Results (Cavg) are on DEV set (but scored with SRC
channel)


iVector system (600dim) used for this experiment






25% relative gain

SAD type/ Cavg[%]

120s

Telephone

2,2

RATS

1,6

9

Patrol LID System for DARPA RATS P1 Evaluation

Pavel Matejka

iVector LID System (BUT)


Acoustic Features


Dithering, bandwidth 300
-
3200Hz for 25 Mel
-
filters, 6 MFCC+C0


CMN/CVN (based on SAD), RASTA normalization


Shifted Delta Cepstra (SDC) 7
-
1
-
3
-
1


UBM


Language independent, diagonal
-
covariance, 2048 Gaussians


Trained on
balanced

train set


iVector


600 dimensions


Trained on
main

set


Neural network classifier


iVector input, 6 outputs (1 nontarget + 5 target languages)


Hidden layer with 200 nodes


Stochastic Gradient Descent training with L2 regularization


Trained on
extended

set (all data + all 30 sec splits)


10

Patrol LID System for DARPA RATS P1 Evaluation

Pavel Matejka

Comparison of L
ogistic
R
egression

and N
eural
N
etwork as final classifier


BUT iVector system (600dim)


Results on Development set


Logistic Regression


trained by: Trusted Region Conjugate
-
GD


Results on Development set


Neural Net:


one hidden layer 200


trained by: Stochastic
-
GD with L2 regularization


also experiments with Conjugate
-
GD, but no improvement

11

Patrol LID System for DARPA RATS P1 Evaluation

Pavel Matejka

JFA LID System (BUT)


Acoustic Features


Same as for iVector system + Wiener filtering


Universal Background Model (UBM)


Language independent,


Diagonal
-
covariance, 2048 Gaussians


Trained on
balanced

train set


JFA


Trained on
main

train set


μ = m +
D
z +
U
x


Models of languages
D

are MAP adapted from UBM with tau =10


Channel matrix
U

with 200 dimensions


Linear scoring

12

Patrol LID System for DARPA RATS P1 Evaluation

Pavel Matejka

Importance of Wiener Filter


400dim i
-
vector + logistic regression
experimental
system


Results on Development set

13

Patrol LID System for DARPA RATS P1 Evaluation

Pavel Matejka

iVector LID System (BBN)


Acoustic Features


RASTA
-
PLP


Block of 11
-
frame PLPs, projected to 60 dimensions via HLDA


UBM


Language dependent (5 target, 1 “non
-
target”), 1024 Gaussians


iVector


400 dimensions


Group adjacent speech segments into 20s chunks, estimate one
iVector per chunk


improves performance on short duration conditions by 28%


Estimate 6 iVectors (one per UBM)


Apply neural network (NN) to each iVector
-

6 outputs (1
nontarget + 5 target languages)


Combine NN outputs to form 6
-
dimensional score vector


26% relative improvement compared to using language
independent i
-
vectors

14

Patrol LID System for DARPA RATS P1 Evaluation

Pavel Matejka

Analysis of iVector LID System (BBN)


Analysis of the BBN iVector extractor training and UBM:

1.
Whole audio segments, single UBM

2.
Audio split to 20s segments, single UBM

3.
Audio split to 20s segments, language dependent background
models (LDBM)

15

Patrol LID System for DARPA RATS P1 Evaluation

Pavel Matejka

Phonotactic iVector LID System (BUT)


Phoneme recognizer


Czech CTS recognizer trained on artificially noised data


Added noise with varying SNR (lowest 10dB) to 30% of the corpus


38 phonemes


3
-
gram counts: sum of posterior probabilities of 3
-
grams from
phone lattices


iVector


Multinomial subspace modeling


600 dimensions, trained on
main

train set


Training a low
-
dimensional subspace in the framework of total
variability model using multinomial distribution


Using point
-
estimate of the model’s latent variable for each
utterance as our new features


Logistic regression as final classifier


Trained on
main

train set


16

Patrol LID System for DARPA RATS P1 Evaluation

Pavel Matejka

Fusion and Calibration


Regularized logistic regression


Objective: minimize cross
-
entropy on development set


Duration
-
independent


trained on files from 10s, 30s, and 120s
conditions


Procedure


Calibrate (tune) each system individually


Combine calibrated system outputs into a single output vector


Fusion parameters estimated on the same development set



Performance evaluation


Primarily Cavg score


Also computed P
MISS

and P
FA

at Phase 1 target operating points





17

Patrol LID System for DARPA RATS P1 Evaluation

Pavel Matejka

Overall Results

18

Patrol LID System for DARPA RATS P1 Evaluation

Pavel Matejka

Robustness


There is channel B completely removed from the training of the
contrastive system (noB) (channel B is unseen channel)


Results on Development set with BUT iVector system (600dim)



Over all results


System/Cavg[%]

120s

30s

10s

3s

iVector NN

3.4

9.7

16.8

25.8

iVector NN noB

8.1

14.7

19.8

29.0

System/Cavg[%]

120s

30s

10s

3s

iVector NN

1.9

6.6

13.0

23.6

iVector NN noB

2.5

7.2

13.4

23.8


Results only for channel B

19

Patrol LID System for DARPA RATS P1 Evaluation

Pavel Matejka

Conclusion


SAD is crucial


De
-
noising helps


Benefit from using Language dependent UBM


Benefit from using NN as final classifier for LID