YEAR 2007 Participation to Interspeech 2007 (Antwerpen , August ...

joinherbalistAI and Robotics

Nov 17, 2013 (3 years and 9 months ago)

77 views

YEAR 2007


Participation to Interspeech 2007 (Antwerpen ,
August

27
-
31)

The first day was dedicated to the tutorials. I attended one of them on voice transformation, in direct
link with my thesis. The author was Yannis Stylianou, from Crete. He introduced

the basics in voice
conversion and voice morphing using the time domain as well as the frequency domain. Future
potentialities in voice transformation were discussed.

The four next days were devoted to plenary (in the morning), oral and poster sessions. I

firstly have
to admit I was impressed by the importance of the event

: a complete week only for speech
processing with 7 sessions at the same time… All the names we use to write in our bibliographies
were present.


Most of the time, I prefered to follow t
he poster sessions


because oral sessions inherently required
a strong background in the concerned field. Even if all this people work in the same domain (speech
processing), it is impossible to have enough knowledge for understanding everything.

I parti
cularly focused my attention to the oral session about Multimodal Speech Recognition because
two presentations were very close to my master thesis. This session and the discussions which ensue
were very fruitful.

On the other hand, I followed the sessions

which dealt with 2 fields of my PhD

:

-

The Lombard effect and speech in noise

-

Speech Synthesis using Hidden Markov Models.

For the first subject, three presentations were quite interesting

:

-

«

Lombard Speech Impact on Perceptual Speaker Recognition

», I
keno, Hansen, University of Texas.
Hansen was known for having writtena famous paper about the importance of Lombard effect in
speech recognition.

-

«

Two
-
Stage System for Robust Neutral/Lombard Speech Recognition

», Boril, Fousek, Hoge,
University of Prag
ue. The authors are one of the rare groups which built a Lombard databse.

-

« Speech Synthesis enhancement in noisy environments

», Bonardo, Zovato, Loquendo in Torino.

The authors use a dynamic range controller for improving speech intelligibility.

As for

the second subject, two presentations caught my attention

:

-

«

An HMM
-
based Speech Synthesis System Applied to German and Its Adaptation to a
Limited Set of Expressive Football Announcements

», Krstulovic, Hunecke, Schroeder.

The authors managed to reach a

good voice quality even with little data for the training.

-

«

Implementation and Evaluation of an HMM
-
based Thai Speech Synthesis System

»,
Chomphan, Kobayashi, Tokyo Institute of Technology.

I had the opportunity to converse a lot with the author. It gave

me a lot of ideas for
implementing it in French.



Participation to
MMSP
2007 (
International Workshop on Multimedia Signal
Processing
-

Chania, Crete October

1
-
3
)

I was the first author of a paper dealing with feature selection (I wrote
it
at EPFL, Lausan
ne,
Switzerland). Unfortunately, the date coincided with the beginning of my FNRS grant. One of my
colleagues in Switzerland had the opportunity to go in my place for presenting a 20 minutes oral
session (and for benefiting from the Greek beaches and sun).

Here is the
paper details
:


Paper:

D2O1.2

Session:

Image & Video I

Time:

Tuesday, October 2, 10:13
-

10:26

Presentation:

Lecture

Title:

RELEVANT FEATURE SELECTION FOR AUDIO
-
VISUAL SPEECH
RECOGNITION

Authors:

Thomas Drugman;
Faculte Polytechnique de
Mons






Mihai Gurban;
Ecole Polytechnique Federale de Lausanne (EPFL)






Jean
-
Philippe Thiran;
Ecole Polytechnique Federale de Lausanne (EPFL)




Abstract:

We present a feature selection method based on information theoretic
measures, targeted at mu
ltimodal signal processing, showing how we can
quantitatively assess the relevance of features from different modalities. We
are able to find the features with the highest amount of information relevant for
the recognition task, and at the same having mini
mal redundancy. Our
application is audio
-
visual speech recognition, and in particular selecting
relevant visual features. Experimental results show that our method
outperforms other feature selection algorithms from the literature by improving
recognition
accuracy even with a significantly reduced number of features.



Seminar

of Information Technology research center

(
FPMs

, Mons,
Belgium
,
October 11
th
)

The Lombard effect: analysis and applications:

the Lombard effect refers to the speech changes due
to t
he immersion of the speaker in a noisy environment. These modifications are observed on an
acoustic, phonetic as well as an articulatory point of view. Through an hyper
-
articulation
(unconsciously most of the time), the speaker placed in a communicative co
ntext aims at maximizing
the intelligibility of his utterances. After an analysis of the different changes produced, hindrances
induced in automatic speech recognition and future potential applications in speech synthesis will be
discussed.


Lecture at Com
putational Intelligence and Learning doctoral school
(Louvain
-
la
-
Neuve, November 5
th
)

Masashi Shimbo, professor at the Computatio
nal Linguistics Laboratory in the

Nara Institute of
Science and
Technology

(
Japan
)
, presente
d two courses:

Kernels on graph nodes and their application to link analysis

:
In this elementary tutorial,
he

present
ed

an interpretation of Kandola et al.'s von Neumann kernels in the context of link analysis,
with an emphasis on their relationship to th
e HITS importance ranking method.
He

then talk
ed

about
the effect of 'topic drift,' a problem which was first observed with HITS, but affects the von Neumann
kernels as well. The property of the von Neumann kernels is also compared with the kernels based
o
n the Laplacian matrix.

Introduction to conditional random fields and other discriminative sequence labeling methods :

In
recent years, the conditional random field (CRF) have become a popular method in natural language
processing. It has not only served a
s an effective alternative to the hidden Markov model in sequence
labeling problems, but also provides a generic framework that are applicable to a wide range of
applications. This lecture started with a tutorial on the basics of CRFs, and their alternativ
e
algorithms that are more light
-
weight. Some natural language processing tasks were described to
which these algorithms have been applied.


Tutorial on Quartz Composer and Isadora (FPMs
, Nov 28
th

and 29
th

PM
)

Raphaël Sebbe and Celine Mancas
-
Thillou, both
doctors in Image Processing, presented tutorials on
famous visual programming environments.

Quartz Composer

is a node based
visual programming language

provided as part of the
Xcode

development environment

in
Mac OS X v10.5

"Leopard" for processing and
rendering

graphical data.

Isadora

is a
proprietary

graphic programming environment for
Mac OS X

and
Microsoft Windows
,
with emphasis on real
-
time manipulation of digital video. It has support for
OpenSound Control
.

Tutorial on
Max
-
MSP

(FPMs,
December 6
th

PM
)

Nicolas D’Alessandro, PhD Student in Singed Voice Synthesis, presented a tutorial on Max
-
MSP, a
real
-
time sound processing programme.

Max

is a graphical
development environment

for
music

and
multimedia

developed and maintained
by
San Francisco
-
based software company
Cycling '74
. It has been used for over fifteen years by
composers
, performers, software designers, researchers and artists interested in creating
interactive

software.


Tutorial on
Blender

(FPMs,
December 19
th
)

Sebastien Noël, PhD Student in Infor
matics, presented a tutorial on Blender, a powerful 3D
animation software.

Blender

is a
free software

3D

animation program. It can be used for
modeling
,
UV

unwrapping,
texturing, rigging, skinning
, animating,
rendering
, particle and other simulating,
non
-
linear editing
,
compositing
, and creating interactive 3D applications.