Download - AusTalk

matchmoaningAI and Robotics

Nov 17, 2013 (3 years and 10 months ago)

85 views

Roland Goecke

Trent Lewis

Michael Wagner

1

Big ASC Meeting 15
-
16 April 2010

What is Calibration?


Calibration is not so much a data collection process,
although that will happen to some extent as well


Rather it is about ensuring that the hardware components
and black box setup are all correct


Or at least that the settings have been recorded for
subsequent analysis


Occurs before the actual data recording takes place.

2

Big ASC Meeting 15
-
16 April 2010

What is included?


Equipment checking


Is the audio and video capturing software running?


Are the lights set up correctly?


“Recording” of environmental settings


What is the light level?


What is the acoustic background noise level?


What are the distances between camera(s) and microphone(s)?


“Recording” of subject calibration sequences


Face turning


Lip movements


Big ASC Meeting 15
-
16 April 2010

3

Why is this Data Important?


The calibration data is potentially fundamental to everyone who will
use the corpus.


To name just a few research areas that will particularly pay attention to
the calibration data:


A and AV speech recognition


A and AV speaker recognition


Biometrics (face recognition, face
-
voice recognition)


Speech Perception/Psycho
-
Acoustics researchers


Speech and Hearing researchers

Big ASC Meeting 15
-
16 April 2010

4

Hardware and Software Requirements


Normal recording equipment and software


An additional light meter would be useful to measure the
‘global’ level of light in the recording environment


Do we need to do something similar for measuring the
acoustic background noise?


Swivel chair to place subject in


Assists the capturing of the face/head from different angles


We want the subjects to turn with the chair, not just turning their
heads


This is more accurate


Masking tape to mark chair position, angles, etc.


Metronome (AV
synchro
)

Big ASC Meeting 15
-
16 April 2010

5

Collection Process


Step 1


2
-
step process


Step 1


Record environment without subject


At the beginning of each session or, in case of sessions over longer
periods of time, once every hour in case the environmental
conditions have changed


Audio and video recording of the recording environment without a
subject present (30s)


Audio and video recording of the metronome in the scene (30s)


Measurement of location of light sources and distance to
camera(s)(manual measurement)


Check camera output is being recorded


Check microphone output is being recorded


Time 5min

Big ASC Meeting 15
-
16 April 2010

6

Collection Process


Step 2


Step 2


Person specific calibration


At the beginning of each recording session with a subject


Sit subject on swivel chair. Measure distances camera(s) to subject
and microphone(s) to subject (manual measurement)


Turn subject to 90
°

left. It is important that the subject turns their
entire body on the swivel chair such that the face (nose?) points in
the required direction.


We will need both markers on the floor as well as on the walls in 15
°

intervals to facilitate the correct turning on the subjects.


Turn subjects to every 15
°

starting from
-
90
°

(left profile) to +90
°

(
right profile), take 2s at each position

Big ASC Meeting 15
-
16 April 2010

7

Collection Process


Step 2


Let subject face camera frontally.


Participants are to say the following two lip movement
calibration sequences for 5s each:


e o e o e o …

(testing lip rounding)


ba

ba

ba



(testing vertical mouth opening)


This is similar to what was done in the AVOZES corpus and
turned out to be quite useful in determining some
understanding of the range of lip movements a subject
makes


Other sequences are possible


Time: 5min


Big ASC Meeting 15
-
16 April 2010

8

Coding and Annotation


No coding or annotation required as such


Want to take note of the environmental conditions in
which the recordings take place


Light level


Can the acoustic base level, i.e. when no one is talking can be measured
from the recorded audio stream, be sufficiently determined from the
recordings without
a subject? If so, no extra measurements required here.


Distance of camera(s) to subject(s)


Distance of microphone(s) to subject(s), e.g. to chin or mouth


Location of light sources and distance to camera(s) or subject(s) (we may
need a sketch of the recording environment for each location)

Big ASC Meeting 15
-
16 April 2010

9