Automatic Speaker Recognition for Series 60 Mobile Devices

ugliestmysticΤεχνίτη Νοημοσύνη και Ρομποτική

14 Νοε 2013 (πριν από 3 χρόνια και 11 μήνες)

57 εμφανίσεις

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


Speaker Recognition
Research in Joensuu

Speech and Image Processing Unit (SIPU)

http://cs.joensuu.fi/sipu/

Puheteknologian talviseminaari

Pasi Fränti

Joensuu

10.3.2006

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


Goals for PUMS season 3 (1/2)

1.
Usability of
automatic

speaker
identification in
forensic

applications

2.
Compatibility with large databases

3.
Automatization of LTAS + fusion
with MFCC.

4.
Voice activity detection

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


Goals for PUMS season 3 (2/2)

5.
Speaker verification in real (noisy)
environment

6.
Prototype for access control

7.
Solving technical requirements for
prototype in elevator.

8.
Usability for detecting sound sources in
general

9.
Key word search (using
HTK

or
Lingsoft
Recognizer
)


University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


Research Group

Pasi Fränti

Professor

Juhani Saastamoinen, PhLic

Tomi Kinnunen, PhD (Singapore)

Ville Hautamäki, MSc

Ismo Kärkkäinen, MSc

PUMS personnel

Marko Tuononen, BSc

Doctoral researchers

Collaborators

Rosa Gonzalez
-
Hautamäki, MSc

Ilja Sidoroff

Victoria Yanulevskaya

Evgeny Karpov, MSc (NRC)

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


1. Applicability to forensic
applications


Automatic speaker recognition study
has been done.


Results are not reported but actions
taken within tasks 3 and 4.


Material can be found in Kinnunen’s
PhD thesis [4] and Niemi
-
Laitinen’s
presentation.

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


2. Support for large databases

-

Not yet done
-

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


3. LTAS and other features


Automatic calculation of LTAS done.
Integration to WinSprofiler in progress.
Reporting in progress.


Benefit of LTAS is merely its speed and
ease of use: no difficult control parameters.


No additional benefit to recognition
accuracy. MFCC includes the same
information.


Could be used for preliminary pruning in
case of large datasets.

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


Noise robustness of F0 feature

Results reported in
[
3, 5
]

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


4.
Voice activity detection


Software for speech segmentation
(
VoiceGrep
).


Command line version for Linux.


Windows version in WinSprofiler.


Testing done in SIPU laboratory.


Labtec® pc mic 333,
44
,
1 kHz


Recordings were emphasized

24 dB
by
Audacity

voice editor

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


4a. Test material and results


Material


4 hours in total.


Bad quality recordings: 11 bits data, of which

4
-
5 informatio, and the rest noise.


VoiceGrep
made 168 detections:


56 speech (33%)


112 non
-
speech (67%)


Material included 71 real speech segments:


Average segment length 16 s.


VoiceGrep

found 25 of these (35 %)




University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


4b. VoiceGrep overall results

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


4c. VoiceGrep example

(Correct detection)

Start of the speech is
detected correctly

End of the speech

is missed

Play sample #1

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


Door opening

Running water

Walking

Door

4d. VoiceGrep example

(false detections)

Play sample #2

Play sample #3

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


4e. VoiceGrep example

(missed speech segment)

Door

Speech and walking

Door

Play sample #4

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


4f. Entire data set

(4 hours)

Speech segments

Result of VoiceGrep

Data

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


5. Speaker verification in
noisy environment


Syste
matic testing of the effective
parameters has been reported in
[1].


Applicability of speaker verification in
real environment has been reported in
[2]

and in Kinnunen’s PhD thesis

[5].


Additional testing will be done if
enough time
.


University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


5a. Text
-
dependent verification

in access control



Utilizing time series information improves recognition.



Best result if everyone has their own password.

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


6. Prototype for access control

Microphone

Motion detector

Emergency button

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


7. Calling elevator

(technical requirements)


Communication with
OPC
-
server
:


Implemented with
Matrikon server
.


Program logic to elevator implemented:


Reads variables from OPC
-
server.


Interprets and shows elevator status.


Includes recording logic.


Speaker and voice related stuff:


Not yet implemented.


Main window does not show anything yet.

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


8.
Usability for detecting
sound sources in general

-

Not yet done
-

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


9.
Keyword search

-

Not yet done
-

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


Publications (season 3)

1.
J
.

Saastamoinen,

Z
.

Fiedler,

T
.

Kinnunen

and

P
.

Fränti,

"On

factors

affecting

MFCC
-
based

speaker

recognition

accuracy",

Int
.

Conf
.

on

Speech

and

Computer

(SPECOM'
05
)
,

Patras,

Greece,

503
-
506
,

October

2005
.


2.
H
.

Gupta,

V
.

Hautamäki,

T
.

Kinnunen

and

P
.

Fränti,

"Field

evaluation

of

text
-
dependent

speaker

recognition

in

an

access

control

application",

Int
.

Conf
.

on

Speech

and

Computer

(SPECOM'
05
)
,

Patras,

Greece,

551
-
554
,

October

2005
.

3.
T
.

Kinnunen,

R
.

Gonzalez
-
Hautamäki,

"Long
-
Term

F
0

Modeling

for

Text
-
Independent

Speaker

Recognition"

Int
.

Conf
.

on

Speech

and

Computer

(SPECOM'
05
)
,

Patras,

Greece,

567
-
570
,

October

2005
.


University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


Theses (season 3)

Opinnäytetyöt

4.
T
.

Kinnunen
,

"
Optimizing

Spectral

Feature

Based

Text

Independent

Speaker

Recognition

,

PhD

thesis
,

University

of

Joensuu,

June

2005
.

5.
R
.

Gonzalez
-
Hautamäki
,

"
Fundamental

Frequency

Estimation

and

Modeling

for

Speaker

Recognition

,

MSc

thesis
,

University

of

Joensuu,

July

2005
.


University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


Speaker Verification

Speaker Identification

Speaker Recognition

Whose voice is this?

Is this Bob’s voice?

(Claim)

+

Verification

Imposter!

?

Identification

Applications scenarios

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


Software 1: Console program

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


Software 2: WinSprofiler

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


Software 3: Symbian

Port to Symbian OS with
Series 60 UI platform

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


Software 4: Door SProfiler

Opening laboratory door by speaking

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


Software 5: Lift SProfiler

(to appear in season 4 perhaps…)

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


Future development (1)

VAD

WinSprofiler

Windows (JoY)

Mobile

Series 60 (JoY)

SRLIB:

MSE

GMM

MFCC

VQ

DB

support

LTAS

F0 extraction

fusion by weighted MSE

Keyword search

Software integration

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


Future development (2)

Classifier fusion

srlib

DB

Access control

Speech analyzer tool

Forensic applications

Segmentation

VAD

common speaker recognition app. interface

Verification

Calling elevator

Keyword search

Call center

Applications

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


Future development (3)


Implement and integrate F0, maybe also
other formants (F1, F2).


Automatic voiced/unvoiced segmentation.


User enrollment.


Use of sequence information (triplets).


Development of WinSprofiler software to
the direction of voice profiler and speech
analyzer tool!

Technical development

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi



OPC

server

Machine room

CAN

Ethernet

TCP/IP

Microphone

Display

OPC

client

LiftCaller

SRLIB 3.0

Approach detection

DCOM

Lift car &

hardware

Our PC

GW box

Future
development (4)

Elevator prototype

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


Vision 1: Teleconferencing

Unkonwn

Bob

Minna

Alice

VPN

Paul

Speaker Recognition

Speaker


Recognition

Speaker


Recognition

Speaker


Recognition

Speaker


Recognition

Alice

Bob

Minna

Unknown

Verified

&

allowed

Not

registered

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


Vision 2: Call
-
center



Speech is the main tool
for people in call
-
center



Voice login of personell


Removes the need for
manual entry

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


Vision 3: Language recognition


Related problem to speaker
recognition


the same research groups
usually study both problems.


Not trivial to solve.


Studied a lot for Asian languages,
even for rare languages that do not
have any ”written form”.


University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


Vision 4: Medical applications


Doctor use voice to record summary
of patient meetings.


Access by keyword search.


Annotation.


Authentication of speaker.

University of Joensuu

Dept. of Computer Science

P.O. Box 111

FIN
-

80101 Joensuu

Tel. +358 13 251 7959

fax +358 13 251 7955

www.cs.joensuu.fi


Thank for you patience!

Questions?