Lecture I

spectacularscarecrowΤεχνίτη Νοημοσύνη και Ρομποτική

17 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

64 εμφανίσεις

SPEECH PROCESSING

BINIT MOHANTY

binit.mohanty@gmail.com

Why Speech?


No visual contact required


No special equipment required


Can be done while doing other things



Telephones


AT
&
T


Mobile Phones (1G and 2G)

Speech Processing


Speech Coding


Speech Synthesis


Speech Recognition


Speaker Recognition/Verification


Dyslexia and Auditory problems



Audio Engineering

Speech Coding


Compress a Speech File


Why not use standard compression
techniques?



MP3 Format


Perceptual Coding


Exploits sensory organ biases

Speech Synthesis


Construct Speech waveform from words


Speaker Quality and Accent


Prosody?




http://www.research.att.com/~ttsweb/tts/demo.php

Speech Recognition


Convert a sound waveform to words


The most relevant and important task in
the industry


90% in lab conditions, much lower in
factory conditions



Sphinx by CMU, ViaVoce by IBM & SDK
by Microsoft

Speaker Recognition


Concerned with Biometrics


Acceptable as a verification technique


How would this be different from Speech
recognition?


Speaker Quality


Prosody


Pitch, Accent etc.

Dyslexia & Auditory Problems


Study Voice and Ear defects


Detect and correct Speech Disfluencies


CMU


Development of better Ear substitutes


Cochlear Implants

Audio Engineering


Adding effects to sound


Clarity of reproduction


A Big industry with players like


Dolby,
Bose, Phillips etc



Voice Morphing!

SOURCE

TARGET

CONV 1

CONV 2

Courtesy: Hui Ye & Steve Young, Cambridge

Automatic Speech Recognition


Most Important Task


Hardest Task


Co
-
articulation: Two speakers speaking at the
same time


Speaker Variation


Spontaneity


Language Modeling


Noise Robustness

ASR: Problems

© James Glass, MIT

ASR: Method

© James Glass, MIT

ASR: Application

© James Glass, MIT

Automatic Speech Recognition

© James Glass, MIT

Automatic Speech Recognition



© James Glass, MIT

Speech Production