The Speech Speech

spectacularscarecrowAI and Robotics

Nov 17, 2013 (3 years and 4 months ago)

76 views

The Speech Speech

casey chesnut

brains
-
N
-
brawn.com


Madison .NET April 2007

Powerpoint


Page Up


Page Down

brains
-
N
-
brawn.com


Pervasive Computing


Tablet PC
(MVP 03)


Compact Framework
(MVP 04)


Advanced Web Services
(MVP 05)


Media Center
(MVP 06)


Speech


Location Based Services


Artificial Intelligence


3D

Outline


Speech Overview


Vista Speech Recognition


SAPI 5.3 / System.Speech


Speech Server 2007

Outline : Speech Overview


Voice User Interface


How does it work?


Synthesis (TTS)


Recognition (SR)

Overview


Speech is just another presentation
system


Synthesis = Output to user


Recognition = User input


Voice User Interface (VUI)

VUI Modes


Applications


Multi
-
modal


Voice
-
only

VUI Tips


Don't replicate the touch
-
tone
-
based menu
system


Restrict options on the main (opening) menu to
4 or fewer


Make sure your opening greeting is short


Don't design the app solely for the new user


Focus on task completion above all


What can I say?


http://blogs.msdn.com/anandis_thoughts/archive/2
006/02/08/528181.aspx

Speech Synthesis


Text to Speech


Dynamic


Prompt database

How Synthesis Works


Text parsing


Sentences, numbers, symbols, pauses


Natural language processing


Part of speech, tense


Phonemes are looked up or sounded out


Diphones are appended together


Post process audio to add emphasis


Play speech audio

How Synthesis Works


Demo


/xnaSynth app


Article


http://www.brains
-
N
-
brawn.com/ttSpeech/


http://www.brains
-
N
-
brawn.com/xnaSynth/

(codebase from
/ttSpeech)

Speech Recognition


Speech to Text


Dictation


Command and Control

How Recognition Works


Audio signal is processed


Look for signals which might be speech


Phonemes are found in audio signals


Phonemes are mapped to a dictionary or
words


Dictation or grammar
-
based


Apply natural language processing

How Recognition Works


Demo


/wavReader app


Article


http://www.brains
-
N
-
brawn.com/noReco/


http://www.brains
-
N
-
brawn.com/speakerVerify/

(codebase from
/noReco)

Outline : Vista Speech
Recognizer


Built
-
in to Vista’s shell


Microphone bar


Language support


Can be trained to improve accuracy


Command
-
and
-
control, also Dictation


Automagic application support


Horrible Office integration


UAC problems

Demo


Say what you see


Show numbers


Correct


Spell it


Mouse grid


http://www.istartedsomething.com/20060808
/vista
-
speech
-
recognition
-
screencast/

High Risk Demo

Hack

http://news.bbc.co.uk/1/hi/technology/63208
65.stm



/micBarExtend


tap and talk

Narrator


Vista’s screen reader

Outline : SAPI 5.3 /
System.Speech


Desktop applications


SAPI 5.3


System.Speech


SAPI 5.3


COM based


Native applications


Managed apps which need more control

System.Speech


Part of .NET 3.0 WPF


Managed wrapper built on SAPI 5.3


Simple API


Standards support (SSML, SRGS)


Language support


Vista Speech Recognition integration


Does not work in XBAP

System.Speech.Synthesis


SpeechSynthesizer


SSML


PromptBuilder


Voices

System.Speech.Synthesis


Demo


/speechSamples
-

/speechSynth

System.Speech.Recognition


SpeechRecognizer /
SpeechRecognizerEngine


SRGS


GrammarBuilder


Advanced users


Deep
-
link functionality


Mixed initiative

System.Speech.Recognition


Demo


/speechSamples
-

/speechReco

System.Speech


Demo


/micBarExtend


/mceSapiMcpl


Article


http://www.brains
-
N
-
brawn.com/speechSamples/


http://www.brains
-
N
-
brawn.com/micBarExtend/


http://www.brains
-
N
-
brawn.com/mceSapi/

(not
updated for Vista yet)

What about Mobile Devices


OEMs can add VoiceCommand


VoiceCommand is not accessible to
developers


WindowsMobile has the SAPI API, but no
engines


PlatformBuilder is supposed to have
engines


There are 3
rd

party engines for purchase

Outline : Speech Server 2007

Speech Server 2007


Telephony Applications


Outgoing calls


Speaker Independent

Speech Server 2007


VOIP


Language support


VoiceXML / SALT


Workflow development model


Reports


Still in beta

Speech Server 2007


Speech Synthesis


Inline


PromptBuilder


SSML


Prompt databases


Speech Recognition


Inline


Dynamic Grammar


SRGS


Conversational Grammar Builder


DTMF

VoiceXML


Declarative language


Article


http://www.brains
-
N
-
brawn.com/vxml/


http://www.brains
-
N
-
brawn.com/myVoices/


http://www.brains
-
N
-
brawn.com/voiceBio/

SALT


Yet another declarative language


Multimodal support has been dropped


Article


http://www.brains
-
N
-
brawn.com/noHands/


http://www.brains
-
N
-
brawn.com/speechMulti/


http://www.brains
-
N
-
brawn.com/tabletWeb/


http://www.brains
-
N
-
brawn.com/mceSalt/

Speech Workflow


Speech Sequence Workflow designer


Speech activities


Statement


QuestionAnswer


Debugging tools

Speech Workflow


Demo


/speechTextAdv


/speakerVerify


/mobileRecord


Article


http://www.brains
-
N
-
brawn.com/speechTextAdv/


http://www.brains
-
N
-
brawn.com/speakerVerify/

Where


Accessibility


Telephony


Telematics


Home automation


Mobile Devices / Tablets


Gaming


Warehouses




Possible Future


Telematics


Service Pack for Office Support


Exchange Server 2007


Speech Server 2007 release


Rumors that WindowsMobile will get a
public API


Dictation has room to improve


Hope that System.Speech will ultimately
work in XBAP

Questions