Voice as a User Interface Tuning Speech

decorumgroveInternet and Web Development

Aug 7, 2012 (5 years and 1 month ago)

410 views

Voice as a User Interface

Tuning Speech

Baiju D. Mandalia, PhD

Senior Technical Staff Member, IBM Corporation

Voice as a User Interface


Human factors standards


Error handling


Prompt Design


Dialog Design


Grammar Design


Tuning voice applications


ETSI ES 202 077 for spoken

commands




Context
-
independent common commands
(like Main Menu, Operator, Goodbye etc.)



Context dependent common commands
(Help, Repeat)



Core commands
(Yes, No, Stop)



Digits
(including zero, Oh, double etc)



Name and Digit dialing
(like Home, Work, Mobile)



Basic Call Handling
(Answer, Divert all calls to, Transfer etc.)



Media control
(Play, Pause, Continue etc.)



Browseable List Navigation
(Next, Continue, Details)



Editing Commands
(Delete, Save, Record etc.)



Device settings
(Volume up/ Louder, Volume down/Quieter)



Word
-
spotting mode
(Wake
-
up
-

to activate other modal functions in above list)


Dialog Components for handling
errors

Prompt Design


Pre
-
recorded vs Text to Speech


Guiding the caller


SSML


Dictionaries

Dialog Design


Consistency


Make sure caller experience is uniform during call


Barge
-
in


Define dialog based on use of better interaction
and barge
-
in by experienced users


Using Nbest


Exploit nbest to provide disambiguation with
complex tasks

Grammar Design


Unknown pronunciations


Acoustic Confusability


Grammar Coverage


Complexity


Dynamic Application Development


Weighting of more common words

Tuning Voice Applications


Timeouts


Lexicons


Weighting


Confidence levels


Speed vs Accuracy


Sensitivity


Acoustic model adaptation

WebSphere Voice Server

Tuning Tools

Tuning tools features


Eclipse based


User friendly , graphical interface


Tightly integrated for repetitive testing


Assistive tools like pronunciation builder for
tuning


Validating grammars on the MRCP
Server

Enumerating a grammar (random)

Testing grammars with text

Pronunciation Builder

Testing grammars with speech

Voice Trace Analyzer for Tuning

1.
Set the Voice Server trace specification.

2.
Run your voice application (generate trace
data).

3.
Run the WVS Collector tool.


(optional for Integrated Runtime Environment)

4.
Import the data into the Voice Trace
Analyzer.

Voice Trace Analyzer Views

Transcriptions within tool


In Grammar


Accuracy



Correct Accept (CA)



Correct Reject (CR)



False Accept (FA)



FA
-
In



FA
-
Out



False Rejects (FR)

In Grammar

Out
Grammar

Match

CA

No Match

FR

CR

False Accept

FA
-
In

FA
-
Out

References


WebSphere Voice Server



http://www.ibm.com/software/pervasive/voice_server



WebSphere Voice Server Information Center



http://publib.boulder.ibm.com/infocenter/pvcvoice/51x/index.jsp



WebSphere Voice Zone



http://www.ibm.com/developerworks/websphere/zones/voice



IBM WVS for Multiplatforms V5.1.1/V5.1.2 Handbook


http://www.redbooks.ibm.com/abstracts/sg246447.html?Open



Speech User Interface Guide



http://www.redbooks.ibm.com/redpieces/abstracts/redp4106.html?Open