VoiceXML: History, Background, Language, Tools

thingyvirginiaInternet and Web Development

Jul 30, 2012 (5 years and 21 days ago)

358 views

VoiceXML:


VoiceXML: History, Background,
Language, Tools




Acknowledgements


Prof. Mctear, Natural Language Processing,
http://www.infj.ulst.ac.uk/nlp/index.html
, University of
Ulster.

Why VoiceXML?


VoiceXML provides a way to link with the Internet
without the need for a PC


Anytime/anywhere access: phone, wireless phone,
wireless PDA, etc.


Can support interesting location
-
based services.


Standard language enables portability.


High
-
level domain
-
specific language simplifies
application development.

Why VoiceXML now?


Technological advances in computer hardware


Miniaturisation


Dramatic price reductions


Improvements in algorithms for speech technology


More accurate speech recognition


Speech recognition in noisy environments


Emergence of dialogue design methods


Prompt design


Grammar design


Call flow design

Benefits


End Customers


Easier
-
to
-
use applications


Network Providers


New services and

revenue streams


More cellular airtime, more subscription services


Multiple applications share network resources


Application/Content developers


Leverage the Internet for application development and
delivery.


Web development model


Portability of telephony infrastructure

VoiceXML Genealogy

AT&T Bell Labs: PML

IBM: SpeechML

Motorola: VoxML

Lucent: PML

AT&T: PML

VoiceXML Forum

VoiceXML 0.9: Aug 1999

VoiceXML Forum

VoiceXML 1.0: March 2000

W3C/Voice Browser

VoiceXML 2.0: Oct 2001

W3C/Voice Browser

VoiceXML 2.1: Jan 2004

VoiceXML Forum


Industry organisation


Founded by AT&T, IBM, Lucent and Motorola in March
1999


“Chartered with establishing and promoting VoiceXML, a
new standard essential to making Internet content and
information accessible via voice and telephone”


VoiceXML Specification v0.9 released August 1999


Specification v1.0 released March 2000


Submitted to W3C (acknowledged 22 May 2000)


VoiceXML 2.0 released as a W3C Candidate
Recommendation January 2003


VoiceXML Developer Certification Exam launched, 1 June
2004


Over 380 member companies


http://www.voicexml.org

World Wide Web Consortium (W3C)


International Standards for the Web


HTML, XML, XHTML etc.


Voice Browser "Activity" started October 1998


Wide variety of interests:


Voice
-
augmented graphical browsers


"Voice Browsing" web content via phone


Voice telephony services via domain
-
specific markup
languages


Voice Browser Working Group


Defining standards for:


Dialogue Markup


Grammar formats


Speech Synthesis Markup


Natural Language Semantics

http://www.w3.org/Voice/

Spoken Dialog Example: Communicator

AT&T Communicator


How can I help you?

From Washington DC to Denver roundtrip.

Flying from Washington to Denver, roundtrip flight. O.k., and

what date do you want to leave?

Next Friday, in the morning.

Leaving April 27
-
th

There is an outbound flight from



6: 35 am

, would you like me to hold that flight?

No, do you have a later flight?


O.k.


A later flight is with AA departing at 3:32 pm


and

arriving Denver at 8:02 pm. Would you like me to hold that

flight?

Sure.

O.k., and what date do you want to return?



The Speech Dialog Circle

DM*

SLU*

RG

Response

Generation

Automatic Speech

Recognition

Spoken Language

Understanding


Dialog

Management

ASR

Data,

Rules



Speech

Action


Words spoken


I need a flight from
Washington DC to Denver
roundtrip



Meaning

Speech

ORIGIN_CITY: WASHINGTON

DESTINATION_CITY: DENVER

FLIGHT_TYPE: ROUNDTRIP

GET DEPARTURE DATE

Which date do you want
to fly from Washington
to Denver?

VoiceXML Architectural Model

VoiceXML Browser

VoiceXML

Browser

Application

Web Server

PSTN

Public Services

Telephone Network

VoiceXML

code

HTTP

VoiceXML Interpreter

ASR

TTS

Audio

DTMF

Telephony

Service Logic

Content and Data

Transaction Processing

Database Interface


W3C Speech Interface Framework

Telephone

System


World

Wide

Web

Dialogue

Manager

Context

Interpretation

Media

Planning

Language

Understanding

ASR

DTMF tone recogniser

TTS

Language

Generation

Speech Synthesis ML

Reusable components

Speech Recognition

Grammar ML

N
-
gram Grammar ML

Natural Language

Semantics ML

VoiceXML

Lexicon

Prerecorded audio player

Key Documents

Voice Extensible Markup Language (VoiceXML) Version 2.0
Recommendation 16 March2004

http://www.w3.org/TR/2004/PR
-
voicexml20
-
20040203/


Speech Recognition Grammar Specification Version 1.0 W3C
Recommendation 16 March 2003

http://www.w3.org/TR/2003/PR
-
speech
-
grammar
-
20031218/


Speech Synthesis Markup Language Version 1.0 W3C Candidate
Recommendation 18 December 2003

http://www.w3.org/TR/2003/CR
-
speech
-
synthesis
-
20031218/


Voice Browser Call Control: CCXML Version 1.0 W3C Working
Draft 12 June 2003

http://www.w3.org/TR/ccxml/


Semantic Interpretation for Speech Recognition

W3C Working Draft 1 April 2003

http://www.w3.org/TR/semantic
-
interpretation/

Voice Applications: Some Examples


Basic interactive voice response (IVR)

Computer: For stock quotes, press 1. For trading,
press 2.

Human: (presses DTMF “1”)




Basic speech IVR

Computer: Say the stock name for a price quote.

Human: SmartSpeech Technologies

Advanced speech IVR

C: Stock Services, how may I help you?

H: Uh, what’s SmartSpeech trading at?


“Near
-
natural language” dialogue

C: How may I help you?

H: Um, yeah, I’d like to get the current price of
SmartSpeech Technologies

C: SmartSpeech is up two at sixty eight and a half.

H: OK. I want to buy one hundred shares at market price.

C: …

Voice Applications: Some Examples

Voice Applications and the Internet


Information retrieval


News, weather, sports, traffic, stock quotes.


e
-
Transactions (e
-
commerce, e
-
tailing, etc.)


Customer service: package tracking, account status, call
centres.


Financial: banking, stock trading.


Telephone services


Auto attendant, call routing, email reader.


Personal voice activated dialling.


One
-
number find
-
me services.


Travel


Driving directions, flight information.


Games and Entertainment


Horoscopes, trivia, music, movies.

Banking demo (Loquendo)

XML


E
x
tensible
M
arkup
L
anguage developed by the World
Wide Web Consortium


Used for adding annotations to text.

The annotations
describe the characteristics and properties of the
text.

XML annotations are used for two general
purposes:



XML interpreters present the text to the user via
standard formats and layouts derived from the XML
annotations.


Users formulate questions using XML annotations to
retrieve specific text.


XML Languages

XML

SRGS

SSML

SMIL

SVG

XHTML

CCXML

VoiceXML

XML Languages


XHTML is a family of current and future document types
and modules that reproduce, subset, and extend HTML




SVG (Scalable Vector Graphics) is a language for
describing two
-
dimensional graphics and graphical
applications in XML


SMIL
-

Synchronized Multimedia Integration Language
allows integrating a set of independent multimedia
objects into a synchronized multimedia presentation.
Using SMIL, an author can


describe the temporal behavior of the presentation


describe the layout of the presentation on a screen


associate hyperlinks with media objects




XML Languages


VoiceXML


SSML
-

The Speech Synthesis Markup Language
Specification provides a rich, XML
-
based markup language for
assisting the generation of synthetic speech in Web and other
applications. The essential role of the markup language is to
provide authors of synthesizable content a standard way to
control aspects of speech such as pronunciation, volume,
pitch, rate, etc. across different synthesis
-
capable platforms.


SRGS
-

Speech Recognition Grammar Specification defines
syntax for representing grammars for use in speech
recognition so that developers can specify the words and
patterns of words to be listened for by a speech recognizer.


CCXML
-

Call Control eXtensible Markup Language provides
telephony call control support for VoiceXML





XML Syntax


Elements


represented by tags which support the text being
annotated

<prompt> hello </prompt>


Standalone tag

<prompt> Welcome to the Student System main menu
<break/> The system provides details on … </prompt>


Embedded tag

<prompt> Welcome to the Student System <emphasize>
main menu </emphasize> </prompt>


Attributes

<grammar type="application/srgs+xml" root="source">


Comment

begins with “<!
--

” and ends with “
--
>”.




Special characters


< (less than)


&lt;

> (greater than)


&gt;

&




&amp;

VoiceXML Example

<?xml version="1.0”?>


<vxml version=“2.0">

<form>

<field>

</field>

</form>

</vxml>

VoiceXML 2.0 dialog language

<prompt>

say <emphasis> students </emphasis> or <emphasis> courses
</emphasis>

</prompt>

Speech Synthesis Markup

<grammar type = "application/grammar+xml" root = “choice"

<rule id = “choice">

<one
-
of>

<item> students




</item>

<item> courses




</item>

</one
-
of>

</rule>

</grammar>

Speech Recognition Grammar

Semantic Tag

<tag> $= “std” </tag>

<tag> $= “crs” </tag>

What is VoiceXML?

VoiceXML is a language for specifying voice dialogues



Output elements


pre
-
recorded audio files


text
-
to
-
speech (TTS)



Input elements


touch
-
tone keys (DTMF)


automatic speech recognition (ASR)


Audio recordings <record>



For a complete list and description, see the VoiceXML v2.0
specification, section 1.4:


http://www.w3.org/TR/voicexml20/#dml1.4

VoiceXML: main concepts in voice
dialogues


Session


Application


Dialog (forms and/or menus)


Subdialog


Grammar


Events


Control logic


Connection control

Application


An
application

is a set of
documents sharing the same
application root document
.


Whenever the user interacts
with a document in an
application, its application root
document is also loaded.


The application root document
may contain variables and
grammars that are available to
other documents e.g.


Global commands such as
‘help’, ‘cancel’, ‘operator’,
‘start over’

Dialog


The basic unit of interaction in a VoiceXML document is the
dialog


Form


Defines an interaction that collects values for a set of field
item variables


May consist of one or more fields


Each field contains a prompt and a grammar that defines
the set of allowable user inputs for that field


Menu


Presents the user with a choice of options and then
transitions to another dialog based on that choice


Subdialog


Invokes a new interaction and then returns to the calling
document


Example: to validate the user’s id

Logic, Events and Call Control


Presentation logic



Control flow <if>, <else>, etc.


ECMAScript client
-
side scripting <script>


Server
-
side/dynamic content generation <submit>



Events


Bad input <noinput>, <nomatch>


Shorthand <help>


<catch>, <throw>



Call Control


<transfer>
-

Call transfer and bridging


<disconnect>
-

Disconnect



Features outside the scope of
VoiceXML

Features handled by traditional Web application
programming techniques are outside the scope of
VoiceXML


application logic


database operations


interfaces to legacy systems (e.g., "screen scraping")

Directed Dialogue

C: Please say the area for
which you want the
weather.

H: Midlands.

C: Please say the day for
which you want the
weather.

H: Thursday

The computer controls the sequence of the dialogue. Fields
must be entered in order.


Weather Information

Area:

Day:

Mixed Initiative Dialogue

Both computer and human control dialogue flow. Fields
can be entered in any order; several fields can be
entered with one utterance.

C: Please say the area and day for which you would like
the weather.

H: Midlands Thursday


Requires more complex grammars, and a flexible dialogue
flow

Tools for Developing VoiceXML applications


Standalone development kits


no need for telephone or Internet connection


can run on standard PC


simulated telephone interface (using microphone)



Voice gateways


include a voice browser, telephone and Internet
capabilities


application can be hosted on the gateway, developer
does not require telephony handware and software


developer accounts require sign
-
up (usually free)


additional facilities: syntax checking, real
-
time debugging,
documentation, libraries of audio clips, grammars, code
samples


mainly US toll
-
free numbers

Standalone Development Kits


IBM WebSphere Voice Toolkit 5.1


Windows 2000, XP


12 languages supported



Nuance V
-
Builder 2.0


Windows 2000


27 languages supported



Motorola Mobile ADK


Windows 95, 98, 2000, NT


simulates VoiceXML and WAP applications

IBM Voice Toolkit


Uses VoiceXML technology to enable developers to
create voice
-
enabled Web applications and test them
on a desktop workstation


Easy
-
to
-
use development environment for creating
Voice XML applications


Free download


IBM Voice Toolkit: Main features


Code editor


Re
-
usable dialog components


Debugging tool


Grammar building tool (SRGS XML and ABNF)


Grammar testing and enumeration


Creating and tuning pronunciations


Recording audio prompts


WebSphere Studio 5.1.0 support


Natural Language Understanding modeling tools to
import, change, and export data for building models


CCXML 1.0 support


Gateways

A
gateway

converts the telephone line of the telephone network
to the Internet Protocol world of the Internet and vice versa.


A gateway enables users to use a telephone or cell phone to
interact with the computer.


Examples:


BeVocal


http://café.bevocal.com


Tellme


http://studio.tellme.com


HeyAnita


http://freespeech.heyanita.com


VoiceGenie

http://developer.voicegenie.com


Voxeo


http://community.voxeo.com


Voxpilot


http://www.voxpilot.com

Bevocal Cafe


Online VoiceXML development tool


Free sign
-
up


Interaction either through text
-
based Vocal Scripter or
by telephone (US number only)


Range of developer’s tools, including VoiceXML
checker, log browser, file management, etc.


Extensive documentation


VoiceXML sample code


http://cafe.bevocal.com

Voxpilot


European based VoiceXML provider


Many European languages supported, including:

Czech, Danish, French, German, Italian, Norwegian,
Spanish (and others)



Voxbuilder tool


Free sign
-
up


Extensive documentation and code samples


File manager, code validator, log browser


Access numbers throughout Europe

e.g. UK :
0870 730 0582

£0.0673 per minute (Daytime)

£0.0336 per minute (Evening)



Voxpilot URL

http://www.voxpilot.com/index.php