Speech Products

greasyservantInternet and Web Development

Jul 30, 2012 (5 years and 11 months ago)



Speech Products


Fall 2001

Mary L. Manfredi


This paper will mainly focus on the speech products available on the market
today. I will be reviewing products from the following companies: (Lernout and
Hauspie) L&H, Nuance, Voice Pilo
t, Philips, Speechworks, AT&T and IBM.
Speech products are broken down into several categories. You have those
products that can be purchased and used by a customer directly. You also have
those products that are used in support of a complete speech so
lution. These
products are used by integrators and developers who provide complete speech
solutions for businesses. Examples of such products are speech recognition
engines and test
speech engines.

SpeechWorks Products [30]

SpeechWorks is one of the

prominent companies today in the arena of speech
technologies. It’s products are available in several languages including French,
Spanish, and German.

Speech Recognition


OpenSpeech Recognizer 1.0

This product is a speech
recognition engine optimiz
ed for VoiceXML. It uses underlying
technology created by AT&T.


SpeechWorks 6.5 Second Edition

is a comprehensive software
product for building network
based speech recognition services.
It supports multiple languages including French (Continental

Canadian), Spanish (Latin American, Anglo and Castillian),
Cantonese (Hong Kong), Mandarin (Taiwan), Dutch, German,
English (US, UK, Australia, Asian and South African), Korean,
Japanese, Portuguese (Brazilian) and Italian.



Speech2Go is an ad
vanced speech recognition
engine designed for embedded applications in mobile devices,
handsets and automotive systems. It has the ability to
recognize new names and phrases using its grammar update
tool. This eliminates the need for the user to teach it
. It
supports several language including US, UK and Australian
English, German, French and Italian. Its size is relatively small,
between 2
8 MB depending on the vocabulary size and
vocabulary is limited by available hardware resources (memory


and CPU sp
eed). It is available for the Windows CE operating




Speechify™ from SpeechWorks is a next
speech (TTS) engine that is more natural
synthesized speech than in the past. It uses AT&T natural
nguage processing technology to accomplish this.




is a text
speech system that can be used
with any TTS application. Its low memory requirements are
ideal for embedded devices such as in
car navigation systems,
next generation mobile p
hones and hand
held devices. Because
of its support for many different languages it is also appealing
to international companies.

Voice XML and Standards

VoiceXML is the emerging standard for speech services. It is a markup
language, used to describe a
n interaction between a caller on a
telephone, and a server. VoiceXML browsers provide an interface between
a caller on a standard telephone and an application running on the web.


Open Vxi

is a VoiceXML interpreter which allows developers to
add VoiceXM
L capabilities to their products without developing
that technology themselves.

Speaker Verification



This product uses biometric technology to
verify a caller's identity based on the characteristics of his or her
unique vocal patterns. SpeechSecure provides an extremely
tight level of security for callers who access personal or
information over the telephone. Many commercial
applications who have requirements of high security, such as
financial services and other self
service sites accessing personal
information will be able to use this service


Lernout & Hauspie Products [23]


&H Voice Xpress Professional (Version 5)

This voice
recognition package features dictation to most windows based
applications. It also works with Windows 2000 and is one of
the leading packages that works with Microsoft Office. A new
feature reads bac
k your text in a very human sounding voice.
Also available are extra plug
in vocabularies for specific fields


such as Business and Finance, Technology, Leisure and In The


PowerTranslator Pro Version 7

This product translates English
to Spanish, G
erman, French, Italian, Portuguese and Japanese.
It can be integrated into The MS office 2000 products
preserving the document formatting. This is especially
meaningful to a use because sometimes the format is what
takes the most time to do. Text, origi
nal and translated, can be
displayed side by side.


Dragon NaturallySpeaking Professional 6.0

This speech
recognition software is geared for corporate and professional
use. It allows for dictation of memos, reports and other
documents, enter data, fill
in forms, send e
mail, and work on
the web

all by voice. This software handles multiple specialty
vocabularies and lets you create custom commands to automate
tasks. It is available in American and British English, French,
Italian and Spanish.


NaturallySpeaking Legal Suite 6.0

This suite contains
specialized terminology used in the legal profession. It also
includes Corel WordPerfect Suite 8.


Dragon NaturallySpeaking Preferred

This edition offers Text
Speech and Dictation Playback which

assists with editing.
NaturallyMobile has support for hand
held recording devices. It
is available in American and British English, French, Italian and


Dragon NaturallySpeaking Medical Suite 6.0

The distinctive
feature of this edition is the
specialized terminology used in the
medical profession.


Dragon NaturallySpeaking Mobile Recorder Option Kit

the hand
held digital recorder which holds up to 40 minutes of
recorded speech, equating to about 10 pages.


Dragon NaturallySpeaking M

This is a package containing
Dragon NaturallySpeaking Preferred software, the hand
Dragon NaturallyMobile digital recorder and a headset




This product allows the operation of the pc to
be totally hands free. One i
s able to create, edit, format and
move text by voice into most window applications including MS
Word, Corel WordPerfect, MS Excel, Netscape Navigator and MS
IE. Activation of menus and dialog boxes is done by using the
words on the screen. It is availab
le in American and British
English, French, Italian and Spanish.

Voice Pilot Products [29]



This product allows you to make your own voice
files and compresses them so they can be easily sent over the
internet. The voice file can be inserted i
nto standard word
processing programs like MS Word and WordPerfect or even
into a cell of a standard spreadsheet program.



A product that lets you send pictures along with
your voice file. It also compresses the files well so that they
take les
s time to send.



Allows the control of a computer by voice. It works with
most popular speech engines including IBM, Dragon and L&H.
You can use PAL to keep your appointment book, your to
do list
and make notes on your contacts without touching the

keyboard. You can also synchronize your Palmtop and other
computing tools by voice.

Nuance Products [25]

Nuance is a speech technology company that offers a suite of voice software
products in the categories of server software, voice browser solutions,

application enablers, and developer tools.

Server Software


Nuance 7.0

is core speech recognition software for voice
driven applications over the telephone. Some of the f
include wireless and hands free support, dynamic language
detection™ for multi
lingual systems, hot swappable grammars,
and enhanced barge


Nuance Verifier 3.0

Nuance’s voice authentication technology,
uses these voiceprints to deliver high s
ecurity and secure access
at a low cost without the use of passwords and PINs.


Voice Browser Solutions


Nuance Voyager

is a voice browser that enables a user to surf
the web over the phone. It also takes advantage of Nuance's
integrated speech recognitio
n and voice authentication, allowing
personal information to remain secure.

Application Enablers


Nuance SpeechObjects

SpeechObjects are reusable software
components. Developers use SpeechObjects to build speech
recognition and voice authentication app

Developer Tools


Nuance V

This is a graphical tool which enables
developers to create voice applications.


Nuance Foundation SpeechObjects

are 25 pre
built speech
application components.


Nuance Grammar Builder

is also a graphical to
ol which
enables developers to create, view, edit, manage


Nuance V

a tool for analyzing and tuning deployed

AT&T Products [20]

Speech Engine

This package comes in 3 flavors, Server, Server
Lite and Desktop. The
r edition is targeted for large businesses serving the needs of many users
across an enterprise network. It includes a female and male U.S. English voice
and supports the creation of unique customized voices. The development
platforms that it supports in
cludes Linux, Solaris, Window XP, NT and 2000. The
Lite configuration is geared towards small business and the Desktop
edition targets individual end
users who want to add TTS capabilities to their
own desktop applications.

Customized Voice Produ

The AT&T Labs Natural Voices customized voice products gives the ability
to those that have the AT&T TTS engine to create made
order voices. Two
packages are available for this, AT&T Labs’ Natural Voices fonts and AT&T Labs’
Natural Voices icons.

The fonts give businesses a library of voices to use when
adding TTS capabilities to an application. The icons include custom
TTS voices. The voices are developed closely with the customer in one of two
ways. The customers can supply their own

voice talent. AT&T Labs would record
the voice talent and then produce the synthesized voice. The alternative is that


the customer specify the characteristics of the voice and the AT&T Labs find a
voice talent to match the customer request.

Philips P
roducts [26]

Dicatation Software



is a client/server speech recognition software
package used by developers to create applications.


Speech SDK

a professional software development kit used to
enable software applications.

Digital Di


SpeechMike Family

A set of devices that are a combination of
speaker, microphone and mouse.


Digital Dictation Solution

this solution contains several
different models of digital recorders.

Telephony Solutions



is a product family
with different components
which is suitable for a variety of telephone applications such as
directory assistance, information and customer service, banking
applications and name dialing.



is a natural speech recognition and language
anding software platform to automate telephone based
information and transaction services.



is a speech recognition solution that integrates
six technologies under a common API (application programming
interface). The technologies include dis
crete digits, continuous
digit strings, alphanumeric strings, phonetic vocabularies,
speaker dependent recognitions and speaker verification.

Voice Control

Voice Control is the use of embedded speech technologies.
Applications for these technologies inc
lude navigation systems, telematic
applications, car features, car equipment, mobile cellular phones,
handheld devices, television, audio and others.

IBM Products [22]

IBM has a host of voice products broken down into two categories. The
categories ar
e Home and Small Business solutions and Enterprise solutions.
Below is a listing of these solutions.


Home and Small Business Solutions

The ViaVoice family of products for home and small business use provide
the necessary speech recognition software to
the customer to perform dictation,
internet and command and control features. The ViaVoice vocabularies allow the
expansion of vocabularies such as medical and legal.


ViaVoice Pro


ViaVoice Advanced




ViaVoice for Mac OS X


ViaVoice for Mac




ViaVoice for Mac



ViaVoice Millennium Pro


ViaVoice Vocabularies (Mac)


ViaVoice Vocabularies (Win)

Enterprise Solutions

As environments become more mobile, conventional interfaces are
becoming less usable. Voice technology will
become the primary user interface
for accessing information and conducting transactions in the new environment.
IBM provides middleware and component parts for companies to build their own
voice solutions as well as all
inclusive voice solution packages.


IBM WebSphere Voice Server

this software encompasses both
a speech recognition and a text
speech engine. It enables
developers to develop and deploy voice
enabled e


WebSphere Voice Response

This is a solution that will allow
businesses to answer and screen a large number of calls


WebSphere Voice Toolkit


IBM Message Center


WebSphere translation server

A useful tool for companies
dealing internationally. It enables the translation of web pages
into different
languages without the need to recreate them


Mobility Suite

It enables PDA functions to respond to voice


Mobile Device Edition


Dictation for Linux

ViaVoice Test

This gives Text
Speech abilities to mobile devices
such as
PDAs, SmartPhones and automobiles.


Many of the products mentioned above are used in developing voice biometrics
applications. The best
known commercialized forms of voice biometrics are
speech verification and speaker identification [17]. Of the two, spe
identification is the most difficult because when the voice sample is taken from
the user it must be compared to all the voices it has available in the database.
Speaker verification, on the other hand, takes the user’s voice sample and also
takes w
ho they claim to be. Then the 2 samples are compared to see if they
match. The use of voice biometrics is growing and it appears that it will continue
to be used as a means of identification for sensitive applications.

An article written back in 1994 ent
itled “Survey of Current Speech Technology”
concluded that the greatest potential lies in the development of systems that
combine recognition and synthesis to support conversational interaction between
humans and computers in complex task domains [13]. L
ooking at the variety of
products that exist now we can see that there are many speech applications
today that perform rather complex interactions and more coming in the near


[13] Rudnicky, A., Hauptmann, A., Lee, K., “Survey of Cu
rrent Speech
Technology”, Communications of the ACM, March 1994, Vol. 37 No. 3.

[17] Markowitz, J., “Voice Biometrics”, Communications of the ACM, September
2000, Vol. 43, No. 9