Speech Products

greasyservantInternet and Web Development

Jul 30, 2012 (5 years and 3 months ago)

395 views


1

Speech Products

DCS860A


Fall 2001

Mary L. Manfredi


Introduction


This paper will mainly focus on the speech products available on the market
today. I will be reviewing products from the following companies: (Lernout and
Hauspie) L&H, Nuance, Voice Pilo
t, Philips, Speechworks, AT&T and IBM.
Speech products are broken down into several categories. You have those
products that can be purchased and used by a customer directly. You also have
those products that are used in support of a complete speech so
lution. These
products are used by integrators and developers who provide complete speech
solutions for businesses. Examples of such products are speech recognition
engines and test
-
to
-
speech engines.


SpeechWorks Products [30]


SpeechWorks is one of the

prominent companies today in the arena of speech
technologies. It’s products are available in several languages including French,
Spanish, and German.



Speech Recognition

-

OpenSpeech Recognizer 1.0
-

This product is a speech
recognition engine optimiz
ed for VoiceXML. It uses underlying
technology created by AT&T.


-

SpeechWorks 6.5 Second Edition
-

is a comprehensive software
product for building network
-
based speech recognition services.
It supports multiple languages including French (Continental
and

Canadian), Spanish (Latin American, Anglo and Castillian),
Cantonese (Hong Kong), Mandarin (Taiwan), Dutch, German,
English (US, UK, Australia, Asian and South African), Korean,
Japanese, Portuguese (Brazilian) and Italian.


-

Speech2Go


Speech2Go is an ad
vanced speech recognition
engine designed for embedded applications in mobile devices,
handsets and automotive systems. It has the ability to
recognize new names and phrases using its grammar update
tool. This eliminates the need for the user to teach it
. It
supports several language including US, UK and Australian
English, German, French and Italian. Its size is relatively small,
between 2
-
8 MB depending on the vocabulary size and
vocabulary is limited by available hardware resources (memory

2

and CPU sp
eed). It is available for the Windows CE operating
system.


Text
-
to
-
Speech

-

Speechify
-

Speechify™ from SpeechWorks is a next
-
generation
text
-
to
-
speech (TTS) engine that is more natural
-
sounding
synthesized speech than in the past. It uses AT&T natural
la
nguage processing technology to accomplish this.

-

ETI


Eloquence


is a text
-
to
-
speech system that can be used
with any TTS application. Its low memory requirements are
ideal for embedded devices such as in
-
car navigation systems,
next generation mobile p
hones and hand
-
held devices. Because
of its support for many different languages it is also appealing
to international companies.


Voice XML and Standards


VoiceXML is the emerging standard for speech services. It is a markup
language, used to describe a
n interaction between a caller on a
telephone, and a server. VoiceXML browsers provide an interface between
a caller on a standard telephone and an application running on the web.


-

Open Vxi


is a VoiceXML interpreter which allows developers to
add VoiceXM
L capabilities to their products without developing
that technology themselves.


Speaker Verification

-

SpeechSecure™
-

This product uses biometric technology to
verify a caller's identity based on the characteristics of his or her
unique vocal patterns. SpeechSecure provides an extremely
tight level of security for callers who access personal or
sensitive
information over the telephone. Many commercial
applications who have requirements of high security, such as
financial services and other self
-
service sites accessing personal
information will be able to use this service

-


Lernout & Hauspie Products [23]


-

L
&H Voice Xpress Professional (Version 5)


This voice
recognition package features dictation to most windows based
applications. It also works with Windows 2000 and is one of
the leading packages that works with Microsoft Office. A new
feature reads bac
k your text in a very human sounding voice.
Also available are extra plug
-
in vocabularies for specific fields

3

such as Business and Finance, Technology, Leisure and In The
News.


-

PowerTranslator Pro Version 7


This product translates English
to Spanish, G
erman, French, Italian, Portuguese and Japanese.
It can be integrated into The MS office 2000 products
preserving the document formatting. This is especially
meaningful to a use because sometimes the format is what
takes the most time to do. Text, origi
nal and translated, can be
displayed side by side.


-

Dragon NaturallySpeaking Professional 6.0


This speech
recognition software is geared for corporate and professional
use. It allows for dictation of memos, reports and other
documents, enter data, fill
in forms, send e
-
mail, and work on
the web


all by voice. This software handles multiple specialty
vocabularies and lets you create custom commands to automate
tasks. It is available in American and British English, French,
Italian and Spanish.


-

Dragon
NaturallySpeaking Legal Suite 6.0


This suite contains
specialized terminology used in the legal profession. It also
includes Corel WordPerfect Suite 8.


-

Dragon NaturallySpeaking Preferred


This edition offers Text
-
to
-
Speech and Dictation Playback which

assists with editing.
NaturallyMobile has support for hand
-
held recording devices. It
is available in American and British English, French, Italian and
Spanish.


-

Dragon NaturallySpeaking Medical Suite 6.0


The distinctive
feature of this edition is the
specialized terminology used in the
medical profession.


-

Dragon NaturallySpeaking Mobile Recorder Option Kit


contains
the hand
-
held digital recorder which holds up to 40 minutes of
recorded speech, equating to about 10 pages.


-

Dragon NaturallySpeaking M
obile


This is a package containing
Dragon NaturallySpeaking Preferred software, the hand
-
held
Dragon NaturallyMobile digital recorder and a headset
microphone.




4

-

DragonDictate


This product allows the operation of the pc to
be totally hands free. One i
s able to create, edit, format and
move text by voice into most window applications including MS
Word, Corel WordPerfect, MS Excel, Netscape Navigator and MS
IE. Activation of menus and dialog boxes is done by using the
words on the screen. It is availab
le in American and British
English, French, Italian and Spanish.


Voice Pilot Products [29]


-

Hear
-
Say


This product allows you to make your own voice
files and compresses them so they can be easily sent over the
internet. The voice file can be inserted i
nto standard word
processing programs like MS Word and WordPerfect or even
into a cell of a standard spreadsheet program.


-

Hear
-
Look
-

A product that lets you send pictures along with
your voice file. It also compresses the files well so that they
take les
s time to send.


-

PAL


Allows the control of a computer by voice. It works with
most popular speech engines including IBM, Dragon and L&H.
You can use PAL to keep your appointment book, your to
-
do list
and make notes on your contacts without touching the

keyboard. You can also synchronize your Palmtop and other
computing tools by voice.


Nuance Products [25]


Nuance is a speech technology company that offers a suite of voice software
products in the categories of server software, voice browser solutions,

application enablers, and developer tools.



Server Software

-

Nuance 7.0
-

is core speech recognition software for voice
-
driven applications over the telephone. Some of the f
eatures
include wireless and hands free support, dynamic language
detection™ for multi
-
lingual systems, hot swappable grammars,
and enhanced barge
-
in.


-

Nuance Verifier 3.0
-

Nuance’s voice authentication technology,
uses these voiceprints to deliver high s
ecurity and secure access
at a low cost without the use of passwords and PINs.



5

Voice Browser Solutions

-

Nuance Voyager
-

is a voice browser that enables a user to surf
the web over the phone. It also takes advantage of Nuance's
integrated speech recognitio
n and voice authentication, allowing
personal information to remain secure.



Application Enablers

-

Nuance SpeechObjects


SpeechObjects are reusable software
components. Developers use SpeechObjects to build speech
recognition and voice authentication app
lications.


Developer Tools

-

Nuance V
-
Builder


This is a graphical tool which enables
developers to create voice applications.

-

Nuance Foundation SpeechObjects


are 25 pre
-
built speech
application components.

-

Nuance Grammar Builder


is also a graphical to
ol which
enables developers to create, view, edit, manage

-

Nuance V
-
Optimizer


a tool for analyzing and tuning deployed
applications.


AT&T Products [20]



Text
-
to
-
Speech Engine


This package comes in 3 flavors, Server, Server
-
Lite and Desktop. The
serve
r edition is targeted for large businesses serving the needs of many users
across an enterprise network. It includes a female and male U.S. English voice
and supports the creation of unique customized voices. The development
platforms that it supports in
cludes Linux, Solaris, Window XP, NT and 2000. The
Server
-
Lite configuration is geared towards small business and the Desktop
edition targets individual end
-
users who want to add TTS capabilities to their
own desktop applications.



Customized Voice Produ
cts


The AT&T Labs Natural Voices customized voice products gives the ability
to those that have the AT&T TTS engine to create made
-
to
-
order voices. Two
packages are available for this, AT&T Labs’ Natural Voices fonts and AT&T Labs’
Natural Voices icons.

The fonts give businesses a library of voices to use when
adding TTS capabilities to an application. The icons include custom
-
developed
TTS voices. The voices are developed closely with the customer in one of two
ways. The customers can supply their own

voice talent. AT&T Labs would record
the voice talent and then produce the synthesized voice. The alternative is that

6

the customer specify the characteristics of the voice and the AT&T Labs find a
voice talent to match the customer request.





Philips P
roducts [26]



Dicatation Software

-

SpeechMagic


is a client/server speech recognition software
package used by developers to create applications.

-

Speech SDK


a professional software development kit used to
speech
-
enable software applications.


Digital Di
ctation

-

SpeechMike Family


A set of devices that are a combination of
speaker, microphone and mouse.

-

Digital Dictation Solution


this solution contains several
different models of digital recorders.


Telephony Solutions

-

SpeechPearl


is a product family
with different components
which is suitable for a variety of telephone applications such as
directory assistance, information and customer service, banking
applications and name dialing.

-

SpreechMania


is a natural speech recognition and language
underst
anding software platform to automate telephone based
information and transaction services.

-

SpeechWave


is a speech recognition solution that integrates
six technologies under a common API (application programming
interface). The technologies include dis
crete digits, continuous
digit strings, alphanumeric strings, phonetic vocabularies,
speaker dependent recognitions and speaker verification.


Voice Control


Voice Control is the use of embedded speech technologies.
Applications for these technologies inc
lude navigation systems, telematic
applications, car features, car equipment, mobile cellular phones,
handheld devices, television, audio and others.


IBM Products [22]


IBM has a host of voice products broken down into two categories. The
categories ar
e Home and Small Business solutions and Enterprise solutions.
Below is a listing of these solutions.


7



Home and Small Business Solutions


The ViaVoice family of products for home and small business use provide
the necessary speech recognition software to
the customer to perform dictation,
internet and command and control features. The ViaVoice vocabularies allow the
expansion of vocabularies such as medical and legal.


-

ViaVoice Pro

-

ViaVoice Advanced

-

ViaVoiceStandard

-

ViaVoice for Mac OS X

-

ViaVoice for Mac

-

Millennium

-

ViaVoice for Mac


enhanced

-

ViaVoice Millennium Pro

-

ViaVoice Vocabularies (Mac)

-

ViaVoice Vocabularies (Win)



Enterprise Solutions


As environments become more mobile, conventional interfaces are
becoming less usable. Voice technology will
become the primary user interface
for accessing information and conducting transactions in the new environment.
IBM provides middleware and component parts for companies to build their own
voice solutions as well as all
-
inclusive voice solution packages.



-

IBM WebSphere Voice Server


this software encompasses both
a speech recognition and a text
-
to
-
speech engine. It enables
developers to develop and deploy voice
-
enabled e
-
business
solutions.

-

WebSphere Voice Response


This is a solution that will allow
businesses to answer and screen a large number of calls
simultaneously.

-


WebSphere Voice Toolkit

-

IBM Message Center

-

WebSphere translation server


A useful tool for companies
dealing internationally. It enables the translation of web pages
into different
languages without the need to recreate them
manually.

-

Mobility Suite


It enables PDA functions to respond to voice
commands.

-

Mobile Device Edition

-

Dictation for Linux

ViaVoice Test
-
to
-
Speech


This gives Text
-
to
-
Speech abilities to mobile devices
such as
PDAs, SmartPhones and automobiles.


8

Many of the products mentioned above are used in developing voice biometrics
applications. The best
-
known commercialized forms of voice biometrics are
speech verification and speaker identification [17]. Of the two, spe
aker
identification is the most difficult because when the voice sample is taken from
the user it must be compared to all the voices it has available in the database.
Speaker verification, on the other hand, takes the user’s voice sample and also
takes w
ho they claim to be. Then the 2 samples are compared to see if they
match. The use of voice biometrics is growing and it appears that it will continue
to be used as a means of identification for sensitive applications.

An article written back in 1994 ent
itled “Survey of Current Speech Technology”
concluded that the greatest potential lies in the development of systems that
combine recognition and synthesis to support conversational interaction between
humans and computers in complex task domains [13]. L
ooking at the variety of
products that exist now we can see that there are many speech applications
today that perform rather complex interactions and more coming in the near
future.



References:


[13] Rudnicky, A., Hauptmann, A., Lee, K., “Survey of Cu
rrent Speech
Technology”, Communications of the ACM, March 1994, Vol. 37 No. 3.



[17] Markowitz, J., “Voice Biometrics”, Communications of the ACM, September
2000, Vol. 43, No. 9



[20]
http://www.naturalvo
ices.att.com


[21]
http://www.synapseadaptive.com/syn/pro/soft/speech.htm


[22]
http://www
-
3.ibm.com/software/speech/


[23]
http://www.lhsl.com/naturallyspeaking/


[25]
http://www.nuance.com/


[26]
http://www.speech.philips.com/


[29]
http://www.voicepilot.com/


[30]
http://www.speechworks.com