Speech Recognition—Where are We - College of Education

birthdaytestAI and Robotics

Nov 17, 2013 (4 years and 6 months ago)


Speech Recognition

Where are We?

Ms. Iris C. Ellis, M.Ed.


Adult and Career Education Department

Valdosta State University

Valdosta, GA 31698

FAX: (229) 333

School phone: (229) 333

mail: icellis@valdosta.edu

Speech Recognition

Where are We?

Teachers have been debating the past few years about which grade level is most
appropriate for teaching keyboarding. The State of Georgia does not fund keyboarding
courses at the high school level. The responsibility o
f teaching keyboarding in Georgia
has been given to the middle schools (grades 6
8). Middle schools have been working
with keyboarding curriculum, new software, and new textbooks since textbook adoption
took place two years ago. High schools have been st
ruggling with handling students
without keyboarding skills enrolled in computer application classes and other computer
lab courses. Speech recognition technology or voice recognition technology (SRT or
VRT) might be a solution for students who have not re
ceived keyboarding training and
for those who simply have not developed keyboarding speeds over 20 to 30 words per
minute (wpm).

In a study conducted by Ruthann and Marvin Dirks (1997), 44 college
business communication students were divided into tw
o groups. All but two of these
students reported having had at least a one
semester keyboarding class, and some
having had an advanced keyboarding class as well. Group A was given a five
timed writing and was allowed to practice it once using Word
Perfect software. The next
day, this same group was introduced to DragonDictate speech recognition (SR) system
and worked through the tutorial, which took approximately 1.5 hours. Group A then
dictated the same timed writing used the day before. Group B

completed the SR
training and dictation on day one and did the keyboarding timed writing the following
day (the opposite order of Group A).

Twenty percent of all the students could dictate faster than they could keyboard.
As a group, however, speech rec
ognition dictation times had a mean of 27.7 wpm
(SD=5.3), whereas keyboarding speeds tended to be higher, with a mean score of 41.1
wpm (SD=13.8). The SR
dictated material tended to have fewer errors (M=2.3) than did
the keyboarded timed writing (M=5.8).

Since 20 percent of the students in the study
dictated straight copy with SR faster after a brief tutorial than they could keyboard after
a semester or more keyboarding training and practice, Dirks and Dirks raised the issue
of whether teachers should con
tinue keyboarding training or simply teach SR skills.

Research on SR technology began in the late 1950s. IBM demonstrated an early
version of SRT in 1964 with Shoebox. The movie
2001: A Space Odyssey
in 1968 featuring the HAL
9000 (a ficti
onal computer) and brought speech recognition to
the attention of the general public, and it has steadily progressed since then (Savitsha,
Coleman, Parkin, Pye, & Patel, 1998). With all of the improvements in computer
memory capacities, processor speeds,
and software advancements, it is ironic that SR
technology has not become an integral part of basic computer hardware and software
systems. In researching this question, the Internet was used to find many sources of
information about SR technology. Seve
ral local businesses were contacted, including
doctors’ offices, lawyers’ offices, and a hospital. Information was also obtained from a
representative from Technology Assistance Group, an organization geared toward
meeting the needs of physically disabled


Mary Allen, sales representative for Technology Assistance Group, reported that
the general public is simply not aware of this technology. As far as educational use,

Ms. Allen stated that physically disabled students in special education classe
s are using
SR extensively, but the majority of teachers in regular classrooms are not
knowledgeable of its benefits.

Jim Seymour, author of “The Truth about Speech Recognition,” stated that the
disconnect between the technology and the practical usage of
SR is our expectations.
He contends that most people who have tried SR briefly have failed to understand that
how you use the product shapes your expectations and your satisfaction. Seymour
stated that Lernout & Hauspie’s Dragon NaturallySpeaking Prefe
rred 5.0 is the current
leader in SR software. He recommends this software as well as ViaVoice Pro 8.0. He
gave three tips for new users of speech recognition, which are as follows: (1) Take the
time to dictate the paragraphs given in the manual to tra
in the software and use the
microphone, not the mouse and keyboard, to make corrections. This will fine tune the
software to your voice. (2) The process works much better if you consciously slow
down and pay a little extra attention to articulation. You

do not have to sound like a
zombie; just slow down a little. (3) A good noise
canceling microphone is key. Other
reasons given for SR not being included as basic hardware/software were cost, time
needed to train the user and the software, computer syste
m requirements,
environmental requirements, and technology inefficiencies.

Cost is certainly a factor. Speech recognition software choices typically range
from about $50 to $250. Specialized vocabulary lists can be purchased separately and
installed to a
dd industry
specific vocabulary terms (i.e., medical, legal, etc.). A noise
canceling headset with a microphone is included with the higher
priced programs, and
the priciest program options may include both a headset and an accompanying mobile
recording d
evice for dictation purposes while on the road (Underhill, 2001). The
microphone recommended by several sources was the DSP
300 by Plantronics, which
features a digital USB
connection headset. This model costs approximately $100.

Time is also a factor.

To make SR work, a user must take the time to train the
software program to learn his/her speech patterns. Using the correcting feature of the
software is a must in order to improve the accuracy of the program. One author
compared getting a speech recog
nition software program to getting a puppy; the time
that is put into training it will reward you later with an obedient, problem
free program
(Underhill, 2001). Fogg and Wightman (2000) recommended being patient through the
learning process and stated th
at the system will become more efficient and accurate in
recognizing voice patterns after about five hours of use.

System requirements must also be considered. Because speech recognition
requires searching through very large data structures, the speed
of the recognition will
be directly affected not only by the processor speed, but also by the amount of memory
that is available. A leading SR trainer in 1999 reported that a 333 MHz processor speed
with 96 MB RAM was necessary to achieve desirable perfor
mance (Fogg & Wightmann,
2000). Disk space is another area in which more is better. Digitized speech can
consume approximately a megabyte per minute. If used in a classroom setting, the
computer would need to have enough disk space for four or five dif
ferent users.

Environment is another consideration. Authors of several articles stated that it
might be impossible to use a speech recognition program in a noisy environment.
Recordings made with a cheap microphone placed on a table are likely to produc
recognition error rates in excess of 50 percent (Fogg & Wightman, 2000). These
environmental requirements would be very difficult in a high school or middle school
computer lab setting or in many work settings.

Other common problems associated with SR a
re punctuation and word
fragments. The single greatest impediment to widespread use of SR is the inability of
the software to generate punctuation automatically (Fogg & Wightman, 2000). Most SR
products must have punctuation dictated (i.e., “period”, “co
mma”, “question mark”).
Also, filled pauses, such as “ummm” and “err” generally cause recognition errors. The
software will attempt to recognize the sounds as legitimate dictionary words. Users
must be trained not to use filled pauses and formulate th
eir thoughts in order to speak
fluently. Silence is recognized, so it is far better to dictate in short, fluent bursts
separated by large pauses than to fill the pauses with speech that should not be
transcribed. Another topic of concern is configuration
. Allen stated that SR software is
not easily configured for computer networks.

Perhaps the most important factor to consider when contemplating using SR
software instead of teaching keyboarding is SR’s practicality. Is speech recognition
faster? One a
uthor reported that the existing speech recognition/dictation products do
not yield significant productivity gains in comparison to a skilled keyboarder. But, SR
would allow workers or students with good subject knowledge but poor typing skills to

at the high ends of productivity achieved by an average keyboarder (Savitska

et al., 1998). In South Georgia, SR is being used in few legal offices, medical offices,
or in medical records at area hospitals. Allen states that one reason may be that most

SR packages transcribe into Microsoft Word, and many legal transcriptionists prefer

Where is SR succeeding? Several sources reported that voice recognition
software was being used to offer relief from repetitive stress problems associated wi
prolonged periods of keyboarding. Computer users with arthritis, carpal tunnel
syndrome, or a myriad of other debilitating problems that limit the use of the upper body
could make excellent use of SR technology.

Dr. Richard O’Brien, emergency departm
ent physician at Moses Taylor Hospital
in Scranton, Pennsylvania, is using SR to improve patient record legibility, accuracy,
and availability (Durlach, 2000). O’Brien uses Lernout & Hauspie’s Clinical Reporter for
its SR software. Also, Duke University’
s radiology department has used SR to eliminate
sourced transcription services. Duke wanted to shrink average report turnaround
times to within four hours of the radiologists’ dictation. The use of SR resulted in a 91
percent reduction. They used Le
rnout & Hauspie’s PowerScribe for their SR software.

Another market where speech recognition is making great strides is online

based systems. AT&T has taken a minority stake in Speech Words
International, Inc., and MCI WorldCom confirmed
technical trials with Nuance, Inc. All
carriers, except MCI, plan to use SR technology for speed
enabled network and hosting
services, voice portals, and call
center packages. Megan Gurley, a director at the
Yankee Group, stated that speech
recognition t
echnology, as carriers see it, is a way to
make up for declining long
distance voice revenue. She stated that “the dam is about to
break, and when it does, users will start seeing speech recognition used in many
different ways.”

The software is ready and
the hardware is capable. Several researchers
concluded that SR is easier to use and learn than keyboarding. But, can the
technology realistically be taught in middle schools or high schools? Most teachers
would agree that it is not practical, yet. Wit
h the requirement of a noise
free setting and
the additional cost per workstation, it is still more practical and cost
effective to teach an
introductory keyboarding unit. Heads up, though; technology groups are hot on this


Dirks, R.,
& Dirks. M. (1997). Introducing business communication student to
automated speech recognition.
Journal of Education for Business, 72

(3), 153

Durlach, P. (2000, April). Talking up an emerging technology.
Management Technology,


T., & Wightman, C. W. (2000, April).
Improving Transcription of Qualitative
Research Interviews with Speech Recognition Technology.

Paper presented at the
Annual Meeting of the American Educational Research Association , New Orleans, LA.

Savitska, J., Col
eman, P., Parkin, J., Pye, A., & Patel, N. ( 1998, December).
Voice recognition technology

P’s in a pod.
Voice Recognition Technology Project

line], Available:

ymour, J. (2001, October). The truth about speech recognition.

P C Magazine.


Underhill, S. (2001, January). Speech recognition software solutions, a mini

line], Available:

Wallace, B. (2000, June). Carriers move to further speech technology.
Information Week

line], Available: