Speech Recognition 1

mumpsimuspreviousAI and Robotics

Oct 25, 2013 (3 years and 8 months ago)

86 views

Speech Recognition
1

RUNNING HEAD:

Speech Recognition and Keyboarding




















Speech Recognition Devices

Is Keyboarding An


Endangered Skill?


Iris C. Ellis


Valdosta State University


Speech Recognition
2

Abstract


Speech recognition is an input system that consists of a microphone
and speech
recognition software. The microphone may be a tabletop model, however the
most common style used with speech recognition is the head set model. Speech
recognition software is complex. Sounds received via a microphone must be
broken down into
phonemes; then, a “best guess” algorithm is used to match
phonemes and syllables to a database of possible words. To solve speech
inconsistencies like accent, volume, pitch, inflection, male vs. female, sarcasm,
and colloquialisms, the software must exami
ne context and word patterns to
determine the correct meaning. This requires an enormous database, which in
turn requires a computer with very capable processing speed and memory.
Currently, speech recognition technology is improving but has not been
per
fected. Speech recognition will greatly change the way information is input.
Ease, convenience, and speed are three advantages to using a microphone
instead of a keyboard. Most computer gurus agree that the keyboard will not be
replaced by the microphon
e but will simply be enhanced by it. Speech
recognition technology will have an impact on all careers but especially
education. Business Education teachers must stay abreast of this technology
and be willing to adjust their courses to best prepare studen
ts.

Speech Recognition
3

Speech Recognition Devices

Is Keyboarding an Endangered Skill?



Star Trek fans take for granted that one day we will communicate with our
computers via speech input. This TV series featured a space ship in which
crewmembers accessed the computer any
where in the ship by speaking
“computer” and then a command. How far are we from this type of technology?
Not as far as you might think, according to Bill Hills, an analyst at the Aberdeen
Group, a research company in Boston. Mr. Hills believes that spe
ech recognition
will become a primary interface, but not in the next five months or even a year.

Similar technology has been available for the last three years, but the
technology has not become popular, because it still does not work very well. Up
to th
is point in time, speech or voice recognition software had to be run
separately. As we approach the time when speech will be the primary interface
for computers, four questions come to mind: What is speech recognition? How is
speech recognition currently
being used? What is the future outlook of this
technology?, and Is the keyboard an input device of the past?

What is speech recognition?


Speech or voice recognition is the technology that allows for computer
input via spoken commands. To use speech recog
nition, your computer needs a
microphone and the proper software. In this duo, the software is the complex
part. Speech recognition software works by disassembling sound into atomic
units (called phonemes) and then piecing them back together. Phonemes c
an be
thought of as the sound made by one or more letters in sequence with other
letters, like th, cl, or dr. After speech recognition software has broken sounds
Speech Recognition
4

into phonemes and syllables, a "best guess" algorithm is used to map the
phonemes and syllabl
es to actual words. (Machowski, 1998).


These words are then translated into ideas by natural language
processing. Natural language processing has the ability to process the output
from speech recognition software and understand what the user meant.
(
Machowski, 1998) Natural language processing attempts to translate words into
ideas by examining context, patterns, phrases, etc. Natural language processing
and speech recognition work together to clear
-
up vague words or homonyms, ie.
two, too, and to.


Most speech recognition systems come with a headset
-
type microphone.
However, a tabletop or handheld microphone may be used. Regardless of the
style, the microphone should be unidirectional and, preferably, sound canceling
(Sobolewski, 1998). Microphone
s can be interchanged with each other fairly
successfully without adversely affecting the recognition rate. Background noises
cause problems and can interfere with voice input. For optimum usage, the
microphone must be properly placed close to, but a litt
le to the side of, the mouth
(so as not to breathe into it). The sound level should be correctly set
(Sobolewski, 1998).

Aside from the microphone, other hardware requirements for most speech
recognition systems are as follows: 32 MB of Ram, Pentium Pr
ocessor (90mHz
or better), and a sound card.

Speech Recognition
5

How is speech recognition currently being used?


There are generally three categories of speech recognition products being
used. They are navigation, development, and dictation.
Navigation

allows the
user to
give spoken commands that launch and operate programs. This system
acts much like a mouse, except a user speaks commands, like "load," "save," or
"spell check," instead of pointing and clicking.
Development

products are mainly
used by programmers to cust
omize computer programs.
Dictation
products allow
a user to create text documents using word processing software. Numbers can
also be placed in a spreadsheet with dictation products. Dictation software is the
system in which business education teachers
are most interested because of the
implications to keyboarding.

Two dictation
-
type input systems are generally in existence: speaker
-
dependent and speaker
-
independent. With speaker
-
dependent systems, users
must read aloud into the microphone a paragrap
h that is displayed on the
computer screen. The purpose of reading this specific paragraph is to require
the user to speak words with hard
-
to
-
catch sounds. The computer uses this
sound database to match words when a document is dictated. With speaker
-
ind
ependent systems, the computer needs no training; however, there is a higher
error rate than with speaker
-
dependent systems.

Speech recognition is currently being used in banking, business,
education, engineering, medicine, insurance, law, manufacturing,
and
transportation. Many people have already used speech recognition input when
placing a collect call. Do you recall a recording that says, "Please say Collect,
Speech Recognition
6

Calling Card, Person to Person, or Operator Now?" Another popular use of
speech recognition

is in schools to assist disabled students. Students who have
vision problems or who cannot physically use a keyboard can do almost
comparable work using speech recognition devices.

Hardware and software companies are frantically experimenting to perfec
t
speech recognition. In September 1997, IBM introduced ViaVoice, its first
general
-
purpose, natural
-
language speech product. Microsoft invested $45
million in Lernout & Hauspie, a speech
-
technology company whose products
include Kurzweil Voice Commands
for Microsoft Word. This package allows
users to edit and format Word documents by speaking voice commands such as
“insert a table with three rows and four columns.” Both of these products allow a
user to quickly speak his/her text and not worry about sp
elling errors. The latest
products are Jurzweil's Voice 2.0 and IBM's VoiceType Dictation for Windows 95.
The average price of this type software is $700 (Powell, 1996).

Other hardware/software companies such as Xybernaut, in Fairfax,
Virginia have develop
ed a computer which is clipped to one’s belt, and the user
views the screen from a small display attached to a headpiece. Commands are
given via a tiny microphone also attached to the headpiece. This system gives
users (such as telephone line repair peo
ple) hands
-
free usage. (Radosevich,
1997) Charles Schwab Trading Service uses yet another speech recognition
application. Their system allows customers to use the phone to speak the name
of a stock or mutual fund and then receive a quote.

Speech Recognition
7

Dragon Naturally
Speaking Legal Suite is another system which can be
used to dictate, naturally and directly. Words are immediately transcribed on the
screen. Legal briefs, time and billing records, correspondence, and e
-
mail can
be quickly created. With NaturallySpeaki
ng Legal Suite, a user can work directly
with Microsoft Word or Corel WordPerfect, and it costs approximately $1000.
Shure Voice Recognition and Speech Input Microphones, another manufacturer
of speech recognition software and hardware, offers several diff
erent models of
microphones and cables. Some of these microphones are worn as a headpiece
with a transmitter or as a lightweight over
-
the
-
ears
-
frame design. To support the
prediction that speech recognition will be a part of basic hardware systems, the
ne
w version of OS/2 Warp will have voice recognition built into the operating
system (Gilbert, 1996).

So why has speech recognition not caught on yet? Problems such as the
difficulty of understanding different types of voices (male vs. female), different
dialects, and distinguishing between background noise and commands have
been difficult to resolve. Other voice differences include pitch, volume, accent,
and inflection. If a user has to make corrections in every sentence, time and
efficiency are lost (Hi
ghland, 1997). Voice recognition products must be capable
of analyzing the input, comparing it with a model, and deciding what was said.

Future Outlook
?

For voice recognition to become widespread, it must be highly responsive
to different voices. In mos
t cases, the ability of the software to interpret
naturally
spoken sentences, rather than carefully pronounced single words, is the area
Speech Recognition
8

that needs the most improvement. This improvement in the software can be
accomplished by allowing several different pat
terns of phonemes to make up a
given word, which will increase the size of the database needed. Currently
computers are being manufactured with such speed and memory capacity that
these phoneme databases will not create a burden on the hardware. For spee
ch
recognition to work smoothly, software must also be able to understand grammar
rules, practices and structures. Also, sarcasm, humor, rhetorical questions, etc.
would need to be interpreted correctly.

In Japan, technology has been developed that allows

a user to place a
device in his mouth and then make guttural sounds. The software then
translates those murmurs into words. Also in the works, is the ability to input data
from a device other than a microphone. Still in the experimental stage is the use

of prerecorded digital sound files and analog sound from a tape recorder
(Sobolewski, 1998).


When perfected, voice recognition would be a tremendous help with
security issues. People will be able to speak into a device as opposed to typing
a password.
Another benefit will be input speed, since most people can speak
faster than they type. One author stated that he is baffled that he has a machine
that is 300 times faster than in 1982, but he still cannot input data any faster.
The demand continues for
computer manufacturers to build faster, more capable,
and cheaper computers. In the near future, voice recognition technology will be
widely available, and the technical bugs will be worked out. In the future, most
Speech Recognition
9

computers will come with a keyboard and
a voice input device as standard
hardware.

Is the keyboard an input device of the past
?


Many teachers wonder if keyboarding as a subject is a thing of the past.
Currently, people who lack keyboarding efficiencies are at a disadvantage
(Highland, 1997).
Some experts have paralleled speech recognition technology
and keyboarding like this: Did the television replace movies? Or Did the
telephone replace written communication? Or Did the microwave replace the
oven? And Has e
-
mail replaced the U. S. Postal Se
rvice. In each one of these
instances, the new technology complemented the old.


Technology students will need fundamental training on the keyboard
before using the computer. Proofreading will take on a more critical role, and
within 12 hours of classroom

instruction, most students will be introduced to
speech recognition. Those who suggest anybody can teach keyboarding do not
fully understand the importance of this skill (Highland, 1997). Students will need
to be proficient with the keyboard and with sp
eech recognition technology.
Keyboarding teachers will need to focus on what students will need to be
successful in future educational endeavors and work. Speech recognition will be
a part of the near future, and the job of teaching input or keyboarding
will again
be expanded.


In conclusion, speech recognition will revolutionize the entire field of
human
-
computer interaction. People will be allowed the freedom of moving
about in relatively close proximity to a computer and not confined to their chair
Speech Recognition
10

an
d desk. Input speeds will be increased because of the ability to speak rather
than key. Eventually, the computer could act on our behalf, such as searching
for information on the Internet while we sleep or go to work. After this step, true
artificial in
telligence will not be too far behind.

Speech Recognition
11

References

Gilbert, H. (1996). Input Devices. [On
-
line]. Available:
http://talay.psu.ac.th/classes/his/input.htm

Highland, P. (October 1997). Voice recognition technology.
Business
Education Forum
, 30
-
32.


Machowski,

M. (1998). Speech recognition and Natural Language
Processing as a highly effective means of human
-
computer interaction. [On
-
line].
Available:
http://www.cyber
-
north.com/voicerecognitio
n/speech.html


Powell, J. E. (1996). Friends, Romans, PC/s: Lend me your ears.
Windows
, 88
-
90.

Radosevich, L. (November 10, 1997). Alternative inputs gain ground. In
Enterprise Computing

[On
-
line]. Available:
http://www.infoworld.com/cgi
-
bin/displayArchive.pl?/97/45/e02
-
45.131.htm

Computer Microphone Products and Accessories.
Shure Voice
Recognition and speech Input Microphone
. (1998) [On
-
line]. Available:
http://www.shure.com/computer.html


Smart Practices.
Dragon NaturallySpeaking Legal Suite
. Lexington, KY
[On
-
line]. Available:
http://www.iglou.com/vrsky/legsuite.
htm

Sobolewski, P. (1998). Frequently Asked Questions. In
Speech
Recognition

[On
-
line]. Available:
http://www.lang.duke.edu/edtech/software/spchrec/SRFAQ.htm

Speech Recognition
12


Sobolewski, P. (1998)
. General Minimum Requirements. In
Speech
Recognition

[On
-
line]. Available:
http://www.lang.duke.edu/edtech/software/spchrec/minreq.htm