A survey on Speech Recognition

movedearAI and Robotics

Nov 17, 2013 (3 years and 7 months ago)

80 views

International Journal of Computer Trends
and Technology

(IJCTT)


volume 4

Issue 9


Sep

2013


ISSN: 2231
-
2803

http://www.ijcttjournal.org


Page
3036



A
s
urvey on Speech Recognition


V.Malarmathi

M.C.A

1
,
Dr.E.Chandra M.Sc
, M.phil, Phd

2

1
Research Scholar, Dr.SNS Rajalakshmi College of Arts & Science, Coimbatore, India
.

2

Director, Department of Computer Science, Dr.SNS Rajalakshmi College of Arts &
Science, Coimbatore
-
32, India


Abstract


The Speech is most prominent & primary mode of
Communication among human being. The communication
among human computer interaction is called
human computer
interface. It is the study of speech signals and the processing of
methods of the signals. The signals are usually processed in a
digital representation. It is closely tied to Natural Language
Processing (NLP) Example is Speech
-
To
-
Text Synth
esis. Since
even before the time of Alexander Graham Bell’s revolutionary
invention, engineers and scientists have studied the phenomenon
of speech communication with an eye on creating more efficient
and effective systems of human
-
to
-
human and human
-
to
-
ma
chine communication. our goal is to provide a useful
introduction to the wide range of important concepts that
comprise the field of digital speech processing. Speech processing
in an effort to provide a more efficient representation of the
speech signal.
Speech Processing is divided in to various
categories such as Speech recognition, Speaker recognition,
Speech coding, Voice Analysis, Speech Synthesis and Speech
Enhancement. This paper is mainly discussed with Speech.
Speech Recognition is the process of

speaking words into the
computer, and having text appear on the screen, or having the
computer perform various functions, is one of the most exciting
and potential
-
filled technologies available for students with
special needs.


Keywords




Speech
Recognition, Automatic Speech Recognition
(ASR), Voice Analysis, Speech Synthesis


I.

I
NTRODUCTION


Speech Recognition is a process used to Recognize Speech
uttered by a Speaker and it has been in the field of Research.
Voice communication is the most effective mode of
communication used by humans. The significance of Speech
recognition lies in its simpl
icity. It can be used in many
applications like Security devices, Household Appliances,
Cellular Phones, ATM, Machines and Computers. Speech
Recognition is the process of translating spoken words into
text information on the computer. Through a speech
reco
gnition program or an application, the computer is able to
process words you say and turn them into text on the screen.
In
computer science
,
speech recognition

(SR) is the translation
of spoken words into text. It is also known as "automatic
speech recogni
tion", "ASR", "computer speech recognition",
"speech to text", or just "STT". Some SR systems use
"training" where an individual speaker reads sections of text
into the SR system. These systems analyze the person's
specific voice and use it to fine tune th
e recognition of that
person's speech, resulting in more accurate transcription.
Systems that do not use training are called "Speaker
Independent" systems. Systems that use training are called
"Speaker Dependent" systems. Most speech recognition
systems ca
n be classified according to the following
Categories [1][2]
.


II.

S
PEECH RECOGNITION


A.


Isolated
-
Word Recognition

Isolated
-
Word can be introduced with speaker trained and
speaker independent. This technology opened
up

a class of
applications called ‘command
-
and control’ applications
in

which the system is capable of recognizing a single word
command (from a small vocabulary of single word
commands), and appropriately responding to the recognized
command. One key problem

with this technology is the
sensitivity to background noises (which were often recognized
as spurious spoken words) and extraneous speech which was
inadvertently spoken along with the command word. Various
types of ‘keyword spotting’ Algorithms evolved to

solve these
types of problems [1][2]
.



B.

Connected Word Recognition

Connected Word Recognition can be introduced with
speaker trained and speaker independent. This technology was
built on top of word recognition technology, choosing to
exploit the word models that were successful in isolated word
recognition, and extend th
e modeling to recognize a
concatenated sequence
(a

string) of such word models as a
word string. This technology opened up a class of applications
based on recognizing digit strings and alphanumeric strings,
and led to a variety of systems for voice dialin
g, credit card
authorization, directory assistance lookups, and catalog
ordering[1][2].


C.

Continuous or Fluent Speech Recognition

Continuous or Fluent can be introduced with speaker
trained and speaker independent. This technology led to the
first large vocabulary recognition systems which were used to
access databases (the DARPA Resource Management Task),
to do constrained dialogue
access to information to handle
very large vocabulary read speech for dictation (the DARPA
NAB Task), and eventually were used for desktop dictation
systems for PC environments [1][3].

International Journal of Computer Trends
and Technology

(IJCTT)


volume 4

Issue 9


Sep

2013


ISSN: 2231
-
2803

http://www.ijcttjournal.org


Page
3037



D.

Speech Understanding Systems


Speech Understanding Systems (so
-
called
unconstrained
dialogue systems) which are capable of determining the
underlying message embedded within the speech, rather than
just recognizing the spoken words. Such systems, which are
only beginning to appear recently, enable services like
customer care

and intelligent agent systems which provide
access to information sources by voice dialogues (the AT&T
Maxwell Task)[1][2].


E.

Spontaneous conversation Systems

Spontaneous Conversation is able to recognize the spoken
material accurately and understand the m
eaning of the spoken
material? Such systems, which are currently beyond the limits
of the existing technology, will enable new services such as
‘Conversation Summarization’, ‘Business Meeting Notes’,
Topic Spotting in fluent speech (e.g., from radio or TV
broadcasts), and ultimately even language translation services
between any pair of existing languages[1][4].


F.

Applications

Speech Recognition applications include voice user
interfaces such as voice dialing (e.g., "Call home"), call
routing (e.g., "I would

like to make a collect call"), demotic
appliance control, search (e.g., find a podcast where particular
words were spoken), simple data entry (e.g., entering a credit
card number), preparation of structured documents (e.g., a
radiology report), speech
-
to
-
text processing (e.g., word
processors or emails), and aircraft (usually termed Direct
Voice Input)[1][2].

The Applications include automation of complex operator
-
based tasks, e.g., customer care, dictation, form filling
applications, provisioning of new s
ervices, customer help
lines, e
-
commerce[3][1].


G.

Issues in Speech Recognition

The term
voice recognition

refers to finding the identity of
"who" is speaking, rather than what they are saying.
Recognizing the speaker can simplify the task of translating
speech in systems that have been trained on specific person's
voices or it can be used to authenticate or verify the identity of
a speaker as part of a security process[6][2].

Accurately and efficiently convert a speech signal into a
text message independe
nt of the device, speaker or the
environment. It is easy to measure extracted speech features.
It should be stable over time [3][1].

.


III.


AUTOMATIC SPEECH RECOGNITION (ASR)
FEATURES


A.

Advantages

Speech input is easy to perform because it does not require
a specialized skill as does typing or push button operations.
Information can be Input even when the user is moving or
doing other activities involving the hands, legs, eyes or ears.
ASR is divide
d in to major categories. Speaker
-
dependent and
Speaker
-
independent. Automatic Speech Recognition requires
Speaker Training or enrollment prior to use and the primary
user trains the Speech Recognizer with samples of his or her
own speech. In Speaker indep
endent Automatic Speech
Recognition does not Require Speaker Training prior to use.
The Speech recognizer is pre
-
trained during system
development with speech samples from a collection of
Speakers.[5][3]
.


IV.

CONCLUSION

In this review, we have discussed the t
ypes of speech
recognition system. We also presented the applications and
issues consider under speech recognition system. Speech
recognition technology as evolved for more than 40 years,
spurred on by advances in signal processing, algorithms,
architectur
es, and hardware. Today high quality speech
recognition technology packages are available in the form of
inexpensive software only desktop packages (IBM via Voice,
Dragon Naturally Speaking, Kurzweil etc.), technology
engines that run on either the deskto
p or a workstation[1]


.


R
EFERENCES

[1].

B. H. Juang Ed., “The past, present, and future of speech
processing”,
IEEE Signal Processing Magazine
, 24
-
48, May 1998
.


[2].

Murty, K.S.R., Yegnanarayana, B., “Epoch Extraction From
Speech Signals”,
IEEE Tnsactions on
Audio, Speech, and
Language Processing,
;
Nov. 2008 Volume: 16 Issue:8; 1602


1613
.


[3].

Kenneth Thomas Schutte “Parts
-
based Models and Local Features
for Automatic Speech Recognition” B.S.,University of Illinois at
Urbana
-
Champaign (2001) S.M.,V Massachusetts

Institute of
Technology (2003).

Bain, K. Paez, D. Speech Recognition in
Lecture
.

[4].

Theatres.
Proceedings of the Eighth Australian International
Conference on Speech Science and Technology.
Canberra,
Australia (2000)
.


[5].

Fundamentals of Speech Recognition
, L.
R. Rabiner and B. H.
Juang,Prentice Hall Inc., 1993


[6].

Connectionist Speech Recognition
-
A Hybrid Approach
,
H.A.Bourlard and Kluwer Academic Publishers, 1994