and Speech Enabled Applications

scarfpocketAI and Robotics

Oct 24, 2013 (3 years and 7 months ago)

58 views

Natural Language Processing

and

Speech Enabled Applications


by Pavlovic Nenad

2

Presentation Content

What is natural language processing


Speech synthesis


Speech recognition


Natural language understanding

Basic concepts and terms

Types of speech recognition engines

Hardware requirements

How speech recognition/synthesis works

Speech enabled applications

Applications of speech enabled system

Commercial & non
-
commercial software

3

Natural language processing

Natural Language Processing (NLP) or
Computational Linguistic (CL)


is a discipline
between linguistics and computer science which
is concerned with the computational aspects of
the human language faculty
” [1].

“It belongs to the cognitive sciences and
overlaps with the field of artificial intelligence
(AI), a branch of computer science that is aiming
at computational models of human cognition”

[1].

4

Natural Language Processing

Other words, NLP is a discipline that aims
to build computer systems that will be able
to analyze, understand and generate
human speech.

Therefore, NLP sub areas of research are:


Speech Recognition

(speech analysis),


Speech Synthesis

(speech generation), and


Natural Language Understanding (NLU)
.

5

Speech Recognition & Synthesis


Speech recognition

is the process of
converting spoken language to written text
or some similar form.


Speech synthesis

is the process of
converting the text into spoken language.

6

Natural Language Understanding

Natural Language Understanding (NLU) is
a process of analysis of recognized words
and transforming them into data
meaningful to computer.

Other words, NLU is a computer based
system that “understands” human
language.

NLU is used in combination with speech
recognition.

7

Basic Terms and Concepts

Utterance

is any stream of speech between two
periods of silence.

Pronunciation

is what the speech engine thinks
a word should sound like.

Grammars

define a domain (of words) within
which recognition engine works.

Vocabulary (dictionary)

a list of words
(utterances) that can be recognized by the
speech recognition engine.

Training

is the process of adapting the
recognition system to a speaker.

8

Basic Terms and Concepts


Accuracy

is the measure of recognizer’s
ability to correctly recognize utterances.

Speaker Dependence


Speaker dependent system

is designed for
only one user (at the time).


Speaker independent system

is designed
for variety of speakers.

9

Types Of Speech Recognition

Speech recognizers are divided into several
different classes according to the type of
utterance that they can to recognize:


Isolated words,


Connected words,


Continuous speech (computer dictation)


Spontaneous speech


Voice Verification


Voice Identification

10

Hardware Requirements


Natural Language Processing requires
string systems in order to work accurately
and with a minimum response time.

The important hardware parts are:


Sound Card


Microphone


Processor/RAM

11

How speech synthesis works?


There are five major steps in the process of
speech synthesis:


Structure analysis
: process the structure of the input
text.


Text pre
-
processing
: analyze input text for special
constructs of the language.


Text
-
to
-
phoneme conversion
: converts each word
to phonemes (e.g. “times” = “t ay m s”).


Prosody analysis
: determining appropriate prosody
for the sentence (e.g. pitch, timing, pausing, etc…).


Waveform production
: phoneme and prosody
information is used to produce the audio waveform.

12

How speech recognition works?


The basic characteristics of mostly used
speech recognizers are:


Mono
-
lingual,


Process a single input at the time,


Can optionally adopt to the voice of speaker,


Grammars can be dynamically updated, and


Has a small defined set of properties.

13

How speech recognition works?

1. Grammar design:

Defines the words that may be spoken

by a user and the pattern in which they

may be spoken.

2. Signal Processing:

Analyze the spectrum

(frequency) characteristics

of the incoming audio.

Holds the knowledge of the

environment (how user pronounces

Phonemes)


User profile.

3. Phoneme Recognition:

Compare spectrum patterns

To the patterns of the phonemes.

4. Word recognition:

Compare the sequence of likely

phonemes against the words and

patterns of words specified by grammar.

5. Result generation:

Provides the information about

the words that recognizer has

detected.

14

Speech Enabled Applications
-
1


The primary aim of speech enabled
applications is to improve interaction
between user and machine.


For this purpose are used both speech
recognition and synthesis or either one of
them. It mostly depends of the type of
application and its purpose.

15

Speech Enabled Applications
-
2



Speech synthesis is farley easy for usage.
After setting up the “type” of voice, the
speed of “speaking”, the duration of pause
between sentences, and so on, speech
synthesis engine is ready for usage.

16

Speech enabled applications
-
3


Applying speech recognition requires careful
analysis of what could be the possible inputs to
the system, and the way in which user provides
the input.

The way in which user provides the input to the
system, and the way the application responds to
the user is called
Natural Language Dialog
.

Natural Language Dialog

is the first decision that
developer must to make.

17

Natural Language Dialog
-
1


Three essential types of interaction that
are available to software applications are:



Direct dialog,


Mixed initiative dialog, and


Natural dialog.

18

Natural Language Dialog
-
2


Direct Dialog


Interaction directs the user to perform a specific task by
asking for information at each turn and expecting the
specific words or phrases in response.


System:

“Welcome to ABC bank customer services


system. Please say your name.”

User:


“Nenad Pavlovic”

System:

“Please say your account number.”

User:


“1234
-
123
-
12332
-
1233”

System:

“Would you like to perform a transfer or to see


the status on your account?”

User:


“Transfer.”, etc…

19

Natural Language Dialog
-

3

Mixed initiative dialog


Is similar to previous interaction dialog but it gives
speaker some freedom. However, it allows user to have
as much as little control as s/he desire.


System:

“Welcome to ABC bank customer services


system. Please say your name.”

User:


“My name is Nenad Pavlovic, and my account


number is: 1234
-
123
-
12332
-
1233”

System:

“Would you like to perform a transfer or to see


the status on your account?”

User:


“Show me the status and than go to



transfers.”, etc…

20

Natural Language Dialog
-

4

Natural dialog


Allows user to enjoy a more unstructured interaction with
an application (as natural as possible)


System:

“Welcome to City Directory Dialer, how can I


help you?”

User:


“I’d like to call Mr. George Eleftherakis in


Tsimiski building.”

System:

“George Eleftherakis


Tsimiski building. Is


this correct?”

User:


“Yes”

System:

“George Eleftherakis is found in directory.


Calling…”, etc…

21

Grammars vs. Statistical NLU

More freedom is given to the user to
interact with application, the more complex
processing of input data become.

According to complexity of possible user
inputs and used interaction dialog, it will
be used on of two approaches of
implementation:


Grammar
-
based NLU


Statistical NLU

22

Grammars vs. Statistical NLU

Grammar
-
based NLU:

relies on defining
(creating) the grammar, which means
constructing the phrases and stating all
posible words that can be used.



Advantages: fast, allows freedom of phrases
construction.


Disadvantages: used only for small set of
phrases and words, if word or phrase is not
defined it will not be recognized.

23

Grammars vs. Statistical NLU

Statistical NLU:

relies on usage of
statistical model of utterances derived
from actual conversation data.



Advantages: huge set of phrases and words


Disadvantages: slow, difficult to add new
phrases.

24

Uses of speech applications

The speech technology is mostly used in
the following areas:


Dictation


Command and Control


Telephony


Wearables


Medical Disabilities


Embedded Applications

25

Speech Systems

Commercial


IBM’s ViaVoice (Linux, Windows, MacOS)


Dragon NaturalySpeaking (Windows)


Microsoft’s Speech Engine (Windows)


BaBear (Linux, Windows, MacOS)


SpeechWorks (Linux, Sparc & x86 Solaris, Tru64,
Unixware, Windows)

Non
-
commercial


OpenMind Speech (Linux)


XVoice (Linux)


CVoiceControl/kVOiceControl (Linux)


GVoice (Linux)

26

Conclusion

Developers’ perspective
: developing speech
enabled application does not require redesigning
or explicitly designing systems to support
speech. It is treated and “attached entity” and
can be viewed as separate module. Also, It does
not require special linguistic or programming
skills.

Business perspective
: usage of speech
enabled applications can noticeable improve the
accuracy and effectives of employees that work
with big number of data or people or both.

Thank you


Pavlovic Nenad

pavlovic@city.academic.gr

28

References

[1]

Radev, R., D.(2001), “
Natural Language Processing FAQ
”, Columbia

University, Dept. of Computer Science, NYC.