Doc

joinherbalistAI and Robotics

Nov 17, 2013 (3 years and 6 months ago)

101 views

Lesson 3:

Hearing Things


In this module, you will learn:



What speech recognition is



How to use Speech Recognition on NAO Framework.



English Speech Recognition : online and offline.



Vietnamese Speech Recognition: online and offline.


Contents:



Speech Recog
nition on NAO



Speech Recognition on NAO Framework



Task 1: English Speech Recognition



Task 2: Vietnamese Speech Recognition



Speech Recognition on NAO:

Humans frequently communicate through speech. For example, a common greeting when we meet
someone is “hi
” or “how are you?” We process speech automatically, and understand the meaning
of the words we hear nearly instantaneously. On a robot, this process is more involved. The NAO
humanoid robot has microphones on its head, that it uses to listen to sounds aro
und it.


However, unlike our ears that listen for sounds all the time, the NAO has to be programmed to
listen for sounds at specific times. After it hears human speech, the NAO performs speech
recognition with an algorithm to convert what it hears into wo
rds that it knows.


To do so, the NAO requires a library of words that it expects to hear. For example, the library can
contain two words, “yes” and “no”. When the NAO processes the sounds it hears, it will classify it
as either “yes”, “no”, or neither of
the two. You may have had experience with a similar system
when using automated phone services or voice control on your cellphone, where you are given a list
of options that you can speak to select.


Once a word is recognized, the NAO can then be programme
d to react in different ways. After
hearing “yes”, the NAO could reply with “I am happy” and after hearing “no”, the NAO could say
“I am sad”. If the NAO doesn’t understand the words (it did not sound like “yes” or “no”) then the
NAO could reply “I don’t k
now.” This is called a
conditional
on the robot, and we will go into more
detail in the tasks below.


Speech Recognition on NAO Framework:

NAO Framework provides two choice of speech recognition:

-

On Android phone:

If our application uses speech recogniti
on on Android phone, we will see the below picture to know
the programing flow:

Advantage:


-

Use a lot popular Speech Recognition Engine as Google

Disadvantage:



-

On NAO Robot

If we want to use speech recognition service on NAO, first we have to regis
ter the service on NAO
robot via NAO Framework.


Task 1: English Speech Recognition

In this lesson, we will learn how to use speech recognition using NAO Framework. We will
program the NAO to recognize the question about robot name and to give response.

-

Option 1: Using speech recognition on Android phone

Using Android Speech Recognition
(
http://developer.android.com/reference/android/speech/package
-
summary.html
)

-

Option 2: Using speech recognition on NAO robot

Step 1. Create new Android project using Robot Activity template.

Step 2. Open res/layout/activity_main.xml in your android project and replace its content with
following:

File: res/layout/
activity_
main.xml
<
RelativeLayout

xmlns:android
=
"http://schemas.android.com/apk/res/android"


xmlns:tools
=
"http://schemas.android.com/tools"


android:layout_width
=
"match_parent"


android:layout_height
=
"match_parent"

>



<
LinearLayout



android:layout_width
=
"fill_parent"


android:layout_height
=
"fill_parent"


android:orientation
=
"vertical"
>




<
LinearLayout



android:layout_width
=
"fill_parent"


android:layout_height
=
"wrap_content"


android:orientatio
n
=
"horizontal"
>




<
Button


android:id
=
"@+id/btSpeechRecognitionStart"


android:layout_width
=
"fill_parent"


android:layout_height
=
"fill_parent"


android:layout_weight
=
"1"




android:text
=
"Start Recognition"

/>





<
Button


android:id
=
"@+id/
btSpeechRecognitionStop
"


android:layout_width
=
"fill_parent"


android:layout_height
=
"fill_parent"



android:layout_weight
=
"1"



android:text
=
"Stop Recognition"

/>





</
LinearLayout
>



</
LinearLayout
>


</
RelativeLayout
>


The UI is very simply. One LinearLayout to organize the button and text view. Note th
e id for
two
button
s
:
btSpeechRecognitionStart

and
btSpeechRecognitionStop

which we will use in our Java code.


Step 3: Register RobotEvent to handle result.


Use
RobotEventReceiver.
register
()

to register event form NAO Robot.

Step 4: Setting the language
to use.

In this step, we have to set the language, language parameters, sentences list.


Use
RobotSpeechRecognition.
setVisualExpression
(
)

to
set visual expression.


Use

RobotSpeechRecognition.
setAudioExpression
(
)

to
set audio expression.


Use
RobotSpeechRe
cognition.
setVocabulary
(
)

to
set vocabulary.


Use
RobotSpeechRecognition.
getAvailableLanguages
(
)

to
get all available language.


Use
RobotSpeechRecognition.
getCurrentLanguage
()
to
get current language.


Use
RobotSpeechRecognition
.set
CurrentLanguage()
to
set
current language.


Step 5: Subscribe the speech recognition event.


We have to subscribe the speech recognition event to use them on NAO Robot by

RobotEventSubscriber.
subscribeEvent
(
)

function.



Step 6: Start Recognition process and handle the result.


A
fter that, we can use speech recognition on NAO, speak the sentence in sentence list

above, NAO Robot will recognize it and return the result.

Step 7:
Unsubscribe
the speech recognition event.


To stopping speech recognition module you have to unsubscribe

the robot event which you
subscribe above by
RobotEventSubscriber.
unsubscribeEvent
()
function.


The State
-
Machine of NAO Speech Recognition.


Task 2: Vietnamese Speech Recognition

In this next exercise, its similar to previous lesson but used Vietnamese
speech recognition.

-

Option 1: Using online speech recognition :

Step 1:
Create new Android project using Robot Activity template.

Step 2:
Initiating a Recognition.

Before you use speech recognition, ensure that you have set up the core Speech Kit library

with the
SpeechKit.
initialize
()

method.

Then create and initialize a
Recognizer

object:

recognizer = sk.createRecognizer(Recognizer.RecognizerType.
Dictation
,


Recognizer.EndOfSpeechDetection.
Short
,


"en_
US"
,
this
, handler);

The
SpeechKit.createRecognizer

method initializes a recognizer and starts the speech recognition
process.



The
type

parameter is a

String

, generally one of the recognition type constants defined in
the Speech Kit library and available

in the class documentation for
Recognizer

. Nuance may
provide you with a different value for your unique recognition needs, in which case you will
enter the raw

String

.



The
detection

parameter determines the end
-
of
-
speech detection model and must be one

of
the
Recognizer.EndOfSpeechDetection

types.



The
language

parameter defines the speech language as a string in the format of the ISO 639
language code, followed by an underscore “_”, followed by the ISO 3166
-
1 country code.



The
this
parameter defines the

object to receive status, error, and result messages from the
recognizer. It can be replaced with any object that implements the
RecognizerListener
interface.



handler
should be an
android.os.Handler
object that was created with

Handler handler =
new

Handl
er();



Handler

is a special Android object that processes messages. It is needed to receive
call
-
backs from the Speech Kit library. This object can be created inside an Activity
that is associated with the main window of your application, or with the window
s or
controls where voice recognition will actually be used.

Start the recognition by calling
start
()

.

The
Recognizer.Listener
passed into
SpeechKit.createRecognizer
receives the recognition results or
error messages, as described below.

Step 3:

Receiving

Recognition Results

To retrieve the recognition results, implement the
Recognizer.Listener.onResults

method. For
example:
public

void

onResults(
Recognizer recognizer, Recognition results) {


currentRecognizer

=
null
;


int

cou
nt = results.getResultCount();


Recognition.Result [] rs =
new

Recognition.Result[count];


for

(
int

i = 0; i < count; i++)


{


rs[i] = results.getResult(i);


}


s
etResults(rs);


}


This method will be called only on successful completion, and the results list will have zero or more
results.

Even in the absence of an error, there may be a suggestion, present in the recognition results object,
from the sp
eech server. This suggestion should be presented to the user.


Step 4:

Using Prompts

Prompts are short audio clips or vibrations that are played during a recognition. Prompts may be
played at the following stages of the recognition:



Recording start: the pr
ompt is played before recording. The moment the prompt completes,
recording will begin.



Recording stop: the prompt is played when the recorder is stopped.



Result: the prompt is played if a successful result is received.



Error: the prompt is played if an

error occurs.

The
SpeechKit.defineAudioPrompt
method defines an audio prompt from a raw resource ID packaged
with the Android application. Audio prompts may consume significant system resources until
release
is called, to try to minimize the number of in
stances. The

Prompt.vibrate
method defines a
vibration prompt. Vibration prompts are inexpensive

they can be created on the fly as they are
used, and there is no need to release them.

Call

SpeechKit.setDefaultRecognizerPrompts
to specify default audio or v
ibration prompts to play
during all recognitions by default. To override the default prompts in a specific recognition, call
setPrompt
prior to calling
start
.

Step 5:
Handling Errors

To be informed of any recognition errors, implement the
onError
method o
f the
Recognizer.Listener
interface. In the case of errors, only this method will be called; conversely, on success this method
will not be called. In addition to the error, a suggestion, as described in the previous section, may or
may not be present. Not
e that both the
Recognition
and the
SpeechError
class have a
getSuggestion
method that can be used to check for a suggestion from the server.

Example:
public

void

onError(
Recognizer recognizer, SpeechError error) {



if

(recognizer !=
currentRec
ognizer
)
return
;


currentRecognizer

=
null
;



// Display the error + suggestion in the edit box


String detail = error.getErrorDetail();


String suggestion = error.getSuggestion();





if

(suggestion ==
null
) suggestion =
""
;


setResult(detail +
"
\
n"

+ suggestion);


}


Step 6:
Managing Recording State Changes



Optionally, to be informed when the recognizer starts or stops recording audio, imple
ment
the
onRecordingBegin
and
onRecordingDone
methods of the
Recognizer.Listener
interface. There may be
a delay between initialization of the recognizer and the actual start of recording, so the
onRecordingBegin
message can be used to signal to the user w
hen the system is listening.
public

void

onRecordingBegin
(Recognizer recognizer) {


// Update the UI to indicate the system is now recording


}


The

onRecordingDone
message is sent before the speech server has finished receiving
and processing
the audio, and therefore before the result is available.
public

void

onRecordingDone(Recognizer recognizer) {


// Update the UI to indicate that recording has stopped and the speech is still


being proc



}


This message is sent both with and without end
-
of
-
speech detection models in place. The message
is sent regardless, whether recording was stopped due to calling the
stopRecording
method or
due to detecting end
-
of
-
speech.


The state machine:


-

Option 2: Using offline speech recognition :

Step 1. Create new Android project using Robot Activity template.


Step 2. Open res/layout/activity_main.xml in your android project and replace its content with
following:

File: res/layout/
activity_
main.xml
<?
xml

version
=
"1.0"

encoding
=
"utf
-
8"
?>

<
LinearLayout

xmlns:android
=
"http://schemas.android.com/apk/res/android"


android:orientation
=
"vertical"


android:layout_width
=
"fill_parent"


android:layout_height
=
"fill_parent"

android:background
=
"@color/white
"
>


<
EditText


android:id
=
"@+id/EditText01"


android:layout_width
=
"fill_parent"


android:layout_height
=
"wrap_content"


android:layout_weight
=
"1"



android:contentDescription
=
"Recognition results"


android:text
=
"Text goes here..."


andro
id:textColor
=
"@color/black"

>

</
EditText
>


<
ProgressBar


android:id
=
"@+id/progressbar_level"


style
=
"?android:attr/progressBarStyleHorizontal"


android:layout_width
=
"match_parent"


android:layout_height
=
"wrap_content"


android:layout_weight
=
"1"

/>


<
Button

android:id
=
"@+id/Button01"

android:layout_width
=
"wrap_content"


android:layout_height
=
"wrap_content"

android:layout_weight
=
"0"


android:layout_gravity
=
"center_horizontal"

android:text
=
"
Start
"
>
]


</
Button
>


</
LinearLayout
>


The UI is very s
imply. One LinearLayout to organize the button and text view. Note the id for
button:
Button01

and text view:
EditText01

which we will use in our Java code.


Step 3. Create the speech recognition using
SphinxSpeechRecognizer createSpeechRecognizer(Context

context, ArrayList<String> grammarContent, String modelPath )

@param

context : current context

@param

grammarContent : Sentences used in your application

@param

modelPath : Path to language models in Assets folder

Note: We have to create string list inc
lude all the sentences if we want to use. Then we have to init
the recognition and set the listener to handle result and error after run recognition.

Step 4. Using
prepareSpeechRecognizer()

and
startSpeechRecognizer()

to prepare and start recognition proce
ss.

Step 5: Using
stopSpeechRecognizer()

to stop recognition process and handle the result in

onResults(Bundle b)
.


The State
-
Machine of VN Speech Recognition.