HCI 4 Assessment Exercise

seamaledicentAI and Robotics

Nov 17, 2013 (3 years and 9 months ago)

72 views


HCI 4 Assessment
Exercise

Communication Assistant


Loïc

Galland

2058121G








1
Loïc Galland


2058121g


HCI Assessment


Table of contents

Aims of the system

................................
................................
................................
.........

2

Utility of the application

................................
................................
................................
.

2

Relation with existing HCI systems

................................
................................
...............

3

Design of the system

................................
................................
................................
......

3

Design of the code

................................
................................
................................
......

3

Organization of the code

................................
................................
........................

3

Features description

................................
................................
................................

4

GUI

................................
................................
................................
.............................

5

Screenshots of the final interface

................................
................................
...........

5

Menus descriptions

................................
................................
................................
.

5

Absolute Layout versus Relative Layout

................................
...............................

6

Desktop design versus mobile design

................................
................................
.....

6

Simplicity versus simple

................................
................................
.........................

7

How to test this system

................................
................................
................................
...

8

Reflection about the design process of working with new modalities

...........................

8

Futu
re improvements

................................
................................
................................
......

9

References

................................
................................
................................
....................

10





2
Loïc Galland


2058121g


HCI Assessment


Aims of
the
system




My application should provide a simple interface to help people to communicate. This
simplicity is required so the users can easily switch between the two
modes
and respond to a
question as quickly as possible. The ultimate goal of my application is to pro
vide the
possibility of a fluent conversation for

two persons which don’t

speak the same language or
if
one of the protagonist has some physical disabilities.


This simplicity will be enhanced by adding a machine learning feature. The
application will s
to
re

the usual phrases and provide a list of them when the user wants to use
the text
-
to
-
voice feature. This way, he doesn’t need to type the sentence in order to save time.
Once again, it is a way to reduce the latency during a conversation.

Utility of th
e application



The goal of my application is to ease the communication between people who
can be deaf, dumb or strangers.
These people

struggle
to communicate with others for various
reasons.


Deaf have to read on the lips of people to understand what they say. It requires
lot of
learning and some people

can be more difficult to understand than others. It is even truer
when the interlocutor speaks a foreign language. My application can replace t
he ears

by

using
Speech
-
to
-
text to counter this trouble.


Dumb people use sign language to communicate between each other. The main
disadvantage of this technique is obvious. Indeed not many people which don’t have this
disability know this language becaus
e it is tough
one and its utility during daily life isn’t very
important.



Finally, strangers can have some troubles to speak or understand

when interacting with
native people
. So it can be very useful to have an application to translate on a device which
moved with you all the time. A dictionary application is generally used but it requires time to
type a sentence and the user
may

have misunderstood some words. My
application
gets

rid of
these problems because the application interprets directly the words of the interlocutor.


Moreover, it takes advantage of the mobile itself. Indeed it gives the opportunity to
bring an interpreter everywhere with you. My application has the
advantage to unify several
features in one application.

Normally, you should switch between 3 applications to do the
same thing. By doing that you lose your time and it’s becoming frustrating really quickly.





3
Loïc Galland


2058121g


HCI Assessment


Relation with
existing
HCI

systems


This ap
plication is a multi
-
modal one. Indeed, it uses the microphone and the speakers
to overcome human disabilities or lack of knowledge.
The mobile is used as a medium to
communicate with others people.


There are already several applications on the Google play store or the Apple store
which use voice recognition and text
-
to
-
speech. Generally, they concentrate their efforts only
on one of these features. Moreover, the goal of these applications is differen
t from my project.
These applications try to provide a hand
-
free interaction with the phone. For example you can
easily send a message using voice recognition instead of typing your message. The most
example of application using this feature is SIRI which
is available natively with any Iphone
4S or newer. This application allows the user to control his phone using only his voice.


My
application uses

the voice
-
recognition and the text to speech features embedded in
the Android SDK.

Design of the system

Des
ign of the code

Organization of the code


General design of the application


The application is quite simple. Indeed it is
only
compose
d of three menus.

The
layouts of these menus are

controlled by 4 layout xml file
s

(“activity_main.xml”,
options_menus.xml”, “speech2text.xml” and “text2speech.xml”).

Each file is dedicated to one
menu (“activity_main.xml” is used to
manage

the animation when the application switch
between modes).



Concerning the source code, my application is composed of 4
classes. The first one
“MainActivity” controlled nearly everything in the application. The second one
“OptionsActivity” controls the options menu. The third one “GestureManager”
,

is used to
interpret the gestures performed on the screen and take actions in

consequence.

The last one
“OptionsSingleton” is a class which stocks data accessible from any classes (like the language
used or if the mode offline is activated).


Speech to text Menu
Text to speech Menu
Options Menu

4
Loïc Galland


2058121g


HCI Assessment


Features description


To implement the text to speech feature, I used the “TextToSpeech” a
vailable natively
in the android SDK.
My main activity overrides the method “onInit” to initialize this feature
and test if the language selected is supported by the device. After these initial steps, the
application can say anything with the accent associ
ated with the

chosen

language. However it
doesn’t take into account the different accents in the same language (i.e. you can’t specify if
the accent is American or Scottish).
If none text is inserted into the “EditText”, the application
will say “You haven
’t type a text”. If the application can’t translate a sentence, it will say “No
translation found”.


To implement the voice recognition, I implemented two methods. The first one
“startVoiceRecognitionActivity” will start a new activity and prompt a window

composed of
a button and a text field representing the language used. When a sound is detected
,

the layout
of the button will change and the message will be
analysed

when the user stops talking. This
activity will send raw data (sound recorded) to Google
servers which respond by sending a list
of possible sentences. After that the second method (“onActivityResult”)

is called and the
sentence with the best probability is displayed in the “TextView” in the top of the screen.

Obviously this method requires an

unrestrained

internet connection
so I implement an offline
mode for test purposes. Instead of using the voice recognition it will behave like if it ha
s
heard the sentence “What is your
name?”.


The translation in my application doesn’t
use
any specific fe
ature of the android SDK.
The Google translate API is quite effective but it isn’t free so I decided to simulate one.
If a
language different from English is chosen in the options menu, the application will browse a
HashMap composed of keys representing th
e English sentences and values representing the
equivalent French sentences.
In the case of text
-
to speech, the sentence will be said with a
French accent if French is chosen in the options menu.


A small machine learning feature is used to ease the utiliz
ation of the application.

Into
the text to speech menu, a scroll view can be used to quickly insert a usual sentence by
selecting it from the list. The initial list is defined in the code however the four first entries are
dynamic. They represent the four

most used sentences in this mode

(by
ascending order
)
. This
feature is made possible by using a Hashmap composed of the sentence as key and the
number of utilisation as value.


The “Gesture Manager” is used to provide a better interaction. Indeed if
the

user

wants
to switch between modes instead of pressing a button (which takes some place on the screen)
the user have to do a vertical gesture from right to left or right to left
.


A copy of the output text into the clipboard seems to be a good idea for
several
purposes. When a text is said in the text to speech menu or a text is heard by the voice
recognition, it is automatically inserted into the clipboard of the system. This way
,

you can
use this sentence in any applications

on your mobile device (to s
end a translated message, to
record a memo


)
.
This feature is made possible by using the object “ClipBoardManager”
embedded in the Android SDK.


To provide a better user experience, I used an animation when the
user wants

to switch
between modes.
I used a

custom ViewFlipper which produces a slide effect to avoid a brutal
visual change.


5
Loïc Galland


2058121g


HCI Assessment



GUI

Screenshots of the final interface








Speech to text Menu

Text to speech Menu

Options Menu

Menus

description
s


Speech to text menu


The speech to text menu is composed of three elements
:



-

The “TextField” in the top corner represents the text which have been recorded and
possibly translated (which is invisible on the screenshot because none text

has
been recorded yet
)
.

-

The button with the green microphone is used to activate the voice recognition and
prompt the windows dedicated to this feature.

-

The
button “Options menu”
switches

temporarily to the option menu to modify the
parameters of the application.


Text to speech menu



The text to speech Menu is composed of 5 elements
:

-

The “EditText” in the top corner is used to type the sentence which will be said by
the Text to speech
engine.

-

The button with the eraser button will clear the content of the “EditText”.

-

The next element is a “ScrollView” composed of usual sentences

which can be
scrolled by a vertical

movement of your finger on them
.
When a usual sentence is
selected, its

text will be automatically added into the “EditText”, translated if
necessary and said by the text to speech engine.



6
Loïc Galland


2058121g


HCI Assessment


-

The but
t
on

with a talking head icon will activate the text to speech engine and said
the text currently inserted into the “EditText”.

-

The

button “Options menu” switches temporarily to the option menu to modify the
parameters of the application.


Options menu


The text to speech Menu is composed of 5 important elements:

-

The first “Spinner” is used to choose if you want to translate the
output of Speech
to text menu.

-

The second “Spinner” is used to choose if you want to translate the output of Text
to Speech menu.

-

The checkbox
can be checked
to simulate the Speech to text when an internet
connection is not available.

-

The button “Apply” s
tores the values into a singleton. This way, the parameters are
available anywhere in the source code.
The application returns automatically to the
last screen used.

-

The button “Back” returns to the last screen used.

Absolute Layout versus Relative Layout


I had several problems with the GUI during the development of my project. Android
development is based on Java and the Java GUI isn’t really known for its user
-
friendly
interface
. I began to create my interface with a layout named “Absolute layout”. Whe
n you
used it, you have to define the exact position X and Y of each element. The placing of the
different elements is made easier by using the eclipse’s layout editor for Android.
It took
some time to obtain a decent interface but at least it was exactly
like I wanted.


However, there is an important problem with this approach. Indeed the layout doesn’t
adapt to the screen so the layout can be good on a small screen but ugly on a bigger one.

So I
decided to modify myself the layout without using the layout editor. To do that, you have to
modify the layout file for each “view”. This layout is
an

xml file which defines everything
(background, size …) about the visual aspect of your application
.

Instead of using the
“Absolute layout” I chose to combine “Relative Layout” with “LinearLayout”. This
combination offers lot of possibilities but it’s a tricky thing to understand well for an Android
beginner like me. Each element (buttons, text fields, p
ictures…) is placed in relation to
another element.
For example instead of specifying that a button is placed at the position X/Y
you will say that it is above the picture P.
This way, the resolution of the screen doesn’t
matter.

Desktop design versus mob
ile design


I haven’t developed much for mobile during my studies (only one course of IPhone
development). So I didn’t have much experience
about
how

to

design a good

interface for
mobile. So it explains why my first attempt is more a desktop design than
a mobile one.




7
Loïc Galland


2058121g


HCI Assessment



Desktop design


The picture above shows one of the early GUI that I have developed. The main
problem with this interface is that it doesn’t take into account that the mobile is generally in
portrait mode.
In the other hand n
owadays, the

norm for monitors is to have a 16/9 or 16/10
ratio
.
This difference is one of the main
reasons

why most of the desktop applications need to
be adap
ted

to the mobile before they become usable.


Another difference between desktop and mobile is how the user controls his device.
The touchscreen can be seen like a replacement mouse.
This idea is wrong because most of
mobile applications prefer to use gestures than buttons. Moreover you can’t use sub
-
menus in
mobile

software.


So I decided to change the design of
the text to speech menu in order to take into
account this idea.


If you compare the previous picture with the screenshot of the “Screenshots of the
final interface” section you can see that

I have removed the buttons “Mode switch” and I have
modified the list of usual sentences.
My first idea was to use a button to switch between
Speech to Text mode and Text to speech mode. But I realized that I have to reduce the
number of buttons as much a
s possible to

obtain
an
interface

which is easy to use
. So I
decided to detect any horizontal scroll movement and switch between the modes when this
movement was important enough. I also added an animation during the switch to obtain a
smoother change.
Con
cerning the list of usual sentences, I
decided to use a scroll view and get
rid of the two arrow buttons. So instead of clicking on the
se

buttons,

you have to move your
finger vertically on the list.



Simplicity versus simple



8
Loïc Galland


2058121g


HCI Assessment


My aim

with this project

was to provide
an application which is easy to use but which
provides multiple functionalities at the same time.
This way, I push all the things which can
trouble the user into the options menus to obtain two modes with
as
few elements

as possible
.

How t
o test this system


This application can be easily tested by typing sentences (in Text
-
to
-
Speech) or speak
(in Speech
-
to
-
Text).


To test the text to speech, you can choose to type the sentence in the “TextField” or
you can press one of the pre
-
defined sen
tences. You can press the icon representing a talking
head in order to hear the sentence selected in the “TextField” a second time.


To test the speech
-
text
, you just have to press the button representing a microphone to
prompt a small window which will re
cord your voice and transform it into a text.
I had a
problem to test this feature
because it
requires internet. When I tested it at home it works
perfectly with my private wireless network. However when I tested it at the university, it
didn’t work anymore. It seems that some
ports are

blocked and my application can’t
send/receive data from Google
’s servers.
To resolve this problem, I added a checkbox in the
options menu to enable/disable Offline Mode. When this option is activated, the application
doesn’t launch the voice recognition when you press the “Micro button” instead it simulates
as if you

have said “what is your name”. This way,
I can simulate the voice recognition if
none internet connection is available and it works normally if a connection without any
restriction is accessible.


To test the translation,
you have to change the parameters

of the application into the
options menu which is available by a button on the bottom of each screen. Once you have
do
ne

this
,

if you are in the speech to text menu
you

just have to enter a sentence in English
into the “TextField” and it will be automatic
ally translated if the dictionary knows this
sentence. In the case of the speech
-
text feature, you just have to speak in English and the
recognized text will be translated if it is present into the dictionary
. So obviously if you want
to test this functio
nality you have to enter one of the few sentences which are translated into
the code like “I’m twenty” or “my name is”.

Reflection about the design process of working with new
modalities



As Hinckley and al. said in a paper about
the use of
sensors with
a mobile device

[1]
,
it can be tricky to mix several modalities without producing clashes.
My point of view is that
the combination of several modalities can provide several advantages and ease the utilization
of the application

if it is done well
.
To avoi
d these clashes
,

I have separated the menu Text to
speech and Speech to Text.
But to be sure that your program will run smoothly
in any
situation;

it should be tested by several persons by differ
ent experience to test as much
behaviour

as possible.




I
was thinking of providing

two types of interaction to move across the menus. This
way, the newcomer isn’t lost
(use classic buttons)
and the expert can move quickly between
the menus because he knows the shortcuts (gestures used to replace the button). Th
is choice

9
Loïc Galland


2058121g


HCI Assessment


aim
ed

to adapt the application to the user experience. For example on Windows, you can
access the Windows Explorer by several ways. A novice will click on the shortcut situated on
the desktop or
he will search the button “computer” into the start

menu. In the other hand,
experienced users will simply press the “Windows” button and the key “E” to prompt the
Windows Explorer.

After some reflexion time, I have arrived to the
decision

that mobile users
have enough experience to use gestures without bo
ther.


Future improvements


The time allocated for this coursework wasn’t very long so I cannot achieve to build a
complex system (doesn’t mean that the utilization should be complex). So it explains why
several
improvements can be made for this applicati
on.

Some of them
are
quite eas
y to

impleme
nt with slightly more time but some can be the subject of studies during many years.


-

The voice recognition embedded in the android SDK isn’t really effective. Indeed, it
only recognizes some simple phrases and long sentences need to be said really slowly.
Moreover the voice recognition doesn’t take into account the different accents within

the same language. For example, you can’t configure this tool to affine its recognition
by specifying that the interlocutor is Scottish (i.e. you can only select English).
These
problems are even more important if you try to speak in another language than

English.

Moreover, this feature requires an internet connection as explain in the design
section.


-

Instead of using a real translator, the application uses a simple hash map.
This feature
can be improved by using a
n X
ml file

composed of entries. Each entr
y should
represent a specific sentence

with his translation in all supported
language
s. However
t
his solution isn’t really flexible and the limitations are obvious

(not many languages
supported and not many words supported)
. The Google translator API is qu
ite
effective but it isn’t free and it requires a
n

internet connection.
Indeed if you want to
use this API, you have to pay 20
$ per 10
6
characters of text and the same additional
price per character if you want to use language detection.

Moreover you are l
imited to
20

000

000 characters per day and you have to paid additional fees if you want to
exceed this limit.

So there isn’t any ideal solution and it explains why dictionary
applications are generally limited to one language.


-

Nearly every developer
will agree

that
creating

GUI

in Java

isn’t really
easy
.

This
statement can be extended to the Android SDK which based its interface on the same
programing philosophy.
However, t
he UI
creation is
slightly
facilitated by a drag and
drop interface to create/m
odify the general layout of the application.

My experience
with these tool
s is quite new so I think that my

interface
can

be improved

(my
experience with interface creation in general isn’t very good also)
.


-

As I said previously, I have chosen gesture inst
ead of providing two types of
interaction to move across the menus. However it can be interesting to make some
experiments to know what type of interaction is preferred by users and if buttons are
completely useless

(if you can use gestures)
.


1

0
Loïc Galland


2058121g


HCI Assessment


-

The machine learning
can be improved to

be more sophisticated
. One improvement
can be
to
identify the context of the conversation and proposes
sentences
specific to
the situation
.

For example when the context detected is restaurant; the sentences
should be

about the food.

Instead of detecting the context the application can also use
the GPS location of the user.
This machine learning feature can take a
long

time to be
useful
and should be trained by the programmer before its launch on the market. On
the oth
er hand,

it adapts the software to the
context which can produce interesting
results
.


-

The user doesn’t always find what he wants in the usual sentences list. So if the user
has to type what he wants to say, it can be a good idea to help him
finding

his w
ords.
This feature is already used since a long time with mobile device and has

proven its
efficiency

(
predictive text alias T9)
.


-

The application uses the clipboard to copy the sentences listened or said. This small
feature can be used to write memo or
send a message in a foreign language. We can
improve my application by managing natively these features instead of using a third
-
party application.

References



[1] Ken Hinckley, Jeff Pierce, Mike Sinclair, and Eric Horvitz. 2000. Sensing
techniques for m
obile interaction. In
Proceedings of the 13th annual ACM symposium on
User interface software and technology

(UIST '00). ACM, New York, NY, USA, 91
-
100.
DOI=10.1145/354401.354417 http://doi.acm.org/10.1145/354401.354417