Voice Signal Technologies - MMM Home

spectacularscarecrowAI and Robotics

Nov 17, 2013 (3 years and 8 months ago)

170 views

Experiences with Mobile Device
Multimodal Applications

Jordan Cohen

June 2004

Martigny, Switzerland

Multimodal Communications Thru the Ages

Outline


Notes about Multimodality


Computing and the User Interface Problem


Technical Advances in Mobile Platforms


Mobile Platform Markets


Some Multimodal Applications


Future Trends



Multimodal Questions


Where is the system multimodal?


The User


The Computing Environment


The Application

The Multimodal User
-

Human



Input


Visual


Auditory


Tactile


Olfactory


Temperature


Motion


Output


Voluntary


Pressure (touch)


Audio (speech)


Involuntary


Lip Position


Conductivity (galvanic Skin Response)


Electromagnetic Waves (Brain Stem Response)


Gaze (Eyeball Location)





The Computing Environment


Traditional


Screen


Keyboard


Mouse


Microphone


Speakers


Large Memory


Fast Computation


Embedded Platforms


Screen


Microphone


Cursor


Buttons


Speaker/Earphone


Non
-
Volatile Slow Memory


Fixed Point Slow Computation

The Modern Computing Environment

Traditional Computing

Embedded Platforms

Large Screen

Keyboard

Mouse

Microphone

Speakers

Large Memory

Floating PointFast
Computation

Small Screen

Keypad

Cursor

Microphone

Speaker/Earphone

Non
-
Volatile Slow Memory

Fixed Point Slow
Computation



Multimodal Applications


Traditional


Word Processing


Transcription via Dictation


Game Playing


Modern


Meeting Analysis


Battle Planning


(Activity) Assistant


Pilot


Driver


Data Analyzer


Forklift Operator


Telephone User

Characteristics of Multimodal Interfaces


Transparent


Easy to Learn


Easy to use


Robust


Flexible


More than one input mode


Mode Choice


Adaptability over Time (Mobile Requirement)


Efficient


Accommodate Adverse Conditions


Stable


Combined Modes often Quicker


Superior Error Handling


Simplified Language


Mode Switching Facilitates Error Recovery


Mutual Disambiguation Possible for Simultaneous Inputs



Sharon Oviatt, “Multimodal Interfaces”, 2002

Outline


Notes about Multimodality


Computing and the User Interface Problem


Mobile Platform Markets


Some Multimodal Applications


Future Trends



The First Computer

The Babbage
Difference Engine
(1832)
25,000 parts
cost:
£17,470
Famous Predictions



“Computers in the future may weigh no more
than 1.5 tons” (Popular Mechanics, 1949)

Famous Predictions


“I think there is a world market for maybe five
computers”


(Thomas Watson, Chairman of IBM, 1943)

The Mainframe
-

IBM 3090




The Easy
-
To
-
Use PC


Down to the Pocket PC


Transistor Counts

1,000,000

100,000

10,000

1,000

10

100

1

1975

1980

1985

1990

1995

2000

2005

2010

8086

80286

i386

i486

Pentium
®

Pentium
®

Pro

K

1 Billion
Transistors

Source: Intel

Projected

Pentium
®
II

Pentium
®
III

Courtesy, Intel

Power Dissipation

P6

Pentium
® proc

486

386

286

8086

8085

8080

8008

4004

0.1

1

10

100

1971

1974

1978

1985

1992

2000

Year

Power (Watts)

Lead Microprocessors power continues to increase

Courtesy, Intel

Power density

4004

8008

8080

8085

8086

286

386

486

P6

1

10

100

1000

10000

1970

1980

1990

2000

2010

Year

Power Density (W/cm2)

Hot Plate

Nuclear

Reactor

Rocket

Nozzle

Power density too high to keep junctions at low temp

Courtesy, Intel

ARM processors

Famous Predictions


“This ‘telephone’ has too many shortcomings to
be seriously considered as a means of
communication. The device is inherently of no
value to us.”




Western Union internal memo, 1876.

The Early Car Phone

And There Were Cell Phones


Motorola Made the Phone Portable!


Martin Cooper, April 3, 1973


First Public Cell Phone Call (non
-
car phone)

The Ubiquitous Computing Platform

Digital Cellular Market

(Phones Shipped)

1996 1998 2000 2004*

Units

48M 162M 435M 600M*

Analog

Baseband

Digital Baseband

(DSP + MCU
)

Power

Management

Small

Signal RF

Power

RF

data from Texas Instruments

*
-

Projection


Cell

Phone

The Shape of Multimodal Interfaces

Error Rate



Moore’s Law Time Constant:



10x improvement per decade



Limited by R&D Investment



(Not Physics)

Borrowed Slide

Audrey Le (NIST)

1990

2000

2010

Outline


Notes about Multimodality


Computing and the User Interface Problem


Mobile Platform Markets


Some Multimodal Applications


Future Trends



Features Enhanced by a Multimodal UI

Full PIM on phone
Voice Dialing
Send/Receive SMS
Send/Receive e-mail
Access the Internet
Coodinate meetings
PDA function
Current
0
10
20
30
40
50
60
70
80
90
100
% of Users
Selected Services Requested
Current
Requested
Full PIM on phone
Voice Dialing
Send/Receive SMS
Send/Receive e-mail
Access the Internet
Coodinate meetings
PDA function
Current
0
10
20
30
40
50
60
70
80
90
100
% of Users
Selected Services Requested
Current
Requested
Full PIM on phone
Voice Dialing
Send/Receive SMS
Send/Receive e-mail
Access the Internet
Coodinate meetings
PDA function
Current
Requested
0
10
20
30
40
50
60
70
80
90
100
% of Users
Selected Services Requested
Current
Requested
Technology Opportunities vs. Markets


Opportunity


Make Some Process Better


Easier


Quicker


More Effective


More Robust


Possible


Market


Opportunity


Money


Technology


What is the Mobile Handset Market?


Multimodal Applications make the handset


Easier to use


More robust


More efficient


Environmentally Less Sensitive


Is There a Market?


How Many Handsets?


Where do Applications Fit?


What Existing Applications Can Be Augmented?


What Legal Drivers Exist?


Major Drivers for Embedded Multimodality


Power


The most power hungry function on your handset is
the radio


The second most expensive operation is the digital
channel


Local operations are cheap


The User Interface is Antique!


9 (12) buttons


Small Screen


Timing


Local computation is predictable


Network computation has a large random
component


The Cacophony of Capabilities


Telephone Calls


International


Three Party


(VOIP)


Messaging


SMS


MMS


Instant Messaging


Video


Services


Web Access


Games


Personalization


Ring Tones


Front Covers


Customers?
-

Nokia prognosis for
3G mobile communication

Monthly income pr. user in euro
(1 euro = 1 USD ca.)

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

0

10

20

30

40

50

60

70

80

90

100

Location based services

Commercials

Entertainment

Information services

Payment transactions

Music and video

Internet surfing

Download from internet

Chat on internet

Multimedia messages

Text messages

Vide conferencing

Normal speech

Fixed subscription
fees

Div. telecomm.

Photo messages

A Sign of the Times

“ I have always wished that my computer
would be as easy to use as my telephone.
My wish has come true. I no longer know
how to use my telephone.”


Bjarne Stroustrop (originator of C++)



Projected Unit Shipments of Mobile
Electronic Products with Flash
Memory


Inside Mobile, February, 2004


1970


1980 1990

2000


2010


2020

Year

Analog

1G

Standard began ~ 1970

(Country Specific)

Digital

2G

Standard began
-

Early 1980’s

(Regional)

Digital

3G

Standard began
-

Early 1990’s

(International)

Deployment Timeline for WWAN Standards

Creating and commercializing a global standard takes time

Early Discussions Begun

(~2002
-

)

Digital

“4G”??

Global Revenue per Wireless Network Type


PAN

(Bluetooth & UWB)


Around US$1.5B for Bluetooth by 2007 (Source: IDC) and US$1.7B for UWB by 2007 (ABI)



WLAN

(Wi
-
Fi)


US$4.1B by 2008 (Source: IDC)



MAN

(WiMAX)


2
-
4 million by 2008, generating up to US$2B in access revenues (Source: Pyramid)



WAN

(802.20)


30 million users worldwide by 2009 (Source: Visant Strategies)



WWAN
-

Mobile

(2G, 2.5G, 3G)


Around 2.2B users worldwide, generating US$500B by 2008 (Sources: EMC & Telecompetition)

0
50
100
150
200
250
300
350
400
450
500
PAN
LAN
MAN
WAN
WWAN
Worldwide Revenue by 2008

Source: IDC, Pyramid, Visant Strategies, ABI, EMC, Telecompetition

(Billions)

Sources: Graph: EMC World Database, March 2003


* Subscriber forecast for 2007 was derived from the average estimate from In
-
Stat MDR, Strategy Analytics, Ovum
, iGilliott


** Strategy Analytics, October 2003


Worldwide demand for WWAN mobile telecommunications continues at a rapid
pace due to the operators’ ability to offer more value at lower prices

WWAN


2G & 3G

Cumulative Number of Subscribers

(Millions)

0
200
400
600
800
1000
1200
1400
2002
2003
2004
2005
2006
2007
WCDMA
CDMA
GSM
0

200

400

600

800

1000

1200

1400

2002

2003

2004

2005

2006

2007

The Global Market for Services

$0
$20
$40
$60
$80
$100
$120
$140
$160
2001
2002
2003
2004
2005
2006
Messaging
Info Svcs
Location Svcs
Entertainment
Other
Billions

Source, “Global Wireless Device Market,” Strategy Analytics, October 1, 2001

Mobile Device Segmentation: 2006

Handset Sales Volume (1 billion units)
Browser
phone
69%
Smartphone
23%
Non-browser
phone
1%
Multimedia
Terminal
7%
Source, “Global Wireless Device Market,” Strategy Analytics, October 1, 2001

Outline


Notes about Multimodality


Computing and the User Interface Problem


Mobile Platform Markets


Some Multimodal Applications


Future Trends



Rules of Robust Embedded Applications


It must work the first time


It must work the second time


It must work the third time





The instruction book must have only two
sentences


Number of people online by language
spoken
(
http://www.glreach.com/globstats/)



The Issues


Make the Cell Phone Useable


Language

the Global Cell Phone Business


Dialects


Localization


Acoustic Environment


Mostly Unaware


Real Time


Real Time


Real Time


Microphone Type/Location


Software Reliability


Computational Tradeoffs


Getting the Right Application


Ask the right questions


Utilize the multimodality of the platforms

Test, test, test, test…


Automated and live testing
on target hardware


Stress testing, application
verification on target HW


PC test program, artificial
mouth, pneumatic fingers

Voice Activated Phonebook (VAP) Setup

Say a Command:

Settings

Help

Settings

Help

Digit Dial

Name Dial

Quick Dial

Voice Memo

Phone Book

Browser

Say a Command:

Say the Name:

Settings

Help

Say the Name:

Settings

Help

Which Number?


John Smith

Help

Settings

Which Number?


John Smith

Help

Settings

Home

Mobile

Work

Connecting…

John Smith

Mobile

555
-
1212

Cancel


No Special Contact Entry Required:

Contacts
can be entered into the device by the
keypad, PC sync software, SIM card, or OTA
synchronization.


Plug & Play
-

No User Setup Required:

By
analyzing the spelling of the contact entries as
they are entered, our VAP software
automatically voice activates every contact
list entry
-

no user setup or training is required.

VSuite Core Feature Index

How it works….

Thread titled, “Samsung A600 Users”


Posted: Tues July 22, 2003 5:51 pm


Just push and hold TALK and then say NAME DIAL


When "Samsung Lady" says "name please" just say one of the names in you phone book. You'll
here an "internal lady" repeat it back pronouncing how it's spelled.


"Samsung Lady" will ask if that's right and then dial.


Try it...it worked with my last name Allebach and that's one that no one knows how to say. It
worked with Ben and Zimmer (where I work)


No user training:
Unlike old voice dialing, VAP
works right out of the box, and voice
-
activates
every name in the phonebook, no matter how
large.


Multimodal:

User can dial by voice, or key
press. Dialogs are presented on the display,
and played through the earpiece for total
eyes free dialing


Eyes
-
free, helpful voice guide:

Intuitive
dialog is easy to use the very first time and
doesn’t require looking at the display.


Listening indicators:

Users can see visual (ear
icon) and hear (beep) cues when to speak.


Dial from the entire phonebook:

When
multiple numbers are stored, the system asks
which one to dial.

Voice Activated Phonebook (VAP): Dialing with
Discrete Commands

Say a Command:

Settings

Help

Settings

Help

Digit Dial

Name Dial

Quick Dial

Voice Memo

Phone Book

Browser

Say a Command:

Say the Name:

Settings

Help

Say the Name:

Settings

Help

Which Number?


John Smith

Help

Settings

Which Number?


John Smith

Help

Settings

Home

Mobile

Work

Connecting…

John Smith

Mobile

555
-
1212

Cancel

Click to

Start VSuite

VSuite Core Feature Index

Comments from Time.com



"...Now there's something infinitely better: it's called
"speaker
-
independent" voice recognition, and the
best one out by far is from Voice Signal...this
million
-
dollar technology might already be in your
phone. Have a look. If not, trade it in, because this
stuff is awesome!..."





WILSON ROTHMAN


Time Magazine, “Gadget of the Week”, Feb. 4, 2004




Voice Activated Phonebook (VAP): Dialing with
Natural Commands

Say a Command:

Settings

Help

Settings

Help

Call [number]

Call [name]

Go To [Application]

Voice Message [Name]

Lookup [Name]

Send [Message] to [ ]

Say a Command:

Connecting…

John Smith

Mobile

555
-
1212

Cancel


Freedom, Flexibility, Convenience:
Natural Commands enhance the VSuite 1.x
experience by allowing users to accomplish
tasks with single, natural voice commands.


What


Who


Where Paradigm


What ?






Call, Send Message?


To whom?





Person or Number?


To which device?



Mobile? Email? Pager?


Smart Dialog:
When VSuite isn’t sure what
you said, it will ask, “Did you say?” allowing
for instant correction.

Click to

Start VSuite

VSuite Upgrade Index

Voice Activated Message Setup

Say a Command:

Settings

Help

Settings

Help

Call [number]

Call [name]

Go To [Application]

Voice Message [Name]

Lookup [Name]

Send [Message] to [ ]

Say a Command:

Click to

Start VSuite


Fastest and Simplest Method to Set Up
and Address Messages:


Single key press


Compatible with any message
client and protocol (SMS, EMS, MMS,
Email, Instant Messenger,)


In just a few seconds, the messaging
client is open, addressed, and waiting for
text input.


Any name or number in phonebook
can be addressed by voice


VSuite Upgrade Index

John Smith

Send

ABC

A Messaging Application


Demonstration

Outline


Notes about Multimodality


Computing and the User Interface Problem


Mobile Platform Markets


Some Multimodal Applications


Future Trends



…What’s Next?

Interface To The Digital Future!

The Technical Thrusts


Instant Messaging


AOL


Microsoft


Apple


Bantu


1.2m Army clients


350k Navy clients


Web Interfaces


Oracle


SAS


Salesforce.com


Pervasive Computing


IBM


HP


Microsoft



The 802.11 Revolution


Cell Phone and Wideband


HotSpot Spread

Advanced Features


Speech
-
to
-
Text


Camera and Video Integration


Smart Phone Market Expansion


Web Services and Enterprise Opportunities


Ubiquitous Computing

Last thoughts


“Everything that can be invented has been
invented”


Charles Duell, Commissioner, US Office of
Patents, 1899.



“As we say in the computer business, “shift
happens”


Time Romero

Embed for the Future!

Jordan Cohen

jrc@voicesignal.com


Speech recognition market

(software,
applications, services and hardware) 2001
2002 2003 2004 2005 CAGR End
-
user reveue
(US$m) 468.3 823.1 1,342.6 2,036.2 2,749.4
55.66% Growth (%) 152.3 75.8 63.1 51.7 35.0
Source: Cahners In
-
Stat Note: Table made
from bar graph.
Copyright 2002 Emap
Business Communications


Opportunities for Power Savings

European mobile subscribers with Internet
enabled handsets

33%
49%
58%
63%
68%
70%
0%
20%
40%
60%
80%
100%
2002
2003
2004
2005
2006
2007
Percentage of Mobile subscribers

Source:

Milestones in Speech and Multimodal
Technology Research


1962

1967

1972

1977

1982

1987

1992

1997

2002

Year

Isolated
Words

Filter
-
bank
analysis;
Time
-
normalization
;Dynamic

programming

Isolated Words;
Connected Digits;
Continuous
Speech

Pattern
recognition; LPC
analysis;
Clustering
algorithms; Level
building;

Continuous
Speech; Speech
Understanding

Stochastic language
understanding;
Finite
-
state
machines;
Statistical learning;

Small
Vocabulary,
Acoustic
Phonetics
-
based

Medium
Vocabulary,
Template
-
based

Large
Vocabulary;
Syntax,
Semantics,

Connected
Words;
Continuous
Speech

Large
Vocabulary,
Statistical
-
based

Hidden Markov
models;
Stochastic
Language
modeling;

Spoken dialog;
Multiple
modalities

Very Large
Vocabulary;
Semantics,
Multimodal
Dialog, TTS

Concatenative
synthesis; Machine
learning; Mixed
-
initiative dialog;

Borrowed

Slide

Consistent improvement over time, but unlike Moore’s Law, hard to extrapolate (predict future)

Cell Phone Power by Function

Early Multimodal Systems


Speech and Pen


Dataland (Negroponte, 1978)


Map Reading


Speech and Lip Movement


Chalapathy Neti (IBM)


Speech and Manual Gesture


Handicapped Aids


Gaze Tracking and Manual Input


Driving Simulation