Presenter Ivan Chiou

yakzephyrAI and Robotics

Nov 24, 2013 (3 years and 9 months ago)

58 views

Presenter Ivan
Chiou


All come from Electrical
and Computer
Engineering,
Carnegie Mellon University


Zheng

Sun, PhD student in
CyLab

Mobility Research
Center



Aveek

Purohit
,
Ph.D. candidate



Raja Bose
, Microsoft Silicon Valley,
KarMode

LLC



Pei
Zhang, Assistant Research Professor



Spartacus


a mobile
system that
enables spatially
-
aware neighboring
device
interactions with
zero prior configuration
.


Using built
-
in
microphones and speakers


Doppler effect
to enable
an interaction through
a
pointing
gesture
.


audio
-
based
lower
-
power listening mechanism to trigger
the gesture
detection service.



Experiment


90% device selection accuracy within 3m


lower energy consumption


Recent research still require initial
channel of
communication such
as Wi
-
Fi or Bluetooth



Spartacus’ Key contribution:


a
novel acoustic technique based on
the Doppler
effect


a
novel
undersampling

audio signal
processing
pipeline


low
-
power listening(
reduces energy consumption
)
and
without any manual
actions from users


Experimentally validation


How it works:


Spartacus interact
by quickly
pointing her mobile
phone
towards
the targeting device
.


low
-
power listening using their
built
-
in microphones
.


an audio beacon with
a short
duration as an initiator


does
not require any
extra
hardware.


implemented on the
Android
mobile platform without extra
hardware.


High Resolution Doppler
-
Shift Detection


pointing gestures of average users are
usually
transient (shorter
than 0.5s
)


increases the frequency
-
domain resolution by
5X than
traditional FFT
-
based approaches


High
-
Accuracy
Device Selection


Accurately estimate
the peak frequency
shifts


implement a
bandpass

audio signal processing
pipeline to intermit high frequency acoustic noises


Energy
-
effect Interaction
Trigger


a low
-
power audio listening
protocol to trigger
incoming interaction


How Spartacus detects the maximum peak
frequency shifts among those candidate
target devices?


Since the
user made
the gesture directionally
towards the target
device, the
target device
would be able to observe the
maximum
Doppler shift and to be selected.


Deriving Angular
Resolution





where
fA

is the observed tone
frequency of
DA, f0 the frequency of the original tone,
Fs

the sampling rate
, NFFT the number of FFT
points,
and the calculated
frequency shift
expressed in terms of FFT points.


Assume
the target device is stationary
during the course
of the gesture



Improving Resolution using
Undersampling


increasing
the
original tone
frequency
f
0


stronger energy
degradation


increasing
the number of
FFT
points N
FFT


higher computational burden


decreasing
the sampling rate
F
s
.


Spartacus at a very high
frequency(18KHz)


Undersampling

technique can
significantly
reduce it


Determining
Undersampling

Parameters


A higher
n


a higher
fL


Avoided using
fL

higher than 19KHz since it will
cause greater
energy
degradation


Commodity Device limits audio
sampling
rates


include
8KHz, 16KHz, 32KHz,
44.1KHz, and 48KHz


only when n=5, 6, or 7 given
Fs

= 44.1KHz, or when
n =
4 given
Fs

=
48KHz


Angular resolution
improved 26.7 degrees to 10 degrees.



Bandpass

Signal Processing
Pipeline


since the new
sampling rate
is much lower
than the
Nyquist

rate, aliasing arises
in the
original sampled audio signals.


We
found that M = 1.5 led
to
robust
performance in
various indoor environments
.


After each device detects the
Doppler frequency shifts,
all
the
devices
report their
frequency
shift to the sender
device, along with the
device’s ID
information.


The
sender device then
compares all
the received
Doppler shifts and
determines the target device.


Angular Gain through Pointing
Gestures


the number of FFT points is
2048, the
smallest angular
resolution is
10 degrees
when
the
undersampling

factor
n is
equal to 7
.


when candidate devices are
close
to the
user (i.e. within
3m), the device selection
accuracy
is better
than the
analysis
.


This angular
change is significant
when the candidate devices
DA
and
DB are close to D0.
Assuming the
user’s arm is 60cm,
the
effective
angular
difference
is increased
to 55", which makes
the two devices much
easier to
be
differentiated
.


How Spartacus Design
for
saving energy?


Low
-
Power
Audio
Listening


Advantages



Ubiquitous Hardware
Support


No extra
hardware and Only
need Microphones
and
speakers


Limited
Range


Easy to detecting
neighboring devices within the same space


Energy
Efficient


designed for continuous discovery.


Protocol
T
wo
major
modes

»
Periodic
Listening


wake up (every
Trx
)


Record sound for duration (
drx
).

»
Beaconing


After receive
the
beacon, switch
to
continuous listening
mode to record the
gesture


a short beacon
duration
consumes more
energy


Tradeoff
between energy
consumption vs
. duty
cycles


Encodes
the device
ID using
the Reed
-
Solomon
coding


U
sing
a 16 Frequency Shift
-
Keying (FSK)
scheme with
a central frequency at 19KHz.

»
Keys
are u
sing
a 50Hz

»
the transmission of the device
ID is
at least 200Hz lower than the gesture
tone
-

NO ambiguities


Dealing with Wakeup
Jitter


It can be observed
between
when an API starts recording
sound
and when
the system
actually begins recording
.


average
jitter: 70ms,
standard
deviation: 15ms


empirical
measurements to
solve this problem


Dealing with Wakeup Jitter


due
to the existence
of the
wakeup jitter

,
an additional guard band

is used
in the
beacons.


Hardware


Android
platform on Galaxy
Tab, Nexus 7,
Galaxy
Nexus
, and HTC One S
.


Software implementation


4 components


GestureSensing


GestureSensing.makeGesture
();


GestureSensing.analyzeGesture
();


LowPowerListening


LPL.start
();


AudioModem


GUI
.


In Spartacus, we use tone frequencies higher
than
20KHz :
inaudible


quantize the energy degradation of
sound


Devices:


Sennheiser

MKE 2P microphone


Yamaha
NX
-
U10 speaker


energy
degradation higher
than 15KHz


Mobile phone usually designed for human
conversations
and
music that is lower than 15KHz


increases
every 1KHz
, the degradation of sound energies increases
5dB
on speakers


average
3.2dB/m energy
decrease of sound from 1m to 6m


These results indicate that, to reduce
energy degradation
and increase
interaction range, audio tones
with lower
frequencies should be leveraged.


Challenging
questions
:


How
diversely do users point their phones, and
how fast
can a
user
point?


If
the user points fast enough, how often does the
target device
observe the highest frequency shift, thus
the highest
velocity, of
the
gesture?


If
we want to estimate the frequency shifts, how
much
frequency
-

and time
-
domain resolution do we need
to
successfully
capture the peak frequency shift inside of
a gesture?


Participator


12
participants (6 females
)


briefed the participants on
the idea
of
Spartacus before the
experiment


10 gestures towards a target device
2m away
from them, using
a Galaxy Nexus phone
.


detected hand trajectories of
the participants
using image
processing techniques


Finding 1


Three types of gesture


most of the participants fully
stretched

out
their arms


Focusing
on evaluating this
vertically downward

gesture trajectory
in the
current design of Spartacus.



Finding 2


facing towards the target device,
with an
average
±
7.5
"
angular bias.


precisely
point the
phones towards the target
device


selecting the target
device using the
maximum velocity


Finding 3


The peak velocity of the gestures of all
participants was
3.4m/s on
average


Most
of the
gestures lasted
less than one
second, and the peak velocities
appeared and
diminished within 25ms
.


Spartacus needs
a high
time
-
domain
resolution to position the peak
frequency
shifts


Galaxy Nexus phone
25 times
towards the
target
device


a peak velocity
of about
3m/s
.


Select 20 from 25
gestures
for
analysis
.


captured at
the two
candidate devices at
44.1KHz,
undersampled

7 times to 6.3KHz


Performance with Distances and
Angles


A
s
the distances between
devices increase
, the
device selection accuracy drops
gradually


Since tones
and other frequency
bands decreases
as
the distances increase


as

decreases, the accuracy
of device
selection
drops.


Evaluation metals sounds


played
a piece of
rock music
(i.e.
“Burn
It Down”
of
Linkin

Park
)


metal clangs
can hardly
reach frequencies above
18KHz, which has
limited effect
to Spartacus.


limited space
in these scenarios


Only test to 1.5m with 30

degrees.


Distance increase, the
performance slight decreases due
to the stronger multi
-
path
effects
in the
Cubicles and Hallway.


A
ll three

cases
,
achieved
higher
than 85%
accuracy.


Spartacus: 2014
-
point
FFT
processing


takes 1.5s
to process
a
1s gesture


traditional
FFT:
8192
-
point FFT
processing


takes 8.7s


compare the performance
under
different
duty
cycles


fixed
each
listening session to
200ms


Hardware


Galaxy Nexus mobile phones


Each test time


running low
-
power
listening task
for
5min


Result


4X lower energy consumption
than
WiFi

Direct


5.5X
lower than the latest
Bluetooth
4.0 protocols


Audio Processing in Mobile
Sensing


Microphones on Mobile
sensing


Miluzzo


human conversation snippets for analyzing social activities


SurroundSense


combined with other sensing modalities

»
accelerometers, cameras, and magnetometers to detect locations of users
for social context inferences


Lu


unknown social events can be automatically identified and easily labeled


Microphones on
Energy
-
efficient


JigSaw

and Darwin
Phones


enabling energy
-
efficient continuous sensing
and collaborative
learning
techniques


MoVi


multiple participants to create integrated social
event records


SwordFight


Provide distance
ranging technique using time
difference
of
sound arrivals


Spatially
-
Aware Device Interactions


Point & Connect (P&C) proposed an interaction technique based
on time difference of sound arrivals.


Enabling P&C may prevent the users from using their default
WiFi

networks.


launched the related service and continuously waiting for interaction
requests


consume significant energy.


SoundWave


Single
-
device interactions


the laptop is both the transmitter and the receiver of Doppler effect, the
generated frequency shift is doubled.


PANDAA


No extra infrastructure and no extra effort from users to initiate
interactions


only supports devices in stationary placements


Polaris


Support spatially
-
aware
indoor device
interactions


dealt
with only absolute directional
relationships of
devices


Energy
-
Efficient Interaction
Triggers


B
e
enabled on
demand when
the energy
constraint is
not a major
concern
.


Triggered
by other
traditional communication
schemes, such as
Bluetooth or
WiFi

Direct.


To solve that user has to wait
for a couple of seconds for
a “
warmup

beacon”before

doing the
gesture in
Spartacus


Security
Issues


malicious device standing close by
could pretend
to have detected
higher Doppler shifts than
other devices
, so that it deceives the sender
into thinking it
was the receiver.


Only
trusted and
authenticated devices
c
ould be
allowed to report their
Doppler shifts
.


After the
user’s device determines the potential receiver who
has reported
the
maximal Doppler shifts, the name and
identity of
receiver’s owner would be
shown on the user’s device.


Contentions Among Interaction
Sessions


Used
in a crowded
scenario(ex. airport)


contentions could be an issue for device
pairing techniques


Need a contention
coordination mechanism



Spartacus,
a spatially
-
aware
interaction
system


H
igh accuracy


Low latency


L
ow
energy
consumption


No
extra
hardware


Z
ero
prior
noisy configuration


Use in
various
conditions
.


Experimental evaluations for Spartacus
performance


This paper only document the initial
gesture

in its experiments? How about
other gestures detection that receiver can
recognize difference meanings of senders?


If there are many children and adults who
have different height and stand close in
crowded scenario, how could the system
to separate tallest and shortest from all
selection targets?

Presenter Ivan
Chiou