AudioSense: A Simulation

yardbellAI and Robotics

Nov 14, 2013 (3 years and 11 months ago)

63 views

AudioSense: A Simulation

Progress Report

EECS 578


Allan Spale


Background of Concept


Taking the train home and listening to
the sounds around me



How would deaf people be able to perceive
the environment?


What assistance would be useful in helping
people adapt to the environment?

Project Goals


Develop a CAVE application that will
simulate aspects of audio perception


Display the text of “speaking” objects in
space


Display the description text of “non
-
speaking” objects in space


Display visual cues of multiple sound
sources


Allow the user to selectively listen to
different sound sources


Topics in the Project


Augmented reality


Illustrated by objects in a virtual environment


3D sound


Simulated by an object’s interaction property


Speech recognition


Simulated by text near the object


Will remain static during simulation


Virtual reality / CAVE


Method for presenting the project


Not discussed in this presentation

Augmented Reality


Definition


“…provides means of intuitive information
presentation for enhancing situational
awareness and perception by exploiting
the natural and familiar human interaction
modalities with the environment.”






--

Behringer et al. 1999


Augmented Reality:

Device Diagnostics


Architecture components aid in
performing a diagnostic tests


Computer vision used to track the object in
space


Speech recognition (command
-
style) used
for user interface


3D graphics (wireframe and shaded
objects) to illustrate an object’s internal
structure


3D audio emits from an item that allows the
user to find the location within the object


Augmented Reality


Device


diagnostics


Augmented Reality


Device


diagnostics



Augmented Reality:

Device Diagnostics


Summary


Providing 3D graphics and sound helps the
user better diagnose items


Might also want text information on the display


Tracking methodology still needs
improvement



Speech recognition of commands could be
expanded to include annotation


Utilize IP connection to distribute
computing power from the wearable
computer

Augmented Reality:

Multimedia Presentations in the Real World


Mobile Augmented Reality System
(MARS)


Tracking performed by Global Positioning
System (GPS) and another device


Display is a see
-
through and head
-
mounted


Interaction based on location and gaze


Additional interaction provided by hand
-
held
device

Augmented Reality:

Multimedia Presentations in the Real World


System overview


Selection occurs through proximity or gaze
direction followed by a menu system


Information presentation


Video (on hand
-
held deivce) or images
accompanied by narration (on head
-
mounted
display)


Virtual reality (for places that are not able to be
visited)


Augmented reality (illustrate where items were)

Augmented Reality


Multimedia

presentations

in the

real world

Augmented Reality


Multimedia

presentations

in the

real world


Augmented Reality:

Multimedia Presentations in the Real World


Conclusions


Current system is too heavy and visually
undesirable


Might want to make hand
-
held display a palm
-
top computer


Permit authoring of content


Create a collaboration between indoor and
outdoor system users

3D Sound:

Audio
-
only Web Browsing


Must overcome difficulties with utilizing 3D
sound


X axis sounds identifiable, Y and Z axes sounds
are not identifiable


Need exists to create structure in audio
rendered web pages


Document reading appears spatially from left to
right in an adequate amount of time


Utilize earcons and selective listening


Provide meta
-
content for quick document
overview



3D Sound


Audio
-
only

Web browsing

3D Sound:

Audio
-
only Web Browsing


Future work


Improve link information that extends
beyond web page title and time duration


Benefits of auditory browsing aids


Improved comprehension


Better browsing experience for visually
impaired and sited users


3D Sound:

Interactive 3D Sound Hyperstories


Hyperstories


Story occurring in a hypermedia context


Forms a “nested context model”


World objects can be passive, active,
static, or dynamic



3D Sound:

Interactive 3D Sound Hyperstories


AudioDoom


Like computer game of Doom, but different


All world objects represented with sound


Sound represented in a “volume” almost
parallel to the user’s eyes


User interacts with the world objects using
an ultrasonic joystick with haptic
functionality


Organized by partitioned spaces

3D Sound


Interactive

3D sound

hyperstories

3D Sound


Interactive

3D sound

hyperstories


3D Sound:

Interactive 3D Sound Hyperstories


Despite elapsed time between sessions,
users remembered the world structure
well


Authors illustrate the possibility of
“render[ing] a spatial navigable structure
by using only spatialized sound.”


Opens the possibilities for educational
software for the blind within the
hyperstory context

Speech Recognition:

Media retrieval and indexing


Problems with media retrieval and
indexing


Lots of media being generated; too costly
and time
-
consuming to index manually


Ideal system design


Speaker independence


Noisy
-
recording environment capability


Open vocabulary

Speech Recognition:

Media retrieval and indexing


Using Hidden Markov Models the
system achieved the results in Table 1


To improve results, “using string
matching techniques” will help
overcome recognition stream errors


Speech Recognition:

Media retrieval and indexing


String matching strategy


Develop the search term


Divide the recognition stream into a set of
sub
-
strings


Implement an initial filter process


“Identify edit operations for remaining sub
-
strings in [the] recognition stream”


Calculate the similarity measure for the
search term and matched strings

Speech Recognition


Media retrieval and indexing


Speech Recognition:

Media retrieval and indexing


Results of implementing the string
matching strategy


Permitting more operations improved recall
performance but degraded precision
performance


Despite low performance rates, a system
performing these tasks will be
commercially viable

Speech Recognition:

Continuous Speech Recognition


Problems with continuous speech
recognition


Has unpredictable errors that are unlike
other “predictable” user input errors


The absence of context aids makes
recognition difficult for the computer


Speech user interfaces are still in a
developmental stage and will improve over
time

Speech Recognition:

Continuous Speech Recognition


Two modes


Keyboard
-
mouse and speech


Two tasks


Composition and transcription


Results


Keyboard
-
mouse tasks were faster and
more efficient than speech tasks

Speech Recognition:

Continuous Speech Recognition


Correction methods


Two general correction methods


Inline correction, separate proofreading


Speech inline correction methods


Select text and reenter, delete text and reenter,
use correction box, correct problems during
correction




Speech Recognition


Continuous speech recognition

Speech Recognition


Continuous speech recognition


Speech Recognition:

Continuous Speech Recognition


Discussion of errors


Inline correction is preferred by users
regardless of modality


Proofreading had increased usage with
speech because of unpredictable system
errors


Keyboard
-
mouse involved deleting and
reentering the word


Despite ability to correct inline with speech,
errors typically occurred during correction


Dialog boxes used as a last resort

Speech Recognition:

Continuous Speech Recognition


Discussion of results


Users still do not feel that they can be
productive using a speech interface for
continuous recognition


More studies must be conducted to
improve the speech interface for users

Project Implementation


Write a CAVE application using YG


3D objects simulate sound producing
objects


No speech recognition will occur since
predefined text will be attached to each object


Objects will move in space


Objects will not always produce sound


Objects may not be in the line of sight

Project Implementation


Write a CAVE application using YG


Sound location


Show directional vectors for each object that
emits a sound


Longer the vector, the farther away the
object is from the user


X, Y will use arrowheads, Z will use dot / "X"
symbol


Dot is for an object behind the user, "X"
symbol is for an object in front of the user



Only visible if sound can be “heard” by the
user

Project Implementation


Write a CAVE application using YG


Sound properties


Represented using a square


Size represents volume/amplitude (probably
will not consider distance that affects volume)


Color represents pitch/frequency


Only visible if sound can be “heard” by the user


Project Implementation


Write a CAVE application using YG


Simulate “cocktail party effect”


Allow user to enlarge text from an object that is
far away


Provide configuration section to ignore certain
sound properties


Volume/amplitude


Pitch/frequency


Project Tasks Completed


Basic project design


Have read some documentation about YG


Tested functionality of YG in my account


Established contacts with people that have
programmed CAVE applications using YG


Will provide 3D models and code that
demonstrates some functionalities of YG features
upon request


Will help with answering questions and
demonstrating and explaining features of YG


Project Timeline


Week of March 25


Practice modifying existing YG programs


Collect needed 3D models for program


Week of April 1


Code objects and their accompanying text


Implement movement patterns for objects

Project Timeline


Week of April 8


Attempt to “turn on and off” the sound of objects


Work with interaction properties of objects that will
determine visualizing sound properties


Week of April 15


Continue working on visualizing sound properties


Work on “enlarging/reducing” text of an object


Project Timeline


Week of April 22


Create simple sound filtering menus


Test program in CAVE


EXAM WEEK: Week of April 29


Practice presentation


Present project

Bibliography

Behringer, R., Chen, S., Sundareswaran, V., Wang, K., and
Vassiliou, M. (1998). A Novel Interface for Device Diagnostics
Using Speech Recognition, Augmented Reality Visualization,
and 3D Audio Auralization, in
Proceedings of IEEE International
Conference on Multimedia Computing and Systems Vol I
,
Institute of Electrical and Electronics Engineers, Inc.
,
427
-
432.


Goose, S. and Moller, C. (1999). A 3D Audio Only Interactive Web
Browser: Using Spatialization to Convey Hypermedia Document
Structure, in
Proceedings of the seventh ACM international
conference on Multimedia
(Orlando FL, October 1999), ACM
Press, 363
-
371.

Bibliography

Hollerer, T., Feiner, S., and Pavlik, J. (1998). Situated
Documentaries: Embedding Multimedia Presentations in the
Real World, in
Proceedings of the 3rd International Symposium
on Wearable Computers

(October 1999, San Francisco CA),
Institute of Electrical and Electronics Engineers, Inc.
, 1
-
8.


Karat, C.
-
M., Halverson, C., Horn, D., and Karat, J. (1999). Patterns
of Entry and Correction in Large Vocabulary Continuous Speech
Recognition Systems, in
CHI '99, Proceeding of the CHI 99
conference on Human factors in computing systems: the CHI is
the limit

(Pittsburgh PA, May 1999), ACM Press, 568
-
575.

Bibliography

Lumbreras, M., Sanchez, J. (1999). Interactive 3D Sound
Hyperstories for Blind Children, in
CHI '99, Proceeding of the CHI
99 conference on Human factors in computing systems: the CHI
is the limit

(Pittsburgh PA, May 1999), ACM Press, 318
-
325.


Robetison, J., Wong, W. Y., Chung, C., Kim, D. K. (1998).
Automatic Speech Recognition for Generalised Time Based
Media Retrieval and Indexing, in
Proceedings of the sixth ACM
international conference on Multimedia
(Bristol UK, September
1998), ACM Press, 241
-
246.