ViaScribe - The Illinois Center for Information Technology Accessibility

notownbuffΤεχνίτη Νοημοσύνη και Ρομποτική

17 Νοε 2013 (πριν από 3 χρόνια και 10 μήνες)

105 εμφανίσεις

ViaScribe


and CaptionMeNow Solution Scenarios


IBM Confidential

Page
1

of
13




















Solution
Scenario
s

For

the
U
se of

ViaScribe™

and CaptionMeNow




A

Best
-
of
-
Breed Captioning Solution












Release Date:
November 17, 2013





ViaScribe


and CaptionMeNow Solution Scenarios


IBM Confidential

Page
2

of
13











Table of Contents


Table of Contents

................................
................................
........................

2

Executive Summary

................................
................................
.....................

3

Uniqueness of IBM's Captioning Solution

................................
........................

3

ViaScribe™

................................
................................
................................
.

5

CaptionMeNow

................................
................................
............................

9

TransformMeNow

................................
................................
......................

10

Solution Scenarios

................................
................................
.....................

11

Summary

................................
................................
................................
.

11

Appendix A
-

Software Supported

................................
................................

12

Appendix B
-

A
pplication Architecture

................................
..........................

13

Appendix A
-

Software Supported

................................
................................

13


ViaScribe


and CaptionMeNow Solution Scenarios


IBM Confidential

Page
3

of
13












Executive Summary


The purpose of this document is to pro
vide an overview of the capabilities and resulting
benefits that can be realized by implementing the IBM
ViaScribe™

and CaptionMeNow
Soluti
ons.

This
paper is intended to address

requirem
ents
for captioning and transcribing
digitized audio/video archives a
s well as new live lecture and documentary materials
.

Two
methods of captio
ning are provided
. IBM can provide tools and services to create captioning
more cost
-
effectively as an in
-
house capability, using
ViaScribe™

technology.
Another model
involves provi
ding IBM with

audio/video
materials, to

caption
materials
more cost
-
effectively
using
a combination of automated tools with human interventions.
This document will
highlight

other
value
s

for c
aptioning
, such as improved access for non
-
native English speake
rs along
with better search capabilities. This document will also address Best of Breed attributes and
capabilities of IBM’s captioning solutions. Packaging scenarios are presented, along with
suggestions for managing costs.


IBM has a history of innovatio
n that matters. We spend nearly $6 billion a year on Research
and Development, and for each of the past 12 years have received more patents than any
other company
.

We are leaders in the area of speec
h recognition, with over 30 years of
research and devel
opment and the longest track record of any industrial speech recognition
effort.

Uniqueness of IBM’s Captioning Solution


ViaScribe™

and CaptionMeNow are one
-
of
-
a
-
kind solutions on the market, with capabilities
unmatched and not available from other sources. IBM created these solutions in response to
needs identified by key universities, through the Liberated Learning consort
ium
(
www.liberatedlearning.com
). The universities identified gaps in the speech technology
marketplace that made it
difficult

to caption lecture materials.
ViaScribe™

was created to
close

those gaps. Captio
nMeNow is a services extension on
ViaScribe™

that has been
recognized by a number of government agencies
and enterprises
as a solution to a heretofore
unsolved problem.


The gaps in the speech technology market that make captioning challenging, hard to us
e, and
expensive, are
as follows:




Much of the commercially available speech recognition software is optimized for low
bandwidth, limited domain telephony applications.



The large vocabulary commercially available systems are designed primarily for
desktop

dictation applications, rather than for captioning. Therefore the display
characteristics are not optimized

or flexible for captioning.



Commercial systems generally do not have a mechanism for easy and synchronous
integration of multimedia with captioning
, such as video or
Microsoft® PowerPoint®

slides.


F
rom a Best of Breed perspective, the capabilities that make
ViaScribe™

unique in the
market are as follows:




ViaScribe™

is predicated on IBM’s speech recognition expertise. IBM Research has
been a leader
in speech recognition algorithm development for over 30 years, and
has pioneered many of the now accepted commercial advances in this field. IBM’s
leadership is supported by
consistent

high performance in standard competitive fora,
such as DARPA and NIST c
ompetitions. Trained/enrolled speakers in the university
setting have been able to achieve over 90% speech recognition accuracy. For
speakers that have not trained/enrolled, there is an option to use
ViaScribe™

in
speaker independent mode as well, though this is generally accompanied by
some

reductions in accuracy scores.

ViaScribe


and CaptionMeNow Solution Scenarios


IBM Confidential

Page
4

of
13














ViaScribe™

was designed to automatically “bind”
and synchronize
multiple media
tracks. A speaker using
ViaScribe™

will be able to

immediately create a multimedia
presentation that binds their audio,
accessible
captions, and visual materials (slides
and/or video.) The resulting multi
-
modal presentation can be posted to the web
immediately and automatically, or delayed for post
-
proces
sing quality enhancements,
depending on
customer requirements.




ViaScribe™

automatically in
trodu
ces progressive line breaks from natural pauses in
speaker verbalization. Line breaks can then quickly be converted to standard
punctuation during transcription post
-
processing by a minimally trained human
operator.




The out
put presentation created by
ViaScribe™

also supports fuzzy searches on the
content, allowing a viewer to identify particular points of interest in a presentation.
IBM

s advanced multimedia mining capabilities can be brought to bear as well,
enabling even m
ore sophisticated searching on text and images across a range of
documents.




ViaScribe™

offers extensive user control of text presentation parameters. Color, font
size, and other display attributes will meet requirements of many low vision users,
seniors,
and users with some reading disabilities.




Usage of a standard SMIL format synchronization file enables playback on media
players such as Real

M
edia Player. Compatibility with Real Media input format RMA
ensures compatibility with LOC digital archives.




Th
e CaptionMeNow option enables On

Demand Captioning for existing video
archives on an as
-
needed basis.




The use of advanced speech recognition technologies and editing capabilities results
in considerably reduced costs compared with stenographic transcripti
on, without
sacrificing quality or accuracy.



ViaScribe


and CaptionMeNow Solution Scenarios


IBM Confidential

Page
5

of
13











ViaScribe™


IBM
ViaScribe™

is a

speech recognition
-
based

tool for real
-
time transcription, display,
editing, annotation and presentation. The power and flexibility of its multimo
dal features make
its output applicable and appealing to broad audiences, including seniors, deaf, hard of
hearing, blind, some reading disabled, audiences whose English is a Second Language ESL),
and any researcher who has a need of synchronized searching

into the material
.
The
multimodal design
also makes

the post
-
hoc editing and annotation process
es

highly cost
effective. IBM
ViaScribe™

uses speech
-
to
-
text technology to provide real
-
time captioning of
classroom lectures and real
-
time transcription, alignment and preparation of e
-
learning
materials.
ViaScribe™

provides the benefit of multi modal access
---
the opportunity to see a
nd
hear the audio/video lecture track, read and search the textual transcription unmodified or
magnified, hear the text transcription through a user supplied screen reader for the blind, or
view the whole presentation even more highly magnified through a u
ser supplied screen
magnifier. With
ViaScribe™
, spoken information
---
such as lectures an
d
w
ebcasts
---
can be
captioned
onto a screen in real time.


Captioning or transcribing
webcast
s

meets the accessibility requirements of those individuals
who are deaf or hard of hearing. Furthermore, th
e resulting synchronized multimodal output
and presentation of
ViaScribe™

will enhance infor
mation access for all users
, including those
that are aging or have print disabilities. Many users will benefit from the synchronization of
audio, visual and text t
racks. All users will benefit from having searchable access to the text
output files. Users
who

want to listen to webcasts on a particular topic can more easily find
sections of interest by searching for particular keywords.


ViaScribe™

was originally deve
loped for the academic environment. For years, universities
have faced a number of challenges in making classroom lectures accessible to students who
are deaf or hard of hearing. Traditional methods
---
sign interpreters, stenographers or student
note takers
---
are often costly, difficult to procure or inconsistent. IBM
ViaScribe™

software
provides educators and students with one of the first live captioning systems made for use in
the classroom environment.


The tool is relatively simple to use.
To achieve the best results, p
rofessors create a voice
profile that helps
V
iaScribe™

'learn' their particular speech patterns.
Guest speakers and
others who are not “enrolled” (for whom a voice profile has not been created) use
ViaScribe™

in speaker independent mode.
During class,
presenters
speak into a microphone that
transmit
s the lecture to
ViaScribe™
's speech
-
recognition software. Captions are generated in
real time, presented on a
large classroom
screen and saved for later use by students and
educators.


When a presenter speaks into the microphone,
ViaScribe™

digitizes the
speech then
recognizes it, producing both a transcription and an audio recording. Audiences have the
benefit of being able to obtain a raw, unedited transcript immediately following the
presentation. Presenters can then edit the transcript to correct recog
nition errors or provide
additional information, and post the notes on the Web in various accessible formats, including
searchable transcripts, synchronous multimedia and digital audio. This enable
s

greater
access for all people
---
including those who are d
eaf, hard of hearing, mobility impaired,
learning disabled, non
-
native speakers and distance learners.
T
ranscribed text is
also
easier
to process through automatic text summarization or translation programs
. Therefore
u
niversities, institutions, government
s and corporations can become more accessible to
people worldwide
.



ViaScribe


and CaptionMeNow Solution Scenarios


IBM Confidential

Page
6

of
13











Figure 1 presents
ViaScribe™

in use for
real
-
time

captioning, or captioning of
pre
-
recorded

materials.


Figure 1




Speakers use microphones during lectures OR can be shadowed by a trained
ViaScribe™

user for existing
v
ideo
/audio OR simply run thr
ough the
ViaScribe™

tool
for automatic transcription




Software converts speaker’s voice into electronic text




digitizes and transcribes speech to text




After lecture, text is edited for recognition errors and made available with the video,
audio and presen
tation material




ViaScribe™

enhancements to “standard” speech recognition systems:



When the speaker pauses, the text skips to a new line



Easy
-
to
-
use error correction system, for post
-
hoc editing.



Can work in speaker independent mode



Flexible display: chan
ge colors, fonts, pause marker







ViaScri
be

ViaScribe


and CaptionMeNow Solution Scenarios


IBM Confidential

Page
7

of
13












The
ViaScribe™

User
Interface is presented in Figure 2:



ViaScribe™

User
Interface

Figure 2:



Menus: File, View, Microphone, Editor, Tools, Help




File
Commands
:


Exit:

Exits
ViaScribe™
.



View

Commands:


Mo
st
:

Hides all toolbars and the tabbed pane to create a larger display area.


Toolbar
:

Toggle to display or hide the iconic toolbar.


Status
:

Toggle to display or hide the active voice profile indicator.


Always on Top:

When checked, always displays the

ViaS
cribe™

window over

other open applications so that
ViaScribe™

text is never hidden when other windows are
active.






ViaScribe


and CaptionMeNow Solution Scenarios


IBM Confidential

Page
8

of
13











Microphone

Menu:


Toggle to turn
microphone on and off. (Useful when the "Most" feature is active and the
microphone icon is hidden.)


Edit

Commands
:


ViaScribe™

offers the familiar Undo, Cut, Copy, Paste, Select All, and Font functions
common to most Windows
-
based programs like MS Word.


Open Transcription
:

opens a saved
ViaScribe™

.XML transcript file (and associated
audio, if availab
le).


Load Media
:

opens a saved audio file.


Save
:

overwrites transcript files (
ViaScribe™

XML and .RT) with edited changes.


Save As
:

saves a
ViaScribe™

XML file (without an extension).


Media Source
:



Microphone. The default (for live display of tex
t)



Transcribe media. Transcribe pre
-
recorded media (such as an audio file or video).





ViaScribe


and CaptionMeNow Solution Scenarios


IBM Confidential

Page
9

of
13












CaptionMeNow


CaptionMeNow was developed to address the proliferation of multimedia information on the
web without associated tran
scription.
Audio is not always the medium of choice (for example,
in noisy environments, most users prefer to read captions rather than listen to the audio.)

Without the transcribed text from a video or lecture, much of the multimedia on the web:



Cannot be

searched, indexed, mined or translated



Is difficult to hear clearly and understand



Violates accessibility



CaptionMeNow is displayed pictographically
in Figure 3
, from a user’s perspective:




©
2005 IBM Corporation
CaptionMeNow Usage Scenario
Web Surfer
Corporate web server,
ViaScribe and Transcription
Server generates the
transcription, ViaScribe edits the
transcription,
Roadmap to CMN a corporate
website, sample website,
administrator

Figure 3



A user
comes acro
ss uncaptioned audio or
video on an enterprise

website. If the site is
enabled with CaptionMeNow, that user presses a button, that then routes the audio
information to the most suitable, available, and cost
-
effective option for captioning. (This can
be
Via
Scribe™

speech recognition directly, or a person using
ViaScribe™

that “shadows” the
original audio using their own trained speech recognition models
, s
ubject to ava
ilability and
cost sensitivity
.
) The captioned output is then edited to an acceptable level

of accuracy. (The
skill required for editing, however, is considerably less sophisticated


and therefore less
expensive
-

than the skills required by stenography.)

The CaptionMeNow routing
options

are
presented in Figure 4.







ViaScribe


and CaptionMeNow Solution Scenarios


IBM Confidential

Page
10

of
13











Figure 4:






Tran
s
formMeNow


IBM can also leverage the CaptionMeNow outputs for other solutions, referred to as
TransformMeNow.
Withi
n CaptionMeNow, users can press a button that enables more

cost
-
effective caption
ing solutions

than the current approaches predicated entirely on manual labor
of highly trained stenographers
,
by leveraging advanced technology with human editing.
TransformMeNow offers additional capabilities, such as translation and summarization. Users

can then press a button to request that an English video be captioned in Spanish
. Automated
technologies provide the bulk of the solution, supplemented by human editing.



ViaScribe


and CaptionMeNow Solution Scenarios


IBM Confidential

Page
11

of
13












Solution Scenarios


There are several dif
ferent scenarios for implementing
ViaScribe™

and/or CaptionMeNow for
consideration. For ease of understanding
of
the
various
options we have broken
them
down
to
two

scenarios
:




Turn
-
Key Captioning Services


In this sce
nario, an enterprise

contract
s

with IB
M to
provide end
-
to
-
end

captioning
services
.

IBM wo
uld receive the audio/video
,

and use
ViaScribe™

and

supplemental editing to achieve an agreed
-
upon level of accuracy.
This
c
ould
include the existing video as well as any new video
.





In
-
House Captioning S
ervices


This is a scenario where
an enterprise

contracts with IBM to implement

ViaScribe™

on

site
,

and train
the enterprise

to
use
ViaScribe™

to caption

existing
video as well as any new video

that gets created.


The In
-
House captioning option is more cos
t effective with large volumes of audio
data to be captioned.



The In
-
House Captioning Services include the following:


o

Software Licensing Fees (One
-
Time Charge)

o

Annual Software Maintenance Fees (
annual license fee for software
maintenance/periodic functi
onal releases,

etc
.
)

o

Planning: Solution Requirements, Reviewing Business & IT
environments, User Profiles, Testing, Training and Support Plans

o

Installation, Configuration and Enrollment (Training Speakers)

o

Training Material Creation & Delivery (Train
-
the

Trainer Model
)

(Assumes training & skills transfer for 5 people): End User, System
Administrators, Client Support Help Desk or On
-
Line Support Line)

o

Testing & Customer acceptance

o

Post
-
installation support (for 2 year period up to 80 hours)


Summary


ViaScribe™

and CaptionMeNow offer efficient
,
cost
-
effective approach
es

to managing
accessibility requirements of audio and video materials. These offerings leverage IBM’s
longstanding expertise in speech recognition technologies,
enab
ling a semi
-
automated
system that takes maximal advantage of services that can be managed primarily through
speech automation, supplemented with human assistance (such as editing) where necessary.
IBM can offer different models of services and technology p
rovision.


An enterprise
can license the
ViaScribe™

tool, and then transcribe or caption large volumes
of audio data in
-
house. IBM can also provide CaptionMeNow services, whereby audio and
video
materials of the enterprise

are presented to IBM as an on
-
de
mand captioning provider.

ViaScribe


and CaptionMeNow Solution Scenarios


IBM Confidential

Page
12

of
13












Appendix
A



Software
Supp
orted

ViaScribe is compatible with Microsoft Windows XP+ and Internet Explorer
V6.0+, it requires 2 G
B

of disk space, and 512MB RAM, 2.0 GHz

processor


File Formats Supported

(Note
:

ViaScribe, the Web Document, and Real Player
share the WAV audio recordin
g and
the contents of the
media
folder. To make the file dependencies clear, they are mentioned
repeatedly.
)




ViaScribe files:


o

XML
-

Editable transcript file

o

WAV

-

Audio recording of the lecture.

o

Media

folder: contains Microsoft® PowerPoint® slides cap
tured as jpegs.





Web Document files:


o

filename
_frames
.html
-

the launching
-
point for the entire web document
(points to the other .html files below).

o

filename
_media
.html
-

the audio/video frame of the web document.

o

filename
_text
.html
-

the timed text fram
e of the web document

o

filename
_thumbn
.html
-

the Microsoft® PowerPoint® thumbnails frame of
the web document.

o

media

folder: contains Microsoft® PowerPoint® slides captured as jpegs.

o

WAV

-

Audio recording of the lecture.





Real Player files:


o

SMIL

-

(Prono
unced "smile.") The launching
-
point for the entire Real
Player presentation (points to the RT, RP, and WAV files below).

o

RT

-

Real Text captions file.

o

WAV

-

Audio recording of the lecture.

o

RP
-

RealPix file. Tells Real Player which images (screenshots
) to display
in the synchronous multimedia transcript and when.

o

media

folder: contains Microsoft® PowerPoint® slides captured as jpegs.



ViaScribe


and CaptionMeNow Solution Scenarios


IBM Confidential

Page
13

of
13











Appendix
B



Application Archite
cture




Live or pre
-
recorded
media

Steno

Output formats: XML, SMIL,
SMI, HTML

Editing/Annotation

ViaScribe


components

ASR

Real
-
time display

Training

Output user interfaces:
ViaSc
ribe

ⰠI畳t潭⁡ 灳ⰠI䑁

䥮f敧ra瑩o渠睩瑨t
䵩cr潳潦璮t䵩cr潳潦璮t
mow敲moi湴꺮

slid敳