Into the Blue: Streaming Data Processing in the Cloud

pullfarmInternet και Εφαρμογές Web

3 Νοε 2013 (πριν από 4 χρόνια και 4 μέρες)

74 εμφανίσεις

Abstract
:

Into the Blue: Streaming Data Processing in the Cloud

Christopher Alme, Christopher Nunu, Dennis Qian, S
tanley Roberts and Stephen Wong

Rice University

(
swong@rice.edu
)


Most cloud data processing utilizes a “batch
-
mode” where processing requests are submitted and
results returned. The elasticity of the cloud is utilized to scale the system to handle larger or smaller
numbers of processing requests. Very few cloud applic
ations are designed to handle continuous
streams of data [See, for instance, refs. 1, 2, 3] and none can handle multiple, independent data inputs
with multiple independent outputs. Existing systems tend to be based on MapReduc
e/Hadoop,

require
custom pr
ogramming for each application and are difficult to modify, reconfigure, extend or re
-
use in
new situations.
To address these issues, w
e will present the preliminary implementation of an Azure
cloud application that features multiple simultaneous, indepen
dent, real
-
time input and output
endpoints. The system also designed to use drop
-
in processing modules that the user assembles into a
processing graph to perform the desired operations. A processing graph may have multiple input
endpoints of differing

types to simultaneously gather information from a wide variety of sources. The
user assembles the graph by selecting a module from a library and specifying the desired input and
output connections. Process allocation, connection mechanics and data sy
nchronization are handled
transparently. Multiple independent output endpoints are also supported, enabling the user to
simultaneously extract different processing results from a single processing graph. The user can modify,
reconfigure and extend the

graph without re
-
deploying or even stopping the system.

System
configurations are stored in the cloud and cloud
-
enabled fault tolerance is supported.

Applications for
this technology include environmental sensor monitoring, real
-
time aircraft/vehicle

tracking, highway
monitoring and process monitoring
.

1
STREAM project (
http://www.streamproject.eu
)
: query
-
based real
-
time processing

2
HStreaming

(
http://hstreaming.c
om
): Hadoop in the cloud

3
Logothetis

and
Yocum
,

“Ad
-
hoc Data Processing in the Cloud”
(Proc.

VLDB

Endowment

,
1
:
2, pp. 1472
-
1475, 2008
,
http://cseweb.ucsd.edu/~kyocum/pubs/mortar_vldb08.pdf

): MapReduce
-
based system


Biograph
ies
:

Christopher Alme, Christopher Nunu, Dennis Qian and Stanley Roberts are the Fall 2010 COMP410 team
at Rice University. COMP410 is an undergraduate software en
gineering
course
that utilizes a
“discovery mode” pedagogical style where the class works as a team to design and implement a cutting
edge enterprise
-
class software project.

Alme, Nunu, Qian and Roberts are all CS majors

headed off to the Azure Cloud an
d Identity groups

at
Microsoft, Mobile D
evelopment at Google, and
the
OpenWorks team at Halliburton, respectively.

Stephen Wong

was originally trained

in physics (Ph.D.

in semiconductor physics

and

Howard Hughes
Fellow
, M.I.T., 1998), including

a year at B
ell Labs
with future Nobel laureate and Energy Secretary,
Steven Chu
. After a stint
at
Hughes Research Laboratories,
in 1993 he switched to academia and
software consulting for Eastman Kodak. Since 1998, he has been completely devoted to CS teaching and

research
, the last 10 years at Rice University.