Large dataset processing in the Cloud - Cyfronet

gasownerΔιαχείριση Δεδομένων

31 Ιαν 2013 (πριν από 4 χρόνια και 5 μήνες)

595 εμφανίσεις

Large dataset processing in the
Cloud







Kevin Glenny and
GridwiseTech team

Simplified data oriented system

Internal or
external

data sources

applications
working on data

IT systems are constantly growing

Increased number

of users

Increased number

of applications

Increased amount

of data

IT systems are constantly growing

Infrastructure

bottleneck

Example

Electronics manufacturer

24/7 production

Report computation too long


for decision making

2.5 million transactions daily

4TB data to manage

What is Cloud computing?

„Transparant access to

capabilities using a

pay
-
per
-
use

business model”

Benefits:


Dynamic scaling


Pay
-
for
-
use


Off
-
shored administration


What are the delivery models?

SaaS (Software as a Service)


SalesForce.com, 63,00 clients

PaaS (Platform as a Service)


Google App Engine (2008), Microsoft Azure
(2008)

IaaS (Infrastructure as a Service)


Amazon Elastic Compute Cloud, 8.2 million
instances launched since 2006

Application data processing

Database sharding (MySQL,

postgreSQL etc.)


NoSQL (Google's BigTable,

Amazon's Dynamo etc.)


Data
-
grid (GigaSpaces XAP, Oracle
Coherance, InfiniSpan etc.)


Data
-
grid and sharding in the Cloud



All data processing and persistence

in the Cloud

Achievements:


Near real
-
time


Dynamic scaling (application

and resources)


Pay
-
per
-
use


Reduced administration


HA


Remaining issues

Getting large datasets in and out of the Cloud


Bandwidth limited client side


Resort to mailing hard drives!

Performance
-

2 to 50% slow down

Data security/privacy
-

trust

SLAs


plan for the worst

Conclusions

Data oriented systems datasets grow causing
bottlenecks

Datasets in the Cloud can be processed
using scalable technologies

Challenges remain

Main


how to get the data to the Cloud?