Intro to Cloud Computing

meatcologneInternet and Web Development

Nov 3, 2013 (3 years and 9 months ago)

64 views

Intro to Cloud Computing

Source: http://www.free
-
pictures
-
photos.com/

Cloud Computing


No longer the next big thing


the current big
thing


Began in 2007


IBM and Google “Blue Cloud”


Name cloud inspired by cloud symbol
representing internet in diagrams





What is Cloud Computing?


But what is it?


Everyone has a different opinion on what it is


Is it trendy?


“The computer industry is the only industry that is
more fashion
-
driven than women’s fashion”


Larry Ellison




Questions to answer


What clouds have you used today (yesterday)?




What is a cloud?

Applications


What does cloud computing actually do?


Consider applications you may currently be
running on laptop, desktop, phone, server


Cloud has them also, or can potentially bring them
to you


Brings applications, views, manipulates, shares
data

Cloud Computing


Everyone has an opinion on what to use a
cloud for


Applications on the internet


email, tax prep


Storage for business, personal data


Web services for photos, maps, GPS


Rent a virtual server, load software on it, turn it on
/off, clone it if sudden workload demand


Store, secure data for authorized access (really?)


Use a platform including OS, Apache, MySQL,
Python, PHP

Cloud Computing Characteristics


So what are its characteristics?


AKA On
-
demand computing, pay as you go,
software as a service, utility computing


Typically access through the internet


Distributed and highly parallel approach


Usually costs $$$, but cost
-
effective


Virtualization


Elastic


Replication, replication, replication …





Cloud Components


3 components


Clients


Mobile, thin, thick


Datacenter


Distributed servers

Data Center


Data Center


Collection of servers


In large room in your building, across world


Distributed Servers


Distributed data centers


geographically disparate


Robust if failure


Dynamic datacenter so


can increase as needed




Clouds


Allow access to applications other than on
local computer or internet connected device


Instead, company hosts your application
-

Advantages?


No more licenses, service packs, etc.


Less hardware, etc.


Can access anywhere

but



Works only as long
as have internet connection


Lose control


can’t optimize

Cloud Computing Characteristics


Cost
-
effective


start
-
up company to use a cloud instead of buy
computers, hire IT people, etc.


Elastic Computing


company has a temporary surge in business, use
cloud instead of invest in new computing
equipment

Virtualization


What is virtualization?


Software implementation of a computer that executes
programs like a physical machine


Installation of one machine runs on another


All software in the cloud runs on a server within virtual
machine


AMD
-
Virtualization and Intel Virtualization Technologies
(IVT) extensions made it doable





Virtualization


Virtual Machine VM


isolated guest OS installation within a normal host OS


Object of deployment


Virtual Machine Image




Static data containing software (OS, apps, data files) the
VM will run once started


Used to create VM instance


Typically stored on disk


Virtual Machine Instance




Running virtual machine


Started from image, runs OS and processes, computes, etc.


Dynamic object you can interact with


Virtualization


Hypervisor


Virtual Machine Manager VMM


One level higher than supervisory program


Installed on server hardware


Easily create copies of existing environments


Can exist on same servers or different machines


Single server multiple OS instances, minimize CPU idle
time




Hardware

Operating System

App

App

App

Traditional Stack

Hardware

OS

App

App

App

Hypervisor

OS

OS

Virtualized Stack

Elastic
-

Cloud Computing
Characteristics


Use what you need


Hardware, platform (OS), software


Cloud infrastructure used depends on application


Massive number of servers needed

OR


Only need one server to run small job


Company has a temporary surge in business, use cloud
instead of invest in new computing equipment


Company has a decline in business, don’t have to maintain
unused equipment

Cloud Computing Characteristics


Redundancy


Redundancy is the key to the success of clouds


Google approach


cheap components that fail, so
replicate all processing and storage



What Motivated Cloud Computing

Initial motivation:


Web
-
scale problems


data intensive

Solutions:


Large data centers

How to access:


Highly
-
interactive Web applications (thin client)

Next Step:


Different models of computing


Data Intensive
-

How much data?


CERN’s LHC will generate 15 PB a year


Facebook


2.5 Pb, growing at 15TB per day in 2012?



25 TB



1000 times volume of mail delivered by USPS


Sloan Digital Sky Survey


0.5 PB /month in 2015


“all words ever spoken by human beings”



~ 5 EB


10
18




Solution: Large Data Centers


Although Google famous for innovating web
searching, Google’s architecture as much a
revolution


Instead of few expensive servers, use many cheap servers
($5000 instead of $100,000)


1/2M servers in ~ 12 locations)


With thin, wide network


Cloud


robust and self
-
healing


Uses a lot of power


Need cheaper power solutions


The Result:

Different Computing Model

“Why do it yourself if you can pay someone to do it for you?”

IaaS


Infrastructure as a Service (IaaS)


aka Hardware as a
Service (HaaS) and Utility computing


Why buy machines when you can rent cycles?


Utility computing billing


based on what used


Provides basic storage and compute capabilities as
server


Servers, storage systems, CPU cycles, switches,
routers, etc.


Ex:
Amazon’s EC2


IaaS


Does not provide applications to customers
(SaaS and PaaS do)


Saves cost of purchasing


Infrastructure can be scaled up or down


Multiple tenants can use equipment at the
same time


Device independence


access systems on
different hardware


Low barriers to entry, example?


e.g.
Samba



PaaS


Platform as a Service (PaaS) aka cloudware


Supplies all resources needed to build apps and services
without having to download or install software


Provides a computing platform and solution stack


Customer interacts with platform through API


Layer of software encapsulated provided as service to
build higher level services



Ex:
Google Apps Engine





PaaS provides


Development teams across world to work
together


Merge web services from multiple sources


Cost savings from using built
-
in security,
scalability and failover


Cost
-
savings from using higher
-
level
programming abstractions

SaaS


Software as a Service (SaaS)


web based
applications


Software available on cloud for use


Application hosted as a service to customers who
access via the internet


Single instance runs and services multiple end
users


Ex:
salesforce.com
, Gmail


SaaS


Pros/Cons


Customer doesn’t have to maintain or support SW


Out of customer’s hands when hosting service
changes it


Use software out of box


Instead of just paying for its once, billed


Don’t have to pay as much up front, cheaper more
reliable


Security (SSL used), don’t need VPNs (Virtual private
networks on back
-
end)



Benefits to SaaS


Everyone knows WWW, little training needed


Smaller IT staff needed


Easier to customize


Better marketing by providers, accommodate more


Security (SSL used), don’t need VPNs (Virtual private
networks on back
-
end)


But:


Specific computational need not addressed


may
have to buy own


Lock
-
in


can’t move to new vendor without penalty


Future of SaaS


Move all processing power to the cloud and
carry ultralight input device


Already happening?


E
-
mail


Google Docs


Implications for Microsoft, software as purchasable
local application


Windows Live (Microsoft’s cloud)


Adobe web based photoshop

IaaS, PaaS, SaaS

When not to use a Cloud


Legislative Issues


Laws and policy allow freer access to data on a cloud
than private server


FBI can access data without warrant or owner’s consent


Geopolitical concerns


If in Canada, cannot store data on U.S. cloud


Why?


(because of patriot act…)


What about storing your data on clouds outside of
USA?

Types of Clouds


Public, Private, Hybrid Clouds


Names do not necessarily dictate location


Type may depend on whether temporary or
permanent

Data Bases in Cloud Environments


Based on:

Md. Ashfakul Islam

Department of Computer Science

The University of Alabama


Issues to Consider


Distributed or Centralized application?


How can ACID guarantees be maintained?


CAPS theorem


Consistency, Availability, Partition


Data availability (even if network partition) is achieved
by compromising consistency


Traditional consistency techniques become obsolete


Consistency becomes bottleneck of data
management deployment in cloud


Costly to maintain

Analytical DBs
-

Data Warehousing


Data Warehousing DW
-

Popular application of Hadoop


Typically DW is relational (OLAP)


but also semi
-
structured, unstructured data


Can also be parallel DBs (teradata)


column oriented


Can be expensive, e.g. TBs of data


Hadoop for DW


Facebook abandoned Oracle for Hadoop (Hive)


Also Pig


for semi
-
structured

Evaluation of Analytical DB


Analytical DB handles historical data with little or no
updates
-

no ACID properties.


Elasticity


Since no ACID


easier


E.g. no updates, so locking not needed


A number of commercial products support elasticity.


Security


requirement of sensitive and detailed data



third party vendor store data



potential risk of data leakage and privacy violation


Replication


Recent snapshot of DB serves purpose.



Strong consistency isn’t required.

Transactional Data Management


Needed because:


Transactional Data Management



heart of database industry



almost all financial transaction conducted
through it



rely on ACID guarantees


ACID properties are main challenge in
transactional DM deployment in Cloud.

Relational Joins


Hadoop is
not

a DB


Debate between parallel DBs and MR for
OLAPS


Dewitt/Stonebreaker call MR “step backwards”


Parallel faster because can create indexes


Consistency in Clouds


Consistent database must remain consistent
after execution of successful operations.


Inconsistency may cause to problems


Consistency is always sacrificed to achieve
availability and scalability.


Strong consistency maintenance in cloud is
very costly.

DBs in the Cloud


Slow start for DBs


why??


Considered Scalable Transactions for Web
Applications in the Cloud


Two important properties of Web applications


all transactions are short
-
lived


data request can be responded to with a small set
of well
-
identified data items


Eventual consistency acceptable

Cloud Provider DB Options

Windows Azure

Data Management


Can run SQL Server or another DBMS in a VM
created with Azure Virtual Machines


Free to run NoSQL technologies such as
MongoDB and Cassandra


Running your own database system is
straightforward
-

also requires handling the
administration of that DBMS

Data Management Options


Figure 3: For data management, Windows Azure provides
relational storage, scalable NoSQL tables, and unstructured
binary storage.

Data Management Options


Each of the three options addresses a different
need:


relational storage


fast
access to potentially large amounts of simple typed
data


unstructured
binary storage
.



In all
cases
, data is automatically replicated across
three different computers in
an Azure datacenter


A
ll
three options can be accessed either by Windows
Azure applications or by applications running
elsewhere, such as

an on
-
premises datacenter,

a
laptop
, or

phone
.


Relational Storage


SQL Database


Provides all of the key features of a relational
database management system, including



atomic transactions, concurrent data access by
multiple users with data integrity, ANSI SQL
queries, and a familiar programming model.


If know SQL Server, using SQL Database is
straightforward.


can be accessed using Entity Framework,
ADO.NET, JDBC


SQL Database


But SQL Database isn't just a DBMS in the cloud
-
it's a
PaaS

service.



You control
your data and who can access
it and SQL
Database takes care of the administrative grunt
work



such as managing the hardware infrastructure and
automatically keeping the database and operating system
software up to date.



SQL
Database

provides
a federation option that
distributes data across multiple servers.



Spread data
access requests across multiple servers for
better performance.

Tables


For application that needs fast access to lots
of typed data, it, but doesn't need to perform
complex SQL queries


For storing data, and retrieving it in simple
ways


NOT relational


very scalable, with a single table can hold as
much as a terabyte of data

Blobs


Designed to store unstructured binary data.


Like Tables, Blobs provides inexpensive
storage


Single blob can be as large as one terabyte


Application sees ordinary Windows files, but
the contents are stored in a blob

Amazon

Amazon


Simple Storage Service S3


Low
-
level put/get interface


Store items up to 5GB


AWS MySQL


traditional model (non
-
cloud) on EC2


AWS MySQL/R


durability of the data guaranteed by the
Replication architecture


Application server maintains connection to Master copy
and connections to one DB server


Update transactions handled by Master


Read
-
only transactions issued to DB server associated with
application server



Amazon


AWS RDS


relational database service, implements same as
AWS MySQL


RDS is pre
-
packaged, so users don’t have to worry about
managing deployment of VMs, SW upgrades, etc.


AWS Simple DB


retrieve records based on key values or
ranges on primary and secondary keys


Does not synchronize concurrent read/write access to
different copies of same data


Web service for running queries on
structured
data


Eventual data consistency is maintained data


Does not support SQL


Works with S3 and EC2 to store, process, query



Google

Google
-

App Engine (Megastore)


Google has PaaS strategy


App Engine uses the data engine Megastore


Scalable structured data store


Built on BigTable


Partitioned into space of small DBs, each with own log


Log stored across Paxos cluster (Paxos


protocol for solving consensus in unreliable
network


full ACID semantics within partitions


Adopted a combined Partitioning and Replication architecture


Lower consistency across partitions


3B write, 20B read transactions per day as of 1/11


Tables can be arranged hierarchically


Support for secondary indexes

Google
-

App Engine (Megastore)


3 levels of read consistency


Current


last committed value


Snapshot


value as of start of read transaction


Inconsistent reads


used for cross entity group reads


Updates within entity group


Write updates to WAL of entity group, applies to data


Limited by: log contention
-

one winner, one loser


Paxos accepts limited update rate (10**2 per sec)


Across entity groups


2PC


Support for Backup and recovery


Synchronous replication, snapshots and incremental log backups




Google
-

App Engine


AppEngine supports Python, Java with embedded SQL


Used to support simplified SQL dialect, GQL


GQL


no aggregate functions or joins




AWS MySQL

AWS
MySQL/R

AWS RDS

AWS SimpleDB

AWS S3

Google
AppEng

MS Azure



Business
Model

IaaS

IaaS

PaaS

PaaS

IaaS

PaaS

PaaS



Cloud
Provider

Flexible

Flexible

Amazon

Amazon

Flexible

Google

Microsoft



Web/app
server

Tomcat

Tomcat

Tomcat

Tomcat

Tomcat

AppEngine

.Net Azure



Database

MySQL

MySQL Rep

MySQL

SimpleDB

none

DataStore

SQL Azure



Storage / File
Sys.

EBS

EC2 & EBS

-

-

S3

GFS

Windows
Azure



Consistency

Repeatable
Read

Repeatable
Read

Repeatable
Read

Eventual
Consistency

Eventual
Consistency

Snapshot
Isolation

Snapshot
Isolation



App
-
Language

Java

Java

Java

Java

Java

Java/AppEngi
ne

C#



DB
-
Language

SQL

SQL

SQL

SimpleDB

Queries

low
-
level API

GQL

SQL



Architecture

Classic

Replication

Classic

Part.+
Repl
.

Distr. Contol

Part.+Repl.(+
C)

Replication



HW Config.

manual

manual

manual

manual/autom
atic

manual

automatic

manual/auto
matic



Table 1: Overview of Cloud Services














Link to
paper
by Kossmann

Cloud SQL


Google Cloud SQL


Available


One of App Engine’s most requested feature




Simple way to develop traditional DB driven
applications


Quicker path to jump off App Engine platform


DB import/export so can move existing MySQL DBs to
cloud


Support for both Java JDBC, Python DB
-
API connections,
less code change required


No support for PHP on AppEngine, can put PHP apps in
cloud using Quercus

Google
-

Spanner


Previous complaints


no cross row transactions


2PC too expensive to support because of performance or availability
problems


What is a Spanner?


A
huge
Semi
-
Relational Database


Built on top of Colossus (GFS2)


Seriously, it's huge!


Scales up to millions of machines


Shards across multiple data centers



Data centers across multiple continents


Lock
-
free reads


Externally
-
consistent writes (transactions)


Relational Schema


SQL
-
like query language


Reasonable performance

Google
-

Spanner


A Layered System


Relational


Key
-
Value


Paxos TrueTime Colossus



Google says that the biggest new idea is
TrueTime API


Google
-

Spanner


A table
must
have a primary key (ordered set of columns)


A table must be marked as a directory or be interleaved in a parent
table


Interleaved data is actually attached to a row in the parent table


Data is actually stored as key
-
values (heterogeneous/interleaved)


ON DELETE CASCADE
means to delete when parent row is
deleted

Google
-

Spanner


Lock
-
free Read


Lock
-
free reads using timestamps


Read Transaction System uses latest non
-
blocking timestamp


Special non
-
blocking write transaction