Cloud Computing and Open Source

toadspottedincurableInternet and Web Development

Dec 4, 2013 (3 years and 8 months ago)


Cloud Computing and Open Source

With the advent of Web 2.0 and
Software as a Service
, cloud computing has come into vogue.
Cloud computing

has become
synonymous with
providing services anywhere anytime with the basic requirement being access to the internet
. As a
new model, cloud computing promises to make any online service available without a large upfront

investment in infrastructure.
The economics of running a full infrastructure changes dramatically since you only
pay for what you use
. And you can
provision capacity for only what you require at any given time.

Many companies who have supported their own
needs for large
scale, distributed computing are beginning to export their
technologies and services to support others. Companies with a lot of experience in large
scale computing using open source
software, such as Amazon and Google, have developed offeri
ngs in the cloud computing space. Other players are beginning to
make an impact as well, including Hadoop, IBM Blue Cloud based on Hadoop, 10gen, and Eucalyptus.

What is Cloud Computing?

Cloud computing blends Internet
based IT infrastructure together with

the applications and services that can be delivered over
its resources.

Cloud computing consists of computing, networking, and storage resources used to power services, as well as
combinations or mashups of services, that previously had been expensive, i
mpractical and even impossible to provide.

There are three types of cloud computing options available today:

Virtual infrastructure provisioning

Application development and delivery

Building your own cloud from scratch, using your own storage, processing
, and networking resources

Infrastructure provisioning is the most flexible option because it provides pure computing resources such as CPU,
bandwidth and storage. A good example of such a service is Amazon's Elastic Compute Cloud (EC2). The user has
ete control of these resources and what they do with it.

Application development and delivery, the second option, is a little less flexible for the user but is much less complex to s
et up
and start using right away. A good example of this kind of cloud com
puting service is Google's App Engine (GAE), which
provides CPUs, limited bandwidth and limited storage along with a pre
defined web application framework. The user doesn't
have any control over the security and physical infrastructure but can run their we
b applications while the service provider
takes care of scaling, performance and management of the infrastructure.

The third option involves building and managing your own cloud using open source software and tools such as Hadoop. You
have absolute and com
plete control over what you provision, but you must provide the knowledge and skills level required to
optimize your resources yourself.

Open source is important in all aspects of cloud computing. It is used to build the core of the "cloud" and its service
s. Linux is
the operating system of choice for both physical and virtual machines in the cloud. Furthermore, open APIs and open source
toolkits are available to interface and interact with cloud computing at all levels. Python, PHP, Ruby, Java APIs provide

web applications with access to the management services needed to control your resources in the cloud.

Building Your Own Virtual Infrastructure: Amazon Elastic
Compute Cloud

Amazon Elastic Comp
ute Cloud

(EC2) provides infrastructure and compute resources for web applications. EC2
functionality is accessible through web service interfaces, allowing you to configure, monitor your computing resources and
provision capacity almost instantaneously.
EC2 is built to fully support open source software and web applications. EC2 is
based on Amazon's Xen
enabled Linux kernel and any operating system that can run on top of Xen is supported.

EC2 users can load custom application images on as many virtual sys
tems as needed. Security and network access are set up
and configured by the user as needed for their application. EC2 provides APIs for programmatic control of configuration and
provisioning of resources via REST and SOAP protocols. The advanced user can
create their own Amazon Machine Images
(AMI) which package custom environments appropriate for the application to be deployed. This is especially useful for
developers during integration and testing. EC2 is complemented by other Amazon services such as the
ir Simple Queue
Service (SQS), Simple Storage Service (S3) and Simple DB. All these services are fully usable by open source software
because they implement standard open interfaces.

There are four basic steps to create applications on EC2. First, you crea
te an Amazon Machine Image (AMI) which packages
the operating system, configuration settings, libraries, and applications into one image
everything you need to boot instances
of your application. The AMI can be selected from a library of existing public A
MIs or it can be created from scratch.

you upload your image for storage in the Amazon S3 (Amazon Simple Storage Service) service. Third, you register your AMI
with Amazon EC2. Finally, you are ready to use the Amazon EC2 web service APIs to start
, stop, or monitor one or more
instances of this AMI.

Focusing on Your Application: Google App Engine

Google's App Engine

provides a powerful tool for open source developers to build web applicat
ions based on Python. App
Engine restricts applications to a secure sandbox with limited access to the underlying operating system.

App Engine can automatically ramp up compute resources within predefined quotas to handle spikes in traffic. Each App

user account can run upto three applications with 500MB of persistent storage and enough CPU horsepower and
network bandwidth to support about five million page views a month.

App Engine's Python runtime environment provides API access to an object databa
se, Google Accounts infrastructure,
outbound HTTP requests (URL fetch API) and email services. Developers can also take advantage of frameworks such as
webapp and Django to quickly build web applications running on App Engine.

To create an application for
App Engine you first need to download the App Engine software development kit (SDK). The SDK
provides a web server environment that emulates all of the App Engine services locally on your computer and enforces the
restrictions placed on your application by

App Engine's secure sandbox. Next, you create the application code as well as
configuration files and static files necessary for your application. Using the upload tool included in the SDK, you login wit
h your
Google account to upload your application. Yo
u can manage your application, browse the datastore, and view log files using
the web
based Administration Console provided by App Engine.

Virtualizing Your Own Infrastructure: Open Source Cloud
Computing Projects

There are several open source initiatives
which are looking at various implementations of cloud computing. Hadoop,
Eucalyptus, and 10gen look the most promising.


is an open source Java software framework for running data in
tensive distributed applications on large
clusters of commodity computers. Hadoop was inspired by Google's MapReduce and the Google File System. IBM's cloud
computing project, Blue Cloud, uses Hadoop technologies. The framework's major components include a

distributed file
system (HDFS) and the map/reduce engine (Job Tracker, Task Tracker). Hadoop is optimized for highly parallel, data intensive
batch operations.

Hadoop is a top
level Apache Software Foundation project supported by Yahoo. HP, Intel, and Yah
oo recently announced a
project that will create a global cloud computing research testbed leveraging Hadoop.


(Elastic Utility Computing Architecture for Linking Your Program

To Useful Systems) is an
open source cloud computing infrastructure based on Xen, implemented using commonly available Linux tools and
web services technologies. It was developed at University of California Santa Barbara to simulate a cloud computing
form for research and testing. Interfaces to popular commercial clouds such as Amazon EC2 are being developed
to support research.


is an open source web application Platform
Service (PaaS)

technology that helps developers focus on
building application functionality instead of being sidetracked by scalability, management and infrastructure concerns. 10gen

provides an application server (Appsrv) supporting JavaScript and Ruby, an object datab
ase (MongoDB), a virtual file system
(GridFS), Javascript libraries (CoreJS) and an application and resource management system.

10gen is a new player that is remixing ideas from Google App Engine. It promises to broaden the appeal of cloud computing
ation development to a wider range of developer communities. At the same time it provides the tools to craft your own
cloud service.


Cloud computing is a resource sharing model of development and deployment for web applications
. A recent market
study by Merrill Lynch

predicts a big shift to cloud computing in the next five years, predicting that the global market for cloud
computing will grow to $95 billion and represent 1
2% of worldwide software deployment. In this article we have provided an
overview of some major cloud computing solutions available today and their relationships to open source. The next article in
this series will take a detailed look at how to use open s
ource to deploy web applications using each of the major cloud
computing solutions.