Cloud Computing and an intro to The Inudstry/University - CISC

earsplittinggoodbeeInternet and Web Development

Nov 3, 2013 (3 years and 10 months ago)

90 views

Director, NSF Planning I/UCRC for Spatiotemporal Thinking, Computing and
Applications

Co
-
Director,
Center of Intelligent Spatial Computing for Water/Energy
Sciences

Associate Professor, Geography and GeoInformation Science

George Mason Univ., Fairfax, VA, 22030
-
4444

http://cisc.gmu.edu/

http://cpgis.gmu.edu/homepage/



Page

2

What is Cloud Computing

Why Cloud Computing

What are the Issues

Cloud Computing Future

Outline

Cloud Computing Research

Background

Background I

Background II

Background III

What if we can


Integrate all geospatial data, information,
knowledge, processing in a few minutes


Generate and send the right information in real time
to the people including decision makers, first
responders, victims

This dream requires a computing platform that


can be ready in a few minutes


can reach out to all people needed


only cost for the amount of computing used


won

t cost to maintain after the emergency
response

This requires spatiotemporal thinking and computing,
and was somehow envisioned by cloud computing




Cloud Computing

Cloud computing is a model for enabling
convenient, on
-
demand

network access to a shared pool of configurable
computing resources (e.g., networks, servers, storage,
applications, and services) that can be
rapidly provisioned
and released

with minimal management effort or service
provider interaction. This cloud model promotes availability
and is composed of five essential characteristics.

NIST 2010

Cloud Computing

Five essential characteristics, which differentiate cloud computing
from grid computing and other distributed computing paradigms
:

o
On
-
demand self
-
service
.

provision computing capabilities as
needed automatically.

o
Broad network access
.

available over the network and accessed
through standard mechanisms.

o
Resource pooling.

computing resources are pooled with
location independence

o
Rapid elasticity
.

Capabilities can be rapidly and elastically
provisioned.

o
Measured Service.

automatically control and optimize resource

NIST 2010

Page

8

Cloud Computing Service Model


On
-
demand sharing physical
infrastructures



Users: System Administrator


Platform for developing and
delivering applications,
abstracted from infrastructures



Users: Developer



Almost any IT services



Users: End
-
user

Page

9

Clouds Type

Commercial Clouds

Private/Community Clouds

Hybrid Clouds

Commercial clouds and private
clouds: EC2 Vs Eucalyptus, EC2
Vs OpenNebular


Build by commercial or open
-
source
Solutions

Page

10

Framework

Page

11

Why Cloud Computing



Flexible price model: Pay
-
as

you
-
go



No ongoing operational expenses



No upfront capital



On demand scale up and down

Economics

Elasticity




Accessed from anywhere and anytime


with any device

Self
-
Service

Accessibility

User Perspective



Simpler and faster to use cloud service



Minimum interaction with the service


provider



Page

12

Improved Utilization

Economics


Easier for application vendors to reach new customers


Lowest cost way of delivering and supporting applications


Ability to use commodity server and storage hardware


Ability to drive down data center operational cost


Server and storage utilization increased from 10
-
20% to 70
-
80%



Why Cloud Computing

Page

13

What are the issues



Many customers don

t wish to
trust their data to be in

the
cloud




Data must be locally retained
for regulatory reasons




Cannot easily switch from existing legacy applications


Equivalent cloud applications do not exist




Virtualized computing power
and network


Not suitable for real
-
time
applications

Page

14



What if something goes wrong?



What is the true cost of
providing SLAs?





Customers want intuitive GUI, open, standardarized, interoperable APIs



Need to continuously add value




SaaS/PaaS models are challenging



Much lower upfront revenue

What are the issues

Page

15

Cloud Research

Cloud definition,
services

Management

Cloud technologies,
solutions, issues,

cost model

Web application

Big data

HPC applications

General
issues

Cloud
Optimization

Cloud
migration

Future Direction

Across
-
Cloud implementations


Tools and middleware will be available to enable
interoperability and portability across different clouds

IaaS



Become
standardized and
commoditized



Add new utilities
and PaaS
capabilities

SaaS


Integrate with
applications
utilizing mobile
devices and
sensors

PaaS



Battleground for
determining the
future of Cloud
Computing

Page

17

Virtualization

Web service &
SOA, APIs

World
-
wide
distributed storage
& file system

Parallel & distributed
programming model

Enabling Technology

Architecture

Page

18

VIM (OpenNebula, Eucalyptus,CloudStack)

Hypervisor

Hypervisor

Hypervisor

Hypervisor

Virtual Machine

Physical Infrastructure

Virtual Infrastructure Middleware (VIM)


VM lifecycling


Scheduling & monitoring


Networking

Cloud Computing for
GIScience

Outline

1.
Background

2.
Case Study 1: Web application

3.
Case Study 2: Big data application

4.
Conclusion

Background

Many scientific problems are concurrent, data and
computational intensive


Case 1: Web application (GEOSS
Clearinghouse)


GEOSS Clearinghouse


Metadata catalogues search facility for the
Intergovernmental Group on Earth Observation
(GEO).


EO data, services, and related resources can be
discovered and accessed.



GeoCloud I


Governmental cloud initiative


Common operating system and software suites


Deployment and management strategies


Usage and costing of Cloud services


Security (certification and accreditation)


Amazon EC2 Cloud

EC2

Instances

XEN Virtualization

Physical Server

Simple
Storage

Service
(S3)

Elastic
Block
Storage(E
BS)

Hosting of Virtual
machine
images(AMI)

Hosting of Virtual
machine
images(AMI)

A

Web service that provides resizable compute capacity in the cloud


Deployment of GEOSS Clearinghouse on
EC2 Cloud


Performance in the EC2 Cloud


Lucene (used for indexing while
searching) might be the reason
behind the virtual CPUs under
-
utilization.


0.38s : 0 record


3s: 26, 130 records



MapReduce for indexing


Spatiotemporal indexing

Only One Core of the VM is utilized

Usage/Costs in EC2 Clouds

Month

(2011)

Total(Dollar)

Amazon EC2

Amazon
EBS

AWS
Data
Transfer

Hours

Costs

July

113.73


320

108.80

4.64

0.01

August

278.74

758

257.72

20.99

0.03

September

267.25

720

244.80

22.4

0.06

October

276.82

744

252.96

22.21

1.64

Table 6. Monthly Costs of AWS services


Usage chart from July to Nov, 2011

Monthly cost from July to Oct , 2011

Case 2: Big data
-
> Climate@Home

Input: 150 MB

Output: 2G


1 Year, 1
Scenario

100 Year, 1000
Scenarios

10 Year, 100
Scenario

Input: 15 G

Output: 750 G

Computing time per
scenario: 45 minutes

Computing time per
scenario: 4 days and 16
hours

Run on Community Clouds(NASA Eucalyptus)



Scenario: 300 model configuration


VM: 4


8 (20 CPU Cores, 64 GB memory)



Start date: Dec 1949



End date: Jan 1961

Model Simulation Information

Cloud Computing Information



Platform: Eucalyptus



VMs: 4


8 (20 CPU Cores, 64 GB memory)


Task scheduler: Condor


System CPU Utilization


Provides high
-
capacity and scalable computing,
storage and network connectivity for GIScience
applications


Create new opportunities for national,
international, state, and local partners to
leverage research easily

Conclusion

Acknowledgements


Collaborators: Doug Nebert, Myra Bambacus, Yan Xu,
Daniel Fay, Karl Benedict, Songqing Chen



Team: Qunying Huang, Kai Liu, Jizhe Xia, Zhipeng Gui,
Chen Xu, and all CISC members


I/UCRC for Spatiotemporal Thinking,
Computing, and Applications (STC)

Chaowei Yang, Director, GMU Site

Keith Clarke, Co
-
Director, UCSB Site

Peter Bol, Co
-
Director, Harvard Site

Industry/University Cooperative Research
Centers: National Scope, Impact

59 Centers

172 I/UCRC Sites

Plus Participating
International Sites

ENG
CISE

Over 760 Member
Organizations (2010)

Academic
-
Industry partnerships meeting industry sector research needs

Planning Grant Meeting with University Partners,
Students, Center Evaluator, Prospective Members
and NSF I/UCRC Program Directors

Step 6

Step 6

Step 6

LOI

Step 6

Step 6

Step 6

Planning Grant
Proposal

Events Occuring at the
Meeting

Day 1

Events Pre
Meeting

Events Post
Meeting

Day 2

I/UCRC Planning Process

Purpose: Maximize the potential for a successful Center
Proposal.


33

Successful Proposal &
1
st

IAB Meeting

LOI, Planning Grant
Pending or Awarded,
what now?

Getting the proposal
ready to go!

Planning Meeting
Approaching…

Objective

1.
Capture and advance human intelligence

2.
Enable and improve machine processing and
applications

3.
Start from geographic science and technologies
for spatiotemporal issues and solutions

4.
Expand to other domains, such as Earth
science, political science, economics, biology,
public health, energy and environment, K
-
16
education, and others in the future if things
went well

Target

1.
Improve the US and international spatiotemporal research
infrastructure base;

2.
Advance the intellectual capacity of the future science, engineering
and workforce;

3.
Establish the national and international leadership in
spatiotemporal thinking, computing, and applications.


Approaches

1.
Explore new solutions to our 21
st

century challenges, such as
natural disasters, by investigating the spatiotemporal principles
within the challenges with national and international leaders.

2.
Advance human knowledge and intelligence by combining
spatiotemporal principles and computing thinking to form
spatiotemporal thinking as a new methodology and innovative
thinking process to enable physical and social science discoveries,
and to conduct the next generation computing.

3.
Improve interoperability and infrastructure building using the
spatiotemporal methods formed to enable the discoverability,
accessibility, and usability of big data.

4.
Facilitate better understanding of physical and social sciences
through phenomena simulation and visualization improved by
spatiotemporal thinking.

5.
Developing new spatiotemporal computing products in
collaboration amongst the center

s members to establish national
and international leadership in the field, and transferring the new
technologies to companies to improve center members


efficiency
and competitiveness.


NSF I/UCRC Typical Organization

Gray 1998

To ensure the success and
sustainability of the center.



University Management includes VP
for Research, Dean for COS, and GGS
Chair



Science Advisory Committee includes
international renowned scientists from
industry, agencies, and academia



Industry advisory board comprises
sponsor representatives



Research programs will be dynamic
according to progress in the center life
cycle



Each project will include a PI,
IAB/sponsor member, and students
participating in projects



A center director assistant or
operational director will be assigned at
each site



Membership and Benefits

1.
Free access to R&D results worth 10+ times
by investing $50k+ each year.

2.
Increase company and agency


competitiveness through deliverable
oriented partnership with academia and
agencies.

3.
Access to student talent cultivated through
the collaborative research and development
projects.

4.
Collaborate in an academia, government,
and industry environment.



New Proposals

IAB

Portfolio
Engagement

IAB

Portfolio
Engagement

Industry/Agenc
y Advisory
Board Needs

Center Site
Strengths

Review

Discuss

Adapt

L.I.F.E.

Biannual IAB Meeting

Biannual IAB Meeting

L.I.F.E

Review

Discuss

Adapt

Select

The co
-
operative
process rapidly
aligns the
Center’s
Portfolio with
Member Needs
and
University
strengths

The IUCRC Research Portfolio Cycle

L.I.F.E.: Level of
Interest and
Feedback
Evaluation Form

Advancing spatiotemporal computing to enable 21
st

century geospatial sciences and applications

Experimental Plan,
Industrial Relevance and Appropriateness
for the
center
: With the massive amount of spatiotemporal data
now available, novel, more efficient approaches for data modeling
and management are needed to enable 21
st

century geospatial
sciences and applications. This project aims at developing the
theoretical and technical foundations for spatiotemporal
computing with a focus on exploiting spatiotemporal principles to
build new approaches for data and scientific modeling, indexing,
search, and retrieval.

Objectives
: Develop a novel approach for spatiotemporal
computing. This is a four step approach including 1) design and
implementation of data structures; 2) algorithms (e.g. indexing
methods); 3) spatiotemporal enabled optimized ontology and
reasoning methods and 4) search strategy.

Team:
PIs:

Dr. Yang, Dr. Clarke, Dr. Bol, and interested
members from agencies and industry, one graduate student at
each site. Dr. Rezgui will work as the manager and integrator at
the GMU site.


Sample Projects

Four Dimensional space time visualization
of tracked movement

Objective:
Better visualizing enormous quantities of tracking data
collected through innovative geospatial technology developing/using
a host of new display techniques have emerged from computer
vision, graphics and information visualization that show promise for
space
-
time data.

Approaches
: or this research project, visualization environments
(software programs, tools, code libraries and standards) will be
combined with display environments (flat, stereo, augmented virtual
and immersive virtual) such that moving objects and fields can be
explored.

Team
:
PI: Keith Clarke and Michael Goodchild at UCSB, Phil Yang
at GMU, two students with one from each side; Prof. Janowitz will
coordinate the research and development from the UCSB site.

Sample Projects

Temporal Gazetteer and Place Name
Resolution Service with Temporal
Awareness

Objective:
develop a new temporal gazetteer and place name
resolution service with temporal awareness that (1) compiles and
integrate data stored within existing gazetteer systems; (2) enables
new crowd
-
sourced gazetteer entries through a standardized
schema; and (3) provide an Application Programming Interface
(API).

Approaches
: 1) Design and implement a comprehensive gazetteer
structure; 2) Integrate information from multiple existing gazetteers;
3) Build a web
-
based entry system to allow crowd
-
sourced
contributions; 4) Design and implement a conflation rule
-
base to
resolve duplicated entries; 5) Publish a user interface for crowd
-
sourced quality assessment, authorized adjustment of gazetteer
entries, and iterative improvement of conflation rules; 6) Build a
temporal place name resolution service accessible through API and
an online user interface.

Team
:
PI: Peter K. Bol, two technical staff, Dr. Wendy Guan will
coordinate the research and development at Harvard University.

Sample Projects

Project objectives:
SCC is to
develop a middleware that can
best arrange and optimize the computing resources and task
scheduling by fully considering the spatiotemporal patterns
of data, users, cloud computing resources, and geospatial
science phenomena
. Such an effort would greatly help to
construct a better spatial cloud computing (SCC) platform (Yang
et al. 2011b) and geospatial cyberinfrastructure (Yang et al.,
2010a).

We will conduct extensive experiments to explore the
spatiotemporal patterns involved in the forecasting of land and
atmospheric phenomena, e.g., air quality. We will also
experiment with spatiotemporal patterns of users and computing
resources, including computing nodes, network and storage.
These experiments would provide basic guidelines on how to
design the computing platform architecture, select and arrange
the geographically distributed computing resources to handle the
computations, how to organize and store the data for fast model
initialization and output delivery.

Team:
PI: Drs. Yang, Houser, and two students



Spatial Cloud Computing (SCC) Middleware


Sample Projects

Discussion

Relevance

Potential Projects

Collaboration for customized project