Lec23-GridComputing

longtermagonizingInternet and Web Development

Dec 13, 2013 (3 years and 5 months ago)

125 views



1



CSCE 513 Fall 2013

Lecture 23

Grid Computing

Topics


Evolution of Grid computing

November 18, 2013

CSCE 513

Computer Architecture



2



CSCE 513 Fall 2013

Overview

Last Time


Dally’s lecture from PDF


filehttp
://www.cs.gsu.edu/~
tcpp/resources/India_Denial_1209.pdf


http://
www.csm.ornl.gov/workshops/IAA
-
IC
-
Workshop
-
08/documents/wiki/dally_iaa_workshop_0708.pdf


Readings/Links
for
today


CAAQA


5ed


http
://
net.educause.edu/ir/library/pdf/DEC0306.pdf


http://queue.acm.org/detail.cfm?id=1554608


http://en.wikipedia.org/wiki/0
_(number)


ParLab

Top 500
-

http
://www.top500.org
/
,
http
://www.graph500.org/





3



CSCE 513 Fall 2013

Top500.org


1.
K computer, SPARC64
VIIIfx

2.0GHz, Tofu
interconnect

2.
NUDT YH MPP, Xeon X5670 6C 2.93 GHz, NVIDIA
2050

3.
Cray XT5
-
HE Opteron 6
-
core 2.6
GHz

4.
Dawning
TC3600 Blade, Intel X5650, NVidia Tesla
C2050
GPU

5.
HP
ProLiant

SL390s G7 Xeon 6C X5670,
Nvidia

GPU,
Linux/Windows

6.
Cray XE6, Opteron 6136 8C 2.40GHz,
Custom

7.




4



CSCE 513 Fall 2013

K computer, SPARC64 VIIIfx 2.0GHz,
Tofu interconnect

SPARC64 processors

Tofu ??? interconnect



5



CSCE 513 Fall 2013

The Graph500 List

Data intensive supercomputer applications are increasingly
important for HPC workloads, but are ill
-
suited for



Nov 2011 Top
-

NNSA/SC
Blue Gene/Q Prototype II


4096
nodes


65,536
cores



6



CSCE 513 Fall 2013

Breaks in Computing Paradigms

Invention of Zero, decimal

Abacas

First special
-
purpose/hardwired computers

Von Neumann architecture


stored programmed
concept

Multi
-
tasking/Time
-
shared

Personal Desk
-
top computing

Internet (not so much computing as the new library)

Cluster computing (
beowulf
)

GRID Computing (didn’t really catch on) SETI
etc

Cloud Computing




7



CSCE 513 Fall 2013

Beowulf clusters


Beowulf is a multi
-
computer
architecture

which can be
used for
parallel computations


one server node, and one or more client nodes
connected together via
Ethernet


Parallel Virtual Machine (
PVM
) and Message Passing
Interface (
MPI
)


server controls the whole cluster and is a file server


client nodes in a Beowulf system are dumb, the
dumber the better “doesn't even know its
IP address



Specialized operating system

http://en.wikipedia.org/wiki/Beowulf_cluster



8



CSCE 513 Fall 2013



9



CSCE 513 Fall 2013

Amdahl’s Law
-

Again

.

http://en.wikipedia.org/wiki/File:AmdahlsLaw.svg



10



CSCE 513 Fall 2013

Grid computing overview


G. Scholar

F
Magoules

-

… of grid computing: theory, algorithms
and …, 2009
-

books.google.com The term “the grid”
has emerged in the mid 1990s to denote a proposed
distributed computing infrastructure which focuses
on large
-
scale resource sharing, innovative
applications, and high
-
performance orientation
[Foster et al., 2001]. The grid concept is ...

Overview
of
GridRPC
: A remote procedure call API for
grid computing [PDF] from utk.edu K Seymour,

-

Grid Computing

GRID …,
This
paper discusses
preliminary work on standardizing and implementing
a remote procedure call (RPC) mechanism for grid
computing. The
GridRPC

API is designed to address
the lack of a standardized, portable, and simple
programming interface. Our initial ...




11



CSCE 513 Fall 2013

CAAQA5ed
-
Copyright © 2012, Elsevier Inc. All rights reserved.

Introduction from CAAQA5ed

Warehouse
-
scale computer (WSC)


Provides Internet services


Search, social networking, online maps, video sharing, online
shopping, email, cloud computing, etc.


Differences with HPC “clusters”:


Clusters have higher performance processors and network


Clusters emphasize thread
-
level parallelism, WSCs emphasize
request
-
level parallelism


Differences with datacenters:


Datacenters consolidate different machines and software into
one location


Datacenters emphasize virtual machines and hardware
heterogeneity in order to serve varied customers

Introduction



12



CSCE 513 Fall 2013

Copyright © 2011, Elsevier Inc. All
rights Reserved.

Figure 6.3


Average CPU utilization of more than 5000 servers during a 6
-
month period at Google. Servers are rarely
completely idle or fully utilized, in
-
stead operating most of the time at between 10% and 50% of their maximum utilization.
(From Figure 1 in Barroso and Hölzle [2007].) The column the third from the right in Figure 6.4 calculates percentages plus
or minus 5% to come up with the weightings; thus, 1.2% for the 90% row means that 1.2% of servers were between 85%
and 95% utilized.



13



CSCE 513 Fall 2013

Copyright © 2011, Elsevier Inc. All
rights Reserved.

Figure 6.5 Hierarchy of switches in a WSC. (Based on Figure 1.2 of Barroso and Hölzle [2009].)



14



CSCE 513 Fall 2013

Copyright © 2011, Elsevier Inc. All
rights Reserved.

Figure 6.7 Graph of latency, bandwidth, and capacity of the memory hierarchy of a WSC for data in Figure 6.6 [Barroso and
Hölzle 2009].



15



CSCE 513 Fall 2013

Copyright © 2011, Elsevier Inc. All
rights Reserved.

Figure 6.8 The Layer 3 network used to link arrays together and to the Internet [Greenberg et al. 2009]. Some WSCs use a
separate
border router

to connect the Internet to the datacenter Layer 3 switches.



16



CSCE 513 Fall 2013

Copyright © 2011, Elsevier Inc. All
rights Reserved.

Figure 6.9 Power distribution and where losses occur. Note that the best improvement is 11%. (From Hamilton [2010].)



17



CSCE 513 Fall 2013

Copyright © 2011, Elsevier Inc. All
rights Reserved.

Figure 6.10 Mechanical design for cooling systems. CWS stands for circulating water system. (From Hamilton [2010].)



18



CSCE 513 Fall 2013

Copyright © 2011, Elsevier Inc. All
rights Reserved.

Figure 6.11 Power utilization efficiency of 19 datacenters in 2006 [Greenberg et al. 2006]. The power for air conditioning (A
C)
and other uses (such as power distribution) is normalized to the power for the IT equipment in calculating the PUE. Thus,
power for IT equipment must be 1.0 and AC varies from about 0.30 to 1.40 times the power of the IT equipment. Power for
“other” varies from about 0.05 to 0.60 of the IT equipment.



19



CSCE 513 Fall 2013

Copyright © 2011, Elsevier Inc. All
rights Reserved.

Figure 6.18 The best SPECpower results as of July 2010 versus the ideal energy proportional behavior. The system was the HP
ProLiant SL2x170z G6, which uses a cluster of four dual
-
socket Intel Xeon L5640s with each socket having six cores running at
2.27 GHz. The system had 64 GB of DRAM and a tiny 60 GB SSD for secondary storage. (The fact that main memory is larger
than disk capacity suggests that this system was tailored to this benchmark.) The software used was IBM Java Virtual Machine
version 9 and Windows Server 2008, Enterprise Edition.



20



CSCE 513 Fall 2013

Copyright © 2011, Elsevier Inc. All rights Reserved.

Figure 6.19 Google customizes a standard 1AAA container: 40 x 8 x 9.5 feet (12.2 x 2.4 x 2.9 meters). The servers are stacked

up to 20 high in racks that form two long rows of 29 racks each, with one row on each side of the container. The cool aisle
goes down the middle of the container, with the hot air return being on the outside. The hanging rack structure makes it
easier to repair the cooling system without removing the servers. To allow people inside the container to repair components,
it contains safety systems for fire detection and mist
-
based suppression, emergency egress and lighting, and emergency
power shut
-
off. Containers also have many sensors: temperature, airflow pressure, air leak detection, and motion
-
sensing
lighting. A video tour of the datacenter can be found at
http://www.google.com/corporate/green/datacenters/summit.html
.
Microsoft, Yahoo!, and many others are now building modular datacenters based upon these ideas but they have stopped
using ISO standard containers since the size is inconvenient.



21



CSCE 513 Fall 2013

Copyright © 2011, Elsevier Inc. All
rights Reserved.

Figure 6.20 Airflow within the container shown in Figure 6.19. This cross
-
section diagram shows two racks on each side of
the container. Cold air blows into the aisle in the middle of the container and is then sucked into the servers. Warm air
returns at the edges of the container. This design isolates cold and warm airflows.




22



CSCE 513 Fall 2013

Copyright © 2011, Elsevier Inc. All
rights Reserved.

Figure 6.21 Server for Google WSC. The power supply is on the left and the two disks are on the top. The two fans below
the left disk cover the two sockets of the AMD Barcelona microprocessor, each with two cores, running at 2.2 GHz. The
eight DIMMs in the lower right each hold 1 GB, giving a total of 8 GB. There is no extra sheet metal, as the servers are
plugged into the battery and a separate plenum is in the rack for each server to help control the airflow. In part because of

the height of the batteries, 20 servers fit in a rack.



23



CSCE 513 Fall 2013

Copyright © 2011, Elsevier Inc. All
rights Reserved.

Figure 6.22 Power usage effectiveness (PUE) of 10 Google WSCs over time. Google A is the WSC described in this section. It
is the highest line in Q3 ‘07 and Q2 ’10. (From
www.google.com/corporate/green/datacenters/measuring.htm
.) Facebook
recently announced a new datacenter that should deliver an impressive PUE of 1.07 (see
http://opencompute.org/
). The
Prineville Oregon Facility has no air conditioning and no chilled water. It relies strictly on outside air, which is brought
in
one side of the building, filtered, cooled via misters, pumped across the IT equipment, and then sent out the building by
exhaust fans. In addition, the servers use a custom power supply that allows the power distribution system to skip one of the

voltage conversion steps in Figure 6.9.



24



CSCE 513 Fall 2013

Copyright © 2011, Elsevier Inc. All
rights Reserved.

Figure 6.24 Query

response time curve.



25



CSCE 513 Fall 2013

Copyright © 2011, Elsevier Inc. All
rights Reserved.

Figure 6.25 Cumulative distribution function (CDF) of a real datacenter.



26



CSCE 513 Fall 2013

CAAQA5ed
-
Copyright © 2012, Elsevier Inc. All rights reserved.

Introduction from CAAQA5ed

Warehouse
-
scale computer (WSC)


Provides Internet services


Search, social networking, online maps, video sharing, online
shopping, email, cloud computing, etc.


Differences with HPC “clusters”:


Clusters have higher performance processors and network


Clusters emphasize thread
-
level parallelism, WSCs emphasize
request
-
level parallelism


Differences with datacenters:


Datacenters consolidate different machines and software into
one location


Datacenters emphasize virtual machines and hardware
heterogeneity in order to serve varied customers

Introduction



27



CSCE 513 Fall 2013

Introduction

Important design factors for WSC:


Cost
-
performance


Small savings add up


Energy efficiency


Affects power distribution and cooling


Work per joule


Dependability via redundancy


Network I/O


Interactive and batch processing workloads


Ample computational parallelism is not important


Most jobs are totally independent


“Request
-
level parallelism”


Operational costs count


Power consumption is a primary, not secondary, constraint when
designing system


Scale and its opportunities and problems


Can afford to build customized systems since WSC require volume
purchase

Introduction

CAAQA5ed
-
Copyright © 2012, Elsevier Inc. All rights reserved.



28



CSCE 513 Fall 2013

Prgrm’g Models and Workloads

Batch processing framework:
MapReduce



Map: applies a programmer
-
supplied function to each
logical input record


Runs on thousands of computers


Provides new set of key
-
value pairs as intermediate values



Reduce: collapses values using another programmer
-
supplied function

Programming Models and Workloads for WSCs

CAAQA5ed
-
Copyright © 2012, Elsevier Inc. All rights reserved.



29



CSCE 513 Fall 2013

Prgrm’g Models and Workloads

Example:


map (String key, String value):


// key: document name


// value: document contents


for each word w in value

»
EmitIntermediate
(w,”1”); // Produce list of all words



reduce (String key,
Iterator

values):


// key: a word


// value: a list of counts


int

result = 0;


for each v in values:

»
result +=
ParseInt
(v); // get integer from key
-
value pair


Emit(
AsString
(result));

Programming Models and Workloads for WSCs

CAAQA5ed
-
Copyright © 2012, Elsevier Inc. All rights reserved.



30



CSCE 513 Fall 2013

Map reduce

Originates in FP (Functional Programming or earlier)

http://en.wikipedia.org/wiki/MapReduce



31



CSCE 513 Fall 2013

Google map reduce revival

http://research.google.com/archive/mapreduce.html

http://code.google.com/edu/submissions/mapreduce/listing.html


http://code.google.com/edu/parallel/mapreduce
-
tutorial.html



32



CSCE 513 Fall 2013

Hadoop

http://hadoop.apache.org/

http://hadoop.apache.org/



33



CSCE 513 Fall 2013

Prgrm’g Models and Workloads

MapReduce

runtime environment schedules map and
reduce task to WSC nodes


Availability:


Use replicas of data across different servers


Use relaxed consistency:


No need for all replicas to always agree


Workload demands


Often vary considerably

Programming Models and Workloads for WSCs

CAAQA5ed
-
Copyright © 2012, Elsevier Inc. All rights reserved.



34



CSCE 513 Fall 2013

Computer Architecture of WSC

WSC often use a hierarchy of networks for
interconnection

Each 19” rack holds 48 1U servers connected to a rack
switch

Rack switches are uplinked to switch higher in
hierarchy


Uplink has 48 / n times lower bandwidth, where n = # of
uplink ports


“Oversubscription”


Goal is to maximize locality of communication relative to the
rack

Computer Ar4chitecture of WSC

CAAQA5ed
-
Copyright © 2012, Elsevier Inc. All rights reserved.



35



CSCE 513 Fall 2013

Storage

Storage options:


Use disks inside the servers, or


Network attached storage through
Infiniband



WSCs generally rely on local disks


Google File System (GFS) uses local disks and maintains at
least three
relicas

Computer Ar4chitecture of WSC

CAAQA5ed
-
Copyright © 2012, Elsevier Inc. All rights reserved.



36



CSCE 513 Fall 2013

Array Switch

Switch that connects an array of racks


Array switch should have 10 X the bisection bandwidth of
rack switch


Cost of
n
-
port switch grows as
n
2


Often utilize content
addressible

memory chips and FPGAs

Computer Ar4chitecture of WSC

CAAQA5ed
-
Copyright © 2012, Elsevier Inc. All rights reserved.



37



CSCE 513 Fall 2013

WSC Memory Hierarchy

Servers can access DRAM and disks on other servers
using a NUMA
-
style interface

Computer Ar4chitecture of WSC

CAAQA5ed
-
Copyright © 2012, Elsevier Inc. All rights reserved.



38



CSCE 513 Fall 2013

Infrastructure and Costs of WSC

Location of WSC


Proximity to Internet backbones, electricity cost, property
tax rates, low risk from earthquakes, floods, and
hurricanes

Power distribution

Physcical Infrastrcuture and Costs of WSC

CAAQA5ed
-
Copyright © 2012, Elsevier Inc. All rights reserved.



39



CSCE 513 Fall 2013

Infrastructure and Costs of WSC

Cooling


Air conditioning used to cool server room


64 F


71 F


Keep temperature higher (closer to 71 F)


Cooling towers can also be used


Minimum temperature is “wet bulb temperature”

Physcical Infrastrcuture and Costs of WSC

CAAQA5ed
-
Copyright © 2012, Elsevier Inc. All rights reserved.



40



CSCE 513 Fall 2013

Infrastructure and Costs of WSC

Cooling system also uses water (evaporation and
spills)


E.g. 70,000 to 200,000 gallons per day for an 8 MW facility


Power cost breakdown:


Chillers: 30
-
50% of the power used by the IT equipment


Air conditioning: 10
-
20% of the IT power, mostly due to fans


How man servers can a WSC support?


Each server:


“Nameplate power rating” gives maximum power consumption


To get actual, measure power under actual workloads


Oversubscribe cumulative server power by 40%, but
monitor power closely

Physcical Infrastrcuture and Costs of WSC

CAAQA5ed
-
Copyright © 2012, Elsevier Inc. All rights reserved.



41



CSCE 513 Fall 2013

Measuring Efficiency of a WSC

Power Utilization Effectiveness (PEU)


= Total facility power / IT equipment power


Median PUE on 2006 study was 1.69


Performance


Latency is important metric because it is seen by users


Bing study: users will use search less as response time
increases


Service Level Objectives (SLOs)/Service Level Agreements
(SLAs)


E.g. 99% of requests be below 100 ms

Physcical Infrastrcuture and Costs of WSC

CAAQA5ed
-
Copyright © 2012, Elsevier Inc. All rights reserved.



42



CSCE 513 Fall 2013

Cost of a WSC

Capital expenditures (CAPEX)


Cost to build a WSC


Operational expenditures (OPEX)


Cost to operate a WSC



Physcical Infrastrcuture and Costs of WSC

CAAQA5ed
-
Copyright © 2012, Elsevier Inc. All rights reserved.



43



CSCE 513 Fall 2013

Cloud Computing

WSCs offer economies of scale that cannot be
achieved with a datacenter:


5.7 times reduction in storage costs


7.1 times reduction in administrative costs


7.3 times reduction in networking costs


This has given rise to cloud services such as Amazon Web
Services


“Utility Computing”


Based on using open source virtual machine and operating
system software

Cloud Computing

CAAQA5ed
-
Copyright © 2012, Elsevier Inc. All rights reserved.