Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum

fallsnowpeasInternet και Εφαρμογές Web

12 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

97 εμφανίσεις

Taking the Complexity out of Cluster Computing

Vendor Update HPC User Forum

Arend Dittmer

Director Product Management HPC

April,17 2009

Copyright © 2009 Penguin Computing, Inc. All rights reserved

Penguin Vision and Focus


Founded 1998


One of HPC industry’s longest track
records of success


Donald Becker, CTO


Inventor of Beowulf
architecture and primary contributor to Linux
kernel


Over 2500 Customers in Enterprise, Academia and
Government


Focus on integrated ‘turnkey’ HPC clusters

Copyright © 2009 Penguin Computing, Inc. All rights reserved


Rack Integration


Software Integration


Scyld Clusterware


Schedulers


Development tools


Applications


Solution Testing


System level burn
-
in


Full cluster testing


24x7 Support





Software

>
Cluster Management

>
Applications and
Workload Managers

>
Compilers and Tools


Hardware

>
Servers

>
GPU Accelerators

>
Storage

>
Interconnects

>
Racks and PDU’s


Penguin Solutions Delivered “Ready
-
to
-
Run”

Trends in Cluster Computing

Cluster Management Software

Copyright © 2009 Penguin Computing, Inc. All rights reserved


Linux clusters deliver unmatched price/performance


Linux clusters dominate the HPC Market (Market share >75%)


however…






Compute power delivered by many systems introduces complexity

>
Configuration consistency

>
Distributed applications

>
Workload Management


Scyld ClusterWare designed to make cluster management easy

5

The HPC Cluster Management Challenge

Copyright © 2009 Penguin Computing, Inc. All rights reserved

6

Scyld ClusterWare Design


Master node is the single
point of control


Compute nodes are
attached 'stateless'
memory and processor
resources


Scyld maintains
consistency across the
cluster

Designed for Ease
-
of
-
Use and Manageability

‘Manage a Cluster like a Single System’

Copyright © 2009 Penguin Computing, Inc. All rights reserved

Web Based Monitoring
Framwork


One web based interface to all HPC cluster components


Integrates existing tools e.g. IPMI, Ganglia, TORQUE


Customizable, extensible Framework

>
Based on XML, Java script and ExtJS

Trends in Cluster Computing

Hardware

Copyright © 2009 Penguin Computing, Inc. All rights reserved

9

Heterogeneous Computing: GPUs + CPUs


Massive processing power introduces I/O challenge

>
Getting data to and from the processing units can take as long as
the processing itself

>
Requires careful software design and deep understanding of
algorithms and architecture of


Processors (Cache effects, memory bandwidth)


GPU accelerators


Interconnects (Ethernet, IB, 10 Gigabit Ethernet),


Storage (local disks, NFS, parallel file systems)


4 cores

Copyright © 2009 Penguin Computing, Inc. All rights reserved

10

Application Case Study: ANSYS /
Acceleware


ANSYS Direct Sparse Solver (DSS)
-

Single System Mode


Matrix Decomposition offloaded to
NVIDIA Tesla C1060
GPU Accelerator


ANSYS standard benchmark BM
-
7


500K
-
1750K
DoF


Overall speedup up to 3.7X for Single Precision runs, 2.7X
for Double Precision


Copyright © 2009 Penguin Computing, Inc. All rights reserved

Sample of Penguin’s Advanced Compute Offering


NVIDIA Tesla S1070 GPU Accelerator

>
Four processors, 240 cores each

>
Native double precision floating point support

>
Supports Nvidia’s CUDA API


Niveus HTX Personal Supercomputer

>
Engineered to support Tesla coprocessors

>
720 GPU cores



Relion

Intel 1702

>
1U Chassis housing two independent x86 nodes


>
Two Xeon 5500 Series 'Nehalem' processors per
node

>
Up to 96GB of RAM on each node

Thank You

April,17 2009

Copyright © 2009 Penguin Computing, Inc. All rights reserved

13

Application Case Study: ANSYS /
Acceleware


ANSYS

>
Direct Sparse Solver (DSS)
-

SMP/Single System Mode


Acceleware

Plug
-
In for ANSYS

>
Matrix Decomposition offloaded to
NVIDIA Tesla C1060 GPU Accelerator


Benchmark

>
ANSYS standard benchmark BM
-
7


500K
-
1750K Degrees if Freedom (
DoF
)

>
Intel Xeon E5405


Dual core runs

>
Overall speedup up to 3.7X for Single Precision runs, 2.7X for Double Precision


Copyright © 2009 Penguin Computing, Inc. All rights reserved

Integrated Management Framework


One web based interface to all HPC cluster components


Follows Scyld ‘Ease
-
of
-
Use’ Philosophy


Integrates existing tools e.g. IPMI, Ganglia, TORQUE


Copyright © 2009 Penguin Computing, Inc. All rights reserved

A Sample of our 2500+ Customers

National Labs

Aerospace/Defense

Universities/Institutions

Enterprise

Copyright © 2009 Penguin Computing, Inc. All rights reserved

16

Hardware Effects:
Multicore
-
Multithreading


Moore’s Law is doubling the number of transistors on an integrated
circuit every 18 months


However, clock speeds are not scaling


Multicore and Multithreaded

Programming is critical for

continued software scalability


Rather than reinvent the wheel,

use existing frameworks and tools

>
OpenMP

>
MPI

>
Threaded Building Blocks

>
Atlas, FFTW, MKL, AMCL, etc.