Palacios and Memory - The Prognostic Lab

mangledcobwebΛογισμικό & κατασκευή λογ/κού

14 Δεκ 2013 (πριν από 3 χρόνια και 9 μήνες)

84 εμφανίσεις

Multi
-
stack System Software

Jack Lange

Assistant Professor

University of Pittsburgh

Summary


Commodity and HPC systems have been converging


Commodity off the shelf components


Linux based HPC systems


Cloud
computing



Problem: Real HPC
applications need HPC environments


Tightly coupled, massively parallel, and synchronized


Current services must provide dedicated HPC systems



Can
we
co
-
host
HPC applications
on commodity systems?



Dual Stack Approach


Provision the underlying software stack along with application


Commodity stack should handle commodity applications


HPC stack can provide HPC environment








Current systems do support this, but…


Interference still exists inside the system software


Inherent feature of commodity systems

Cores

Socket 1

Memory

1

2

3

4

Cores

Socket 2

5

6

7

8

Memory

HPC Partition

Commodity Partition

User Space Partitioning

HPC vs. Commodity Systems


Commodity systems have fundamentally different
focus than HPC systems


Amdahl’s vs. Gustafson’s laws


Commodity: Optimized for common case



HPC:
C
ommon case is not good enough


At large (tightly coupled) scales, percentiles lose
meaning


Collective operations must wait for slowest node


1% of nodes can make 99% suffer


HPC systems must optimize outliers (worst case)

Dual Stack Approach


Partition


Segment the underlying hardware resources


Assign them to exclusively to specific workloads


Isolate


Prevent interference from other workloads


Hardware: partitions must be course grained


Software: eliminate shared state



Implementation


Independent system software running on isolated
resources

HPC in the cloud


Clouds are starting to look like supercomputers…


Are we seeing a convergence?


Not yet


Noise issues


Poor isolation


Resource contention


Lack of control over topology



Very bad for tightly coupled parallel apps


Require specialized environments that solve these problems



Approaching convergence


Vision: Dynamically partition cloud resources into HPC and commodity
zones


This talk: partitioning compute nodes with performance
isolation


Commodity VMMs


Virtualization is considered an “enterprise”
technology


Designed for commodity environments


Fundamentally different, but not wrong!



Example:
KVM architecture issues


Userspace

handlers


Fairly complex memory management


Locking and periodic optimizations


Presence of system noise

8

Palacios VMM


OS
-
independent embeddable virtual machine monitor


Established compatibility with Linux, Kitten, and
Minix



Specifically targets HPC applications and environments


Consistent performance with very low variance



Deployable on supercomputers
, clusters (
Infiniband
/
Ethernet
),
and
servers


0
-
3% overhead at large scales (thousands of nodes)


VEE 2011, IPDPS 2010, ROSS 2011

http://www.v3vee.org/palacios

Open source and freely available


Palacios/Linux


Palacios/Linux provides lightweight and high
performance virtualized environments


Internally manages dedicated resources


Memory and CPU scheduling


Does not bother with
“enterprise features”


Page sharing/merging, swapping, overcommitting resources




Palacios enables scalable HPC performance on
commodity platforms

VMM Comparison


Primary difference: Consistency


Requirement for tightly coupled performance at large scale



Example: KVM nested paging architecture


Maintains free page caches to optimize performance


Requires cache management


Shares page tables to optimize memory usage


Requires synchronization

VMM

% of exits

Mean

Std

Dev

# NPFS

KVM

52%

8804

5232

3,265,156

Palacios

50%

10876

2685

1,872,017

Hardware

Palacios VMM

HPC Linux

Linux Kernel

HPC Application

Palacios Resource

Managers

KVM

Linux Module Interface

Commodity Linux

Commodity Application(s)

Dual Stack Architecture



Partitioning at the OS level


Enable cloud to host both commodity and HPC apps


Each zone optimized for the target applications

Evaluation


Goal: Measure VM isolation properties


Partitioned a single node into HPC and commodity
zones


Commodity Zone: Parallel Kernel compilation


HPC Zone: Set of
standard
HPC benchmarks


System:


Dual 6
-
core AMD Opteron with NUMA topology


Linux guest environments (HPC and commodity)



Important: Local node only


Does not promise good performance at scale


But,
p
oor performance will magnify at large scales

Results

Palacios delivers

c
onsistent performance

Commodity VMMs
degrade with contention

MiniFE
: Unstructured implicit finite element solver

Mantevo

Project
--

https
://
software.sandia.gov
/
mantevo
/
index.html

Discussion


A dual stack approach can provide HPC environments on
commodity clouds


HPC and commodity workloads can dynamically share resources


HPC requirements can be met without fully dedicated resources




Networking is still an open issue


Need mechanisms for isolation and partitioning


Need high performance networking architectures


1Gbit is not good enough


10Gbit is good,
Infiniband

is better


Need control over placement and topologies

Multi
-
stack Operating Systems


Future
Exascale

Systems are moving towards in situ organization


Applications traditionally have utilized their own platforms


Visualization, storage, analysis,
etc



Everything must now collapse onto a single platform

What this means for the OS


At
Petascale

we could optimize each environment separately


Each had their own OS and hardware


At
Exascale

workloads will be co
-
located


Can a single OS handle all workloads effectively?



Probably Not


Each has different resource requirements and behaviors


Exascale

will need to support multiple OS environments on the same
hardware

Beyond Virtualization


Virtualization imposes overhead


Power
: requires transistors


Performance
: small, but present


Interference
: Still some dependencies on host OS



Might not be available on
exascale

hardware



Can we provide native partitioning?


We think so


Linux provides the ability to dynamically remove
resources (CPUs, memory,
etc
)


These can be taken over by a second OS

Hardware

Palacio VMM

Kitten

HPC Application

Linux

Commodity
Application(s)

Para
-
native Architecture




Provide LWK environment on a commodity system


Each zone optimized for the target applications

Approach


OS partition created via
offlined

resources


CPUs, memory, PCI devices


Secondary OS “booted” on offline resources



Issues:


OS initialization


Boot process


Resource discovery


Coordination and communication


Security and safety

Dual Stack Memory


Maybe we don’t need to provide an entirely separate
OS


Instead selectively manage some resources



Dual stack memory


Provide a separate memory management layer to Linux



Features


Selectively manage heap per application


Provide applications with direct control over memory
layout


Transparently back memory using large pages


Without overhead added by Linux


Hardware

HPC Application

Linux

Commodity Application(s)

Dual Stack Architecture




Provide LWK memory manager on a commodity OS

Lightweight

Memory

Management

Commodity Linux

Initial on
-
demand Page
faults

(500,000


600,000 cycles)

Performance Comparison

Linux Memory Management

Lightweight Memory Management

Occasional Outliers

(Large page coalescing)

Lowlevel

noise

Conclusion


Commodity systems are not designed to support HPC workloads


Different requirements and behaviors than commodity applications



A
multi stack
approach can provide HPC environments in
commodity systems


HPC
requirements can be met without separate physical
systems


HPC and commodity workloads can dynamically share
resources


Isolated
s
ystem software environments

Thank you

Jack Lange

Assistant Professor

University of Pittsburgh

jacklange
@
cs.pitt.edu


http://
www.cs.pitt.edu
/~
jacklange