Statement of High-Performance Computing Facilities at Michigan State University

shrewdnessfreedomSoftware and s/w Development

Dec 2, 2013 (3 years and 8 months ago)

159 views

Statement of High
-
Performance Computing
Facilities at Michigan State University

Compute Resources

The
High Performance Computing Center (
HPCC
)

presently maintains five clusters,
along with additional testing nodes. A sixth cluster (intel13)
has been
ordered, and will
be installed by

November 2013. Each cluster has a different combination of memo
ry
and
interconnects
; a summary is shown in
Table
1
.

A
ll nodes
run Re
d Hat Enterprise
Linux 6.3, and are binary compatible with each other. The HPCC also manages job
scheduling on idle university computers
(approximately 6000 cores)
using a Condor
scheduler.

Cluster

Nodes

Sockets

(/node)

Processors


Cores

(/node)


Memory

(
/node)

Accelerators

Interconnect

intel13

112

2

Xeon E5
-
2670
v
2

20

64 GB


FDR Infiniband


24

2

Xeon E5
-
2670
v
2

20

256 GB


FDR Infiniband


8

2

Xeon E5
-
2670v
2

20

512 GB


FDR Infiniband


28

2

Xeon E5
-
2670v
2

20

128 GB

2x Xeon 5110P

FDR Infiniband


40

2

Xeon
E5
-
2670v
2

20

128 GB

2x Nvidia K20c

FDR Infiniband

intel11

2

8

Xeon E8837

64

2 TB


QDR Infiniband


1

4

Xeon E8837

32

1 TB


QDR Infiniband


2

4

Xeon E8837

32

512 GB


QDR Infiniband

intel10

192

2

Xeon E5620

8

24 GB


QDR Infiniband

gfx10

32

2

Xeon E5530

8

18 GB

2x Nvidia M1060

Gigabit Ethernet

amd09

4

8

Opteron 8384

32

256 GB


SDR Infiniband

intel07

12
6

2

Xeon E5345

8

8GB


SDR Infiniband

Table
1
: The
theoretical peak for the entire system (consisting of over 7400 CPU cores,
200
accelerators, 3
7 TB of memory) is approximately 290 TFlops
.

Storage Resources

The HPCC provides secure, high
-
speed file spaces to users. Research spaces and
home directories are
replicated offsite nightly, and hourly ZFS snapshots

are available
.

The repli
cated file spaces feature 1.8GB/s

writes and 6GB/s reads. Data integrity and
availability from our

replicated file spaces are guar
anteed for four years. The high
-
speed

parallel Lustre file space features 5 GB/s writes and 8GB/s reads.

Each compute node
a
lso has an attached local disk that

can be used

within a Hadoop
-
on
-
demand for data
intensive jobs.
Summaries of file spaces are

sho
wn in
Table
2
.

Class

Path

Capacity

Backups

Purge

Replicated

Storage

/mnt/home and /mnt/research

722 TB

Daily

None

High
-
Speed Lustre

/mnt/scratch

364 TB

None

After 45 days

Local Disk

/mnt/local

210TB

None

After 14 days

Table
2
: A summary of local
file systems
managed by the HPCC.

Personnel

In addition to managing the university supercomputers through the HPCC, the Institute
for Cyber
-
Enabled Research (iCER) employs several research specialists that are
available to provide one
-
on
-
one consulting for
advanced
res
earch computing.
The
research specialists have extensive experience in complex workflows, leveraging
numerical libraries, and are familiar with many different parallel programming
paradigms. Collectively, their specializations include: bioinformatics, im
age vision and
numerical methods.