Overview – Research Computing - Information Technology Services

hordeprobableBiotechnology

Oct 4, 2013 (4 years and 10 days ago)

80 views

Overview of Research Computing

ITS Research Computing

Mark Reed

Overview


Research Computing


Resources



Services



Projects

ReCo

Resources


Computational Resources


compute clusters:
Killdevil
, Kure


Special purpose servers:


galaxy,
bioapps
,
sapientia
, ICISS,
eruditio


Software


licensed


open source


Data Storage


Virtual Computing Lab (VCL)


Access to National Resources


ReCo Services


Technical Support


Training and Development


Engagement and Collaboration


Research Database Support


Secure Data Exchange


Data Grids


iRODS


Desktop Support
-

THL



ReCo

Projects


EFRC



HTS and
Seqware



Digital Humanities


Resources

Compute Cluster Advantages


fast interconnect, tightly coupled


aggregated resources


compute cores


memory


installed software base


high availability


large (scratch) file spaces


scheduling and job management


data backup

Multi
-
Purpose Killdevil Cluster


High Performance Computing


Large parallel jobs, high speed interconnect


High Throughput Computing (HTC)


high volume serial jobs


Large memory jobs


special nodes for extreme memory


GPGPU computing


computing on
Nvidia

processors

Killdevil

Nodes


Three types of nodes:


compute nodes


large memory nodes


GPGPU nodes

Killdevil

Compute Cluster


Heterogeneous

Research
Cluster


Dell Blades


700+ Compute Nodes mostly


Xeon 5670 2.93 GHz


9600

cores


Nehalem
Microarchitecture


Dual socket,
hex

core and
oct

core


48 GB memory


some higher memory nodes


GPGPU Nodes


64
Nvidia

Tesla M2070


Extreme Memory Nodes


two
1 TB
node, 32 cores



Infiniband

4x QDR
Interconnect


priority usage

for patrons


Buy in is cheap


Storage


large
lustre

scratch file
system IB connected


/
netscr


Kure


A HPC/HTC research
compute cluster in RC


Named after the beach in
North Carolina




It’s pronounced like the
Nobel prize winning
physicist and chemist,
Madame Curie

Kure Compute Cluster


Heterogeneous

Research
Cluster


Hewlett Packard Blades


200+Compute Nodes,
mostly


Xeon 5560 2.8 GHz


Nehalem
Microarchitecture


Dual socket, quad core


48 GB memory


over 1800 cores


some higher memory nodes


Infiniband

4x QDR



priority usage

for patrons


Buy in is cheap


Storage


/
netscr
, /
proj




Getting an account:

For Kure,
KillDevil

and Mass Storage


http://onyen.unc.edu



Subscribe to Services

Resources: Available Software

Licensed Software


over 20 licensed software applications (some are
site or volume licensed, others restricted)


SAS,
Matlab
, Maple,
Mathematica
, Gaussian,
Accelrys

Materials Studio and Discovery Studio modules,
Sybyl
,
Schrodinger,
Stata
,
ArcGIS
, NAG, IMSL,
Totalview
,
Envi
/IDL, JMP, and JMP Genomics


compilers (licensed and otherwise)


intel
, PGI, gnu, CUDA compiler


Large Installed Software Base


Numerous other packages provided for research
and technical computing


including BLAST,
PyMol
, SOAP, PLINK,
NWChem
, R,
Cambridge Structural Database, Amber,
Gromacs
,
Petsc
,
Scalapack
,
Netcdf
, Babel, Qt, Ferret,
Gnuplot
,
Grace,
iRODS
,
XCrySDen
, and many more.


Mass Storage


long term archival storage


easy to access and use


“limitless” capacity



2 TB free


looks like ordinary disk file
system


data is actually
stored on tape


data is backed up

“To infinity … and beyond”


-

Buzz
Lightyear

Virtual Computing Lab (VCL)


Collaboration with NC State to establish VCL infrastructure for
UNC.


VCL provides
on
-
demand

access to high
-
end computing
resources, via
highly customized
, virtual Windows and Linux
machines.


Virtual Computing Lab (VCL)


Users can log on from anywhere at any time to make
a reservation to use a machine


Lots of software available!


ArcGIS


SAS


MATLAB


Adobe


MS Office


LaTEX


SigmaPlot


MUCH MORE!


Go to
http://vcl.unc.edu

to sign on


For help, see

“Getting Started on VCL” webpage
http://help.unc.edu/CCM3_007680


Access to National Resources



XSEDE


NSF funded
leadership class
infrastructure at 11
partner sites.


Open Science Grid


national shared
computing and storage
resources in a common
grid infrastructure

Services

Services: Training


Courses are offered in the following areas:


Introductions to HPC resources


Research Applications


Linux


General Computing


Parallel Programming


Courses are taught throughout year by
Research Computing, for listings and
details, go to:


http://learnit.unc.edu/workshops


http://help.unc.edu/CCM3_008194


Services: Technical Support


Technical support in using RC resources is
available


Support in compiling, porting, using tools, submitting
jobs, using software packages, storage and data
management, …


online web forms


email
research@unc.edu



962
-
HELP (962
-
4357)


personal consultation


Engagement, Support and Collaboration



Research scientists with experience in
computational chemistry, physics, grid
computing, environmental modeling,
mathematics, parallel computing and
the life sciences are available for
consultation and collaboration.


Digital
Humanities Specialist


Extensive technical support for utilizing
research computing resources.


Services: Secure Data Exchange



Capability to share secure and sensitive data
using a secure “drop box” mechanism for
anonymous or non
-
Onyen

users or full FTP access
for trusted
Onyen

accounts


Computing
-

challenges of flexibility needed for
research and realities of cyber attacks


Networking


maximizing bandwidth for research
endeavors vs. IPS/IDS inspection


Data


compliance requirements,






data sharing, privacy, etc.


Services: Data Grids

iRODS



Distributed data storage using the integrated Rule oriented
Data System (
iRodS
).
iRODS

provides scientists with a
secure, scalable system that can support many aspects of
research data management


Enables data grids/repositories whose policies are
implemented and enforced through rules


Research Computing is
experimenting with hosting
iRODS

collections as a service.


Collaborating with UNC Libraries,
Institute for the Environment, and
RENCI.


www.irods.org

Desktop Computing

TarHeel

Linux


Desktop/Laptop
Campus Machines


Build desktop machines tailored for the RC
environment with additional customization
by user.


Based on
CentOS


Security Approved Build


nightly updates



Onyen



OpenAFS



Customized Applications



Firewall


http://tarheellinux.unc.edu


Kickstart

Server for Linux
Distribution in ITS Manning
Machine Room

Linux Image Pull

Services: Research Database Support



Full time DB admin to support
UNC research databases


over 20 UNC Research Databases for
research production, training and
development


clients include School of Pharmacy,
Lineberger

Comprehensive Cancer Center
(LCCC), Computer Science, SILS,
Renci
,
Bioinformatics, Institute for the
Environment, …




Projects

Energy Frontier Research Centers

http://www.er.doe.gov/bes/EFRC/index.html


Chemical Approaches to Artificial
Photosynthesis. Modular Approach

1.

Light absorption, sensitization

2.

Electron transfer quenching

3.

Vectorial

electron/proton transfer, redox splitting

4.

Catalysis of water oxidation and reduction

Meyer,
Accounts of
Chemical Research

1989
,
22
, 163.

Photosystem II

Meyer, et. al.
Inorg
. Chem.

2005
, 6802;
Acc.
Chem

Res
1989
, 163.

High Throughput Sequencing


The High Throughput Sequencing Facility (HTSF)
provides core services primarily for


Lineberger

Comprehensive Cancer Center (LCCC)
and the TCGA (The Cancer Genome Atlas) project




Renci



NIDA project (National Inst. Drug Abuse)



UNC life sciences

High Throughput Deep Sequencing
Infrastructure


~20
NextGen

sequences


Illumina

HiSeq
, Ion
Torrent, …


RNAseq

pipeline


DNAseq

pipeline


Whole Genome pipeline


ChIP
/
FAIREseq

pipeline


De novo assembly


Specialized Workflow
Engine, Condor, LSF
scheduling

High Throughput Deep Sequencing
Infrastructure

Aggregation
Server

Isilon

1.7
P
B

Pipeline

Manager

Processing Pipeline

Compute Nodes

Data Collection Infrastructure

MaPSeq

meta scheduler
running multiple pipelines


TCGA is a project to catalog genetic mutations
responsible
for
cancer. UNC is one of twelve
national centers


Processed over 4500 samples in support of TCGA
to date


Have processed
over 700
samples in a
week


Goal is to process 10,000 unique samples total
over five years


Lumbee

Familial Political
Factions

Malinda
Maynor

Lowery, History

Brooklyn Renaissance Social
Graph

Melissa Bullard, History

Ancient World Mapping
Application

Questions and Comments?


For assistance with any of our services, please
contact Research Computing


Email:
research@unc.edu


Phone: 919
-
962
-
HELP


Submit help ticket at
http://help.unc.edu