RP Update July 16, 2009

scacchicgardenSoftware and s/w Development

Dec 13, 2013 (3 years and 10 months ago)

64 views

© 2008 Pittsburgh Supercomputing Center

Pittsburgh Supercomputing Center

RP Update July 16, 2009

Bob Stock




Associate Director


stock@psc.edu




© 2008 Pittsburgh Supercomputing Center

Center for Analysis & Prediction of Storms


Oklahoma/NOAA Spring Severe Weather
Forecast Experiment for 2009


CAPS used NICS (1 km) and PSC (4 km)


At PSC


from 4/20 to 6/5


Sunday
-
Thursday: reservations of 2000 cores
for 10
-
12 hours starting at 10:30 a.m. (eastern)


Lots of data generated: E.g., 66 terabytes
ingested into archive during May

2009 CAPS Spring Experiment

on PSC BigBen


Data Access and Screening


Create Input Files


Create Job Scripts


Remap Radar Data


[800 proc, 20 proc each radar]


Process Initial and Boundary Conditions


Run Weather Analysis [80 processors]


Create Ensemble Perturbations


Run WRF & ARPS Forecast Models [18 x 80 processors]


Extraction & reformatting of 2
-
D output


Archive of 3
-
D results, over 50 TB data


Generate derived products


Data display and interrogation


Analysis and verification


Publication

Sample 4
-
km Ensemble Forecast Products

18h Forecasts Valid 1800 UTC, May 8, 2009

Predicted

Probability

matched

reflectivity

Actual

Observed

Radar

Reflectivity

Predicted

Spaghetti

Diagram

of 35
dBZ


reflectivity

Predicted

Probability

of reflectivity

>35
dBZ



Midwest

Zoom

All Ensemble

Forecast

Members

© 2008 Pittsburgh Supercomputing Center

Enhancing Operations on Pople


Automatic Performance Measurement


Utilize Performance Monitor Unit (PMU)


Backfilling using Predictive Walltimes

© 2008 Pittsburgh Supercomputing Center

Automatic Performance Measurement


Goal: Collect Intel Itanium 2 PMU stats for each job in order to


Identify underperforming codes (MFLOPS)


Provide users with PMU stats for their runs


Based on open source package: Perfmon2


http://perfmon2.sourceforge.net/


Collection started for each job using pfmon


Counters collected: CPU_OP_CYCLES_ALL,
FP_OPS_RETIRED, L3_REFERENCES, L3_MISSES


Counter detail for each process and thread collected


Report issued from digested stats


Currently testing and evaluating load on system

© 2008 Pittsburgh Supercomputing Center

Backfilling using Predictive Walltimes


Goal: Maximize backfilling during drain for larger jobs


Problem: Backfilling for large jobs idles machine due to users
overestimating job run times


Solution: Store estimated and actual job run times for each job
and statistically predict job run times


Statistically calculated run time is used to optimize backfilling
opportunities


Database used to store job actual and estimated walltimes for
each job


Lightweight database engine, SQLite, used to store data


70,000 jobs in database


Database uses only 87Kbytes!


Scheduler uses data from database to select jobs for backfill


Still studying impact and benefits


shows promise


© 2008 Pittsburgh Supercomputing Center

PSC at TG09: Organization


Shawn Brown: Science Track Co
-
Chair


Pallavi Ishwad: EOT Track Chair


Laura McGinnis: Student Program Chair


Shandra Williams: Communications
Committee Member in charge of signage


Mike Schneider: Wrote news items about
the conference



© 2008 Pittsburgh Supercomputing Center

PSC at TG09: Participation


Phil Blood and Robin Flaus: Presented paper on Computation
Exploration (Comp Ex) program in EOT Track


Greg Foss: Presented visualizations in Visualization Showcase


Ed Hanna and Rob Light with Dave Hart (SDSC): Presented paper
on RDR in Technology Track


Anirban Jana and Sergiu Sanielevici with several people from other
institutions: Presented tutorial
Preparing Your Application for
TeraGrid Beyond 2010


Nick Nystrom with several people from other institutions: Presented
tutorial
Using Tools to Understand Performance Issues on TeraGrid
Machines: IPM and the POINT Project


Josephine Palencia: Presented poster
JWAN: PSC's Secure,
Federated, Distributed Lustre Filesystem on the WAN (TeraGrid)