Distributed & Parallel Computing Cluster - UTA HEP WWW Home ...

footballsyrupΛογισμικό & κατασκευή λογ/κού

1 Δεκ 2013 (πριν από 3 χρόνια και 8 μήνες)

72 εμφανίσεις

2/12/04

Distributed & Parallel Computing Cluster

Patrick McGuigan

mcguigan@cse.uta.edu

2/12/04

DPCC Background


NSF funded Major Research Instrumentation
(MRI) grant


Goals


Personnel


PI


Co
-
PI’s


Senior Personnel


Systems Administrator

2/12/04

DPCC Goals



Establish a regional Distributed and Parallel
Computing Cluster at UTA (
DPCC@UTA
)


An inter
-
departmental and inter
-
institutional facility


Facilitate collaborative research that requires large
scale storage (tera to peta bytes), high speed access
(gigabit or more) and mega processing (100’s of
processors)


2/12/04

DPCC Research Areas


Data mining / KDD


Association rules, graph
-
mining, Stream processing etc.


High Energy Physics


Simulation, moving towards a regional D
ø

Center


Dermatology/skin cancer


Image database, lesion detection and monitoring


Distributed computing


Grid computing, PICO


Networking


Non
-
intrusive network performance evaluation


Software Engineering


Formal specification and verification


Multimedia


Video streaming, scene analysis


Facilitate collaborative efforts that need high
-
performance computing


2/12/04

DPCC Personnel


PI Dr. Chakravarthy


CO
-
PI’s


Drs. Aslandogan, Das, Holder, Yu


Senior Personnel


Paul Bergstresser, Kaushik De, Farhad
Kamangar, David Kung, Mohan Kumar, David
Levine, Jung
-
Hwan Oh, Gregely Zaruba


Systems Administrator


Patrick McGuigan

2/12/04

DPCC Components


Establish a distributed memory cluster
(150+ processors)


Establish a Symmetric or shared
multiprocessor system


Establish a large shareable high speed
storage (100’s of Terabytes)

2/12/04

DPCC Cluster as of 2/1/2004


Located in 101 GACB


Inauguration 2/23 as part of E
-
Week


5 racks of equipment + UPS

2/12/04

2/12/04

2/12/04

2/12/04


Photos (scaled for presentation)

2/12/04

DPCC Resources


97 machines


81 worker nodes


2 interactive nodes


10 IDE based RAID servers


4 nodes support Fibre Channel SAN


50+ TB storage


4.5 TB in each IDE RAID


5.2 TB in FC SAN

2/12/04

DPCC Resources (continued)


1 Gb/s network interconnections


core switch


satellite switches


1 Gb/s SAN network


UPS

2/12/04

node
2
node
3
node
4
node
5
node
6
node
7
node
8
node
9
node
10
node
11
node
12
node
13
node
14
Node
15
node
16
node
1
node
18
node
19
node
20
node
21
node
22
node
23
node
24
node
25
node
26
node
27
node
28
node
29
node
30
node
31
node
32
node
17
master
.
dpcc
.
uta
.
edu
grid
.
dpcc
.
uta
.
edu
gfsnode
1
gfsnode
2
gfsnode
3
lockserver
ArcusII
5
.
2
TB
Brocade
3200
Open
Open
Open
Open
Foundry FastIron
800
raid
3
raid
6
raid
8
raid
9
raid
10
node
54
node
55
node
56
node
57
node
58
node
59
node
60
node
61
node
62
node
53
raid
4
IDE
6
TB
Linksys
3512
node
64
node
65
node
66
node
67
node
68
node
69
node
70
node
71
node
72
node
63
raid
5
IDE
6
TB
Linksys
3512
node
74
node
75
node
76
node
77
node
78
node
79
node
80
node
81
node
73
raid
7
IDE
6
TB
Linksys
3512
node
34
node
35
node
36
node
37
node
38
node
39
node
40
node
41
node
42
node
33
raid
1
IDE
6
TB
Linksys
3512
node
44
node
45
node
46
node
47
node
48
node
49
Node
50
node
51
node
52
node
43
raid
2
IDE
6
TB
Linksys
3512
100
Mbs switch
Campus
Campus
DPCC Layout

2/12/04

DPCC Resource Details


Worker nodes


Dual Xeon processors


32 machines @ 2.4GHz


49 machines @ 2.6GHz


2 GB RAM


IDE Storage


32 machines @ 60 GB


49 machines @ 80 GB


Redhat 7.3 Linux (2.4.20 kernel)


2/12/04

DPCC Resource Details (cont.)


Raid Server


Dual Xeon processors (2.4 GHz)


2 GB RAM


4 Raid Controllers


2 port controller (qty 1) Mirrored OS disks


8 port controller (qty 3) RAID5 with hot spare


24 250GB disks


2 40GB disk


NFS used to support worker nodes

2/12/04

DPCC Resource Details (cont.)


FC SAN


RAID5 Array


42 142GB FC disks


FC Switch


3 GFS nodes


Dual Xeon (2.4 Ghz)


2 GB RAM


Global File System (GFS)


Serve to cluster via NFS


1 GFS Lockserver

2/12/04

Using DPCC


Two nodes available for interactive use


master.dpcc.uta.edu


grid.dpcc.uta.edu


More nodes are likely to support other services
(Web, DB access)


Access through SSH (version 2 client)


Freeware Windows clients are available (ssh.com)


File transfers through SCP/SFTP

2/12/04

Using DPCC (continued)


User quotas not implemented on home
directory yet. Be sensible in your usage.


Large data sets will be stored on RAID’s
(requires coordination with sys admin)


All storage visible to all nodes.


2/12/04

Getting Accounts


Have your supervisor request account


Account will be created


Bring ID to 101 GACB to receive password


Keep password safe


Login to any interactive machine


master.dpcc.uta.edu


grid.dpcc.uta.edu


USE yppasswd command to change password


If you forget your password


See me in my office, I will reset your password (with ID)


Call or e
-
mail me, I will reset your password to the original
password

2/12/04

User environment


Default shell is bash


Change with ypchsh


Customize user environment using startup files


.bash_profile (login session)


.bashrc (non
-
login)


Customize with statements like:


export <variable>=<value>


source <shell file>


Much more information in man page


2/12/04

Program development tools



GCC 2.96


C


C++


Java (gcj)


Objective C


Chill


Java


JDK Sun J2SDK 1.4.2

2/12/04

Development tools (cont.)


Python


python = version 1.5.2


python2 = version 2.2.2


Perl


Version 5.6.1


Flex, Bison, gdb…


If your favorite tool is not available, we’ll
consider adding it!

2/12/04

Batch Queue System


OpenPBS


Server runs on master


pbs_mom runs on worker nodes


Scheduler runs on master


Jobs can be submitted from any interactive node


User commands


qsub


submit a job for execution


qstat


determine status of job, queue, server


qdel


delete a job from the queue


qalter


modify attributes of a job


Single queue (workq)

2/12/04

PBS qsub



qsub used to submit jobs to PBS


A job is represented by a shell script


Shell script can alter environment and proceed
with execution


Script may contain embedded PBS directives


Script is responsible for starting parallel jobs (not
PBS)


2/12/04

Hello World

[mcguigan@master pbs_examples]$ cat helloworld

echo Hello World from $HOSTNAME


[mcguigan@master pbs_examples]$ qsub helloworld

15795.master.cluster


[mcguigan@master pbs_examples]$ ls

helloworld helloworld.e15795 helloworld.o15795


[mcguigan@master pbs_examples]$ more helloworld.o15795

Hello World from node1.cluster


2/12/04

Hello World (continued)


Job ID is returned from qsub


Default attributes allow job to run


1 Node


1 CPU


36:00:00 CPU time


standard out and standard error streams are
returned

2/12/04

Hello World (continued)


Environment of job


Defaults to login shell (overide with #!) or

S switch


Login environment variable list with PBS additions:


PBS_O_HOST


PBS_O_QUEUE


PBS_O_WORKDIR


PBS_ENVIRONMENT


PBS_JOBID


PBS_JOBNAME


PBS_NODEFILE


PBS_QUEUE


Additional environment variables may be transferred using “
-
v”
switch

2/12/04

PBS Environment Variables


PBS_ENVIRONMENT


PBS_JOBCOOKIE


PBS_JOBID


PBS_JOBNAME


PBS_MOMPORT


PBS_NODENUM


PBS_O_HOME


PBS_O_HOST


PBS_O_LANG


PBS_O_LOGNAME


PBS_O_MAIL


PBS_O_PATH


PBS_O_QUEUE


PBS_O_SHELL


PBS_O_WORKDIR


PBS_QUEUE=workq


PBS_TASKNUM=1


2/12/04

qsub options


Output streams:


-
e (error output path)


-
o (standard output path)


-
j (join error + output as either output or error)


Mail options


-
m [aben] when to mail (abort, begin, end, none)


-
M who to mail


Name of job


-
N (15 printable characters MAX first is alphabetical)


Which queue to submit job to


-
q [name] Unimportant for now


Environment variables


-
v pass specific variables


-
V pass all environment variables of qsub to job


Additional attributes


-
w specify dependencies


2/12/04

Qsub options (continued)


-
l switch used to specify needed resources


Number of nodes


nodes = x


Number of processors


ncpus = x


CPU time


cput=hh:mm:ss


Walltime


walltime=hh:mm:ss


See man page for pbs_resources

2/12/04

Hello World


qsub

l nodes=1

l ncpus=1

l cput=36:00:00

N
helloworld

m a

q workq helloworld


Options can be included in script:


#PBS
-
l nodes=1


#PBS
-
l ncpus=1


#PBS
-
m a


#PBS
-
N helloworld2


#PBS
-
l cput=36:00:00



echo Hello World from $HOSTNAME


2/12/04

qstat


Used to determine status of jobs, queues, server


$ qstat


$ qstat <job id>


Switches


-
u <user> list jobs of user


-
f provides extra output


-
n provides nodes given to job


-
q status of the queue


-
i show idle jobs

2/12/04

qdel & qalter


qdel used to remove a job from a queue


qdel <job ID>


qalter used to alter attributes of currently
queued job


qalter <job id> attributes (similar to qsub)

2/12/04

Processing on a worker node


All RAID storage visible to all nodes


/dataxy where x is raid ID, y is Volume (1
-
3)


/gfsx where x is gfs volume (1
-
3)


Local storage on each worker node


/scratch


Data intensive applications should copy input data
(when possible) to /scratch for manipulation and
copy results back to raid storage

2/12/04

Parallel Processing


MPI installed on interactive + worker nodes


MPICH 1.2.5


Path: /usr/local/mpich
-
1.2.5


Asking for multiple processors


-
l nodes=x


-
l ncpus=2x



2/12/04

Parallel Processing (continued)


PBS node file created when job executes


Available to job via $PBS_NODEFILE


Used to start processes on remote nodes


mpirun


rsh

2/12/04

Using node file (example job)

#!/bin/sh

#PBS
-
m n

#PBS
-
l nodes=3:ppn=2

#PBS
-
l walltime=00:30:00

#PBS
-
j oe

#PBS
-
o helloworld.out

#PBS
-
N helloword_mpi

NN=`cat $PBS_NODEFILE | wc
-
l`

echo "Processors received = "$NN


echo "script running on host `hostname`"

cd $PBS_O_WORKDIR

echo


echo "PBS NODE FILE"

cat $PBS_NODEFILE

echo


/usr/local/mpich
-
1.2.5/bin/mpirun
-
machinefile $PBS_NODEFILE
-
np $NN ~/mpi
-
example/helloworld

2/12/04

MPI


Shared Memory vs. Message Passing


MPI


C based library to allow programs to communicate


Each cooperating execution is running the same program
image


Different images can do different computations based on
notion of rank


MPI primitives allow for construction of more sophisticated
synchronization mechanisms (barrier, mutex)

2/12/04

helloworld.c

#include <stdio.h>

#include <unistd.h>

#include <string.h>

#include "mpi.h"


int main( argc, argv )


int argc;


char **argv;

{


int rank, size;


char host[256];


int val;



val = gethostname(host,255);


if ( val != 0 ){


strcpy(host,"UNKNOWN");


}



MPI_Init( &argc, &argv );


MPI_Comm_size( MPI_COMM_WORLD, &size );


MPI_Comm_rank( MPI_COMM_WORLD, &rank );


printf( "Hello world from node %s: process %d of %d
\
n", host, rank, size );


MPI_Finalize();


return 0;

}

2/12/04

Using MPI programs



Compiling


$ /usr/local/mpich
-
1.2.5/bin/mpicc helloworld.c


Executing


$ /usr/local/mpich
-
1.2.5/bin/mpirun <options>
\

helloworld


Common options:


-
np number of processes to create


-
machinefile list of nodes to run on

2/12/04

Resources for MPI


http://www
-
hep.uta.edu/~mcguigan/dpcc/mpi


MPI documentation


http://www
-
unix.mcs.anl.gov/mpi/indexold.html


Links to various tutorials


Parallel programming course