MODULE - National Energy Research Scientific Computing Center

greenbeansneedlesSoftware and s/w Development

Dec 13, 2013 (3 years and 6 months ago)

88 views

Douglas Jacobsen

Bioinformatics Computing
Consultant

Genepool

Modules

Setting up your
environment at
NERSC

-

1

-

July 29, 2013

Topics

1.
UNIX Environment Basics

2.
Constructing a default environment,
dotfiles

3.
Introduction to Modules

4.
Extension to Modules


ModulesReloaded

5.
Using modules interactively

6.
Using modules in a batch job

7.
Constructing basic modules for your software

8.
Constructing pipeline modules


-

2

-

Motivation for this training


Most
-
common tickets at NERSC are issues with
environment settings


/
jgi
/tools is being retired;

old settings need to be
changed!


The modules system on
genepool

has been updated
to ease the transition and future production work



Examples
modulefiles

in:


/global/
projectb
/shared/data/training/modules


-

3

-

The UNIX Environment


What is it?
Key/value store for every process


What does the UNIX environment do for you?


controls which programs you can easily run


PATH


Many
linux

systems have default PATH of:

PATH = /
usr
/local/bin:/
usr
/bin:/bin


Sets up linking paths to
allow

your programs to run


LD_LIBRARY_PATH


Controls how your programs run


MANPATH, PKG_CONFIG_PATH, PS1
,
OMPI_MCA_ras


Really the environment is a way for you to communicate with your
programs


Useful convenience variables on the command line and scripts:


SCRATCH, NERSC_HOST, BOOST_ROOT

-

4

-

The UNIX Environment: The Rules


Each process has its own environment


Each process can manipulate it’s own environment
but no others


A child process inherits its parent’s environment


A “login” shell reads special “
dotfiles
” which may
reset parts of the environment

-

5

-

init

bash

ls

memtime

blastx

perl

/bin/
sh

$data = `cat $file | sort `

cat

sort

Looking at the environment

$
env







# dump the whole environment

$ echo $NERSC_HOST


# just see NERSC_HOST

$ echo $PATH





# view the compound variable PATH

$
env

|
grep

MODULE


# just variables with ‘MODULE’



We’ll be looking at the environment a lot today,
these are two easy ways to interrogate the
environment from either bash or
tcsh


What shell are you using? (hint, check $SHELL)

-

6

-

Changing the environment


bash (default on
genepool
)


export MYVAR=“test”




# when writing, don’t use ‘$’

echo $MYVAR





# when reading, use ‘$’

export PATH=$HOME/bin:$PATH

# prepend your PATH

export MYVAR=“${MYVAR}2”


# append ‘2’ to MYVAR



tcsh

setenv

MYVAR “test”

Echo $MYVAR

s
etenv

PATH $HOME/bin:$PATH

s
etenv

MYVAR “${MYVAR}2”




-

7

-

NERSC
Dotfiles



Your default
Environment
Pt

1


When you first login (or a batch script runs), a
login

shell is executed


A login shell is generated for
every

job


even if you transmit your environment, the
login shell environment is
overlayed

on top of the transmitted environment


A login shell sources special files in your home directory, your
dotfiles



b
ash

users (files evaluated in this order):


$HOME/.profile





(read
-
only
symlink
, do not change)


$HOME/.
bash_profile.ext



(user customizable)


$HOME/.
bashrc





(read
-
only
symlink
, do not change)


$HOME/.
bashrc.ext




(user customizable)



t
csh

users (files evaluated in this order):


$HOME/
.
tcshrc





(read
-
only
symlink
, do not change)


$HOME/
.
tcshrc.ext




(
user customizable)


$HOME/
.login





(
read
-
only
symlink
, do not change)


$HOME/
.
login.ext





(
user customizable
)



zsh
,
ksh

execute some
dotfiles
, but NERSC support is being phased out


/bin/
sh

does not properly source the
dotfiles

(BEWARE!)



-

8

-

Using Software and the UNIX
Environment


Providing large
-
scale installations of software for many
different users on an HPC system presents a number of
challenges:


Different users need different software, use different shells


Some users need different specific versions, including older versions


All users need to access the software quickly and easily from
“everywhere” [network
-
mounted, non
-
standard paths]


Providing a user interface for accessing that software can be
challenging


Example: How would you use software installed in

/
usr
/common/
jgi
/aligners/blast+/2.2.28


Answer:


Add /
usr
/common/
jgi
/aligners/blast+/2.2.28/bin to PATH;


c
sh
:
setenv

PATH /
usr
/common/
jgi
/aligners/blast+/2.2.28/bin:$PATH


b
ash: export PATH=
/
usr
/common/
jgi
/aligners/blast+/2.2.28/bin:$PATH


-

9

-

What are Modules?

-

10

-

A “module” is something that can be
loaded or unloaded dynamically into
the environment.

Modules have a name

Modules have a version


can have
many versions

Modules can have a
default

version

To refer to the
default version
of a module, use: <name>


e.g. module load
gcc

To refer to a
specific version
of a module, use: <name>/<version>


e.g. module load
gcc
/4.8.1

Modules Interactive Example


Basic Commands:

module load <module id> [<module id> …]


Load a module

module unload <module id> [<module id> …]


Remove a module

module list









List all loaded modules

module show <module id>






See module effects

module avail









See all modules

m
odule purge








Remove all modules


Try the following:


Load the default blast+ module


Load the
latest

version of the hdf5 module (hint: not default)


Unload the above modules but leave the rest intact


What effects does the
jgitools

module have
?


What versions of
RSeQC

are available on
genepool
? (try using
grep
)


Why didn’t
grep

work for the last step?


module avail |
grep

RSeQC

won

t work


module communicates with you on
stderr

(
stdout

is used internally)


-

11

-

dmj@genepool02:~$ module list

Currently Loaded
Modulefiles
:


1) modules 7)
mysql
/5.0.96


2)
nsg
/1.2.0 8)
PrgEnv
-
gnu/4.6


3)
uge
/8.0.1 9)
perl
/5.16.0


4)
jgitools
/1.2.0 10)
readline
/6.2


5)
oracle_client
/11.2.0.3.0 11) python/2.7.4


6)
gcc
/4.6.3 12)
usg
-
default
-
modules/1.4

dmj@genepool02:~$ module load blast+

dmj@genepool02:~$ module load hdf5/1.8.11

dmj@genepool02:~$ module list

Currently Loaded
Modulefiles
:


1) modules 8)
PrgEnv
-
gnu/4.6


2)
nsg
/1.2.0 9)
perl
/5.16.0


3)
uge
/8.0.1 10)
readline
/6.2


4)
jgitools
/1.2.0 11) python/2.7.4


5)
oracle_client
/11.2.0.3.0 12)
usg
-
default
-
modules/1.4


6)
gcc
/4.6.3 13) blast+/2.2.26


7)
mysql
/5.0.96 14) hdf5/1.8.11

dmj@genepool02:~$ module unload blast+ hdf5

dmj@genepool02:~$ module list

Currently Loaded
Modulefiles
:


1) modules 7)
mysql
/5.0.96


2)
nsg
/1.2.0 8)
PrgEnv
-
gnu/4.6


3)
uge
/8.0.1 9)
perl
/5.16.0


4)
jgitools
/1.2.0 10)
readline
/6.2


5)
oracle_client
/11.2.0.3.0 11) python/2.7.4


6)
gcc
/4.6.3 12)
usg
-
default
-
modules/1.4

dmj
@genepool02:~$ module
-
t avail 2>&1 |
grep

RSeQC

RSeQC
/2.3.2

RSeQC
/2.3.6(default)

dmj@genepool02:~$

-

12

-

More awkward in
tcsh
, but possible:

(
module

t avail
) | &
grep

RSeQC

Basic Modules Functionality


Modules manipulate the environment


Loading can:


Set an environment variable (possibly by replacing)


Append (or prepend) to a compound environment variable


Unset an environment variable


*can* execute a command (not recommended if the command
changes the state of the system)


‘module unload’ reverses the effects of the ‘module load’


Which effects of a module might be irreversible?


Answer:


setenv

won’t restore the environment to its original state


multiple modules calling ‘
setenv
’ or ‘
unsetenv
’ on the same
variable might lead to an inconsistent state (those modules
should conflict)


Executing system calls which change system state (e.g.
xhost
) are
not trivially reversible by unloading the module


-

13

-

Modules: conflicting and swapping


Some modules are incompatible


E.g. both
wublast

and blast+ provide different
blastn
,
blastx
, etc.
executables


To prevent these modules from being simultaneously loaded, they conflict

dmj@genepool02:~$ module load
wublast

dmj@genepool02:~$ module load blast+

blast+/2.2.26(25):ERROR:150: Module 'blast+/2.2.26' conflicts with the currently loaded module(s)
'
wublast
/
20060510’


Most of the time, only a single version of a module should be
loaded at a time:


e
.g., doesn’t make sense to load more than one version of
gcc



Try:

module purge



## cleans everything out

module load
gcc

Module load
gcc
/4.8.1


Error?
t
o
change from
gcc
/4.6.3 (the default) to
gcc
/4.8.1 (the latest), swap!

m
odule swap
gcc

gcc
/4.8.1

-
or
-

module swap
gcc
/4.8.1

-

14

-

Setting up your own modules


Modules are described by
modulefiles


One version per
modulefile
, in a directory named for the
module;


Collections of modules are found in $MODULEPATH


Try looking at $MODULEPATH


Add your own modules directory:

genepool
$
mkdir

$HOME/modules

g
enepool
$
mkdir

$HOME/modules/
my_first_module

genepool
$ module use $HOME/modules


Try looking at $MODULEPATH again

genepool
$ module
avail
my_first_module


Why doesn’t it show up?


No
modulefiles

installed yet… next slide.




-

15

-

Simple
modulefile

(TOO SIMPLE)

-

16

-

#%Module1.0

##


## Required internal variables

set


name


gcc

set


version

4.6.3

set


root


/
usr
/common/
usg
/languages/$name/$version
\
_1


## List conflicting modules here

conflict $name


## Software
-
specific settings exported to user environment

prepend
-
path

PATH




$
root/bin

prepend
-
path

LD_LIBRARY_PATH

$root/lib

prepend
-
path

LD_LIBRARY_PATH

$root/lib64

prepend
-
path

PKG_CONFIG_PATH

$root/lib/
pkgconfig

s
etenv


GCC_DIR



$
root

WARNING: This example is simplified, do not use in production on
genepool
.

Refer to later
ModulesReloaded

examples.

Module identifier string (REQ)

Comment

Internal variables

Don’t load more
than one
gcc
!

The actual environment

adjustments

}

}

Modulefiles

are written in (somewhat overloaded) TCL.

Common Environment Variables in
Modules


Modules for software packages commonly set:


PATH


LD_LIBRARY_PATH


PYTHONPATH


PERL5DIR


Every
usg
/
jgi

module for software also sets an environment
variable pointing to the base of the distribution:


E.g. BOOST_ROOT, PERL_DIR, PYTHON_DIR, GIT_PATH


Exercise:


Load the python module first


Use ‘module info’ to investigate the effects of:


graphviz


RSeQC


Smrtanalysis


Are there commonalities? Differences?




-

17

-

Be VERY careful about
manipulating these
environment variables!!!

Modules have dependencies

-

18

-


Python needs some of
gcc’s

libraries


Perl needs some of
gcc’s

libraries


Python also needs
readline’s

libraries

For the
python

module to function,
both

the
gcc

and
readline

modules need to be loaded

For the
perl

module to
function, the
gcc

module
needs to be loaded

Complexity of module dependencies on
genepool

-

19

-


Highly inter
-
connected
graph of dependencies


The most highly connected
nodes:


g
cc


perl


python


oracle
-
jdk


openmpi



M
any modules are
disconnected from the
network, possibly because
they are:


Statically compiled


Only rely on base
-
system functionality


Dependencies haven’t
been
modelled

yet


ModulesReloaded


Automatically checks and loads dependencies


Automatically unloads orphaned dependencies


Differentiates between user
-
loaded modules and auto
-
loaded modules when manipulating modules


Does more extensive error checking


Modules failing to load return exit status 1 (echo $?)


Supports “variant” modules


Single
modulefiles

for multiple installations of similar software


Enables reporting of upcoming changes to modules
system


Enhances logging capabilities of modules system




-

20

-

ModulesReloaded

AutoLoad
/Unload


Exercise:


Start by unloading all modules.


Load the python module.


Which modules were loaded?


Next, load the
perl

module.


Which modules are loaded now?


Now, unload the python module


Check module list


Finally, unload the
perl

module.


Check module list


Look at the details of the
perl

and python modules.


-

21

-

ModulesReloaded

AutoLoad
/Unload


Exercise:


Start by unloading all modules.

[module purge]


Load the python module.



[module load python]


Which modules were loaded?

[
gcc
,
readline
, python]


Next, load the
perl

module.


[module load
perl
]


Which modules are loaded now?

[
gcc
,
readline
, python,
perl
]


Now, unload the python module

[module unload python]


Check module list




[
gcc
,
perl
]


Finally, unload the
perl

module.

[module unload
perl
]


Check module list




[
None!]


Look at the details of the
perl

and python modules.

module show
perl

module show python


-

22

-

ModulesReloaded

AutoLoad
/Unload


In the previous exercise,
you should have noticed
that the
perl

and python
modules each depended
on the
gcc

module
(among others).


The
gcc

module won’t get
unloaded while another
loaded module still
depends on it.

-

23

-

ModulesReloaded

User’s
Choice!


Exercise:


Load the default
hmmer

module


Load the
repeatmasker

module



Why did that just happen?



ModulesReloaded

tracks which
modules the user directly
requests (vs. those just loaded
as dependencies), and won’t
swap or remove them
automatically.


Unload
hmmer
, then try loading
repeatmasker
.

-

24

-

ModulesReloaded

Variants


Programming Environments are integrated sets of
modules


Attempt to provide a seamless and coherent build environment


regardless of compiler.



Exercise:


Purge all your modules.


Load ‘
PrgEnv
-
gnu’


Load ‘boost’


Examine the BOOST_ROOT environment variable


Swap to ‘
PrgEnv
-
gnu/4.8’


Examine the BOOST_ROOT environment variable again


-

25

-

https://
www.nersc.gov
/users/computational
-
systems/
genepool
/programming/

ModulesReloaded

Variants


The ‘boost’ module is a ‘variant’ module


When loaded, it detects
which programming environment
(
PrgEnv
) is loaded


When the
PrgEnv

is swapped, the variant module is also
reloaded


A variant module cannot be loaded without its provider
(e.g. boost cannot be loaded without some
PrgEnv
)



Earlier, we had to load python before we could
interrogate
RSeQC


because
RSeQC

is a variant on ‘python’ (instead of

PrgEnv
’)

-

26

-

ModulesReloaded

Variants

-

27

-

“Normal” Module

PrgEnv
-
provider Module

PrgEnv
-
client Module

Default Module

Non
-
default Module

Legend

PrgEnv


and Compilers

Software Libraries (and
Deps
)

Each programming
environment provide the

PrgEnv
’ attribute which is
required by the libraries.


The
PrgEnv

meta
-
modules
conflict with each other; but
the compilers do not.


Changing default module versions may be disruptive to some users


To advertise the change a warning is communicated by modules


Example:


The default version of blast+ is planned to be changed on August 6.


Load the default blast+ module


Unload the blast+ module


Load blast+/2.2.26 (which is the default)

dmj@genepool04:~$ module load blast+

WARNING: The default version of blast+ will be changing from 2.2.26 to 2.2.28 on 2013/08/06. Please
try blast+/2.2.28. Please contact
consult@nersc.gov

with any questions.

dmj@genepool04:~$ module unload blast+

WARNING: The default version of blast+ will be changing from 2.2.26 to 2.2.28 on 2013/08/06. Please
try blast+/2.2.28. Please contact
consult@nersc.gov

with any questions.

dmj
@genepool04:~$ module load blast+/2.2.26

dmj@genepool04:~$


The warning is only sent to users accessing the default without
specifying a version.

-

28

-

ModulesReloaded

DefaultChange

NERSC
Dotfiles



Your default
Environment
Pt

2


Default modules are loaded in the
.
bashrc
/.
tcshrc

files


System files load ‘
uge
’,’
nsg
’,’
jgitools



uge

adds the scheduler


Jgitools

puts /
jgi
/tools/bin into your PATH


.
bashrc

loads ‘
usg
-
default
-
modules’


usg
-
default
-
modules
autoloads
:


PrgEnv
-
gnu


p
erl


python


o
racle
-
client


m
ysql


Are any additional modules auto
-
loaded as prerequisites?


You can add your own ‘module load’
commands to .
bashrc.ext

/ .
tcshrc.ext


Do this with care


modules added in the
default environment become somewhat
infectious


-

29

-

NERSC
Dotfiles



Your default
Environment
Pt

2


What happens if a user does the following in a their
.
bashrc.ext

file?

module load
smrtanalysis

export PERL5LIB=$HOME/
perl

export LD_LIBRARY_PATH=/house/
groupdirs
/
randd
/lib:$LD_LIBRARY_PATH


Is something wrong here?


Answer: PERL5DIR shouldn’t be replaced. This is invalidating the
effects of the
smrtanalysis

module. Instead, use:

export
PERL5LIB=
$HOME/
perl
:$PERL5LIB


What about this:

export PATH=/
jgi
/tools/bin:$PATH


Is there something wrong with this?


Answer: The
jgitools

module is loaded very early in the
environment. The
jgitools

module already implements this
functionality. The many things in /
jgi
/tools/bin may override
other settings you want.

-

30

-

NERSC
Dotfiles



Your default
Environment
Pt

2


Best Practices:


Do

put your settings in a “
genepool

-
only section of
.
bashrc.ext

/ .
tcshrc.ext

if [ “$NERSC_HOST” == “
genepool
” ]; then



fi


Limit

the number of modules you load by default, it can
complicate handing off batch scripts later


Do not
replicate module functionality


i.e.
don’t set environment variables with paths into /
usr
/common
directly


Only
add

to variables like PATH, LD_LIBRARY_PATH, PYTHONPATH,
PERL5DIR as these are commonly


-

31

-

Using Modules in your Work

-

32

-

Using Modules Interactively


Use modules precisely as we have been in the
exercises


Modules are great for interactive use!

-

33

-

Using Modules in Batch Scripts

-

34

-

#!/bin/bash

l

#$
-
l
ram.c
=10G

#$
-
l
h_rt
=8:00:00


set

e



m
odule purge

m
odule load
PrgEnv
-
gnu/4.6

m
odule load
uge

m
odule load blast+/2.2.28

m
odule load python/2.7.4


#…. Run your programs here ….

Ensures login environment
is initialized

UGE options

Kill script if any commands
give non
-
zero exit status

Clear all the modules, and
then reload all needed
modules by version

Using Modules in Batch Scripts


Using this approach:


Your batch script will terminate if something goes wrong
(non
-
zero exit status)


No extraneous modules will be loaded, ensuring exactly
the calculation you want to be run is run with no surprises


Using the precise version numbers means your script will
work even after new defaults are installed


Purging the modules first will allow your script to work in
other users’ hands without requiring anybody to change
their
dotfiles
.

-

35

-

Using Modules in
Production
Pipelines


Consider creating a pipeline module


e.g. jigsaw/5.1


The pipeline module could be a pure ‘meta
-
module’ or
point to it’s own relevant scripts (and still be a meta
-
module)



A meta
-
module purely loads other
modulefiles


E.g.,
PrgEnv
-
gnu


A full
-
featured
modulefile

could:


Load other
modulefiles


Add entries to PATH, PERL5LIB, other parts of the environment


-

36

-

Writing a meta
-
modulefile


A pure meta
-
module

-

37

-

#%Module1.0

##


## Required internal variables

set


name


MyPipeline

set


version

1.0


## List conflicting modules here

set
mod_conflict

[list $
name]


## List prerequisite modules here

set
mod_prereq_autoload

[list blast+/2.2.28
mothur
/
1.26.0
qiime
/1.7.0]

set
mod_prereq

[list blast+/2.2.28
mothur
/1.26.0
qiime
/1.7.0
]


## Source the common modules code
-
base

source /
usr
/common/
usg
/Modules/include/
usgModInclude.tcl


## Software
-
specific settings exported to user environment

setenv

MYPIPELINE_VER




$version

mod_conflict

replaces the
conflict keyword to trap and
exit with status 1

m
od_prereq_autoload

is the
list of modules to
autoload

mod_prereq

is the list of
modules to enforce are
loaded first. This sets up the
automatic load/swap
protections.

usgModInclude.tcl

is the
ModulesReloaded

include code. This should be included before any
environment manipulations.

Writing a meta
-
modulefile


A full featured pipeline
-
module

-

38

-

#%Module1.0

##


## Required internal variables

set


name


MyPipeline

set


version

1.0

s
et


root


{/path/to/my/group/stuff/$name/$version}


## List conflicting modules here

set
mod_conflict

[list $
name]


## List prerequisite modules here

set
mod_prereq_autoload

[list blast+/2.2.28
mothur
/
1.26.0
qiime
/1.7.0]

set
mod_prereq

[list blast+/2.2.28
mothur
/1.26.0
qiime
/1.7.0
]


## Source the common modules code
-
base

source /
usr
/common/
usg
/Modules/include/
usgModInclude.tcl


## Software
-
specific settings exported to user environment

setenv


MYPIPELINE_VER


$version

s
etenv


MYPIPELINE_ROOT

$root

p
repend
-
path

PATH




$root/bin

r
oot
should evaluate to the
filesystem

path for your pipeline.
The braces instruct TCL to not
evaluate it immediately. The include
code will do the evaluation and
perform additional error checking.

Position all your
environment manipulations
after the include file.
Do

set
an environment variable for
the version and root of your
pipeline.

Using Pipeline Modules in Batch
Scripts

-

39

-

#!/bin/bash

l

#$
-
l
ram.c
=10G

#$
-
l
h_rt
=8:00:00


set

e



m
odule purge

m
odule load
PrgEnv
-
gnu/4.6

m
odule load python/2.7.4


module use /path/to/my/groups/
modulefiles

m
odule load
MyPipeline
/1.0


#…. Run your programs here ….

Ensures login environment
is initialized

UGE options

Kill script if any commands
give non
-
zero exit status

Clear all the modules, load
any needed variant
-
provider modules

Add your
modulefiles

to
MODULEPATH (module use)

Load your pipeline module

Conclusion and Best
Practices

-

40

-

Best Practices
-

Dotfiles


If you make changes to compound environment
variables, make sure to only
add

to them


PATH, LD_LIBRARY_PATH, PERL5DIR, PYTHONPATH (many more)


Do
not

replace modules functionality in your
dotfiles
:


Don

t add /
jgi
/tools/bin to PATH


Don’t add any absolute paths in /
usr
/common to your
environment


Limit the number of default modules


Large numbers of default modules complicates giving scripts to
others (they need to change
their

default environment to run
your

script)


Instead setup convenience meta
-
modules or pipeline modules
and load them as
-
needed

-

41

-

Best Practices
-

Modules


Avoid embedding absolute paths in your scripts


Instead use the environment variables set in your modules


This reduces maintenance work on your script and centralizes the
work to a single place


the
modulefile


In production scripts,
purge the modules and load them by
-
version


This ensures the script runs reproducibly


Unloading modules and re
-
loading is sometimes more
reliable than swapping


ModulesReloaded
, for example, can’t unload orphaned dependencies
when swapping:

m
odule swap
PrgEnv
-
gnu
PrgEnv
-
intel

m
odule swap
PrgEnv
-
intel

PrgEnv
-
gnu


The above will leave the
intel

module loaded due to a bug in the
underlying modules system (will investigate and fix in the future).




-

42

-

Best Practices
-

General


Logout
(and back in again)


Seriously, environments do not age like a fine wine


With consistent use of modules, however, they should be
more stable


-

43

-

More Information


The NERSC website has a great deal of information
about this:


Genepool

User Environment:


http://www.nersc.gov/users/computational
-
systems/genepool/user
-
environment
/


Running CGI Scripts with Modules:


https://www.nersc.gov/users/computational
-
systems/genepool/user
-
environment/scriptenv
-
loading
-
modules
-
before
-
starting
-
a
-
script
/


Using modules within Python:


https://www.nersc.gov/users/computational
-
systems/genepool/user
-
environment/working
-
with
-
modules
-
within
-
perl
-
and
-
python
/


ModulesReloaded


Coming soon…

-

44

-

-

45

-

EOF

National Energy Research Scientific Computing
Center

-

46

-