bwUniCluster
Getting access
Accessing bwUniCluster is a multi-step procedure that is described in detail here: https://wiki.bwhpc.de/e/Registration/bwUniCluster. Please follow these steps as described. When being asked why you need access, please mention the “HPC with C++ class at the University of Freiburg”.
Logging in and transferring data
You can the login to bwUniCluster with the ssh
(secure shell) command:
ssh -Y <username>@uc2.scc.kit.edu
The option -Y
tells ssh to forward X11 connections. This allows you to display plots while remotely logged in to bwUniCluster. This will not work if you are logging in from a Windows machine, unless you have set up a dedicated X11 server on your windows computer.
You can copy files to bwUniCluster with the scp
(secure copy) command:
scp my_interesting_file <username>@uc2.scc.kit.edu:
Don’t forget the colon :
after the machine name when running the scp
command. Note that you can use the same command to get the data back from bwUniCluster. For example, to copy the file result.npy
from your home directory on bwUniCluster to the current directory simply execute (on your local machine):
scp <username>@uc2.scc.kit.edu:result.npy .
The dot .
refers to your current directory. You can also specify full paths on either remote or local machines when executing scp
.
Setting up the software environment
After logging in to bwUniCluster at uc2.scc.kit.edu
, you will need to set up your local environment to be able to compile and run parallel applications. The module
command loads a specific environment:
- You can see all possible modules with
module avail
. Note that the list changes when you load a module as there are dependencies. - You search for a specific module with
module spider
. - You can list the presently load modules with
module list
. - You can remove (unload) all presently load modules with
module purge
.
Please load the following modules (just execute these commands at the command line):
module load compiler/gnu mpi/openmpi devel/python/3.12.3_gnu_13.3
You can check whether the correct modules where loaded by executing
module list
Now compile you need to install muFFT
. First compile muFFT
’s dependencies. This is described in the Getting Started page of muFFT
. In short, execute
curl -sSL https://raw.githubusercontent.com/muSpectre/muFFT/main/install_dependencies.sh | sh
Now setup a virtual python environment to install muFFT
. You need to execute this in your home directory, which is where the venv
directory will be located:
python3 -m venv venv
You activate the virtual environment with:
source venv/bin/activate
Upgrade pip
in your venv because older versions of pip
do not play well with meson
which is used for building muFFT
:
python3 -m pip install --upgrade pip
You also need to manually install mpi4py
:
python3 -m pip install --force-reinstall --no-cache --no-binary mpi4py mpi4py
Now install muFFT
:
PKG_CONFIG_PATH=$HOME/.local/lib/pkgconfig:$PKG_CONFIG_PATH \
LIBRARY_PATH=$HOME/.local/lib:$LIBRARY_PATH \
CPATH=$HOME/.local/include:$CPATH \
pip install -v \
--force-reinstall --no-cache \
--no-binary muGrid --no-binary muFFT \
muGrid muFFT
Running simulations
bwUniCluster has extensive documentation that can be found here: https://www.bwhpc-c5.de/wiki/index.php/Category:BwUniCluster. Simulations are typically run as batch jobs. They have to be submitted through a batch or queueing system that takes care of assigning the actual hardware (compute node) to your job. A description of the queueing system can be found here. Please make sure you understand the concept of a partition described here.
Run simulations within a dedicated workspace. bwUniCluster provides a parallel file system for such workspaces. Create a workspace with
ws_allocate spectral_methods 30
where the number (30) is the lifetime of the workspace. This workspace will be deleted after 30 days. You can list all your workspaces with ws_list
and extend their lifetime with ws_extend
.
IMPORTANT: never ever run a simulation on the front/login node (i.e. without submitting a job using the sbatch
command, see below). Also do not run simulations in your home directory. Use a workspace. Your access to bwUniCluster could be revoked if you do this.
Job scripts
To run you job, you need to write a job script. The job script is executed by the bash
command which is the shell that you are using on Linux systems. (See the first tutorial here for an introduction to the shell.) The job script specifies how many CPUs you require for your job, how to set up the software environment and how to execute your job. An example job script (you can use this one almost as is) looks like this:
#!/bin/bash -x
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=40
#SBATCH --time=00:40:00
#SBATCH -J SpectralMethods
#SBATCH --mem=6gb
#SBATCH --export=ALL
#SBATCH --partition=multiple
module load compiler/gnu mpi/openmpi devel/python/3.12.3_gnu_13.3
source ${HOME}/venv/bin/activate
echo "Running on ${SLURM_JOB_NUM_NODES} nodes with ${SLURM_JOB_CPUS_PER_NODE} cores each."
echo "Each node has ${SLURM_MEM_PER_NODE} of memory allocated to this job."
time mpirun python3 my_python_program.py
Lines starting with a #
are ignored by the shell. Lines starting with #SBATCH
are ignored by the shell but are treated like command-line options to the sbatch
command that you need to submit your job. You can see all options but running man sbatch
.
The --nodes
option specifies how many nodes you want to use. --ntasks-per-node
specifies how many processors per node you need. The --time
option specifies how long your job is allowed to run. You job is terminated if it exceeds this time! The -J
option is simply the name of your job. It determines the names of output files generated by the batch system. Check the documentation linked above to understand the other options.
The time
command in front of mpirun
measures the execution time of the code and prints it to screen. Note that you do not tell mpirun
how many cores to use. The batch system passes this information on automatically.
Submitting jobs
Assume the filename of the above script is run.job
, you can submit this script with
sbatch run.job
Everything that is included via #SBATCH
in the script above can also be specified on the command line. For example, to change the number of cores to 320 you can issue the command:
sbatch --nodes=4 --ntasks-per-node=80 run.job
Your job may need to wait for resources and will not run immediately. You can inquire the status of your job with
squeue
Once it ran, you will find a file that start with job_
and has the extension .out
. This file contains the output of the simulation, i.e. the output that is normally written to screen when you run from the command line. Note that bash -x
(the first line in the job submission script above) instructs bash to print every command that is executed onto the screen. This makes debugging (of the job script, not your simulation code) easier.
squeue
shows the JOBID
of you job. You cancel a job by executing
scancel <JOBID>
More information on a certain job can be obtained by
scontrol show job <JOBID>