Compute Clusters

Table of Contents

General Information

At the moment the following compute clusters are available:

Cluster Nodes CPUs Clock Cache Memory spec. queue
WAP/ITP 43+2 2x6 Xeon E5-2640 2.5 GHz 16 MB 64GB  
WAP/STP 27+2 2x6 Xeon E5-2640 2.5 GHz 16 MB 64GB  
SFB 20+4 2x6 Xeon E5-2640 2.5 GHz 16 MB 64GB  
CRC 9 2x10 Xeon E5-2640 v4 2.4 GHz 25 MB 128GB crc
BuildMona 35 2x4 Opteron 2376 2.3 GHz 512 kB 16GB mona
For 877 7 2x4 Xeon E5430 2.7 GHz 12 MB 16 GB for877
ITP/grawp 31 2x2 Opteron 2218 2.6 GHz 1 MB 4 GB grawp
  • The WAP/ITP, WAP/STP and grawp compute clusters can be used by any regular ITP member.
  • The For877 compute cluster is preferably for use by members of the DFG Forschergruppe 877.
  • The BuildMona cluster is restricted to members of the BuildMona research school.
  • For a user account or scratch data space ask mailto:helpdesk@itp.uni-leipzig.de

See our performance monitoring page.

Usage

The compute cluster resources are managed by (SLURM) – the Simple Linux Utility for Resource Management queueing system. Please take a look on the SLURM cheat sheet for an overview.

Job Queueing System

Jobs can be submitted from the frontend servers

  • kreacher.physik.uni-leipzig.de
  • dobby.physik.uni-leipzig.de (CQT)
  • emmy.physik.uni-leipzig.de (STP)

Partitions

partition description
default Routing queue to batch
batch common execution queue, limited to 2 days
gpu jobs that use GPUs
mona execution on buildmona cluster
grawp execution on grawp
for877 execution queue for the former Forschergruppe 877
all execution queue for single jobs, may (will) run on different architectures

Common Commands

  • Resource Manager

    The top commands are sufficient for basic use of the compute clusters.

    sbatch submit a new job
    scancel delete a job
    sinfo show queue and job status

    Refer to the SLURM documentation for detailed information: SLURM online documentation

Job submission

Typically a job file is used to describe a job. These are simply shell scripts with slurm directives in the form of comments.

Scalar Jobs

To submit a simple scalar job, that can run for a day on grawp create a file scalartest.job

# execution queue
#SBATCH -p grawp
# one node:
#SBATCH -N 1
# maximum number of tasks per node:
#SBATCH -n 1
# estimated time (hh:mm:ss):
#SBATCH -t 24:00:00
# file for error messages
#SBATCH -e scalar.stderr
# file for normal output
#SBATCH -o scalar.stdout
cd /home/user/scalar
scalartest

and submit it using

sbatch scalartest.job

The line beginning with #SBATCH marks a SLURM directive, see man sbatch for reference on available options. Multiple options may be given per line. Here the following is done:

  • -p grawp selects nodes from that partition for the jobs
  • -N 1 or --nodes 1 selects a single node
  • -n 1 or --ntasks 1 selects a single task, wich defaults to a single CPUs
  • -t 24:00:00 or --time 1-00 allocates one day of runtime
  • -e <error file> and -o <output file> redirects the standard output streams to these files

Without any of those settings, the job would run as well (on the default partition though), with default resources (single CPU with 2GB of RAM for 3 hours, check scontrol show partitions)

This script executes the programm scalartest on one node of the queue grawp. In this case, resource allocation could also be skipped, as it mostly corresponds to the default resource allocations that any job gets.

Here, the job needs one node and one processor (per node) on grawp for 24 hours. It is important to note the change into the correct directory using the cd command. You can also use #SBATCH -D workdir to change into working directory with name workdir.

Parallel jobs

To start an MPI job the srun command is used. OpenMP jobs can be run over multiple processors on the same node. To submit such jobs you have to load the openmpi environment when compiling the job and when submitting the job module load openmpi/1.10.

#!/bin/bash
#SBATCH -p grawp
#SBATCH --ntasks=64
#SBATCH -t 24:00:00
#SBATCH -e scalar.stderr
#SBATCH -o scalar.stdout
cd /home/user/mpi
srun -n 64 ./mpitest

This jobs distributes a total of 64 tasks. The argument to srun (-n) is the (same) total number of processes. The actual program executed on the nodes is mpitest. Note that you can also demand the number of nodes and tasks per node explicitly, see the last example.

Long jobs

If you want to submit a job which needs more than 2 days you have to specify the QOS label long for that job. Just add

#SBATCH --qos=long

to your job file. But be careful, "long" jobs can only access a smaller part of the cluster. This is necessary since allowing "long" jobs to fill the complete cluster could prevent all other users to run jobs for weeks.

GPU jobs

To use GPU resources ask someone who already did it, please.

Useful slurm directives

-e standard error stream
-o standard output stream
-N number of nodes
-p partition of the job (queue)
-a specify an array of job using $SLURMARRAYTASKID index variable

Scratch data space

To avoid high load on the main frontend server and to improve I/O performance for users it is highly recommended to write simulation data to scratch space.

Additional to the common workstation scratch spaces, the following scratch spaces are available:

mount point size notes
/scratch/dobby 11TB RAID6 volume
/scratch/emmy 7TB RAID6 volume
/scratch/shodan 11TB RAID6 volume
/scratch/grawp 9TB RAID6

For i/o-intensive programs always use cluster-local storage (ie. scratch space should be near to computations in terms of distance in the network):

Development Tools

Hardware

  batch grawp for877 mona crc
Nodes 100 31 7 15 9
Throughput          
CPU 2x6 Intel E5-2640 0 @ 2.50GHz 2x2 Opteron 2218 @ 2.6GHz 2x4 Xeon E5430 @ 2.7GHz 2x4 Opteron 2376 @ 2.3GHz 2x10 Intel E5-2640 v4 @ 2.40GH
CPU Cache 2560kB per Core 1MB 12MB 512kB  
CPU arch Core i7 AVX AMD K8 Core2 AMD k10  
Memory 32 / 64 / 128 / 256 / 384 GB 4GB 16GB 16GB 128GB

Created: 2017-02-02 Thu 13:14

Emacs 24.4.1 (Org mode 8.2.10)

Validate