The queuing system
The queueing system currently installed on Hagrid is OpenPBS/Maui.
Useful commands
qsub | to submit a job |
qstat | to check the status of the queue |
qdel | to delete a job |
checkjob | what does the scheduler think about a job |
Less useful commands
qhold | to place a hold on a job |
qrls | to release a hold on a job |
qselect | to list all jobs meeting certain creteria |
qmove | to move a job to a different queue |
qorder | to change the ordering of two jobs in a queue |
qrerun | to terminate a job and return it to the queue |
qsig | to send a signal to a job |
Scalar (single) jobs
The qsub job files are simple shell scripts with PBS directives in form of shell comments.
#!/bin/bash
#PBS -l nodes=1:ppn=1
#PBS -q grawp
cd /home/user/scalar
scalartest
This script executes the programm scalartest on one node of the queue grawp. The line beginning with #PBS marks a PBS directive. Here, the job needs one node and one processor (per node). It is important to note the change in the correct directory using the cd command.
Parallel (MPI) jobs
To start an MPI job the mpiexec command is used.
#!/bin/bash
#PBS -l nodes=4:ppn=2
#PBS -q grawp
cd /home/user/mpi
mpiexec -n 8 ./mpitest
This jobs takes four nodes with two processor per node, i.e. a total of eight cpus. The argument to mpiexec (-n) is the total number of processes (4*2). The actual program exectuted on the nodes is mpitest.
For an odd number of nine cpus the corresponding PBS directive is
#PBS -l nodes=4:ppn=2+1:ppn=1:grawp
Note: this directive allocates 2 processors on 4 nodes and 1 processor on 1 node. The directive grawp after ppn=1 is necessary, otherwise torque/maui allocates nodes of any queue-type. To change the node-type of the first set of processors, in this case 4:ppn=2 use the -q directive.
Actually torque/maui has the freedom to pack the nodes, i.e. it might allocate 2 nodes with 4 processors and 1 node with 1 processor.
Useful PBS directives
-l | resources (cpu time, memory, ...) required by the job |
-e | standard error stream |
-o | standard output stream |
-N | name of the job |
-p | priority of the job |
-q | destination of the job (queue) |
-S | shell that exectutes the job script |
Queues
PBS can control different queues which differe on criteria like the wall clock time etc.
In order to have a fair share of the resources (cpu time) consummed by the users the Maui sheduler reorders the execution sequence.
To get an overview of the available queues the qstat command can be used
qstat -Q -f
qstat -q
Jobs can be submitted (-q directive) to the following queues:
queue name | default CPU Time | default no. of CPUs | machine size |
hagrid | 12h | 1 | 32 bit |
hermione | 24h | 1 | 64 bit |
grawp | 2h | 1 | 64 bit |
Obtaining an account
For technical questions or an account on grawp ask the system administrators.