Changes to job submission policies

Beginning on Friday, September 27, 2013 there will be some changes made to the way you submit your jobs. We have implemented new policies that are intended to help reduce fragmentation of the cluster and decrease queue times for everyone. The main changes are:

 

Most jobs will run as they always have without any submission modifications. However, if you previously submitted partial node (cores < 12) jobs to the hb or lm queues, please read the following advice so that your jobs aren't rejected.

 

Changes for (previously) partial-node hb or lm jobs

If you would like to submit a job that would previously have been a partial node hb or lm job, there are two options:

Option 1) Submit the job to the sw queue instead of lm or hb. Use multiple cores if more memory is required. For example, ppn=2 on sw will provide a job with approximately the same memory as ppn=1 on lm.

Previous method:

msub -q lm -l nodes=1:ppn=1 myscript

New method:

msub -q sw -l nodes=1:ppn=2 myscript


Note: In this case, you will use two cores of sw instead of one core of lm. If you are concerned about how this will affect your Compute Canada accounting, please see Option 2).

Option 2) Modify your submission script to allow multiple tasks to run at once on the same node. Submit this multiple-task script as a single whole-node job to lm or hb.

Previous method:

#!/bin/bash
#PBS -l nodes=1:ppn=1
#PBS -l walltime=12:00:00
#PBS -o outputfile
#PBS -e errorfile
#PBS -V
#PBS -N jobname
 
cd /home/username/your_project_name
./your_code arg1

msub -q lm myscript

New method:

#!/bin/bash
#PBS -l nodes=1:ppn=12
#PBS -l walltime=12:00:00
#PBS -o outputfile
#PBS -e errorfile
#PBS -V
#PBS -N jobname
 
cd /home/username/your_project_name
./your_code arg1 > file1.out &
./your_code arg2 > file2.out&
./your_code arg3 > file3.out&
./your_code arg4 > file4.out&
./your_code arg5 > file5.out&
./your_code arg6 > file6.out&
./your_code arg7 > file7.out&
./your_code arg8 > file8.out&
./your_code arg9 > file9.out&
./your_code arg10 > file10.out&
./your_code arg11 > file11.out&
./your_code arg12 > file12.out&

wait 
msub -q lm myscript12

 Note: Using this method, you will use 12 cores on an lm node (i.e. the entire node). So, please make sure to pack your jobs as shown in the example above so that the additional cores are not wasted.

You may also want to use GNU Parallel:

module load gnu-parallel

 

New Error Messages

You may see some new error messages while submitting jobs, even if the same submission parameters worked previously.

For example, error messages indicating "TPN too low for class" indicate that your job is not using ppn=12 as a submission option. TPN stands for tasks per node. This error message also occurs if you use the 'procs' option on hb or lm instead of 'nodes=n:ppn=12'. In the case of 'procs', TPN will be shown as 0.

ERROR:    cannot submit job - TPN too low for class 'hb' (0 < 12)

Please This email address is being protected from spambots. You need JavaScript enabled to view it. if you have any questions or concerns about the new changes.