Scheduler Policies

Print

 
* Compute Canada documentation: https://docs.computecanada.ca/wiki/Job_scheduling_policies

On any cluster, the scheduler has policies and limits imposed on accounts and users in order to use the available resources efficiently and fairly. This is implemented in a fairshare policy.  The limits also vary according to allocations. These limits are the fairshare targets for each group. Fairshare is based on the number of processors allocated to a PI and are assigned to their group's account in the scheduler's configuration. This is used to determine resources available for all the users under that particular account.

MOAB determines when and to which node  jobs are sent for processing. It monitors the entire job queue,  and assigns a priority to each job. The jobs are prioritized based on requested versus available resources and current usage versus fairshare target. It updates the status of the queue every 2 minutes. Parameters and policy settings can be tuned to efficiently handle a wide range of system workloads

The Limits imposed by the Scheduler

This refers to limits on the number of jobs in the queue which are enforced by the scheduler. The largest factors in determining limits  in numbers of jobs are the Maximum Processor Seconds (MAXPS) and the Maximum Processors (MaxPROC) for each account. The MAXPS is the total number of processor core seconds (ps) allocated for each group account. It is based on fairshare value times 2,592,000 seconds, which corresponds to 30 days (24/7). The MAXPS window on Guillimin is 30 days.

Explaining by an example, an account with a fairshare value of five processor cores has a MAXPS of 12,960,000 ps (5 proc * 2,592,000 sec). Such an account will exhaust the MAXPS window if 15 10-day single-core jobs are started. Any job sent to the queue after this will have the message "blocked due to MAXPS limit exceeded". Such jobs will have to wait until the outstanding scheduled processor seconds are less than the MAXPS value.

In fact, once jobs are started, the maximum ps in use decreases as the remaining time for the job to finish shortens. Finally, other jobs combinations can be 10 15-days jobs, 75 2-days jobs, etc. The usual scenario is to have a mixture of short and long jobs.

Users can submit as many jobs but they cannot be scheduled to run if their groups MAXPS value is exceeded. They instead enter into a "HOLD" state. If the MAXPS of the group is not reached but the resources are not available, the jobs enter into "IDLE" state and will run once the requested resources become available.

 Signifcant benefits of MAXPS

  • Very long jobs can run for up to 14 days on the cluster so accounts are allowed to run jobs that allow them to use their fairshare.
  • Allows accounts to have a burst (short duration) usage onto a number of processor cores larger than their fairshare allocation but does not have negative impact on the ability of users to access their own allocations. The shorter the job walltime, the greater the burst ability is available.
  • The MAXPS encourages all users to set reasonably accurate Job Wall Clock times. If a user requests 14 days when they only need a day, their usage will be limited by MAXPS. The scheduler cannot know if a job is to complete early, so it must have scheduled time based on the Job Wall Clock request. Users then must have an idea of how long their jobs can take in order to optimize usage of their allocated window. Setting the job's wallclock time too short might lead to jobs getting killed when they run out of wallclock time.  A good rule of thumb is to calculate your time and add 20% to create a workable walltime.  If one finds that your job is running out of time, one can request that time be added. However, please note, the time added plus the original submitted walltime can not be over 30 days.

Other Limits

  • The maximum number of running jobs by any single user is set to 1024. Individual user limits in an account is set and can be modified anytime upon request through the ticketing system.
  • The maximum number of jobs in the active idle queue is set to 100.

Limits can always be adjusted for work with deadlines that need resources beyond the current maximum assigned.  If this is the case, the PI or user should contact the Guillimin team and request a reservation.