Parallel Matlab on Guillimin

Print

Introduction

IMPORTANT: the license has expired on June 30, 2017, and will not be renewed. Any job submitted to Guillimin with the below procedure will fail.

Jobs using the Matlab Distributed Computing Server (MDCS) on Guillimin require 2 Matlab licenses and 2 Matlab installations: The MDCS installation/license on Guillimin (provided by us), and a licensed installation of Matlab with the Parallel Computing Toolbox on the user's desktop computer (provided by you or your institution). Matlab installations activated through McGill's Software Centre include the parallel computing toolbox. If you have installed Matlab from another source, please ensure that you have the required prerequisites. MDCS allows you to use any toolbox licenses provided they are also licensed on your desktop installation of Matlab.

IMPORTANT: Parallel Matlab on Guillimin works with versions 2012a, 2012b, 2013a, 2013b, 2014a, 2014b, 2015aSP1, 2015b and 2016b. Other version are not supported. Please, upgrade your local installation if necessary.

IMPORTANT: Please, do not try to directly launch the MDCS Matlab binaries on Guillimin - it will give you a license error. The MDCS matlab installation on Guillimin is not a standard Matlab installation, like the one on your desktop. Its role is only to "transfer" the job from your desktop/laptop, and to start it on multiple nodes of the cluster.

Setting up your parallel Matlab environment

The MDCS system accepts batch job submissions from a user's desktop computer to be scheduled and run through the Guillimin scheduler. Before submitting a job, the Matlab installed on the user's computer must be configured for submission to Guillimin.

Please, follow these steps to correctly setup your parallel matlab environment:

  1. Download our configuration archive, and unpack it on your local machine ($ tar -xvf guillimin_mdcs_config.tar.gz). It contains "config" and "examples" folders. Put the "examples" somewhere in your project directory on your local machine - you will use these simple examples as test runs and as a reference.
  2. The "config" folder contains all necessary configuration files. First, copy all "config/toolbox-local/*" files to the
    "<your_matlab_install>/toolbox/local" folder on your local machine.
  3. Create a profile for the Guillimin cluster in Matlab
    • Restart Matlab
    • At the Matlab command prompt, run: glmnConfigCluster
      • Warning: This command will delete any previous cluster profiles named guillimin
    • Matlab will prompt you for some information about your computer. Please be careful as inputting incorrect information can make your profile unuseable. If you make an error, your cluster profile can be reset by re-running glmnConfigCluster. Please enter:
      • A unique identifier for your local computer (for example, the hostname, or a description like 'lab7' or 'laptop'). Do not use spaces.
      • Your home directory on your local computer (for example, /home/alex on Linux, /Users/alex on Mac, or C:\Users\alex on Windows)
      • Your home directory on Guillimin (for example, /home/alex). You may also specify a folder in your project space or any folder that you have write access to.
      • You may be asked to specify the Matlab path on Guillimin if our script hasn't been configured for your Matlab release. The versions that we support are as follows:
        • 2012a: /software/applications/matlab-2012a-para
        • 2012b: /software/applications/matlab-2012b-para
        • 2013a: /software/applications/matlab-2013a-para
        • 2013b: /software/CentOS-6/applications/matlab-2013b-para
        • 2014a: /software/CentOS-6/applications/matlab-2014a-para
        • 2014b: /software/CentOS-6/applications/matlab-2014b-para
        • 2015aSP1: /software/CentOS-6/applications/matlab-2015a-para
        • 2015b: /software/CentOS-6/applications/matlab-2015b-para
        • 2016b: /software/CentOS-6/applications/matlab-mdcs-2016b
    • You should now have a cluster profile called 'guillimin' in your Matlab 'manage cluster profiles' menu.
    • Log in to Guillimin using ssh and create your Matlab job folder (your details will be different from this example)
      	    mkdir -p /home/username/.matlab/jobs/myLaptop/guillimin/R2014b
      	    
  4. This is the end of configuration procedure. We advise you to restart Matlab at this point.

Validation

Important: Note that you must have a valid glmnPBS.m file in your working directory during validation. The examples/TestParfor/glmnPBS.m is an example of a valid glmnPBS.m file. For more information, please read the next section.

If you use the automatic validation of the Guillimin cluster profile, please expect the final test (MATLAB pool test) to fail. This functionality is not supported on our system. We recommend performing a manual validation for distributed and parallel batch jobs instead of the automatic validation. Please This email address is being protected from spambots. You need JavaScript enabled to view it. if you experience any problems with the validation tests.

  • Distributed job
setSchedulerMessageHandler(@disp)
cluster = parcluster('guillimin');
cluster.NumWorkers = 3;
job = createJob (cluster);
createTask(job, @sum, 1, {[1 1]});
submit(job);
wait(job);
out = fetchOutputs(job)
  • Parallel job
setSchedulerMessageHandler(@disp)
cluster = parcluster('guillimin');
cluster.NumWorkers = 3;
job = createCommunicatingJob(cluster, 'Type', 'spmd') ;
createTask(job, @labindex, 1, {});
submit(job);
wait(job);
out = fetchOutputs(job)

Submitting your parallel Matlab jobs

The jobs are submitted from within the Matlab session on your local machine using the glmnPBS.submitTo(cluster) command after configuring the glmnPBS.m file.

IMPORTANT: You must always have "glmnPBS.m" file in the same directory as the script you are submitting for execution (see examples). In this file you manually set the mandatory submission parameters for your job. The following is an example of the properties definitions at the top of the glmnPBS.m file.

classdef glmnPBS
    %Guillimin PBS submission arguments
    properties
        % Local script, remote working directory (home, by default)
        localScript = 'TestParfor';
        workingDirectory = '.';

        % nodes, ppn, gpus, phis and other attributes
        numberOfNodes = 1;
        procsPerNode = 6;
        gpus = 0;
        phis = 0;
        attributes = '';

        % Specify the memory per process required
        pmem = '1700m'

        % Requested walltime
        walltime = '00:30:00'

        % Please use metaq unless you require a specific node type
        queue = 'metaq'

        % All jobs should specify an account or RAPid:
        % e.g.
        % account = 'xyz-123-aa'
        account = '';

        % You may use otherOptions to append a string to the qsub command
        % e.g.
        % otherOptions = '-M email[at]address.com -m bae'
        otherOptions = ''
    end

Before submitting your job, it is recommended that you review your current submission profile using the getSubmitArgs() function

 
  test = glmnPBS();
  test.getSubmitArgs()
  • Also, be aware that in the MDCS model the master process is not used for matlabpool procedures (parfor, spmd). Therefore, N cores are effectively reserved for the job on Guillimin, but only N-1 Matlab workers will execute the parallel code. In the previous example, there would be 5 workers in the matlabpool.

After reviewing your submission arguments, you may submit your job

  cluster = parcluster('guillimin');
  glmnPBS.submitTo(cluster);

The glmnPBS.submitTo(cluster) function will submit the localScript script to Guillimin inside a job with properties defined in glmnPBS.m. This function is simply a wrapper for Matlab's batch() function (you may wish to customize it yourself).

    methods(Static)
        function job = submitTo(cluster)
            opt = glmnPBS();
            job = batch(cluster,    opt.localScript,     ...
                'matlabpool',       opt.getNbWorkers(),  ...
                'CurrentDirectory', opt.workingDirectory ...
                );
        end
    end

During the submission process you will be asked for the username for Guillimin cluster. There will also be a pop-up window asking if you want to use "identity file" for cluster connection. You may select "Yes" and select a valid OpenSSH private key file, or you may select "No" and you will be asked for your password for Guillimin.

You can check the status of your submitted job from Matlab GUI: Parallel --> Monitor Jobs. However, logging in to Guillimin and using our job management commands is a more direct way of checking.

HINT: You can not monitor the progress of your calculations from Matlab GUI, as all processes on remote nodes are "sealed" there, and no STDOUT from workers is possible. The only way to follow the progress of your computations is to make periodic outputs to a text file with fprintf statement (see examples). Then you can login directly to Guillimin and check the content of that file.

IMPORTANT: It is possible to transfer data files and additional script files (or whole directories) from your local machine to the cluster during job submission. It is done via special options to batch command (see the TestSMPD example and batch help). However, please do not do that in case of large data files. Instead, please copy your data to Guillimin separately before calculations. In the same way, please save large outputs on Guillimin filesystem, and then transfer them to your local machine.