Running jobs on Compute Canada
A good overview of the way we most often use Compute Canada is in the Parallel Computing in R with WestGrid presentation by Bhagya Karunarathna. (Note: WestGrid is the western-Canada division of Compute Canada.) Bhagya mentions, “batch scripts”, “schedulers” and “R scripts”. Links to more information on these topics is given below.
Batch scripts and schedulers
- See the introduction to schedulers for background on schedulers and compute clusters. The name of the scheduler used by Compute Canada is slurm.
- See the running jobs documentation for basic information on running jobs on Compute Canada.
- See the array jobs documentation to see how to run the same job multiple times, possibly with different inputs, random seeds, etc. We often use array jobs to run Monte Carlo simulations; e.g., for importance sampling or to get permutation/bootstrap distributions.
Running R in batch mode
- For jobs that run R, the last line of the slurm batch script will
be of the form
Rscript <myscript.R>
orR CMD BATCH <myscript.R>
where<myscript.R>
is your R script. - For reproducibility, please get in the habit of setting the seed and printing the session information in each R script. That is, all of your R scripts should start with the line
set.seed(N)
for some numberN
, and should end withsessionInfo()
.