It is easy to accidentally waste resources on the HPC clusters. Make sure you read over the Top 10 Mistakes to Avoid on the Research Computing Clusters. Additional guidelines which promote effective usage are given below.
The more resources you request, the longer your job will spend in the queue waiting for the resources to become available. Try to specifiy your minimum requirements. Here are the key pieces:
- Number of CPU-cores
- Amount of time required to run the job
- Amount of memory (RAM) needed
- Number of GPUs (if any)
For parallel codes, one needs to carry out a scaling analysis as described on Choosing the Number of Nodes, CPU-cores and GPUs.
The longer the requested run time limit, the longer your queue time. Make you sure you choose an accurate value but include some extra time for safety since the job will be killed if it does not complete before the run time limit.
For most jobs the default memory per CPU-core will be a resonable choice (4 GB for all clusters besides Stellar where it is 8 GB). Requesting excess memory can cause your job to spend longer than necessary in the queue. Read more about allocating memory.
Use the Minimum Number of Nodes
You should make every effort to use all of the CPU-cores on a node before requesting an additional node. You should always specify the number of nodes in your Slurm script since specifying ntasks > 1 without specifying the number of nodes may cause your job to be split across nodes. This can prevent other jobs which require all the CPU-cores of a node from running. So use directives such as the following:
#SBATCH --nodes=1 #SBATCH --ntasks=16
How to Improve Your CPU Utilization?
Common reasons for low CPU efficiency include:
- Running a serial code using multiple CPU-cores. Make sure that your code is written to run in parallel before using multiple CPU-cores. Learn more about parallel computing.
- Using too many CPU-cores for parallel jobs. You can find the optimal number of CPU-cores by performing a scaling analysis.
- Writing job output to the /tigress or /projects storage systems. Actively running jobs should be writing output files to /scratch/gpfs/<YourNetID> which is a much faster filesystem. For more information see Data Storage.
- Using the MPICH library instead of an MPI library that was built for our clusters. Some software installed using "conda" is built against an MPI library that is not optimized for our systems. Run "conda list" after activating the environment and look for "mpich" to see if you are using this library.
- Using "mpirun" instead of "srun" for parallel codes. Please use "srun". Consult the documentation or write to the mailing list of the software that you are using for additional reasons for low CPU efficiency and for potential solutions.
The scheduler bases everything on highest priority jobs, so keeping this value as high as possible should be of concern.
Since many things go into this calculation, there are only a few things you can control. The most important is in your job submission requests where you specify cores, nodes, memory and time–this was addressed above.
To see what else influences your job priority number, please visit our job priority resource page.
Make sure that you use the local /scratch/gpfs filesystem with job input and output files. Do not use /tigress or /projects. See the Data Storage page for a complete discussion of this.
Number of Files
Our filesystems perform best for megabyte-size files that do not exceed hundreds of thousounds in number. Run the checkquota command to see the limits on the number of files you can store. If you have a large number of files then please use a command like tar to combine them into a single archive.