- Once a job has started to run, use squeue -j <jobid> to determine the nodes allocated to the job.
- ssh into each of these nodes using the ssh <node_name> command.
- Use top -u <username> to see the memory used by each of your processes on that node (see the RES column).
- If your job is multi-threaded, use top -H (capital h) to see information on each thread.
- While logged in to each compute node, check the amount of disk space you are using on that node's /scratch disk.
Read this post about tuning your memory requirements and understanding memory error messages.