All users of the Research Computing clusters are expected to properly utilize the resources that their jobs allocate. There are three instances of underutilization that can result in an account suspension.
In November of 2023, the Research Computing Advisory Group (RCAG) put in place the following policies regarding cluster utilization:
- Zero GPU Utilization: Users that run GPU jobs with with zero GPU utilization will have their jobs killed after 2 hours. Two warning emails will be sent before each job is killed.
- Overallocation of CPU Memory: Users that overallocate CPU memory will have then accounts suspended.
- Running Serial Jobs Using Multiple CPU-cores: A serial job can only using 1 CPU-core. When additional cores are allocated, they are wasted. Users that waste CPU time in this manner face account suspension.
Email Alerts and Account Suspension
For each of the three types of underutilization described above, the first instance will result in a warning email to the user. The second instance will result in a warning email to the user and their sponsor. At that time, the user and sponsor will have 7 days to resolve the issue. If no action is taken after the second warning and the underutilization continues, or if there is another instance of underutilization at a later time, then the user will be prevented from running additional jobs for a period of at least 7 days. Users are encouraged to work with Research Computing to resolve the underutilization issue.
An account can be reactivated after the suspension period has passed and the user can demonstrate that the underutilization issue has been solved.
Weekends and Holidays
RCAG has decided that users can receive email alerts about significant instances of resource underutilization (e.g., zero GPU utilization) on weekends and holidays.