Skip to content

Commit 5c26c0e

Browse files
Update partition time limits to reflect new DefaultTime settings
Jobs without --time now get 4h (general) or 12h (GPU) instead of inheriting the 3-day max. Added guidance for users seeing timeouts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 02a69a0 commit 5c26c0e

1 file changed

Lines changed: 33 additions & 14 deletions

File tree

docs/pain-points.md

Lines changed: 33 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -218,21 +218,40 @@ squeue -j $SLURM_JOB_ID -h -o "%L"
218218
219219
| Partition | Max wall time | Default wall time | Nodes | Access | Notes |
220220
|---|---|---|---|---|---|
221-
| `normal` | 3 days | not set | compute01–04, 06–07, 14 | All accounts | Default partition |
221+
| `normal` | 3 days | **4 hours** | compute01–04, 06–07, 14 | All accounts | Default partition |
222222
| `interactive` | 1 day | 8 hours | compute03–04, 06–07 | All accounts | Max 3 jobs/user |
223-
| `rna` | 3 days | not set | compute07–09, 15–20 | `rbi` | Falls back to `normal` |
224-
| `jones` | 3 days | not set | compute04–05, 10–12 | `jones` | |
225-
| `genome` | 3 days | not set | compute06–09 | `genome` | Falls back to `normal` |
226-
| `gpu` | 3 days | not set | compgpu01, 03 | `gpu_rbi` | 8× NVIDIA A30 |
227-
| `scb_gpu` | 3 days | not set | compgpu02 | `gpu_scb` | 4× NVIDIA A30 |
228-
| `scb` | 3 days | not set | compute13 | `scb` | |
229-
| `cranio` | 3 days | not set | compute21 | `scb` | Falls back to `normal` |
230-
| `bigmem` | 3 days | not set | compute14 | `bigmem` | ~1.5 TB RAM |
231-
| `rstudio` | 3 days | not set | compute00 | `bigmem` | Interactive RStudio |
232-
| `voila` | 3 days | not set | compute00 | `bigmem` | Voilà notebooks |
233-
234-
!!! warning "No default wall time is set"
235-
If you omit `--time`, your job inherits the partition's `MaxTime` (3 days). **Always specify `--time`** — shorter jobs schedule faster via backfill, and you avoid tying up resources longer than needed.
223+
| `rna` | 3 days | **4 hours** | compute07–09, 15–20 | `rbi` | Falls back to `normal` |
224+
| `jones` | 3 days | **4 hours** | compute04–05, 10–12 | `jones` | |
225+
| `genome` | 3 days | **4 hours** | compute06–09 | `genome` | Falls back to `normal` |
226+
| `gpu` | 3 days | **12 hours** | compgpu01, 03 | `gpu_rbi` | 8× NVIDIA A30 |
227+
| `scb_gpu` | 3 days | **12 hours** | compgpu02 | `gpu_scb` | 4× NVIDIA A30 |
228+
| `scb` | 3 days | **4 hours** | compute13 | `scb` | |
229+
| `cranio` | 3 days | **4 hours** | compute21 | `scb` | Falls back to `normal` |
230+
| `bigmem` | 3 days | **4 hours** | compute14 | `bigmem` | ~1.5 TB RAM |
231+
| `rstudio` | 3 days | **8 hours** | compute00 | `bigmem` | Interactive RStudio |
232+
| `voila` | 3 days | **4 hours** | compute00 | `bigmem` | Voilà notebooks |
233+
234+
!!! warning "Default wall time changed — jobs may time out"
235+
If you omit `--time`, your job now gets **4 hours** (general partitions) or **12 hours** (GPU partitions). Previously, jobs without `--time` silently inherited the 3-day maximum.
236+
237+
**If your jobs are timing out**, add `--time` with a realistic estimate:
238+
239+
```bash
240+
#SBATCH --time=8:00:00 # 8 hours
241+
#SBATCH --time=1-00:00:00 # 1 day
242+
```
243+
244+
For jobs that need more than 3 days, use the `long` QoS (up to 7 days):
245+
246+
```bash
247+
#SBATCH --qos=long
248+
#SBATCH --time=5-00:00:00 # 5 days
249+
```
250+
251+
**Why the change?** Shorter default times dramatically improve scheduling. SLURM's backfill scheduler can only fit jobs into gaps if it knows when running jobs will end. A job with no `--time` previously looked like a 3-day job to the scheduler — even if it finished in 20 minutes — blocking other jobs from backfilling into the gap.
252+
253+
!!! tip "Right-size your `--time` requests"
254+
Request about 20–30% more than your expected runtime. Use `seff <jobid>` to check how long past jobs actually took. Shorter time requests schedule faster via backfill.
236255
237256
!!! note "Check current limits"
238257
Partition limits can change. Verify the current limits with:

0 commit comments

Comments
 (0)