Skip to content

Increase picard_markdup maxtime from 30 min to 4 h#48

Open
fkollingiv wants to merge 1 commit into
masterfrom
fix/picard-markdup-maxtime
Open

Increase picard_markdup maxtime from 30 min to 4 h#48
fkollingiv wants to merge 1 commit into
masterfrom
fix/picard-markdup-maxtime

Conversation

@fkollingiv
Copy link
Copy Markdown
Contributor

Summary

The picard_markdup rule in Snakefile (line 293) has maxtime="30:00", which is insufficient for larger BAMs and causes silent pipeline hangs.

What we observed

On a 12-sample mouse run (paired-end, HISAT2, ~5–6 GB sorted BAMs), 3 of 12 samples hit the 30-minute wall and were killed mid-write while MarkDuplicates was still streaming records (last positions on chr19 etc., ~190M+ records written).

Because cluster_profile/config.yaml does not define a status script, the Snakemake driver did not detect the SLURM TIMEOUT states and never exercised restart-times: 5 — the pipeline silently hung at 56/63 (89%) steps done with no log activity until investigated manually.

Change

-    resources: cpus="2", maxtime="30:00", mem_mb="20gb",
+    resources: cpus="2", maxtime="4:00:00", mem_mb="20gb",

4 h is comfortably above what we've seen on real data (picard_markdup jobs that succeeded on this run took 19–28 min) while leaving headroom for larger libraries. Users with small datasets can still override via --set-resources if they want tighter scheduling.

Test plan

  • Re-ran the same dataset with maxtime="4:00:00" — the 3 previously-timed-out markdups complete in ~25–30 min
  • Confirm downstream picard_collectmetrics and multiqc still complete on the rerun
  • (Optional follow-up, separate PR) Add a status script to cluster_profile/ so SLURM TIMEOUT/FAILED states are surfaced to Snakemake and restart-times actually fires

🤖 Generated with Claude Code

The picard_markdup rule has maxtime="30:00", which is insufficient for
larger BAMs. In a 12-sample mouse run (paired, HISAT2, ~5-6 GB sorted
BAMs), 3 of 12 samples hit the 30-min wall and were killed mid-write
while MarkDuplicates was still streaming records.

Because cluster_profile/config.yaml does not define a status script,
the Snakemake driver did not detect the SLURM TIMEOUTs and never
exercised restart-times: 5 - the pipeline silently hung at 89% with no
log activity until manually investigated.

Bumping to 4:00:00 covers BAMs we've observed in practice (successful
runs took 19-28 min) with comfortable headroom; users can still tune
lower for small datasets via the resources override.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant