Skip to content

feat: add --output-dir, --skip-plots, and summary.tsv for pipeline integration#31

Merged
marcoreverenna merged 2 commits into
mainfrom
feat-batch-friendly-cli
Apr 12, 2026
Merged

feat: add --output-dir, --skip-plots, and summary.tsv for pipeline integration#31
marcoreverenna merged 2 commits into
mainfrom
feat-batch-friendly-cli

Conversation

@BioGeek

@BioGeek BioGeek commented Apr 10, 2026

Copy link
Copy Markdown
Collaborator

Summary

Add three features to make InstaNexus suitable for automated pipeline execution (e.g., Nextflow/INFlow):

1. --output-dir — Deterministic output paths

When set, all outputs go to the specified directory instead of the auto-generated folder-outputs/run_name/assembly_mode_params/ path. The auto-generated path remains the default for interactive use.

# Pipeline use — deterministic path
instanexus --input-csv psms.csv --output-dir assembly/ --assembly-mode dbg_weighted

# Interactive use — auto-generated path (unchanged)
instanexus --input-csv psms.csv --assembly-mode dbg_weighted

2. --skip-plots — Headless execution

Skip generating heatmap (seaborn) and logo (logomaker) SVG plots in the consensus step. These are slow and unnecessary for pipeline execution. Available on both the main instanexus CLI and the python -m instanexus.consensus module CLI.

3. summary.tsv — Stable summary file

Written to the experiment output directory at the end of the pipeline. Contains run metadata, output paths, and consensus statistics in a single TSV for easy downstream parsing by pipeline orchestrators.

Motivation

The INFlow pipeline (nf-core/denovoproteomics) wraps InstaNexus as separate Nextflow processes. It needs:

  • Deterministic output paths for Nextflow's file staging
  • Headless execution without matplotlib/seaborn plot generation
  • A stable summary artifact for downstream process consumption

Test plan

  • All 2 existing tests pass
  • instanexus --help shows new flags
  • Pre-commit hooks pass (ruff, codespell)

🤖 Generated with Claude Code

…ine use

Add three features for batch/pipeline integration (e.g., Nextflow):

1. --output-dir: Explicit output directory that overrides the auto-generated
   path (folder-outputs/run_name/params). When set, all outputs go to the
   specified directory with deterministic paths. The auto-generated path
   remains the default for interactive use.

2. --skip-plots: Skip generating heatmap and logo SVG plots in the consensus
   step. Plots use matplotlib/seaborn/logomaker which is slow and unnecessary
   for headless pipeline execution. Available on both `instanexus` (main CLI)
   and `python -m instanexus.consensus` (module CLI).

3. summary.tsv: A stable summary file written to the experiment output directory
   at the end of the pipeline. Contains run metadata, paths, and consensus
   statistics in a single TSV for easy downstream parsing.

All 2 existing tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@BioGeek BioGeek requested a review from marcoreverenna April 10, 2026 14:06
The V2 handler clean_winnow_rescored() already checks multiple column
name candidates, but main() only checked 'preds' and
'prediction_untokenised'. Winnow outputs use 'prediction', which was
missed. Unified the column detection to match the V2 candidates list.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@marcoreverenna marcoreverenna left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good! Thanks for contributing

@marcoreverenna marcoreverenna merged commit 8c551a6 into main Apr 12, 2026
2 checks passed
@marcoreverenna marcoreverenna deleted the feat-batch-friendly-cli branch April 12, 2026 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants