V2 by ch99l · Pull Request #25 · GoekeLab/bambu-singlecell-spatial

ch99l · 2026-04-15T06:25:19Z

Major Update of Bambu-Single Cell Pipeline

Summary of Changes Made:

Support for new single cell and spatial 10x kits
Refactor main.nf script (move process-specific code to modules/subworkflows)
Create assets directory (10x_config) to store static configuration files
Update nextflow.config file
Include new processes in pipeline to improve fastq processing (chopper and cutadapt)
Fix Nextflow parallel processing logic
Update README.md file

…m/rds

…bugs

…Data

ClareRobin · 2026-04-15T09:53:35Z

Code review (From Claude /code-review plugin using our best practices instruction md file)

Found 5 issues:

All containers use :latest tags — reproducibility is not guaranteed (CLAUDE.md says "Always pin to an explicit version — never use latest")

Affects every module: bambu-pipe-r:latest, bambu-pipe-preprocess:latest, bambu-pipe-alignment:latest. Replace with pinned digest or immutable tag, e.g. via Seqera/Wave for CLI tools and a digest for GHCR images.

bambu-singlecell-spatial/modules/bambu.nf

Lines 2 to 6 in 4010340

    
           publishDir "$params.output_dir", mode: 'copy', pattern: '*extended_annotations.gtf' 
        
           container "ghcr.io/ch99l/bambu-pipe-r:latest" 
        
           label "medium_cpu" 
        
           label "high_mem" 
        
           label "medium"

Processes are defined inside subworkflow files rather than imported from separate module files (CLAUDE.md says "Do not define processes inside subworkflows — import from modules only")

subworkflows/alignment.nf defines MINIMAP_BUILD_INDEX, PAFTOOLS_GFF2BED, and MINIMAP_ALIGNMENT inline. subworkflows/prepare_input_standard.nf defines EXTRACT_10X_BARCODES and EXTRACT_10X_SPATIAL_COORDINATES inline. Each should live in its own modules/<tool>/main.nf.

bambu-singlecell-spatial/subworkflows/alignment.nf

Lines 1 to 21 in 4010340

    
           process MINIMAP_BUILD_INDEX{ 
        
               container "ghcr.io/ch99l/bambu-pipe-alignment:latest" 
        
               label "low_cpu" 
        
               label "medium_mem" 
        
               label "short" 
        
               input:  
        
               path(genome) 
        
               val(fastq_count) 
        
               when: fastq_count > 0 // only build index if there are fastq samples to process 
        
               output:  
        
               path('ref.mmi') 
        
               script: 
        
               """  
        
               minimap2 -k15 -w5 -d ref.mmi $genome # -k and -w flags are used for both splice:hq and splice presets 
        
               """ 
        
           }

No process emits versions.yml (CLAUDE.md says "Every process must emit versions.yml alongside its real outputs, using explicit emit: names")

None of the new or modified modules (BAMBU, BAMBU_EM, BAMBU_CONSTRUCT_READ_CLASS, BAMBU_PREPARE_ANNOTATION, PREPROCESS_FASTQ, SEURAT_CLUSTERING, MINIMAP_ALIGNMENT, etc.) emit a versions.yml file. This blocks version tracking and aggregation.

bambu-singlecell-spatial/modules/bambu.nf

Lines 14 to 18 in 4010340

    
           output: 
        
              path ('*quantData.rds'), emit: quant_data 
        
           path ('*extended_annotations.rds'), emit: extended_annotations 
        
              path ('*extended_annotations.gtf'), emit: extended_annotations_gtf

timeline, report, trace, and dag are not enabled in nextflow.config (CLAUDE.md says "nextflow.config must enable timeline, report, trace, dag")

The new nextflow.config adds params, process, and profiles blocks but omits the four reporting directives entirely.

bambu-singlecell-spatial/nextflow.config

Lines 1 to 89 in 4010340

    
           params { 
        
               // Mandatory input 
        
               input                   = null                          // Path to samplesheet .csv file 
        
               genome                  = null                          // Path to .fa or .fasta file 
        
               annotation              = null                          // Path to .gtf or .gff file 
        
               // Optional: Output directory 
        
               output_dir              = "output"                      // Path to output directory 
        
               /* 
        
               Optional: Samplesheet settings (Non Visium HD samples only) 
        
               Note: Use this if all samples share the same chemistry/technology 
        
               */  
        
               chemistry               = null                          // Examples: "10x3v2", "10x3v3", "10x5v2", "visium-v1"  
        
               technology              = null                          // Options: "ONT", "PacBio"  
        
               // Optional: Early termination  
        
               early_stop_stage        = null                          // Options: "rds", "bam"  
        
               // Optional: Q-score filtering 
        
               qscore_filtering        = true                          // boolean   
        
               // Optional: Bambu parameters 
        
               ndr                     = null                          // null or float  
        
               deduplicate_umis        = true                          // boolean 
        
               // Optional: Quantification mode 
        
               quantification_mode     = "EM_clusters"                 // Options: "no_quant", "EM", "EM_clusters" 
        
               // Optional: Seurat clustering 
        
               resolution              = 0.8                           // float 
        
               // Development parameters (DO NOT EDIT) 
        
               bambu_path                  = null 
        
               valid_chemistries           = ['10x3v2', '10x3v3', '10x3v4', '10x5v2', '10x5v3', 'visium-v1', 'visium-v2', 'visium-v3', 'visium-v4', 'visium-v5'] 
        
               valid_technologies          = ['ONT', 'PacBio'] 
        
               valid_quantification_modes  = ['no_quant', 'EM', 'EM_clusters'] 
        
               valid_early_stop_stages     = ['rds', 'bam', null] 
        
               save_intermediates          = false 
        
               qfilter_threshold           = 10 
        
               flexiplex_f_5prime          = 8 
        
               flexiplex_f_3prime          = 13 
        
               flexiplex_e                 = 1 
        
               process_by_chromosome       = true 
        
               fusion_mode                 = false 
        
               jaffal_ref_dir              = null 
        
               jaffal_code_dir             = "$projectDir/jaffal"           
        
               cellranger_dir              = "/opt/spaceranger-4.0.1/lib/python/cellranger/barcodes" 
        
           } 
        
           process { 
        
               // Retry strategy: up to 3 attempts, doubling memory and adding CPUs each retry 
        
               maxRetries = 3 
        
               errorStrategy = { task.exitStatus in [130, 137, 139, 143] ? 'retry' : 'finish' } 
        
               // CPU Labels (scale up with retries) 
        
               withLabel: 'low_cpu'        { cpus = { 4  * (1 + task.attempt) }  } 
        
               withLabel: 'medium_cpu'     { cpus = { 16 * task.attempt }        } 
        
               withLabel: 'high_cpu'       { cpus = { Math.min(32 * task.attempt, 128) } } 
        
               // Memory Labels (double each retry) 
        
               withLabel: 'low_mem'        { memory = { 16.GB  * Math.pow(2, task.attempt - 1) } } 
        
               withLabel: 'medium_mem'     { memory = { 64.GB  * Math.pow(2, task.attempt - 1) } } 
        
               withLabel: 'high_mem'       { memory = { 128.GB * Math.pow(2, task.attempt - 1) } } 
        
               // Time Labels (increase with retries) 
        
               withLabel: 'short'          { time = { 1.h  * task.attempt } } 
        
               withLabel: 'medium'         { time = { 4.h  * task.attempt } } 
        
               withLabel: 'long'           { time = { 12.h * task.attempt } } 
        
               // CPU, Memory, and Time allocation for Bambu (Modify if necessary, especially when using pipeline for multiple samples) 
        
               withName: 'BAMBU' { 
        
                   cpus    = { 16 * task.attempt } 
        
                   memory  = { 128.GB * Math.pow(2, task.attempt - 1) } 
        
                   time    = { 4.h * task.attempt } 
        
               } 
        
               withName: 'BAMBU_EM' { 
        
                   cpus    = { 16 * task.attempt } 
        
                   memory  = { 128.GB * Math.pow(2, task.attempt - 1) } 
        
                   time    = { 12.h * task.attempt }        // increase time allocation if params.quantification_mode = EM 
        
               } 
        
           } 
        
           profiles { 
        
               // Container profiles 
        
           	singularity { 
        
                   singularity.enabled           = true 
        
                   singularity.autoMounts        = true

NDR receives the literal string "NULL" instead of R's NULL when params.ndr is unset (bug)

In main.nf, def ndr = params.ndr ?: 'NULL' produces the Groovy string "NULL". When interpolated into the Rscript block as NDR = $ndr, R sees NDR = NULL — which is actually valid R NULL only because the bare word NULL is an R keyword, so this works correctly. However the R Dockerfile installs bambu from the branch reference GoekeLab/bambu@devel_pre_v4 rather than a pinned commit or tag, so the installed version is not reproducible. CLAUDE.md says "Pin... all package versions".

bambu-singlecell-spatial/containers/r/Dockerfile

Lines 8 to 10 in 4010340

    
           # install Seurat Object (v5.3.0), Seurat (v5.4.0), and Bambu 
        
           RUN R -e "install.packages(c('pak', 'devtools', 'BiocManager'), repos='https://cloud.r-project.org')" 
        
           RUN R -e "pak::pkg_install(c('SeuratObject@5.3.0', 'Seurat@5.4.0', 'GoekeLab/bambu@devel_pre_v4'))"

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

ClareRobin

My (human) review: (summary)

Biggest issue:

example data (currently as is) can't easily be run (also all the samplesheets currently in examples are out of date) - it looks like the reads_chr9_1_1000000.fastq.gz files are 10x5v2, to be able to run the pipeline I had to:

make a new samplesheet
fix resource limits in nextflow config for profile local
fix the staging clash
fix sampleData (spatial) parameter in process BAMBU {}

Other (more minor issues):

Various READMe comments (inconsistencies between README & actual pipline).

ClareRobin · 2026-04-16T01:56:46Z

Clare (human) comments :)
suggested to have a more comprehensive gitignore e.g.

.nextflow.log* work/ results/ *.command.* *.pyc .DS_Store .vscode/```

ClareRobin · 2026-04-16T02:00:08Z

 ``` 
-nextflow run GoekeLab/bambu-singlecell-spatial \
-  --reads $PWD/examples/samplesheet_basic.csv \   # See the arguments section for format specifications
+nextflow run $PWD/bambu-singlecell-spatial \


Clare (human) comment:
This example command in the README has contradictory $PWD references — $PWD/bambu-singlecell-spatial assumes you're in the parent directory, but $PWD/examples/... assumes you're inside the repo.

should either be:
nextflow run $PWD/bambu-singlecell-spatial \ --input $PWD/bambu-singlecell-spatial/examples/samplesheet.csv \ --genome $PWD/bambu-singlecell-spatial/examples/Homo_sapiens.GRCh38.dna_sm.primary_assembly_chr9_1_1000000.fa \ --annotation $PWD/bambu-singlecell-spatial/examples/Homo_sapiens.GRCh38.91_chr9_1_1000000.gtf \ -profile singularity,hpc

or

nextflow run . \ --input $PWD/examples/samplesheet.csv \ --genome $PWD/examples/Homo_sapiens.GRCh38.dna_sm.primary_assembly_chr9_1_1000000.fa \ --annotation $PWD/examples/Homo_sapiens.GRCh38.91_chr9_1_1000000.gtf \ -profile singularity,hpc

(or run main.nf also works)

yup I think run main.nf is the safest to go?

ClareRobin · 2026-04-16T03:53:01Z

Clare (human) comment:
Note - all the samplesheets currently in examples/ have incorrect headers for how the pipeline runs now (so need to be updated)

Another note: the sample sheet that Clare generated for review uses the example data in the repo, where it only contains 10x 5' v2 ONT reads. Both files are actually also identical, which means that Bambu is most likely going to generate the same read class files for both.

Also, this makes it hard to validate if the pipeline still truly functions as intended when using other chemistries/technologies. Maybe an extra 1 to 2 samples can be sourced and included in the sample sheet to reflect the functionality?

ClareRobin · 2026-04-16T05:31:13Z

Github Copilot Code Review (using our nextflow best practices github instruction md file)

Findings:

High: barcode extraction is hard-wired to a path that does not exist in the declared spaceranger container, so the pipeline dies before any real sample processing starts. The failure is in subworkflows/prepare_input_standard.nf and subworkflows/prepare_input_standard.nf, with the path coming from nextflow.config. I reproduced this with a stub-run: EXTRACT_10X_BARCODES failed trying to copy 737K-august-2016.txt from /opt/spaceranger-4.0.1/lib/python/cellranger/barcodes, which is not present in that container. As written, this blocks all chemistries.
High: multi-sample non-visium runs will hit a Nextflow input file collision in BAMBU. For non-visium chemistries, subworkflows/prepare_input_standard.nf creates the same placeholder spatial file name per chemistry, then attaches that same path to every sample in subworkflows/prepare_input_standard.nf. Later, main.nf collects all of those paths into one list and passes them into modules/bambu.nf. Nextflow rejects duplicate file names within a single path-list input, so two 10x3v2 samples will fail before BAMBU starts.
Medium: the branch’s own documented and shipped example inputs no longer match the parser contract, so the advertised smoke-test path is broken. The parser now requires a samplesheet with a path column in subworkflows/prepare_input_standard.nf, but the bundled examples still use fastq or bam columns in examples/samplesheet_basic.csv, examples/samplesheet_bam_example.csv, and examples/samplesheet_custom_example.csv. README also tells users to run a non-existent sample sheet in README.md and points to a non-existent template in README.md. I verified that a stub-run against the bundled example data now fails immediately with “Samplesheet is missing a required path column.”

Clare's (human) note on this feedback

For issue 1: there are currently a docker and singularity profiles, this issue isn't a problem with either of those profiles, but if the user runs the pipeline without a docker or singularity profile (i.e. using locally installed software) the spaceranger filepath will fail. If we want to avoid the possible failure & make this unambiguous, we could add a startup validation in main.nf that exits unless docker or singularity is enabled.

ClareRobin · 2026-04-16T06:34:12Z

Clare (human) comment:
As discussed - move all the bambu processes to a modules/bambu folder with .nf files for each command.

Also - related, since bambu is changing, suggest you add a bambu_container param (or equivalent) so if there's a new bambu container you don't have to manually go into each .nf file & update the container

hafiz-ismail

Mostly general comments

hafiz-ismail · 2026-04-16T06:40:34Z

 ``` 
-nextflow run GoekeLab/bambu-singlecell-spatial \
-  --reads $PWD/examples/samplesheet_basic.csv \   # See the arguments section for format specifications
+nextflow run $PWD/bambu-singlecell-spatial \


yup I think run main.nf is the safest to go?

hafiz-ismail · 2026-04-16T06:42:14Z


-The BambuSC pipeline can be started from from raw reads (fastq.gz) or from a demultiplexed .bam file if you have already produced these (from earlier runs of this pipeline or other upstream tools). Therefore either the --reads or --bams argument is mandatory depending on your input files.
+Executor profiles:
+- `hpc`: execute pipeline on an HPC system


is it good to have a profile for 'local', or a note to 'leave blank for local' as well?

hafiz-ismail · 2026-04-16T07:42:57Z

Another note: the sample sheet that Clare generated for review uses the example data in the repo, where it only contains 10x 5' v2 ONT reads. Both files are actually also identical, which means that Bambu is most likely going to generate the same read class files for both.

Also, this makes it hard to validate if the pipeline still truly functions as intended when using other chemistries/technologies. Maybe an extra 1 to 2 samples can be sourced and included in the sample sheet to reflect the functionality?

lingminhao · 2026-04-16T09:14:39Z

+        cellMixs = list()
+        source(file.path(bin_path,"/utilityFunctions.R"))
+        for(quantData in se){
+            quantData.gene = transcriptToGeneExpression(quantData)


I am just putting a future comment related to my PR here. After it has been merged quantData is no longer a list of se object. You might need to change the code here for successful run. I can provide you the code.

ch99l added 30 commits February 9, 2026 17:11

major update

fec349f

bug fix

14e50db

fix reverse_complement_fastq.py script

c8bcd5c

fix container logic, additional improvements, and bug fixes

d337468

fix comments

1fcb0ff

fix docker system dependencies

795ef4b

fix issue with file extensions

98a3c91

remove unused local variable

e3f9463

refactor main.nf and bug fix

039c9c1

fix comments in nextflow.config

b9d61e4

allow early termination & update pipeline logic when starting from ba…

3f30490

…m/rds

fix 10x 3prime adapter trimming & minimap process output

ebffe99

fix reverse_complement.py, preprocess_fastq scaling, and other minor …

7ee17a0

…bugs

fix preprocess_fastq comments

6581f42

fix flexiplex stdin

dd73cdb

refactor main.nf, update output dir structure, include spatial sample…

5c508ad

…Data

update bambu_construct_read_class.nf output dir

3deefb5

update publishDir

ddcc243

update bambu process output

a54cdb2

bug fix

42dc60a

updated README

fe9784a

update README

35753d1

fix sampleData logic

2f57156

bug fix

ad28090

update README

eeef6ce

update README

e11574d

update README

0f6bd90

update README

6b63cab

update README

41215ed

update params

57f9876

ch99l added 17 commits March 25, 2026 17:03

update README

9b9ff39

update README

fc28a35

update README

f07db09

update README

154b30f

include dynamic retries

b9eb7b4

add param/input check

dfd875f

update samplesheet check

8c3a757

update formatting

ca8f5af

refactor: simplify samplesheet validation using loops

32d6c9f

update README

348bd52

update README

8466545

remove dividers from README

271fed2

refactor: move clustering to separate process

e7a7763

add bambu_path param to load local bambu via devtools

b20469e

add devtools

debed83

update README.md

8c72147

update README

4010340

ch99l requested review from ClareRobin and hafiz-ismail April 15, 2026 06:25

ClareRobin suggested changes Apr 16, 2026

View reviewed changes

ClareRobin reviewed Apr 16, 2026

View reviewed changes

fix collision issue

801b424

hafiz-ismail reviewed Apr 16, 2026

View reviewed changes

remove test_case.md

bc41122

lingminhao reviewed Apr 16, 2026

View reviewed changes

ch99l added 3 commits April 20, 2026 16:07

add visium-hd routing and migrate containers to wave community images

891ddb3

create assets/ dir

a443459

replace deprecated when directives

dec6328

Conversation

ch99l commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ClareRobin commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code review (From Claude /code-review plugin using our best practices instruction md file)

Uh oh!

ClareRobin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ClareRobin commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Github Copilot Code Review (using our nextflow best practices github instruction md file)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hafiz-ismail left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lingminhao Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ch99l commented Apr 15, 2026 •

edited

Loading

ClareRobin commented Apr 15, 2026 •

edited

Loading

ClareRobin commented Apr 16, 2026 •

edited

Loading

lingminhao Apr 16, 2026 •

edited

Loading