Conversation
Code review (From Claude /code-review plugin using our best practices instruction md file)Found 5 issues:
🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
ClareRobin
left a comment
There was a problem hiding this comment.
My (human) review: (summary)
Biggest issue:
- example data (currently as is) can't easily be run (also all the samplesheets currently in examples are out of date) - it looks like the reads_chr9_1_1000000.fastq.gz files are 10x5v2, to be able to run the pipeline I had to:
- make a new samplesheet
- fix resource limits in nextflow config for profile local
- fix the staging clash
- fix sampleData (spatial) parameter in process BAMBU {}
Other (more minor issues):
- Various READMe comments (inconsistencies between README & actual pipline).
There was a problem hiding this comment.
Clare (human) comments :)
suggested to have a more comprehensive gitignore e.g.
.nextflow.log*
work/
results/
*.command.*
*.pyc
.DS_Store
.vscode/```
| ``` | ||
| nextflow run GoekeLab/bambu-singlecell-spatial \ | ||
| --reads $PWD/examples/samplesheet_basic.csv \ # See the arguments section for format specifications | ||
| nextflow run $PWD/bambu-singlecell-spatial \ |
There was a problem hiding this comment.
Clare (human) comment:
This example command in the README has contradictory $PWD references — $PWD/bambu-singlecell-spatial assumes you're in the parent directory, but $PWD/examples/... assumes you're inside the repo.
should either be:
nextflow run $PWD/bambu-singlecell-spatial \ --input $PWD/bambu-singlecell-spatial/examples/samplesheet.csv \ --genome $PWD/bambu-singlecell-spatial/examples/Homo_sapiens.GRCh38.dna_sm.primary_assembly_chr9_1_1000000.fa \ --annotation $PWD/bambu-singlecell-spatial/examples/Homo_sapiens.GRCh38.91_chr9_1_1000000.gtf \ -profile singularity,hpc
or
nextflow run . \ --input $PWD/examples/samplesheet.csv \ --genome $PWD/examples/Homo_sapiens.GRCh38.dna_sm.primary_assembly_chr9_1_1000000.fa \ --annotation $PWD/examples/Homo_sapiens.GRCh38.91_chr9_1_1000000.gtf \ -profile singularity,hpc
(or run main.nf also works)
There was a problem hiding this comment.
yup I think run main.nf is the safest to go?
There was a problem hiding this comment.
Clare (human) comment:
Note - all the samplesheets currently in examples/ have incorrect headers for how the pipeline runs now (so need to be updated)
There was a problem hiding this comment.
Another note: the sample sheet that Clare generated for review uses the example data in the repo, where it only contains 10x 5' v2 ONT reads. Both files are actually also identical, which means that Bambu is most likely going to generate the same read class files for both.
Also, this makes it hard to validate if the pipeline still truly functions as intended when using other chemistries/technologies. Maybe an extra 1 to 2 samples can be sourced and included in the sample sheet to reflect the functionality?
Github Copilot Code Review (using our nextflow best practices github instruction md file)Findings:
Clare's (human) note on this feedback For issue 1: there are currently a docker and singularity profiles, this issue isn't a problem with either of those profiles, but if the user runs the pipeline without a docker or singularity profile (i.e. using locally installed software) the spaceranger filepath will fail. If we want to avoid the possible failure & make this unambiguous, we could add a startup validation in main.nf that exits unless docker or singularity is enabled. |
There was a problem hiding this comment.
Clare (human) comment:
As discussed - move all the bambu processes to a modules/bambu folder with .nf files for each command.
There was a problem hiding this comment.
Also - related, since bambu is changing, suggest you add a bambu_container param (or equivalent) so if there's a new bambu container you don't have to manually go into each .nf file & update the container
| ``` | ||
| nextflow run GoekeLab/bambu-singlecell-spatial \ | ||
| --reads $PWD/examples/samplesheet_basic.csv \ # See the arguments section for format specifications | ||
| nextflow run $PWD/bambu-singlecell-spatial \ |
There was a problem hiding this comment.
yup I think run main.nf is the safest to go?
|
|
||
| The BambuSC pipeline can be started from from raw reads (fastq.gz) or from a demultiplexed .bam file if you have already produced these (from earlier runs of this pipeline or other upstream tools). Therefore either the --reads or --bams argument is mandatory depending on your input files. | ||
| Executor profiles: | ||
| - `hpc`: execute pipeline on an HPC system |
There was a problem hiding this comment.
is it good to have a profile for 'local', or a note to 'leave blank for local' as well?
There was a problem hiding this comment.
Another note: the sample sheet that Clare generated for review uses the example data in the repo, where it only contains 10x 5' v2 ONT reads. Both files are actually also identical, which means that Bambu is most likely going to generate the same read class files for both.
Also, this makes it hard to validate if the pipeline still truly functions as intended when using other chemistries/technologies. Maybe an extra 1 to 2 samples can be sourced and included in the sample sheet to reflect the functionality?
| cellMixs = list() | ||
| source(file.path(bin_path,"/utilityFunctions.R")) | ||
| for(quantData in se){ | ||
| quantData.gene = transcriptToGeneExpression(quantData) |
There was a problem hiding this comment.
I am just putting a future comment related to my PR here. After it has been merged quantData is no longer a list of se object. You might need to change the code here for successful run. I can provide you the code.
Major Update of Bambu-Single Cell Pipeline
Summary of Changes Made: