Skip to content

Commit 8c72147

Browse files
committed
update README.md
1 parent debed83 commit 8c72147

1 file changed

Lines changed: 59 additions & 0 deletions

File tree

README.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,65 @@ This feature is still under development and will be released in a future update.
209209
This feature is still under development and will be released in a future update.
210210

211211

212+
### **Advanced Usage**
213+
214+
**Stopping the Pipeline Early**
215+
216+
The `--early_stop_stage` parameter allows you to stop the pipeline at an intermediate stage and save the outputs for later use. This is useful when you want to inspect intermediate files or when you plan to re-run downstream steps separately.
217+
218+
- `--early_stop_stage bam`: Stops after genome alignment. BAM files are saved to `output/bam/`.
219+
- `--early_stop_stage rds`: Stops after Bambu read class construction. Read class `.rds` files are saved to `output/read_class/`.
220+
221+
```bash
222+
# Stop after read class construction
223+
nextflow run $PWD/bambu-singlecell-spatial \
224+
--input samplesheet.csv \
225+
--genome reference.fa \
226+
--annotation reference.gtf \
227+
--early_stop_stage rds \
228+
-profile singularity,hpc
229+
```
230+
231+
**Restarting from a Specific Stage**
232+
233+
Because the pipeline accepts FASTQ, BAM, or RDS files as input, you can restart from any intermediate stage by providing the corresponding files in your samplesheet. This avoids re-running expensive preprocessing and alignment steps when they have already been completed.
234+
235+
*Example: Incremental sample addition*
236+
237+
A common use case is to process an initial set of samples through to read class `.rds` files, then re-run the full pipeline once additional samples are available. Transcript discovery and quantification in Bambu is performed jointly across all samples, so adding new samples requires re-running only from the `.rds` stage onward.
238+
239+
**Step 1** — Run the first batch of samples from FASTQ to `.rds`:
240+
```bash
241+
nextflow run $PWD/bambu-singlecell-spatial \
242+
--input samplesheet_batch1.csv \
243+
--genome reference.fa \
244+
--annotation reference.gtf \
245+
--early_stop_stage rds \
246+
-profile singularity,hpc
247+
```
248+
249+
This produces `output/read_class/sample1_readClassFile.rds`, `output/read_class/sample2_readClassFile.rds`, etc.
250+
251+
**Step 2** — When new samples are ready, run all samples together from `.rds` for transcript discovery and quantification. Point the `path` column at the existing `.rds` files for the original samples and at the new FASTQ/BAM files for the new samples:
252+
253+
```csv
254+
sample,path,chemistry,technology
255+
sample1,output/read_class/sample1_readClassFile.rds,10x3v3,ONT
256+
sample2,output/read_class/sample2_readClassFile.rds,10x3v3,ONT
257+
sample3,path/to/sample3.fastq.gz,10x3v3,ONT
258+
```
259+
260+
```bash
261+
nextflow run $PWD/bambu-singlecell-spatial \
262+
--input samplesheet_all.csv \
263+
--genome reference.fa \
264+
--annotation reference.gtf \
265+
-profile singularity,hpc
266+
```
267+
268+
The pipeline will skip preprocessing and alignment for `sample1` and `sample2`, process `sample3` from FASTQ through to `.rds`, and then perform transcript discovery and quantification jointly across all three samples.
269+
270+
212271
### **Additional Information**
213272
UMI correction is done at the barcode level. The longest read for each unique barcode-UMI combination is kept for analysis.
214273

0 commit comments

Comments
 (0)