Skip to content

Commit c05e250

Browse files
authored
Merge pull request #23 from Sydney-Informatics-Hub/dev-update-session-2
Merge small corrections into main for review. - Included additional content in intro and lessons to tie together the new heading on the [home page](https://sydney-informatics-hub.github.io/customising-nfcore-workshop-2026/ - Added a section on software versions to further bost the reproducibility theme - Some minor fixes eg lesson to Lesson
2 parents 5d933c9 + e20e02f commit c05e250

8 files changed

Lines changed: 111 additions & 614 deletions

File tree

docs/assets/2.3_github_issues.png

20.5 KB
Loading

docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ Our full code of conduct, with incident reporting guidelines, is available [here
3636
|------------|----------|
3737
| [Set up your computer](./setup.md)| Follow these instructions to install VS Code and login to your Nectar VM. |
3838
| [Session 1: Introduction to nf-core](session_1/1.0_intro.md)| Learn fundamental ideas and skills that are essential for using Nextflow and nf-core workflows. |
39-
| [Session 2: Customising nf-core](session_2/2.0_intro.md)| Write, run, adjust, and re-run an nf-core workflow as we step through various customisation scenarios. |
39+
| [Session 2: Customising nf-core](session_2/2.0_intro.md)| Write, run, adjust, and re-run an nf-core workflow as we step through various customisation and troubleshooting scenarios. |
4040

4141
## Course survey
4242

docs/session_2/2.0_intro.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,9 @@ This session builds on [Session 1](../session_1/1.0_intro.md), where we explored
88
- Nextflow `run` command starts the workflow
99
- Nextflow `log` can be used to inspect run details
1010

11-
At the end of session 1, we downloaded the [nf-core/rnaseq](https://nf-co.re/rnaseq/3.23.0) pipeline and submitted our first run.
11+
At the end of Session 1, we downloaded the [nf-core/rnaseq](https://nf-co.re/rnaseq/3.23.0) pipeline and submitted our first run.
1212

13-
During Session 2, we will continue with this pipeline, using the same data from a [published mouse RNAseq study](https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-2801-4). Pipeline outputs and source code will be explored, and various customisations will be applied using parameters and configuration files to provide you with hands-on experience in nf-core pipeline customisation.
13+
During Session 2, we will continue with this pipeline, using the same data from a [published mouse RNAseq study](https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-2801-4). Pipeline outputs and source code will be explored, and various customisations will be applied using parameters and configuration files to provide you with hands-on experience in nf-core pipeline customisation. We will also learn about maintaining reproducibility and portability when running custom analyses.
1414

1515

1616

@@ -32,7 +32,7 @@ During Session 2, we will continue with this pipeline, using the same data from
3232
2. Select the IP address for your VM from the drop-down list
3333
3. Type in your provided password and hit enter
3434

35-
Change into the session 2 directory created in the previous session:
35+
Change into the `session2` directory created in the previous session:
3636

3737
```default
3838
cd session2

docs/session_2/2.1_params.md

Lines changed: 57 additions & 40 deletions
Large diffs are not rendered by default.

docs/session_2/2.2_config.md

Lines changed: 47 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99

1010
## 2.2.1 Separation of parameters and configurations
1111

12-
In [lesson 2.1](./2.1_params.md), we explored how to customise a run with **nf-core pipeline parameters** on the command line, within a parameters file, or using a run script. In this lesson we will expand upon the Nextflow **configuration settings** introduced in [lesson 1.3](./2.1_params.md).
12+
In [Lesson 2.1](./2.1_params.md), we explored how to customise a run with **nf-core pipeline parameters** on the command line, within a parameters file, or using a run script. In this lesson we will expand upon the Nextflow **configuration settings** introduced in [Lesson 1.3](../session_1/1.3_configure.md).
1313

1414
Pipeline parameters control *what* is run, where configurations control *how* it is run. Customising configurations can be an essential part of getting the workflow to run on your compute system, whether that be your local computer, a remote server or VM, cloud, or High Performance Computer (HPC).
1515

@@ -102,35 +102,59 @@ The below exercise is designed to familiarise you with searching nf-core configu
102102

103103
What are the default settings for CPU and memory for the STAR_ALIGN module?
104104

105-
??? hint "Hint 1"
106-
Find the process label within the STAR_ALIGN `main.nf` file
107-
108-
??? hint "Hint 2"
109-
Process labels are used in `conf/base.config` to assign default compute resources
110-
111-
105+
??? hint "Hint 1: Process labels"
106+
107+
To uncover the default compute resources for the STAR_ALIGN process, we need to find out what **process label** has been assigned to this process.
108+
109+
Process labels are used in `conf/base.config` to assign default compute resources.
112110

113-
??? success "Solution"
111+
Find the process label within the STAR_ALIGN `main.nf` file, and check the resources assigned to that label within `conf/base.config`.
114112

115-
To uncover the default compute resources for the STAR_ALIGN process, we need to find out what **process label** has been assigned to this process. Recall from [lesson 1.1.3](../session1/1.1_nfcore.md/#113-nf-core-workflow-structure) that each process (or 'module') has its own `main.nf` file which includes the Nextflow code to set up the task as well as the actual command to run the analysis.
113+
??? hint "Hint 2: STAR_ALIGN module script"
114+
115+
Recall from [Lesson 1.1.3](../session1/1.1_nfcore.md/#113-nf-core-workflow-structure) that each process (or 'module') has its own `main.nf` file which includes the Nextflow code to set up the task as well as the actual command to run the analysis. The STAR_ALIGN process label will be included within this script.
116116

117117
Finding the `main.nf` script for STAR_ALIGN or any other process can be a little tricky, since nf-core pipelines are a collection of workflows, subworkflows, and modules, that can be local (i.e. used only by) the pipeline, or those that are widely used on other nf-core pipelines.
118118

119-
Applying the knowledge that all module scripts are named `main.nf`, we can't search for the file by name, but we can search for the tool name in the module filepath. nf-core filepaths use lower-case, while the process names themselves use capitals, such as STAR_ALIGN.
119+
Applying the knowledge that all nf-core module scripts are named `main.nf`, we can't search for the file by name, but we can search for the tool name in the module filepath. nf-core filepaths use lower-case, while the process names themselves use capitals, such as STAR_ALIGN.
120+
121+
The bash command below will help you find the STAR_ALIGN `main.nf` file:
120122

121123
```bash
122124
find ./rnaseq/ -type d -name "*star*" -print
123125
```
124126

127+
??? success "Solution"
128+
129+
Searching for `star` in the nf-core/rnaseq codebase yields the following output:
130+
131+
125132
```console title="Output"
126133
./rnaseq/modules/nf-core/sentieon/staralign
127134
./rnaseq/modules/nf-core/star
128135
./rnaseq/subworkflows/local/align_star
129136
```
130137

131-
The STAR alignment subworkflow script can be found at `./rnaseq/subworkflows/local/align_star/main.nf`, and the process script - which is called by the subworkflow - can be found at `./rnaseq/modules/nf-core/star/align/main.nf`. (We are not using the licensed 'sentieon' tools).
138+
The ALIGN_STAR subworkflow script can be found at `./rnaseq/subworkflows/local/align_star/main.nf`. This subworkflow calls the STAR_ALIGN module.
139+
140+
We can find the STAR_ALIGN `main.nf` file either by looking into the `./rnaseq/modules/nf-core/star` directory *or* by viewing the subworkflow script:
141+
142+
```bash
143+
head ./rnaseq/subworkflows/local/align_star/main.nf
144+
```
145+
146+
```console title="Output"
147+
//
148+
// Alignment with STAR
149+
//
150+
include { SENTIEON_STARALIGN as SENTIEON_STAR_ALIGN } from '../../../modules/nf-core/sentieon/staralign/main'
151+
include { PARABRICKS_RNAFQ2BAM as PARABRICKS_RNA_FQ2BAM } from '../../../modules/nf-core/parabricks/rnafq2bam/main'
152+
include { STAR_ALIGN } from '../../../modules/nf-core/star/align'
153+
include { STAR_ALIGN as STAR_ALIGN_IGENOMES } from '../../../modules/nf-core/star/align'
154+
include { BAM_SORT_STATS_SAMTOOLS } from '../../nf-core/bam_sort_stats_samtools'
155+
```
132156

133-
Now we have identified the process code, we can discover its label, for example with `more` or `grep:
157+
The process label can then be extracted from the `modules/nf-core/star/align/main.nf` file, for example with `more` or `grep:
134158

135159
```bash
136160
grep label rnaseq/modules/nf-core/star/align/main.nf
@@ -140,7 +164,7 @@ The below exercise is designed to familiarise you with searching nf-core configu
140164
label 'process_high'
141165
```
142166

143-
Finally, identify the resources for STAR_ALIGN from the base config:
167+
The `process_high` label within the `conf/base.config` shows us that the STAR_ALIGN process will receive 12 CPU and 72 GB memory by default:
144168

145169
```bash
146170
more rnaseq/conf/base.config
@@ -156,7 +180,7 @@ The below exercise is designed to familiarise you with searching nf-core configu
156180

157181
## 2.2.3 When to use a custom config file
158182

159-
In [lesson 1.4.3](../session_1/1.4_rnaseq.md/#143-run-the-pipeline), we applied custom configurations to the rnaseq pipeline to restrict the maximum amount of CPUs and memory each process can use with the custom config we created. As observed when we attempted to run the pipeline before adding that configuration, this customisation was required in order to run the pipeline in our environment.
183+
In [Lesson 1.4.3](../session_1/1.4_rnaseq.md/#143-run-the-pipeline), we applied custom configurations to the rnaseq pipeline to restrict the maximum amount of CPUs and memory each process can use with the custom config we created. As observed when we attempted to run the pipeline before adding that configuration, this customisation was required in order to run the pipeline in our environment.
160184

161185
Apart from reducing resources to adapt to a low-resource compute environment, there are other circumstances in which our nf-core pipeline run can benefit from custom configurations:
162186

@@ -168,13 +192,13 @@ Apart from reducing resources to adapt to a low-resource compute environment, th
168192
- Customise outputs beyond what is possible using the nf-core pipeline parameters
169193

170194

171-
The rest of lesson 2.2 will explore custom resource configuration files, while lesson 2.3 will focus on customising outputs. We won't be covering customising runs for HPC in this workshop, but please check out our [tips and tricks page](../tips_tricks.md) later if you are interested in this, as well as the section below on institutional configs for HPC and other platforms.
195+
The rest of Lesson 2.2 will explore custom resource configuration files. We won't be covering customising runs for HPC in this workshop, but please check out our [Tips and Tricks page](../tips_tricks.md) later if you are interested in this, as well as the section below on institutional configs for HPC and other platforms.
172196

173197
## 2.2.4 Configuration profiles and shared configs
174198

175-
In [lesson 1.4](../session_1/1.4_rnaseq.md#setting-resource-limits) we started developing a custom config for our workshop Nectar VMs. We applied this config to our run using the Nextflow `-c <myconfig>` parameter.
199+
In [Lesson 1.4](../session_1/1.4_rnaseq.md#setting-resource-limits) we started developing a custom config for our workshop Nectar VMs. We applied this config to our run using the Nextflow `-c <myconfig>` parameter.
176200

177-
Custom configurations can also be included as a `profile`, just as we did for the MultiQC report configuration in [lesson 1.3.6](../session_1/1.3_configure.md/#custom-profiles). Profiles are the way in which nf-core's global community-driven shared institutional configs, introduced in [lesson 1.3.5](../session1/1.3_configure.md#135-shared-configuration-files), can be applied to your pipeline runs on any of the platforms included in the shared config collection.
201+
Custom configurations can also be included as a `profile`, just as we did for the MultiQC report configuration in [Lesson 1.3.6](../session_1/1.3_configure.md/#custom-profiles). Profiles are the way in which nf-core's global community-driven shared institutional configs, introduced in [Lesson 1.3.5](../session1/1.3_configure.md#135-shared-configuration-files), can be applied to your pipeline runs on any of the platforms included in the shared config collection.
178202

179203
We recommend you use the [NCI Gadi shared config](https://nf-co.re/configs/nci_gadi) or [Pawsey Setonix shared config](https://nf-co.re/configs/pawsey_setonix) if you run nf-core pipelines on these national HPCs.
180204

@@ -223,7 +247,7 @@ As we continue customising our run on the workshop VMs, it makes sense to **defi
223247
Edit `nectar_vm.config` to define a profile named 'workshop', including the `resourceLimits` directive previously applied.
224248

225249
??? hint "Hint 1"
226-
Check the [Nextflow profiles scope docs](https://docs.seqera.io/nextflow/config#config-profiles) or revisit [lesson 1.3](../session_1/1.3_configure.md/#custom-profiles) for syntax guidance.
250+
Check the [Nextflow profiles scope docs](https://docs.seqera.io/nextflow/config#config-profiles) or revisit [Lesson 1.3](../session_1/1.3_configure.md/#custom-profiles) for syntax guidance.
227251

228252
??? hint "Hint 2 "
229253
Check the solution of [Exercise 1.4.2.4](../session_1/1.4_rnaseq.md/#setting-resource-limits)
@@ -264,7 +288,7 @@ We can simplify this slightly by dropping the singularity profile from the run c
264288
Other commonly observed options are:
265289

266290
- `autoMounts` to allow Nextflow to automatically mount host paths when a container is executed (default: true since v 23.10.0)
267-
- `cacheDir` to specify the directory Singularity cache directory. We have set this within our user profile in [lesson 1.2.2](../session_1/1.2_run.md/#122-managing-your-environment) so it is not required here.
291+
- `cacheDir` to specify the directory Singularity cache directory. We have set this within our user profile in [Lesson 1.2.2](../session_1/1.2_run.md/#122-managing-your-environment) so it is not required here.
268292

269293
<br>
270294

@@ -458,7 +482,7 @@ In addition to `withLabel`, Nextflow also provides the `withName` process select
458482

459483
<br>
460484

461-
To utilise `withName`, we first need to ensure we have the correct and specific process name. For utmost specifity, the 'fully qualified name' is safest.
485+
To utilise `withName`, we first need to ensure we have the correct and specific process name. For utmost specificity, the 'fully qualified name' is safest.
462486

463487
In nf-core pipelines, the fully qualified process name, also referred to as the **process execution path**, is built from the pipeline name, one or more workflows or subworkflows, and the final process name. For example:
464488

@@ -535,9 +559,9 @@ We now expect to see:
535559

536560
<br>
537561

538-
⏲️ It is unlikely that our run will complete much faster, if at all, with our applied customisations. If you review the process execution times within the pipeline_info files, you'll note that even our bottleneck process completed in ~ 1 minute, with most others requiring only seconds.
562+
⏲️ If you review the process execution times within the `pipeline_info` files, you'll see that even our bottleneck process completed in ~ 1 minute, with most others requiring only seconds. For this demonstration, it is therefore unlikely that our run will complete much faster following our customisations for this small dataset on these workshop VMs.
539563

540-
💻 The resource customisations we have applied in this lesson are trivial given the limitations of our workshop VMs. Consider how this approach can be really powerful when working on HPC or cloud infrastructures, where the [`executor`](https://docs.seqera.io/nextflow/executor) and [`queue`](https://docs.seqera.io/nextflow/reference/process#queue) directives enable you to take full advantage of the compute resources available on your platform.
564+
💻 Consider how this approach can be really powerful when working on HPC or cloud infrastructures, where the [`executor`](https://docs.seqera.io/nextflow/executor) and [`queue`](https://docs.seqera.io/nextflow/reference/process#queue) directives enable you to take full advantage of the compute resources available on your platform.
541565

542566
<br>
543567

docs/session_2/2.3_custom.md

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1 @@
1-
# Further customisations
2-
3-
4-
5-
6-
7-
8-
9-
this is getting a bit long, let's wrap this up into a profile to keep things clean
1+
# Customisation in action

0 commit comments

Comments
 (0)