You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Merge pull request #23 from Sydney-Informatics-Hub/dev-update-session-2
Merge small corrections into main for review.
- Included additional content in intro and lessons to tie together the new heading on the [home page](https://sydney-informatics-hub.github.io/customising-nfcore-workshop-2026/
- Added a section on software versions to further bost the reproducibility theme
- Some minor fixes eg lesson to Lesson
Copy file name to clipboardExpand all lines: docs/index.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,7 +36,7 @@ Our full code of conduct, with incident reporting guidelines, is available [here
36
36
|------------|----------|
37
37
|[Set up your computer](./setup.md)| Follow these instructions to install VS Code and login to your Nectar VM. |
38
38
|[Session 1: Introduction to nf-core](session_1/1.0_intro.md)| Learn fundamental ideas and skills that are essential for using Nextflow and nf-core workflows. |
39
-
|[Session 2: Customising nf-core](session_2/2.0_intro.md)| Write, run, adjust, and re-run an nf-core workflow as we step through various customisation scenarios. |
39
+
|[Session 2: Customising nf-core](session_2/2.0_intro.md)| Write, run, adjust, and re-run an nf-core workflow as we step through various customisation and troubleshooting scenarios. |
Copy file name to clipboardExpand all lines: docs/session_2/2.0_intro.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,9 +8,9 @@ This session builds on [Session 1](../session_1/1.0_intro.md), where we explored
8
8
- Nextflow `run` command starts the workflow
9
9
- Nextflow `log` can be used to inspect run details
10
10
11
-
At the end of session 1, we downloaded the [nf-core/rnaseq](https://nf-co.re/rnaseq/3.23.0) pipeline and submitted our first run.
11
+
At the end of Session 1, we downloaded the [nf-core/rnaseq](https://nf-co.re/rnaseq/3.23.0) pipeline and submitted our first run.
12
12
13
-
During Session 2, we will continue with this pipeline, using the same data from a [published mouse RNAseq study](https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-2801-4). Pipeline outputs and source code will be explored, and various customisations will be applied using parameters and configuration files to provide you with hands-on experience in nf-core pipeline customisation.
13
+
During Session 2, we will continue with this pipeline, using the same data from a [published mouse RNAseq study](https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-2801-4). Pipeline outputs and source code will be explored, and various customisations will be applied using parameters and configuration files to provide you with hands-on experience in nf-core pipeline customisation. We will also learn about maintaining reproducibility and portability when running custom analyses.
14
14
15
15
16
16
@@ -32,7 +32,7 @@ During Session 2, we will continue with this pipeline, using the same data from
32
32
2. Select the IP address for your VM from the drop-down list
33
33
3. Type in your provided password and hit enter
34
34
35
-
Change into the session 2 directory created in the previous session:
35
+
Change into the `session2` directory created in the previous session:
Copy file name to clipboardExpand all lines: docs/session_2/2.2_config.md
+47-23Lines changed: 47 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@
9
9
10
10
## 2.2.1 Separation of parameters and configurations
11
11
12
-
In [lesson 2.1](./2.1_params.md), we explored how to customise a run with **nf-core pipeline parameters** on the command line, within a parameters file, or using a run script. In this lesson we will expand upon the Nextflow **configuration settings** introduced in [lesson 1.3](./2.1_params.md).
12
+
In [Lesson 2.1](./2.1_params.md), we explored how to customise a run with **nf-core pipeline parameters** on the command line, within a parameters file, or using a run script. In this lesson we will expand upon the Nextflow **configuration settings** introduced in [Lesson 1.3](../session_1/1.3_configure.md).
13
13
14
14
Pipeline parameters control *what* is run, where configurations control *how* it is run. Customising configurations can be an essential part of getting the workflow to run on your compute system, whether that be your local computer, a remote server or VM, cloud, or High Performance Computer (HPC).
15
15
@@ -102,35 +102,59 @@ The below exercise is designed to familiarise you with searching nf-core configu
102
102
103
103
What are the default settings for CPU and memory for the STAR_ALIGN module?
104
104
105
-
??? hint "Hint 1"
106
-
Find the process label within the STAR_ALIGN `main.nf` file
107
-
108
-
??? hint "Hint 2"
109
-
Process labels are used in `conf/base.config` to assign default compute resources
110
-
111
-
105
+
??? hint "Hint 1: Process labels"
106
+
107
+
To uncover the default compute resources for the STAR_ALIGN process, we need to find out what **process label** has been assigned to this process.
108
+
109
+
Process labels are used in `conf/base.config` to assign default compute resources.
112
110
113
-
??? success "Solution"
111
+
Find the process label within the STAR_ALIGN `main.nf` file, and check the resources assigned to that label within `conf/base.config`.
114
112
115
-
To uncover the default compute resources for the STAR_ALIGN process, we need to find out what **process label** has been assigned to this process. Recall from [lesson 1.1.3](../session1/1.1_nfcore.md/#113-nf-core-workflow-structure) that each process (or 'module') has its own `main.nf` file which includes the Nextflow code to set up the task as well as the actual command to run the analysis.
113
+
??? hint "Hint 2: STAR_ALIGN module script"
114
+
115
+
Recall from [Lesson 1.1.3](../session1/1.1_nfcore.md/#113-nf-core-workflow-structure) that each process (or 'module') has its own `main.nf` file which includes the Nextflow code to set up the task as well as the actual command to run the analysis. The STAR_ALIGN process label will be included within this script.
116
116
117
117
Finding the `main.nf` script for STAR_ALIGN or any other process can be a little tricky, since nf-core pipelines are a collection of workflows, subworkflows, and modules, that can be local (i.e. used only by) the pipeline, or those that are widely used on other nf-core pipelines.
118
118
119
-
Applying the knowledge that all module scripts are named `main.nf`, we can't search for the file by name, but we can search for the tool name in the module filepath. nf-core filepaths use lower-case, while the process names themselves use capitals, such as STAR_ALIGN.
119
+
Applying the knowledge that all nf-core module scripts are named `main.nf`, we can't search for the file by name, but we can search for the tool name in the module filepath. nf-core filepaths use lower-case, while the process names themselves use capitals, such as STAR_ALIGN.
120
+
121
+
The bash command below will help you find the STAR_ALIGN `main.nf` file:
120
122
121
123
```bash
122
124
find ./rnaseq/ -type d -name "*star*" -print
123
125
```
124
126
127
+
??? success "Solution"
128
+
129
+
Searching for `star` in the nf-core/rnaseq codebase yields the following output:
130
+
131
+
125
132
```console title="Output"
126
133
./rnaseq/modules/nf-core/sentieon/staralign
127
134
./rnaseq/modules/nf-core/star
128
135
./rnaseq/subworkflows/local/align_star
129
136
```
130
137
131
-
The STAR alignment subworkflow script can be found at `./rnaseq/subworkflows/local/align_star/main.nf`, and the process script - which is called by the subworkflow - can be found at `./rnaseq/modules/nf-core/star/align/main.nf`. (We are not using the licensed 'sentieon' tools).
138
+
The ALIGN_STAR subworkflow script can be found at `./rnaseq/subworkflows/local/align_star/main.nf`. This subworkflow calls the STAR_ALIGN module.
139
+
140
+
We can find the STAR_ALIGN `main.nf` file either by looking into the `./rnaseq/modules/nf-core/star` directory *or* by viewing the subworkflow script:
141
+
142
+
```bash
143
+
head ./rnaseq/subworkflows/local/align_star/main.nf
144
+
```
145
+
146
+
```console title="Output"
147
+
//
148
+
// Alignment with STAR
149
+
//
150
+
include { SENTIEON_STARALIGN as SENTIEON_STAR_ALIGN } from '../../../modules/nf-core/sentieon/staralign/main'
151
+
include { PARABRICKS_RNAFQ2BAM as PARABRICKS_RNA_FQ2BAM } from '../../../modules/nf-core/parabricks/rnafq2bam/main'
152
+
include { STAR_ALIGN } from '../../../modules/nf-core/star/align'
153
+
include { STAR_ALIGN as STAR_ALIGN_IGENOMES } from '../../../modules/nf-core/star/align'
154
+
include { BAM_SORT_STATS_SAMTOOLS } from '../../nf-core/bam_sort_stats_samtools'
155
+
```
132
156
133
-
Now we have identified the process code, we can discover its label, for example with `more` or `grep:
157
+
The process label can then be extracted from the `modules/nf-core/star/align/main.nf` file, for example with `more` or `grep:
@@ -140,7 +164,7 @@ The below exercise is designed to familiarise you with searching nf-core configu
140
164
label 'process_high'
141
165
```
142
166
143
-
Finally, identify the resources for STAR_ALIGN from the base config:
167
+
The `process_high` label within the `conf/base.config` shows us that the STAR_ALIGN process will receive 12 CPU and 72 GB memory by default:
144
168
145
169
```bash
146
170
more rnaseq/conf/base.config
@@ -156,7 +180,7 @@ The below exercise is designed to familiarise you with searching nf-core configu
156
180
157
181
## 2.2.3 When to use a custom config file
158
182
159
-
In [lesson 1.4.3](../session_1/1.4_rnaseq.md/#143-run-the-pipeline), we applied custom configurations to the rnaseq pipeline to restrict the maximum amount of CPUs and memory each process can use with the custom config we created. As observed when we attempted to run the pipeline before adding that configuration, this customisation was required in order to run the pipeline in our environment.
183
+
In [Lesson 1.4.3](../session_1/1.4_rnaseq.md/#143-run-the-pipeline), we applied custom configurations to the rnaseq pipeline to restrict the maximum amount of CPUs and memory each process can use with the custom config we created. As observed when we attempted to run the pipeline before adding that configuration, this customisation was required in order to run the pipeline in our environment.
160
184
161
185
Apart from reducing resources to adapt to a low-resource compute environment, there are other circumstances in which our nf-core pipeline run can benefit from custom configurations:
162
186
@@ -168,13 +192,13 @@ Apart from reducing resources to adapt to a low-resource compute environment, th
168
192
- Customise outputs beyond what is possible using the nf-core pipeline parameters
169
193
170
194
171
-
The rest of lesson 2.2 will explore custom resource configuration files, while lesson 2.3 will focus on customising outputs. We won't be covering customising runs for HPC in this workshop, but please check out our [tips and tricks page](../tips_tricks.md) later if you are interested in this, as well as the section below on institutional configs for HPC and other platforms.
195
+
The rest of Lesson 2.2 will explore custom resource configuration files. We won't be covering customising runs for HPC in this workshop, but please check out our [Tips and Tricks page](../tips_tricks.md) later if you are interested in this, as well as the section below on institutional configs for HPC and other platforms.
172
196
173
197
## 2.2.4 Configuration profiles and shared configs
174
198
175
-
In [lesson 1.4](../session_1/1.4_rnaseq.md#setting-resource-limits) we started developing a custom config for our workshop Nectar VMs. We applied this config to our run using the Nextflow `-c <myconfig>` parameter.
199
+
In [Lesson 1.4](../session_1/1.4_rnaseq.md#setting-resource-limits) we started developing a custom config for our workshop Nectar VMs. We applied this config to our run using the Nextflow `-c <myconfig>` parameter.
176
200
177
-
Custom configurations can also be included as a `profile`, just as we did for the MultiQC report configuration in [lesson 1.3.6](../session_1/1.3_configure.md/#custom-profiles). Profiles are the way in which nf-core's global community-driven shared institutional configs, introduced in [lesson 1.3.5](../session1/1.3_configure.md#135-shared-configuration-files), can be applied to your pipeline runs on any of the platforms included in the shared config collection.
201
+
Custom configurations can also be included as a `profile`, just as we did for the MultiQC report configuration in [Lesson 1.3.6](../session_1/1.3_configure.md/#custom-profiles). Profiles are the way in which nf-core's global community-driven shared institutional configs, introduced in [Lesson 1.3.5](../session1/1.3_configure.md#135-shared-configuration-files), can be applied to your pipeline runs on any of the platforms included in the shared config collection.
178
202
179
203
We recommend you use the [NCI Gadi shared config](https://nf-co.re/configs/nci_gadi) or [Pawsey Setonix shared config](https://nf-co.re/configs/pawsey_setonix) if you run nf-core pipelines on these national HPCs.
180
204
@@ -223,7 +247,7 @@ As we continue customising our run on the workshop VMs, it makes sense to **defi
223
247
Edit `nectar_vm.config` to define a profile named 'workshop', including the `resourceLimits` directive previously applied.
224
248
225
249
??? hint "Hint 1"
226
-
Check the [Nextflow profiles scope docs](https://docs.seqera.io/nextflow/config#config-profiles) or revisit [lesson 1.3](../session_1/1.3_configure.md/#custom-profiles) for syntax guidance.
250
+
Check the [Nextflow profiles scope docs](https://docs.seqera.io/nextflow/config#config-profiles) or revisit [Lesson 1.3](../session_1/1.3_configure.md/#custom-profiles) for syntax guidance.
227
251
228
252
??? hint "Hint 2 "
229
253
Check the solution of [Exercise 1.4.2.4](../session_1/1.4_rnaseq.md/#setting-resource-limits)
@@ -264,7 +288,7 @@ We can simplify this slightly by dropping the singularity profile from the run c
264
288
Other commonly observed options are:
265
289
266
290
- `autoMounts` to allow Nextflow to automatically mount host paths when a container is executed (default: true since v 23.10.0)
267
-
- `cacheDir` to specify the directory Singularity cache directory. We have set this within our user profile in [lesson 1.2.2](../session_1/1.2_run.md/#122-managing-your-environment) so it is not required here.
291
+
- `cacheDir` to specify the directory Singularity cache directory. We have set this within our user profile in [Lesson 1.2.2](../session_1/1.2_run.md/#122-managing-your-environment) so it is not required here.
268
292
269
293
<br>
270
294
@@ -458,7 +482,7 @@ In addition to `withLabel`, Nextflow also provides the `withName` process select
458
482
459
483
<br>
460
484
461
-
To utilise `withName`, we first need to ensure we have the correct and specific process name. For utmost specifity, the 'fully qualified name' is safest.
485
+
To utilise `withName`, we first need to ensure we have the correct and specific process name. For utmost specificity, the 'fully qualified name' is safest.
462
486
463
487
In nf-core pipelines, the fully qualified process name, also referred to as the **process execution path**, is built from the pipeline name, one or more workflows or subworkflows, and the final process name. For example:
464
488
@@ -535,9 +559,9 @@ We now expect to see:
535
559
536
560
<br>
537
561
538
-
⏲️ It is unlikely that our run will complete much faster, if at all, with our applied customisations. If you review the process execution times within the pipeline_info files, you'll note that even our bottleneck process completed in ~ 1 minute, with most others requiring only seconds.
562
+
⏲️ If you review the process execution times within the `pipeline_info` files, you'll see that even our bottleneck process completed in ~ 1 minute, with most others requiring only seconds. For this demonstration, it is therefore unlikely that our run will complete much faster following our customisations for this small dataset on these workshop VMs.
539
563
540
-
💻 The resource customisations we have applied in this lesson are trivial given the limitations of our workshop VMs. Consider how this approach can be really powerful when working on HPC or cloud infrastructures, where the [`executor`](https://docs.seqera.io/nextflow/executor) and [`queue`](https://docs.seqera.io/nextflow/reference/process#queue) directives enable you to take full advantage of the compute resources available on your platform.
564
+
💻 Consider how this approach can be really powerful when working on HPC or cloud infrastructures, where the [`executor`](https://docs.seqera.io/nextflow/executor) and [`queue`](https://docs.seqera.io/nextflow/reference/process#queue) directives enable you to take full advantage of the compute resources available on your platform.
0 commit comments