Sydney-Informatics-Hub
diff --git a/‎docs/assets/2.3_github_issues.png‎
20.5 KB b/‎docs/assets/2.3_github_issues.png‎
20.5 KB
diff --git a/‎docs/index.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/index.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/session_2/2.0_intro.md‎
Lines changed: 3 additions & 3 deletions b/‎docs/session_2/2.0_intro.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/session_2/2.1_params.md‎
Lines changed: 57 additions & 40 deletions b/‎docs/session_2/2.1_params.md‎
Lines changed: 57 additions & 40 deletions
diff --git a/‎docs/session_2/2.2_config.md‎
Lines changed: 47 additions & 23 deletions b/‎docs/session_2/2.2_config.md‎
Lines changed: 47 additions & 23 deletions
diff --git a/‎docs/session_2/2.3_custom.md‎
Lines changed: 1 addition & 9 deletions b/‎docs/session_2/2.3_custom.md‎
Lines changed: 1 addition & 9 deletions
@@ -36,7 +36,7 @@ Our full code of conduct, with incident reporting guidelines, is available [here
 |------------|----------|
 | [Set up your computer](./setup.md)| Follow these instructions to install VS Code and login to your Nectar VM. |
 | [Session 1: Introduction to nf-core](session_1/1.0_intro.md)| Learn fundamental ideas and skills that are essential for using Nextflow and nf-core workflows. |
-| [Session 2: Customising nf-core](session_2/2.0_intro.md)| Write, run, adjust, and re-run an nf-core workflow as we step through various customisation scenarios. |
+| [Session 2: Customising nf-core](session_2/2.0_intro.md)| Write, run, adjust, and re-run an nf-core workflow as we step through various customisation and troubleshooting scenarios. |
 
 ## Course survey
 
 
@@ -8,9 +8,9 @@ This session builds on [Session 1](../session_1/1.0_intro.md), where we explored
 - Nextflow `run` command starts the workflow
 - Nextflow `log` can be used to inspect run details
 
-At the end of session 1, we downloaded the [nf-core/rnaseq](https://nf-co.re/rnaseq/3.23.0) pipeline and submitted our first run. 
+At the end of Session 1, we downloaded the [nf-core/rnaseq](https://nf-co.re/rnaseq/3.23.0) pipeline and submitted our first run. 
 
-During Session 2, we will continue with this pipeline, using the same data from a [published mouse RNAseq study](https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-2801-4). Pipeline outputs and source code will be explored, and various customisations will be applied using parameters and configuration files to provide you with hands-on experience in nf-core pipeline customisation.
+During Session 2, we will continue with this pipeline, using the same data from a [published mouse RNAseq study](https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-2801-4). Pipeline outputs and source code will be explored, and various customisations will be applied using parameters and configuration files to provide you with hands-on experience in nf-core pipeline customisation. We will also learn about maintaining reproducibility and portability when running custom analyses. 
 
 
 
@@ -32,7 +32,7 @@ During Session 2, we will continue with this pipeline, using the same data from
     2. Select the IP address for your VM from the drop-down list
     3. Type in your provided password and hit enter
 
-    Change into the session 2 directory created in the previous session: 
+    Change into the `session2` directory created in the previous session: 
 
     ```default
     cd session2
 
@@ -9,7 +9,7 @@
 
 ## 2.2.1 Separation of parameters and configurations
 
-In [lesson 2.1](./2.1_params.md), we explored how to customise a run with **nf-core pipeline parameters** on the command line, within a parameters file, or using a run script. In this lesson we will expand upon the Nextflow **configuration settings** introduced in [lesson 1.3](./2.1_params.md). 
+In [Lesson 2.1](./2.1_params.md), we explored how to customise a run with **nf-core pipeline parameters** on the command line, within a parameters file, or using a run script. In this lesson we will expand upon the Nextflow **configuration settings** introduced in [Lesson 1.3](../session_1/1.3_configure.md). 
 
 Pipeline parameters control *what* is run, where configurations control *how* it is run. Customising configurations can be an essential part of getting the workflow to run on your compute system, whether that be your local computer, a remote server or VM, cloud, or High Performance Computer (HPC).
 
@@ -102,35 +102,59 @@ The below exercise is designed to familiarise you with searching nf-core configu
 
     What are the default settings for CPU and memory for the STAR_ALIGN module?
 
-    ??? hint "Hint 1"
-        Find the process label within the STAR_ALIGN `main.nf` file
-
-    ??? hint "Hint 2"
-        Process labels are used in `conf/base.config` to assign default compute resources  
-    
-
+    ??? hint "Hint 1: Process labels"
+        
+        To uncover the default compute resources for the STAR_ALIGN process, we need to find out what **process label** has been assigned to this process. 
+        
+        Process labels are used in `conf/base.config` to assign default compute resources. 
 
-    ??? success "Solution"
+        Find the process label within the STAR_ALIGN `main.nf` file, and check the resources assigned to that label within `conf/base.config`.
 
-        To uncover the default compute resources for the STAR_ALIGN process, we need to find out what **process label** has been assigned to this process. Recall from [lesson 1.1.3](../session1/1.1_nfcore.md/#113-nf-core-workflow-structure) that each process (or 'module') has its own `main.nf` file which includes the Nextflow code to set up the task as well as the actual command to run the analysis. 
+    ??? hint "Hint 2: STAR_ALIGN module script"
+        
+        Recall from [Lesson 1.1.3](../session1/1.1_nfcore.md/#113-nf-core-workflow-structure) that each process (or 'module') has its own `main.nf` file which includes the Nextflow code to set up the task as well as the actual command to run the analysis. The STAR_ALIGN process label will be included within this script.          
 
         Finding the `main.nf` script for STAR_ALIGN or any other process can be a little tricky, since nf-core pipelines are a collection of workflows, subworkflows, and modules, that can be local (i.e. used only by) the pipeline, or those that are widely used on other nf-core pipelines. 
 
-        Applying the knowledge that all module scripts are named `main.nf`, we can't search for the file by name, but we can search for the tool name in the module filepath. nf-core filepaths use lower-case, while the process names themselves use capitals, such as STAR_ALIGN. 
+        Applying the knowledge that all nf-core module scripts are named `main.nf`, we can't search for the file by name, but we can search for the tool name in the module filepath. nf-core filepaths use lower-case, while the process names themselves use capitals, such as STAR_ALIGN. 
+
+        The bash command below will help you find the STAR_ALIGN `main.nf` file:
 
         ```bash
         find ./rnaseq/ -type d  -name "*star*" -print
         ```
 
+    ??? success "Solution"
+
+        Searching for `star` in the nf-core/rnaseq codebase yields the following output: 
+
+
         ```console title="Output"
         ./rnaseq/modules/nf-core/sentieon/staralign
         ./rnaseq/modules/nf-core/star
         ./rnaseq/subworkflows/local/align_star
         ```
 
-        The STAR alignment subworkflow script can be found at `./rnaseq/subworkflows/local/align_star/main.nf`, and the process script - which is called by the subworkflow - can be found at `./rnaseq/modules/nf-core/star/align/main.nf`. (We are not using the licensed 'sentieon' tools). 
+        The ALIGN_STAR subworkflow script can be found at `./rnaseq/subworkflows/local/align_star/main.nf`. This subworkflow calls the STAR_ALIGN module. 
+        
+        We can find the STAR_ALIGN `main.nf` file either by looking into the `./rnaseq/modules/nf-core/star` directory *or* by viewing the subworkflow script: 
+        
+        ```bash
+        head ./rnaseq/subworkflows/local/align_star/main.nf
+        ```
+
+        ```console title="Output"
+        //
+        // Alignment with STAR
+        //
+        include { SENTIEON_STARALIGN as SENTIEON_STAR_ALIGN } from '../../../modules/nf-core/sentieon/staralign/main'
+        include { PARABRICKS_RNAFQ2BAM as PARABRICKS_RNA_FQ2BAM } from '../../../modules/nf-core/parabricks/rnafq2bam/main'
+        include { STAR_ALIGN                                } from '../../../modules/nf-core/star/align'
+        include { STAR_ALIGN as STAR_ALIGN_IGENOMES          } from '../../../modules/nf-core/star/align'
+        include { BAM_SORT_STATS_SAMTOOLS                   } from '../../nf-core/bam_sort_stats_samtools'
+        ```
 
-        Now we have identified the process code, we can discover its label, for example with `more` or `grep: 
+        The process label can then be extracted from the `modules/nf-core/star/align/main.nf` file, for example with `more` or `grep: 
 
         ```bash
         grep label rnaseq/modules/nf-core/star/align/main.nf 
@@ -140,7 +164,7 @@ The below exercise is designed to familiarise you with searching nf-core configu
         label 'process_high'
         ```
 
-        Finally, identify the resources for STAR_ALIGN from the base config:
+        The `process_high` label within the `conf/base.config` shows us that the STAR_ALIGN process will receive 12 CPU and 72 GB memory by default:
 
         ```bash
         more rnaseq/conf/base.config
@@ -156,7 +180,7 @@ The below exercise is designed to familiarise you with searching nf-core configu
 
 ## 2.2.3 When to use a custom config file
 
-In [lesson 1.4.3](../session_1/1.4_rnaseq.md/#143-run-the-pipeline), we applied custom configurations to the rnaseq pipeline to restrict the maximum amount of CPUs and memory each process can use with the custom config we created. As observed when we attempted to run the pipeline before adding that configuration, this customisation was required in order to run the pipeline in our environment. 
+In [Lesson 1.4.3](../session_1/1.4_rnaseq.md/#143-run-the-pipeline), we applied custom configurations to the rnaseq pipeline to restrict the maximum amount of CPUs and memory each process can use with the custom config we created. As observed when we attempted to run the pipeline before adding that configuration, this customisation was required in order to run the pipeline in our environment. 
 
 Apart from reducing resources to adapt to a low-resource compute environment, there are other circumstances in which our nf-core pipeline run can benefit from custom configurations:
 
@@ -168,13 +192,13 @@ Apart from reducing resources to adapt to a low-resource compute environment, th
 - Customise outputs beyond what is possible using the nf-core pipeline parameters
 
 
-The rest of lesson 2.2 will explore custom resource configuration files, while lesson 2.3 will focus on customising outputs. We won't be covering customising runs for HPC in this workshop, but please check out our [tips and tricks page](../tips_tricks.md) later if you are interested in this, as well as the section below on institutional configs for HPC and other platforms. 
+The rest of Lesson 2.2 will explore custom resource configuration files. We won't be covering customising runs for HPC in this workshop, but please check out our [Tips and Tricks page](../tips_tricks.md) later if you are interested in this, as well as the section below on institutional configs for HPC and other platforms. 
 
 ## 2.2.4 Configuration profiles and shared configs
 
-In [lesson 1.4](../session_1/1.4_rnaseq.md#setting-resource-limits) we started developing a custom config for our workshop Nectar VMs. We applied this config to our run using the Nextflow `-c <myconfig>` parameter. 
+In [Lesson 1.4](../session_1/1.4_rnaseq.md#setting-resource-limits) we started developing a custom config for our workshop Nectar VMs. We applied this config to our run using the Nextflow `-c <myconfig>` parameter. 
 
-Custom configurations can also be included as a `profile`, just as we did for the MultiQC report configuration in [lesson 1.3.6](../session_1/1.3_configure.md/#custom-profiles). Profiles are the way in which nf-core's global community-driven shared institutional configs, introduced in [lesson 1.3.5](../session1/1.3_configure.md#135-shared-configuration-files), can be applied to your pipeline runs on any of the platforms included in the shared config collection. 
+Custom configurations can also be included as a `profile`, just as we did for the MultiQC report configuration in [Lesson 1.3.6](../session_1/1.3_configure.md/#custom-profiles). Profiles are the way in which nf-core's global community-driven shared institutional configs, introduced in [Lesson 1.3.5](../session1/1.3_configure.md#135-shared-configuration-files), can be applied to your pipeline runs on any of the platforms included in the shared config collection. 
 
 We recommend you use the [NCI Gadi shared config](https://nf-co.re/configs/nci_gadi) or [Pawsey Setonix shared config](https://nf-co.re/configs/pawsey_setonix) if you run nf-core pipelines on these national HPCs. 
 
@@ -223,7 +247,7 @@ As we continue customising our run on the workshop VMs, it makes sense to **defi
     Edit `nectar_vm.config` to define a profile named 'workshop', including the `resourceLimits` directive previously applied. 
 
     ??? hint "Hint 1"
-        Check the [Nextflow profiles scope docs](https://docs.seqera.io/nextflow/config#config-profiles) or revisit [lesson 1.3](../session_1/1.3_configure.md/#custom-profiles) for syntax guidance. 
+        Check the [Nextflow profiles scope docs](https://docs.seqera.io/nextflow/config#config-profiles) or revisit [Lesson 1.3](../session_1/1.3_configure.md/#custom-profiles) for syntax guidance. 
 
     ??? hint "Hint 2 "
         Check the solution of [Exercise 1.4.2.4](../session_1/1.4_rnaseq.md/#setting-resource-limits)
@@ -264,7 +288,7 @@ We can simplify this slightly by dropping the singularity profile from the run c
     Other commonly observed options are:  
 
     - `autoMounts` to allow Nextflow to automatically mount host paths when a container is executed (default: true since v 23.10.0)
-    - `cacheDir` to specify the directory Singularity cache directory. We have set this within our user profile in [lesson 1.2.2](../session_1/1.2_run.md/#122-managing-your-environment) so it is not required here. 
+    - `cacheDir` to specify the directory Singularity cache directory. We have set this within our user profile in [Lesson 1.2.2](../session_1/1.2_run.md/#122-managing-your-environment) so it is not required here. 
 
 <br>
 
@@ -458,7 +482,7 @@ In addition to `withLabel`, Nextflow also provides the `withName` process select
 
 <br>
 
-To utilise `withName`, we first need to ensure we have the correct and specific process name. For utmost specifity, the 'fully qualified name' is safest.  
+To utilise `withName`, we first need to ensure we have the correct and specific process name. For utmost specificity, the 'fully qualified name' is safest.  
 
 In nf-core pipelines, the fully qualified process name, also referred to as the **process execution path**, is built from the pipeline name, one or more workflows or subworkflows, and the final process name. For example:
 
@@ -535,9 +559,9 @@ We now expect to see:
 
 <br>
 
-⏲️ It is unlikely that our run will complete much faster, if at all, with our applied customisations. If you review the process execution times within the pipeline_info files, you'll note that even our bottleneck process completed in ~ 1 minute, with most others requiring only seconds. 
+⏲️ If you review the process execution times within the `pipeline_info` files, you'll see that even our bottleneck process completed in ~ 1 minute, with most others requiring only seconds. For this demonstration, it is therefore unlikely that our run will complete much faster following our customisations for this small dataset on these workshop VMs. 
 
-💻 The resource customisations we have applied in this lesson are trivial given the limitations of our workshop VMs. Consider how this approach can be really powerful when working on HPC or cloud infrastructures, where the [`executor`](https://docs.seqera.io/nextflow/executor) and [`queue`](https://docs.seqera.io/nextflow/reference/process#queue) directives enable you to take full advantage of the compute resources available on your platform.
+💻 Consider how this approach can be really powerful when working on HPC or cloud infrastructures, where the [`executor`](https://docs.seqera.io/nextflow/executor) and [`queue`](https://docs.seqera.io/nextflow/reference/process#queue) directives enable you to take full advantage of the compute resources available on your platform.
 
 <br>
 
 
@@ -1,9 +1 @@
-# Further customisations
-
-
-
-
-
-
-
-this is getting a bit long, let's wrap this up into a profile to keep things clean
+# Customisation in action