WIP finish addressing feedback for 1.2

mgeaghan · mgeaghan · commit 6881fd350f4e · 2026-04-24T12:56:26.000+10:00
diff --git a/docs/session_1/1.2_run.md b/docs/session_1/1.2_run.md
@@ -132,7 +132,7 @@ nf-core --version
 
 nf-core tools are for everyone, with commands intended to help both **users** and **developers**. For users, the tools make it easier to execute workflows. For developers, the tools make it easier to develop and test your workflows using best practices. You can read about the nf-core commands on the [tools page](https://nf-co.re/tools/) of the nf-core website or using the command line.
 
-!!! example "Exercise 1.2.3"
+!!! example "Exercise 1.2.3.1"
 
     Find out what nf-core tools commands and options are available using the `--help` option:
 
@@ -152,8 +152,6 @@ nf-core tools is updated with new features and fixes regularly so it's best to k
 
 One very useful nf-core tools command is `nf-core pipelines download`. Sometimes you may need to execute an nf-core workflow on a computer with no internet connection, for example if you have highly protected data. In this case, you will need to fetch the workflow files and manually transfer them to your offline system. The `nf-core pipelines download` command makes this process easier and ensures accurate retrieval of correctly versioned code and software containers.
 
-The `nf-core pipelines download` command will download both the workflow code and the institutional nf-core/configs files. It can also optionally download singularity image file.
-
 ```bash
 nf-core pipelines download
 ```
@@ -176,9 +174,126 @@ Alternatively, you could build your own execution command with the command line
 
 ![](../assets/1.2_downloadhelp.png){width=100%}
 
-## 1.2.4 Executing a workflow
+The command line method also gives you a few additional options, including the ability to download all of the [nf-core institutional configs](https://nf-co.re/configs). This lets you run a workflow completely offline while still having access to these community-created configurations. **Note** that you must use the command line argument `--download-configuration yes` to do this; the interactive mode doesn't support this option yet.
+
+!!! example "Exercise 1.2.3.2"
+
+    Have a go at using `nf-core pipelines download` to download an nf-core pipeline along with the nf-core institutional configs. Tell the tool to:
+
+    - Download the `nf-core/rnaseq` pipeline
+    - Pull the `3.23.0` version of the pipeline
+    - Download the institutional configs
+    - **Not** download the singularity images for the pipeline (doing so might take a while!)
+    - **Not** compress the downloaded data
+
+    Consult `nf-core pipelines download --help` to help you find the right arguments.
+
+    ??? success "Solution"
+
+        The arguments we want are:
+
+        - `--revision 3.23.0`: this pulls the specific version we want
+        - `--download-configuration yes`: this pulls the institutional configs
+        - `--container-system none`: this tells the tool to not download any images
+        - `--compress none`: this tells the tool to not compress the data
+
+        The final command will look like:
+
+        ```bash
+        nf-core pipelines download rnaseq --revision 3.23.0 --download-configuration yes --container-system none --compress none
+        ```
+
+        You should see a new directory where you ran the command:
+
+        ```bash
+        ls
+        ```
+
+        ```console title="Output"
+        nf-core-rnaseq_3.23.0
+        ```
+
+        Let's look at what is inside:
+
+        ```bash
+        ls nf-core-rnaseq_3.23.0/
+        ```
+
+        ```console title="Output"
+        3_23_0  configs
+        ```
+
+        Looking one level deeper, we can see what each folder contains:
+
+        ```bash
+        ls -lh nf-core-rnaseq_3.23.0/*
+        ```
+
+        ```console title="Output"
+        nf-core-rnaseq_3.23.0/3_23_0:
+        total 340K
+        drwxrwxr-x 2 user3 user3 4.0K Apr 24 01:54 assets
+        drwxrwxr-x 2 user3 user3 4.0K Apr 24 01:54 bin
+        -rwxrwxr-x 1 user3 user3 113K Apr 24 01:54 CHANGELOG.md
+        -rwxrwxr-x 1 user3 user3  11K Apr 24 01:54 CITATIONS.md
+        -rwxrwxr-x 1 user3 user3  14K Apr 24 01:54 CODE_OF_CONDUCT.md
+        drwxrwxr-x 2 user3 user3 4.0K Apr 24 01:54 conf
+        drwxrwxr-x 5 user3 user3 4.0K Apr 24 01:54 docs
+        -rwxrwxr-x 1 user3 user3 1.1K Apr 24 01:54 LICENSE
+        -rwxrwxr-x 1 user3 user3 7.0K Apr 24 01:54 main.nf
+        drwxrwxr-x 4 user3 user3 4.0K Apr 24 01:54 modules
+        -rwxrwxr-x 1 user3 user3  24K Apr 24 01:54 modules.json
+        -rwxrwxr-x 1 user3 user3  17K Apr 24 01:54 nextflow.config
+        -rwxrwxr-x 1 user3 user3  57K Apr 24 01:54 nextflow_schema.json
+        -rwxrwxr-x 1 user3 user3 1.5K Apr 24 01:54 nf-test.config
+        -rwxrwxr-x 1 user3 user3  13K Apr 24 01:54 README.md
+        -rwxrwxr-x 1 user3 user3  23K Apr 24 01:54 ro-crate-metadata.json
+        drwxrwxr-x 4 user3 user3 4.0K Apr 24 01:54 subworkflows
+        drwxrwxr-x 2 user3 user3 4.0K Apr 24 01:54 tests
+        -rwxrwxr-x 1 user3 user3 3.0K Apr 24 01:54 tower.yml
+        drwxrwxr-x 3 user3 user3 4.0K Apr 24 01:54 workflows
+
+        nf-core-rnaseq_3.23.0/configs:
+        total 68K
+        drwxrwxr-x 2 user3 user3 4.0K Apr 24 01:54 bin
+        -rwxrwxr-x 1 user3 user3 1.6K Apr 24 01:54 CITATION.cff
+        drwxrwxr-x 5 user3 user3 4.0K Apr 24 01:54 conf
+        -rwxrwxr-x 1 user3 user3  273 Apr 24 01:54 configtest.nf
+        drwxrwxr-x 4 user3 user3 4.0K Apr 24 01:54 docs
+        -rwxrwxr-x 1 user3 user3 1.1K Apr 24 01:54 LICENSE
+        -rwxrwxr-x 1 user3 user3   69 Apr 24 01:54 nextflow.config
+        -rwxrwxr-x 1 user3 user3  15K Apr 24 01:54 nfcore_custom.config
+        drwxrwxr-x 2 user3 user3 4.0K Apr 24 01:54 pipeline
+        -rwxrwxr-x 1 user3 user3  17K Apr 24 01:54 README.md
+        ```
+
+        We can see that the `3_23_0` folder contains the pipeline code, including its `main.nf` file, `nextflow.config` file, its `modules` and `subworkflows`, along with its configuration folder `conf`.
+
+        Meanwhile the `configs` folder is where the institutional configs were downloaded. The config files themselves are under the `conf` directory:
+
+        ```bash
+        ls nf-core-rnaseq_3.23.0/configs/conf
+        ```
+
+        ```console title="Output"
+        abims.config
+        adcra.config
+        alice.config
+        alliance_canada.config
+        apollo.config
+        arcc.config
+        awsbatch.config
+        aws_tower.config
+        azurebatch.config
+        azurebatchdev.config
+        ...
+        ```
+
+        The pipeline code is set up to find these and include them when you request the appropriate profile; for example, if you run the pipeline with `-profile nci_gadi`, it will find the config file stored at `nf-core-rnaseq_3.23.0/configs/conf/nci_gadi.config` and include it in the pipeline's configuration.
+
+## 1.2.4 Downloading and executing workflows with `nextflow`
 
-Nextflow seamlessly integrates with code repositories such as [GitHub](https://github.com/). This feature allows you to manage your project code and use public Nextflow workflows &mdash; including nf-core workflows &mdash; quickly, consistently, and transparently.
+The `nextflow` command itself can also be used to download pipelines. Nextflow seamlessly integrates with code repositories such as [GitHub](https://github.com/), allowing you tu use public Nextflow workflows &mdash; including nf-core workflows &mdash; quickly, consistently, and transparently.
 
 The Nextflow `pull` command will download a workflow from a hosting platform into your global cache `$HOME/.nextflow/assets` folder.
 
@@ -202,19 +317,17 @@ nextflow clone foo/bar
 
 This is equivalent to pulling the GitHub repository directly with `git clone https://github.com/foo/bar`. The `nextflow clone` syntax simply shortens and cleans up the command.
 
-The Nextflow `run` command is used to initiate the execution of a workflow:
+Once the workflow is donwloaded, the Nextflow `run` command is used to initiate the execution of a workflow:
 
 ```bash
 nextflow run foo/bar
 ```
 
-If you `run` a workflow, it will look for a local file with the workflow name you’ve specified. If that file does not exist, it will next look in your `$HOME/.nextflow/assets` folder to see if you have previously `pull`ed the pipeline. Failing that, it will look for a public repository with the same name on GitHub (unless otherwise specified). If it is found, Nextflow will automatically `pull` the workflow to your global cache and execute it.
-
 !!! warning "Warning"
 
     Be aware of what is already in your current working directory where you launch your workflow. If there are other workflows (or configuration files) within the directory, you may encounter unexpected results.
 
-!!! example "Exercise 1.2.4.1"
+!!! example "Exercise 1.2.4"
 
     Use the `nextflow` command line tool to clone the [`nextflow-io/hello` Nextflow repository](https://github.com/nextflow-io/hello) to your local directory, then execute it.
 
@@ -260,48 +373,19 @@ If you `run` a workflow, it will look for a local file with the workflow name yo
 
         Note that the second line says ``Launching `hello/main.nf` ...``, which indicates that it was launched from the local directory.
 
-!!! example "Exercise 1.2.4.2"
-
-    Try executing the workflow directly from `nextflow-io` [GitHub](https://github.com/nextflow-io/hello) repository.
-
-    ??? success "Solution"
-
-        Use the `run` command again, but this time use include the `nextflow-io/` prefix in the workflow name:
+### 1.2.4.1 More on `nextflow run`
 
-        ```bash
-        nextflow run nextflow-io/hello
-        ```
-
-        Since there is no local directory called `nextflow-io/hello`, the workflow will be automatically pulled from GitHub and executed. You should see:
-
-        ```console title="Output"
+When you run a pipeline with `nextflow run some_pipeline`, it will look for a local folder with the workflow name you’ve specified and a `main.nf` file within. If that file does not exist, it will next look in your `$HOME/.nextflow/assets` folder to see if you have previously `pull`ed the pipeline. Failing that, it will look for a public repository with the same name on GitHub (unless otherwise specified). If it is found, Nextflow will automatically `pull` the workflow to your global cache and execute it.
 
-        N E X T F L O W   ~  version 25.10.4
-
-        Pulling nextflow-io/hello ...
-        downloaded from https://github.com/nextflow-io/hello.git
-        Launching `https://github.com/nextflow-io/hello` [festering_engelbart] DSL2 - revision: d828daeef7 [master]
-
-        executor >  local (4)
-        [7f/f69f01] sayHello (4) [100%] 4 of 4 ✔
-        Bonjour world!
+This means it is possible to seamlessly `run` public Nextflow pipelines without having to manually download them first.
 
-        Ciao world!
-
-        Hello world!
-
-        Hola world!
-
-        ```
+!!! note "Our recommendation"
 
-        Note that now the output reads:
+    As you can see, there are a few different ways you can go about running a nextflow or nf-core pipeline. Generally, **we recommend always downloading the code to your working directory** and **not** using `nextflow pull` (and by extension `nextflow run` with pipelines you haven't already downloaded). This means using either the `nextflow clone` command or directly cloning the repository with `git clone` to make sure the code is in your working directory first.
 
-        ```console title="Output"
-        Pulling nextflow-io/hello ...
-        downloaded from https://github.com/nextflow-io/hello.git
-        ```
-
-        This indicates that the pipeline was pulled from the repository rather than executed from the local directory.
+    If you need to execute an **nf-core pipeline** in an environment **without an internet connection**, you can use the `nf-core pipelines download` method [mentioned above](#nf-core-pipelines-download) on a computer with internet and transfer it to where it will run. Again, this method ensures that you localise the pipeline code to your working directory first, and makes sure you have all of the required configuration files and singularity images ready for offline use.
+    
+    We recommend this approach because it is the most flexible approach and gives you control over exacly what version of the workflow is being downloaded and where it is being downloaded to (instead of all pipelines going to `$HOME/.nextflow/assets` as with `nextflow pull`/`nextflow run`). In addition, it provides greater flexibility in modifying configuration files to suit the needs of your data and infrastructure, for example process CPU and memory resources.
 
 More information about the Nextflow `run`, `pull`, and `clone` commands can be found in the Nextflow documentation:
 
@@ -323,11 +407,6 @@ More information about the Nextflow `run`, `pull`, and `clone` commands can be f
     nextflow run -r 1.2.0 foo/bar
     ```
 
-!!! note "Our recommendation"
-
-    As you can see, there are a few different ways you can go about running a nextflow or nf-core pipeline. We recommend using either the `nextflow clone` command or directly cloning the repository with `git clone`. This is because it is the most flexible approach and gives you control over exacly what version of the workflow is being downloaded and where it is being downloaded to (instead of all pipelines going to `$HOME/.nextflow/assets` as with `nextflow pull`/`nextflow run`). In addition, it provides greater flexibility in modifying configuration files to suit the needs of your data and infrastructure, for example process CPU and memory resources.
-
-    The one exception to this is when you need to execute an **nf-core pipeline** in an environment **without an internet connection**. In this case, the recommendation is to use the `nf-core pipelines download` method [mentioned above](#nf-core-pipelines-download), as this tool allows you to localise all of the required configuration files and singularity images for offline use.
 
 ## 1.2.5 Nextflow log
 
@@ -389,6 +468,19 @@ When running large (and possibly expensive) workflows, we want to be sure that i
 
 In Nextflow, we can utilise this cache by using the `-resume` option. The cache works keeping track of the file paths, file sizes, and modification times of all input files to a process. It also keeps track of the process definition itself. If these are unchanged between runs, the **cached** outputs are re-used. If any of these values have changed, the process will be re-run.
 
+!!! note "The Nextflow cache can be sensitive!"
+
+    It's important to note that the Nextflow cache looks for any change that might affect the output of each process. This includes:
+
+        - Input file modification times
+        - Changes to the script
+        - Changes to the container or conda environment used to run the process
+        - Changes to the `ext` properties, e.g. `ext.args`
+
+    Sometimes, you can run into issues where the cache is **invalidated** and a re-run of a process is forced even when nothing seems to have changed. Often this will happen on some HPC systems where file modification times aren't syncronised perfectly across parallel file systems. In these cases, it can help to apply the `process.cache = 'lenient'` configuration option to tell Nextflow to only use the file name and size, but not the modification time, to determine whether the cache is valid or not.
+
+    See the [Nextflow cache documentation](https://docs.seqera.io/nextflow/cache-and-resume) for further information on the cache and how to configure it.
+
 !!! note "Key points"
 
     - Environment variables can be used to control your Nextflow runtime