Skip to content

Commit 6881fd3

Browse files
committed
WIP finish addressing feedback for 1.2
1 parent fa49d73 commit 6881fd3

1 file changed

Lines changed: 143 additions & 51 deletions

File tree

docs/session_1/1.2_run.md

Lines changed: 143 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@ nf-core --version
132132

133133
nf-core tools are for everyone, with commands intended to help both **users** and **developers**. For users, the tools make it easier to execute workflows. For developers, the tools make it easier to develop and test your workflows using best practices. You can read about the nf-core commands on the [tools page](https://nf-co.re/tools/) of the nf-core website or using the command line.
134134

135-
!!! example "Exercise 1.2.3"
135+
!!! example "Exercise 1.2.3.1"
136136

137137
Find out what nf-core tools commands and options are available using the `--help` option:
138138

@@ -152,8 +152,6 @@ nf-core tools is updated with new features and fixes regularly so it's best to k
152152

153153
One very useful nf-core tools command is `nf-core pipelines download`. Sometimes you may need to execute an nf-core workflow on a computer with no internet connection, for example if you have highly protected data. In this case, you will need to fetch the workflow files and manually transfer them to your offline system. The `nf-core pipelines download` command makes this process easier and ensures accurate retrieval of correctly versioned code and software containers.
154154

155-
The `nf-core pipelines download` command will download both the workflow code and the institutional nf-core/configs files. It can also optionally download singularity image file.
156-
157155
```bash
158156
nf-core pipelines download
159157
```
@@ -176,9 +174,126 @@ Alternatively, you could build your own execution command with the command line
176174

177175
![](../assets/1.2_downloadhelp.png){width=100%}
178176

179-
## 1.2.4 Executing a workflow
177+
The command line method also gives you a few additional options, including the ability to download all of the [nf-core institutional configs](https://nf-co.re/configs). This lets you run a workflow completely offline while still having access to these community-created configurations. **Note** that you must use the command line argument `--download-configuration yes` to do this; the interactive mode doesn't support this option yet.
178+
179+
!!! example "Exercise 1.2.3.2"
180+
181+
Have a go at using `nf-core pipelines download` to download an nf-core pipeline along with the nf-core institutional configs. Tell the tool to:
182+
183+
- Download the `nf-core/rnaseq` pipeline
184+
- Pull the `3.23.0` version of the pipeline
185+
- Download the institutional configs
186+
- **Not** download the singularity images for the pipeline (doing so might take a while!)
187+
- **Not** compress the downloaded data
188+
189+
Consult `nf-core pipelines download --help` to help you find the right arguments.
190+
191+
??? success "Solution"
192+
193+
The arguments we want are:
194+
195+
- `--revision 3.23.0`: this pulls the specific version we want
196+
- `--download-configuration yes`: this pulls the institutional configs
197+
- `--container-system none`: this tells the tool to not download any images
198+
- `--compress none`: this tells the tool to not compress the data
199+
200+
The final command will look like:
201+
202+
```bash
203+
nf-core pipelines download rnaseq --revision 3.23.0 --download-configuration yes --container-system none --compress none
204+
```
205+
206+
You should see a new directory where you ran the command:
207+
208+
```bash
209+
ls
210+
```
211+
212+
```console title="Output"
213+
nf-core-rnaseq_3.23.0
214+
```
215+
216+
Let's look at what is inside:
217+
218+
```bash
219+
ls nf-core-rnaseq_3.23.0/
220+
```
221+
222+
```console title="Output"
223+
3_23_0 configs
224+
```
225+
226+
Looking one level deeper, we can see what each folder contains:
227+
228+
```bash
229+
ls -lh nf-core-rnaseq_3.23.0/*
230+
```
231+
232+
```console title="Output"
233+
nf-core-rnaseq_3.23.0/3_23_0:
234+
total 340K
235+
drwxrwxr-x 2 user3 user3 4.0K Apr 24 01:54 assets
236+
drwxrwxr-x 2 user3 user3 4.0K Apr 24 01:54 bin
237+
-rwxrwxr-x 1 user3 user3 113K Apr 24 01:54 CHANGELOG.md
238+
-rwxrwxr-x 1 user3 user3 11K Apr 24 01:54 CITATIONS.md
239+
-rwxrwxr-x 1 user3 user3 14K Apr 24 01:54 CODE_OF_CONDUCT.md
240+
drwxrwxr-x 2 user3 user3 4.0K Apr 24 01:54 conf
241+
drwxrwxr-x 5 user3 user3 4.0K Apr 24 01:54 docs
242+
-rwxrwxr-x 1 user3 user3 1.1K Apr 24 01:54 LICENSE
243+
-rwxrwxr-x 1 user3 user3 7.0K Apr 24 01:54 main.nf
244+
drwxrwxr-x 4 user3 user3 4.0K Apr 24 01:54 modules
245+
-rwxrwxr-x 1 user3 user3 24K Apr 24 01:54 modules.json
246+
-rwxrwxr-x 1 user3 user3 17K Apr 24 01:54 nextflow.config
247+
-rwxrwxr-x 1 user3 user3 57K Apr 24 01:54 nextflow_schema.json
248+
-rwxrwxr-x 1 user3 user3 1.5K Apr 24 01:54 nf-test.config
249+
-rwxrwxr-x 1 user3 user3 13K Apr 24 01:54 README.md
250+
-rwxrwxr-x 1 user3 user3 23K Apr 24 01:54 ro-crate-metadata.json
251+
drwxrwxr-x 4 user3 user3 4.0K Apr 24 01:54 subworkflows
252+
drwxrwxr-x 2 user3 user3 4.0K Apr 24 01:54 tests
253+
-rwxrwxr-x 1 user3 user3 3.0K Apr 24 01:54 tower.yml
254+
drwxrwxr-x 3 user3 user3 4.0K Apr 24 01:54 workflows
255+
256+
nf-core-rnaseq_3.23.0/configs:
257+
total 68K
258+
drwxrwxr-x 2 user3 user3 4.0K Apr 24 01:54 bin
259+
-rwxrwxr-x 1 user3 user3 1.6K Apr 24 01:54 CITATION.cff
260+
drwxrwxr-x 5 user3 user3 4.0K Apr 24 01:54 conf
261+
-rwxrwxr-x 1 user3 user3 273 Apr 24 01:54 configtest.nf
262+
drwxrwxr-x 4 user3 user3 4.0K Apr 24 01:54 docs
263+
-rwxrwxr-x 1 user3 user3 1.1K Apr 24 01:54 LICENSE
264+
-rwxrwxr-x 1 user3 user3 69 Apr 24 01:54 nextflow.config
265+
-rwxrwxr-x 1 user3 user3 15K Apr 24 01:54 nfcore_custom.config
266+
drwxrwxr-x 2 user3 user3 4.0K Apr 24 01:54 pipeline
267+
-rwxrwxr-x 1 user3 user3 17K Apr 24 01:54 README.md
268+
```
269+
270+
We can see that the `3_23_0` folder contains the pipeline code, including its `main.nf` file, `nextflow.config` file, its `modules` and `subworkflows`, along with its configuration folder `conf`.
271+
272+
Meanwhile the `configs` folder is where the institutional configs were downloaded. The config files themselves are under the `conf` directory:
273+
274+
```bash
275+
ls nf-core-rnaseq_3.23.0/configs/conf
276+
```
277+
278+
```console title="Output"
279+
abims.config
280+
adcra.config
281+
alice.config
282+
alliance_canada.config
283+
apollo.config
284+
arcc.config
285+
awsbatch.config
286+
aws_tower.config
287+
azurebatch.config
288+
azurebatchdev.config
289+
...
290+
```
291+
292+
The pipeline code is set up to find these and include them when you request the appropriate profile; for example, if you run the pipeline with `-profile nci_gadi`, it will find the config file stored at `nf-core-rnaseq_3.23.0/configs/conf/nci_gadi.config` and include it in the pipeline's configuration.
293+
294+
## 1.2.4 Downloading and executing workflows with `nextflow`
180295

181-
Nextflow seamlessly integrates with code repositories such as [GitHub](https://github.com/). This feature allows you to manage your project code and use public Nextflow workflows — including nf-core workflows — quickly, consistently, and transparently.
296+
The `nextflow` command itself can also be used to download pipelines. Nextflow seamlessly integrates with code repositories such as [GitHub](https://github.com/), allowing you tu use public Nextflow workflows — including nf-core workflows — quickly, consistently, and transparently.
182297

183298
The Nextflow `pull` command will download a workflow from a hosting platform into your global cache `$HOME/.nextflow/assets` folder.
184299

@@ -202,19 +317,17 @@ nextflow clone foo/bar
202317

203318
This is equivalent to pulling the GitHub repository directly with `git clone https://github.com/foo/bar`. The `nextflow clone` syntax simply shortens and cleans up the command.
204319

205-
The Nextflow `run` command is used to initiate the execution of a workflow:
320+
Once the workflow is donwloaded, the Nextflow `run` command is used to initiate the execution of a workflow:
206321

207322
```bash
208323
nextflow run foo/bar
209324
```
210325

211-
If you `run` a workflow, it will look for a local file with the workflow name you’ve specified. If that file does not exist, it will next look in your `$HOME/.nextflow/assets` folder to see if you have previously `pull`ed the pipeline. Failing that, it will look for a public repository with the same name on GitHub (unless otherwise specified). If it is found, Nextflow will automatically `pull` the workflow to your global cache and execute it.
212-
213326
!!! warning "Warning"
214327

215328
Be aware of what is already in your current working directory where you launch your workflow. If there are other workflows (or configuration files) within the directory, you may encounter unexpected results.
216329

217-
!!! example "Exercise 1.2.4.1"
330+
!!! example "Exercise 1.2.4"
218331

219332
Use the `nextflow` command line tool to clone the [`nextflow-io/hello` Nextflow repository](https://github.com/nextflow-io/hello) to your local directory, then execute it.
220333

@@ -260,48 +373,19 @@ If you `run` a workflow, it will look for a local file with the workflow name yo
260373

261374
Note that the second line says ``Launching `hello/main.nf` ...``, which indicates that it was launched from the local directory.
262375

263-
!!! example "Exercise 1.2.4.2"
264-
265-
Try executing the workflow directly from `nextflow-io` [GitHub](https://github.com/nextflow-io/hello) repository.
266-
267-
??? success "Solution"
268-
269-
Use the `run` command again, but this time use include the `nextflow-io/` prefix in the workflow name:
376+
### 1.2.4.1 More on `nextflow run`
270377

271-
```bash
272-
nextflow run nextflow-io/hello
273-
```
274-
275-
Since there is no local directory called `nextflow-io/hello`, the workflow will be automatically pulled from GitHub and executed. You should see:
276-
277-
```console title="Output"
378+
When you run a pipeline with `nextflow run some_pipeline`, it will look for a local folder with the workflow name you’ve specified and a `main.nf` file within. If that file does not exist, it will next look in your `$HOME/.nextflow/assets` folder to see if you have previously `pull`ed the pipeline. Failing that, it will look for a public repository with the same name on GitHub (unless otherwise specified). If it is found, Nextflow will automatically `pull` the workflow to your global cache and execute it.
278379

279-
N E X T F L O W ~ version 25.10.4
280-
281-
Pulling nextflow-io/hello ...
282-
downloaded from https://github.com/nextflow-io/hello.git
283-
Launching `https://github.com/nextflow-io/hello` [festering_engelbart] DSL2 - revision: d828daeef7 [master]
284-
285-
executor > local (4)
286-
[7f/f69f01] sayHello (4) [100%] 4 of 4 ✔
287-
Bonjour world!
380+
This means it is possible to seamlessly `run` public Nextflow pipelines without having to manually download them first.
288381

289-
Ciao world!
290-
291-
Hello world!
292-
293-
Hola world!
294-
295-
```
382+
!!! note "Our recommendation"
296383

297-
Note that now the output reads:
384+
As you can see, there are a few different ways you can go about running a nextflow or nf-core pipeline. Generally, **we recommend always downloading the code to your working directory** and **not** using `nextflow pull` (and by extension `nextflow run` with pipelines you haven't already downloaded). This means using either the `nextflow clone` command or directly cloning the repository with `git clone` to make sure the code is in your working directory first.
298385

299-
```console title="Output"
300-
Pulling nextflow-io/hello ...
301-
downloaded from https://github.com/nextflow-io/hello.git
302-
```
303-
304-
This indicates that the pipeline was pulled from the repository rather than executed from the local directory.
386+
If you need to execute an **nf-core pipeline** in an environment **without an internet connection**, you can use the `nf-core pipelines download` method [mentioned above](#nf-core-pipelines-download) on a computer with internet and transfer it to where it will run. Again, this method ensures that you localise the pipeline code to your working directory first, and makes sure you have all of the required configuration files and singularity images ready for offline use.
387+
388+
We recommend this approach because it is the most flexible approach and gives you control over exacly what version of the workflow is being downloaded and where it is being downloaded to (instead of all pipelines going to `$HOME/.nextflow/assets` as with `nextflow pull`/`nextflow run`). In addition, it provides greater flexibility in modifying configuration files to suit the needs of your data and infrastructure, for example process CPU and memory resources.
305389

306390
More information about the Nextflow `run`, `pull`, and `clone` commands can be found in the Nextflow documentation:
307391

@@ -323,11 +407,6 @@ More information about the Nextflow `run`, `pull`, and `clone` commands can be f
323407
nextflow run -r 1.2.0 foo/bar
324408
```
325409

326-
!!! note "Our recommendation"
327-
328-
As you can see, there are a few different ways you can go about running a nextflow or nf-core pipeline. We recommend using either the `nextflow clone` command or directly cloning the repository with `git clone`. This is because it is the most flexible approach and gives you control over exacly what version of the workflow is being downloaded and where it is being downloaded to (instead of all pipelines going to `$HOME/.nextflow/assets` as with `nextflow pull`/`nextflow run`). In addition, it provides greater flexibility in modifying configuration files to suit the needs of your data and infrastructure, for example process CPU and memory resources.
329-
330-
The one exception to this is when you need to execute an **nf-core pipeline** in an environment **without an internet connection**. In this case, the recommendation is to use the `nf-core pipelines download` method [mentioned above](#nf-core-pipelines-download), as this tool allows you to localise all of the required configuration files and singularity images for offline use.
331410

332411
## 1.2.5 Nextflow log
333412

@@ -389,6 +468,19 @@ When running large (and possibly expensive) workflows, we want to be sure that i
389468

390469
In Nextflow, we can utilise this cache by using the `-resume` option. The cache works keeping track of the file paths, file sizes, and modification times of all input files to a process. It also keeps track of the process definition itself. If these are unchanged between runs, the **cached** outputs are re-used. If any of these values have changed, the process will be re-run.
391470

471+
!!! note "The Nextflow cache can be sensitive!"
472+
473+
It's important to note that the Nextflow cache looks for any change that might affect the output of each process. This includes:
474+
475+
- Input file modification times
476+
- Changes to the script
477+
- Changes to the container or conda environment used to run the process
478+
- Changes to the `ext` properties, e.g. `ext.args`
479+
480+
Sometimes, you can run into issues where the cache is **invalidated** and a re-run of a process is forced even when nothing seems to have changed. Often this will happen on some HPC systems where file modification times aren't syncronised perfectly across parallel file systems. In these cases, it can help to apply the `process.cache = 'lenient'` configuration option to tell Nextflow to only use the file name and size, but not the modification time, to determine whether the cache is valid or not.
481+
482+
See the [Nextflow cache documentation](https://docs.seqera.io/nextflow/cache-and-resume) for further information on the cache and how to configure it.
483+
392484
!!! note "Key points"
393485

394486
- Environment variables can be used to control your Nextflow runtime

0 commit comments

Comments
 (0)