You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The pipeline requires us to define both an input samplesheet and an output directory to place our results. We supply these with the `--input` and `--outdir` parameters, respectively. We've already looked at our input samplesheet: `~/data/samplesheet.csv`. Our output directory can be named anything we want, and will be automatically created by Nextflow if it doesn't already exists.
253
-
254
-
!!! example "Exercise 1.4.2.1"
255
-
256
-
Create a new file called `run.sh` and start writing a run command for the rnaseq pipeline. Start by providing the samplesheet as input. Also define an output directory called `lesson-1.4`.
257
-
258
-
??? success "Solution"
259
-
260
-
First, create the new run script:
261
-
262
-
```bash
263
-
touch run.sh
264
-
```
265
-
266
-
Additionally, make sure it is executable:
267
-
268
-
```bash
269
-
chmod +x run.sh
270
-
```
271
-
272
-
Open the file within VSCode so you can easily edit it. Remember you can do this via the graphical interface or with the `code` command in the terminal:
273
-
274
-
```bash
275
-
code run.sh
276
-
```
277
-
278
-
Next, start by writing out the basic `nextflow run` command:
279
-
280
-
```bash title="run.sh"
281
-
nextflow run nf-core-rnaseq-3.23.0/3_23_0 \
282
-
```
283
-
284
-
**Note** that we have added a space and a backslash (` \`) to the end of the line so we may continue writing the full command over multiple lines for legibility.
285
-
286
-
Next, add the `--input` parameter and pass it the path to the samplesheet. Be sure to replace `<USERNAME>` with your provided user name:
287
-
288
-
```bash title="run.sh" hl_lines="2"
289
-
nextflow run nf-core-rnaseq-3.23.0/3_23_0 \
290
-
--input /home/<USERNAME>/data/samplesheet.csv \
291
-
```
292
-
293
-
Finally, add the `--outdir` parameter and give it the name `lesson-1.4`:
294
-
295
-
```bash title="run.sh" hl_lines="3"
296
-
nextflow run nf-core-rnaseq-3.23.0/3_23_0 \
297
-
--input /home/<USERNAME>/data/samplesheet.csv \
298
-
--outdir lesson-1.4 \
299
-
```
300
-
301
-
### Required input: reference data
290
+
### Reference data
302
291
303
292
Many nf-core pipelines have a minimum requirement for reference data inputs. The input reference data requirements for this pipeline are provided in the [usage documentation](https://nf-co.re/rnaseq/3.11.1/usage#reference-genome-files). To see what reference files we can specify using parameters, rerun the pipeline's help command to view all the available parameters.
304
293
@@ -355,15 +344,52 @@ For each of these parameters, we have the following files that we can use:
355
344
356
345
**Note** that we are just using chr18 as it is a relatively small chromosome, so this should help to keep the run time for our exercises nice and short.
357
346
347
+
### Writing the run command: required `--input` and `--outdir` parameters
348
+
349
+
The pipeline requires us to define both an input samplesheet and an output directory to place our results. We supply these with the `--input` and `--outdir` parameters, respectively. We've already looked at our input samplesheet: `~/data/samplesheet.csv`. Our output directory can be named anything we want, and will be automatically created by Nextflow if it doesn't already exists.
350
+
351
+
!!! example "Exercise 1.4.2.1"
352
+
353
+
Start writing a run command for the rnaseq pipeline. Start by providing the samplesheet as input. Also define an output directory called `lesson-1.4`.
354
+
355
+
??? success "Solution"
356
+
357
+
Start by writing out the basic `nextflow run` command:
358
+
359
+
```bash
360
+
nextflow run nf-core-rnaseq-3.23.0/3_23_0 \
361
+
```
362
+
363
+
**Note** that we have added a space and a backslash (` \`) to the end of the line so we may continue writing the full command over multiple lines for legibility. If you hit `Enter` now, the command won't run yet, but you will be provided a new line to continue writing.
364
+
365
+
Next, add the `--input` parameter and pass it the path to the samplesheet. Be sure to replace `<USERNAME>` with your provided user name:
366
+
367
+
```bash hl_lines="2"
368
+
nextflow run nf-core-rnaseq-3.23.0/3_23_0 \
369
+
--input /home/<USERNAME>/data/samplesheet.csv \
370
+
```
371
+
372
+
Finally, add the `--outdir` parameter and give it the name `lesson-1.4`:
373
+
374
+
```bash hl_lines="3"
375
+
nextflow run nf-core-rnaseq-3.23.0/3_23_0 \
376
+
--input /home/<USERNAME>/data/samplesheet.csv \
377
+
--outdir lesson-1.4 \
378
+
```
379
+
380
+
### Writing the run command: reference data
381
+
382
+
With the inputs and outputs defined, we next need to tell the pipeline where to find the necessary reference data. We have already determined the parameters and files we need to pass to the pipeline, so let's add them to the command now.
383
+
358
384
!!! example "Exercise 1.4.2.2"
359
385
360
-
Add the reference file parameters and their respective file paths to the `run.sh` script.
386
+
Continue writing your run command by passing the reference files to their respective parameters.
361
387
362
388
??? success "Solution"
363
389
364
-
Add the following lines to the end of `run.sh`:
390
+
Following on from the last line from Exercise 1.4.2.1, add the `--fasta`, `--gtf`, `--star_index`, and `--salmon_index` parameters, and pass them the files we determined above in [Reference data](#reference-data):
365
391
366
-
```bash title="run.sh" hl_lines="4-7"
392
+
```bash hl_lines="4-7"
367
393
nextflow run nf-core-rnaseq-3.23.0/3_23_0 \
368
394
--input /home/<USERNAME>/data/samplesheet.csv \
369
395
--outdir lesson-1.4 \
@@ -375,43 +401,36 @@ For each of these parameters, we have the following files that we can use:
375
401
376
402
### Optional parameters
377
403
378
-
Now that we have prepared our input and reference data, we will customise the typical run command by:
379
-
380
-
1. Using Nextflow's `-profile` parameter to specify that we will be running the Singularity profile instead of the Docker profile
381
-
2. Adding additional process-specific flags to [skip duplicate read marking](https://nf-co.re/rnaseq/3.23.0/parameters#skip_markduplicates), [save trimmed reads](https://nf-co.re/rnaseq/3.23.0/parameters#save_trimmed) and [save unaligned reads](https://nf-co.re/rnaseq/3.23.0/parameters#save_unaligned)
382
-
383
-
The parameters we will use are:
404
+
Now that we have prepared our input and reference data, we have defined all the required parameters for the pipeline. However, Nextflow still needs to be configured to use Singularity, and we will add an additional workflow parameter to help speed up the pipeline run for the sake of this workshop. The parameters we will use are:
384
405
385
406
-`-profile singularity`
407
+
- Recall that this is a **Nextflow** parameter and tell it to use nf-core's Singularity profile, rather than the default Docker profile, and run each process using Singularity containers.
386
408
-`--skip_markduplicates true`
387
-
-`--save_trimmed true`
388
-
-`--save_unaligned true`
409
+
- This is a pipeline parameter that tells the `rnaseq` pipeline to [skip duplicate read marking](https://nf-co.re/rnaseq/3.23.0/parameters#skip_markduplicates). Ordinarily we would want to include this, but for the sake of the workshop and in the interest of time we will skip it.
389
410
390
411
!!! example "Exercise 1.4.2.3"
391
412
392
413
Add the optional parameters and the singularity profile to the run command.
393
414
394
415
??? success "Solution"
395
416
396
-
Add the following lines to the end of `run.sh`:
417
+
Finish writing the run command by adding the `-profile` and `--skip_markduplicates` parameters:
**Remember** that `-profile` is a *Nextflow parameter* and therefore only uses a **single hyphen**. The remaining parameters are *workflow parameters* and use a **double hyphen**.
413
432
414
-
**Note** also that we have left off the trailing space and bashslash from the final line (`--save_unaligned true`) since this line concludes our initial run command.
433
+
**Note** also that we have left off the trailing space and bashslash from the final line (`--skip_markduplicates true`) since this line concludes our initial run command.
415
434
416
435
[hi](./1.3_configure.md#configuring-processes)
417
436
@@ -421,9 +440,23 @@ The parameters we will use are:
421
440
422
441
The inclusion of `ext.args` is currently best practice for all DSL2 nf-core modules where additional parameters may be required to run a process. However, this may not be implemented for all modules in all nf-core pipelines. Depending on the pipeline, these process modules may not have defined the `ext.args` variable in the script blocks and is thus not available for applying customisation. If that is the case consider submitting a feature request or a making pull request on the pipeline's GitHub repository to implement this!
423
442
424
-
### Setting resource limits
443
+
## 1.4.3 Run the pipeline
444
+
445
+
You should now have a multi-line command in your terminal waiting to run. Now if you hit `Enter`, Nextflow should launch and the pipeline will start to run. It will take a few seconds to start up, and then you should start seeing processes spawning and running.
446
+
447
+

448
+
449
+

450
+
451
+
However, very quickly, we run into an error!
452
+
453
+

425
454
426
-
There is one thing left to do with our basic run command, and that is to set some resource limits. The `nf-core/rnaseq` pipeline is designed to run on large datasets and therefore expects to require lots of CPU and memory resources to run. However, we're using a small test dataset that doesn't need a lot of computing power, and as such we're also using low-resource VMs. Running the workflow with its default settings will cause it to crash due to insufficient CPU and memory requirements.
455
+
What happened?
456
+
457
+
## 1.4.4 Setting resource limits
458
+
459
+
It turns out that there is one thing left to do in order to run the pipeline: set some **resource limits**. The `nf-core/rnaseq` pipeline is designed to run on large datasets and therefore expects to require lots of CPU and memory resources to run. However, we're using a small test dataset that doesn't need a lot of computing power, and as such we're also using low-resource VMs. Running the workflow with its default settings causes some of the processes to crash due to insufficient CPU and memory requirements.
427
460
428
461
We can fix this by telling Nextflow that we want to limit the resource requests from each process to an upper bound of 2 CPUs and 6GB of memory. We do this within a custom configuration file using the `process.resourceLimits` directive. This takes a list of upper resource limits like so:
429
462
@@ -435,22 +468,22 @@ process.resourceLimits = [
435
468
]
436
469
```
437
470
438
-
!!! example "Exercise 1.4.2.4"
471
+
!!! example "Exercise 1.4.4"
439
472
440
-
Create a file called `nextflow.config` within your current working directory (`~/session2`) and add the `resourceLimits` directive, giving our workflow a limit of 2 CPUs and 6GB of memory.
473
+
Create a configuration file called `nectar_vm.config` within your current working directory (`~/session2`) and add the `resourceLimits` directive, giving our workflow a limit of 2 CPUs and 6GB of memory.
441
474
442
475
??? success "Solution"
443
476
444
-
First, create the `nextflow.config` file:
477
+
First, create the `nectar_vm.config` file:
445
478
446
479
```bash
447
-
touch nextflow.config
448
-
code nextflow.config
480
+
touch nectar_vm.config
481
+
code nectar_vm.config
449
482
```
450
483
451
484
Next, add the `resourceLimits` directive. You can do this in one of two ways. You can use the `process.resourceLimits` form as shown above:
452
485
453
-
```groovy title="nextflow.config"
486
+
```groovy title="nectar_vm.config"
454
487
process.resourceLimits = [
455
488
cpus: 2,
456
489
memory: 6.GB
@@ -459,7 +492,7 @@ process.resourceLimits = [
459
492
460
493
Alternatively, you can use the expanded version by nesting `resourceLimits` within a `process` scope:
461
494
462
-
```groovy title="nextflow.config"
495
+
```groovy title="nectar_vm.config"
463
496
process {
464
497
resourceLimits = [
465
498
cpus: 2,
@@ -470,11 +503,11 @@ process.resourceLimits = [
470
503
471
504
The second form is preferable since we will need the `process` scope for configuring processes further in the second session.
472
505
473
-
We now have a finished initial run command. Note how we didn't update `run.sh` after creating the new `nextflow.config` file. Recall from the [previous session](./1.3_configure.md#131-introduction-to-nextflow-configuration) that Nextflow will automatically include a `nextflow.config` file in the launch directory in its configuration. So, if we have configuration options we want to include for every run, we can add them to `nextflow.config` and they will be automatically loaded without having to specify the file in our run command.
506
+
We now have a finished initial run command. Now we just need to update our run command to include the new configuration file, as well as tell Nextflow to resume from where it left off - there's no sense re-running jobs that already succeeded!
474
507
475
508
Our final run command and default config file look like:
476
509
477
-
```bashtitle="run.sh"
510
+
```bashhl_lines="11-13"
478
511
nextflow run nf-core-rnaseq-3.23.0/3_23_0 \
479
512
--input /home/<USERNAME>/data/samplesheet.csv \
480
513
--outdir lesson-1.4 \
@@ -485,10 +518,12 @@ nextflow run nf-core-rnaseq-3.23.0/3_23_0 \
485
518
-profile singularity \
486
519
--skip_markduplicates true \
487
520
--save_trimmed true \
488
-
--save_unaligned true
521
+
--save_unaligned true \
522
+
-c nectar_vm.config \
523
+
-resume
489
524
```
490
525
491
-
```groovy title="nextflow.config"
526
+
```groovy title="nectar_vm.config"
492
527
process {
493
528
resourceLimits = [
494
529
cpus: 2,
@@ -497,17 +532,9 @@ process {
497
532
}
498
533
```
499
534
500
-
## 1.4.3 Run the pipeline
501
-
502
-
Now all that is left to do is to run the pipeline!
535
+
Go ahead and re-run the workflow. It should now run successfully to completion!
503
536
504
-
!!! example "Run the pipeline"
505
-
506
-
Simply run the `run.sh` script to execute the pipeline:
507
-
508
-
```bash
509
-
./run.sh
510
-
```
537
+
## 1.4.5 Examine the outputs
511
538
512
539
:eyes: Take a look at the stdout printed to the screen. Your workflow configuration and parameter customisations are all documented here. You can use this to confirm if your parameters have been correctly passed to the run command:
513
540
@@ -527,8 +554,6 @@ To understand how this is coordinated, consider the STAR_ALIGN process that is b
527
554
- Once a TRIMGALORE task is completed for a sample, the STAR_ALIGN task for that sample begins
528
555
- When the STAR_ALIGN process starts, it spawns 2 tasks.
529
556
530
-
## 1.4.4 Examine the outputs
531
-
532
557
Once your pipeline has completed, you should see this message printed to your terminal:
533
558
534
559
```console title="Output"
@@ -556,10 +581,9 @@ In the meantime, list the contents of your directory. You will see a few new dir
0 commit comments