You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/side_quests/workflow_management_fundamentals.md
+10-8Lines changed: 10 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -350,7 +350,7 @@ Check the results:
350
350
ls results/
351
351
```
352
352
353
-
It worked! But you have 3 samples (and 50 more coming).
353
+
This works for one sample, but you have 3 samples (and 50 more coming).
354
354
Running this command manually for each one isn't practical.
355
355
356
356
---
@@ -512,7 +512,7 @@ Notice the interleaved output - all samples running at once:
512
512
[WT_REP2] Complete!
513
513
```
514
514
515
-
Much faster!
515
+
Faster, because all samples run concurrently.
516
516
517
517
#### The Hidden Problem
518
518
@@ -677,6 +677,8 @@ Nextflow is a workflow manager. Instead of writing imperative scripts that say "
677
677
678
678
The key difference: in bash, you explicitly manage data flow with variables and file paths. In Nextflow, you declare what each process needs, and Nextflow figures out the execution order automatically.
679
679
680
+
#### Software Management
681
+
680
682
In Part 1, you installed FastQC, fastp, and Salmon with conda - hoping dependencies wouldn't conflict, documenting versions manually.
681
683
682
684
With Nextflow, each process declares its own software requirements. The tools are configured automatically at runtime with support for whichever software packaging tool you prefer. Your colleague runs the same pipeline and gets the exact same software environment based on the process definition, not their system.
@@ -720,11 +722,11 @@ Each part has a purpose:
720
722
-**`output`** - Declares produced files with named channels (`emit: html`) for downstream processes
721
723
-**`script`** - The actual command, nearly identical to your bash version
722
724
723
-
Notice you're not writing any loop logic, file existence checks, or error handling - Nextflow handles all of that.
725
+
You're not writing any loop logic, file existence checks, or error handling.
724
726
725
727
!!! tip "Contrast with scripts"
726
728
727
-
The **one line** `container 'quay.io/biocontainers/fastqc:0.12.1--hdfd78af_0'` handles all software dependencies. The version is locked forever. Your colleague, your cluster, your cloud - all get the exact same FastQC.
729
+
The **one line** `container 'quay.io/biocontainers/fastqc:0.12.1--hdfd78af_0'` handles all software dependencies. The version is locked forever. Your colleague, your cluster, and your cloud all get the exact same FastQC.
728
730
729
731
#### Call FASTQC in main.nf
730
732
@@ -756,7 +758,7 @@ executor > local (3)
756
758
[a1/b2c3d4] FASTQC (WT_REP1) [100%] 3 of 3 ✔
757
759
```
758
760
759
-
**All 3 samples ran in parallel automatically.** In Part 1, you wrote `&` and `wait` and worried about resource limits. Nextflow figures out optimal parallelization from your process definition alone - no infrastructure code required.
761
+
**All 3 samples ran in parallel automatically.** In Part 1, you wrote `&` and `wait` and worried about resource limits. Nextflow figures out optimal parallelization from your process definition alone.
760
762
761
763
---
762
764
@@ -961,7 +963,7 @@ Watch what happens - Nextflow automatically determines the execution order from
961
963
962
964
!!! tip "Contrast with scripts"
963
965
964
-
In Part 1, you implemented `&` and `wait`, then worried about memory limits with 500 samples. Nextflow infers parallelization from the data flow and respects resource declarations - optimal scheduling without infrastructure code.
966
+
In Part 1, you implemented `&` and `wait`, then worried about memory limits with 500 samples. Nextflow infers parallelization from the data flow and respects resource declarations.
965
967
966
968
---
967
969
@@ -1010,7 +1012,7 @@ ch_multiqc = FASTQC.out.zip
1010
1012
MULTIQC(ch_multiqc)
1011
1013
```
1012
1014
1013
-
The `.collect()` operator waits for all upstream processes to complete, then passes everything to MultiQC as a single batch. Nextflow tracks which files to collect automatically - you don't write any "wait for all jobs" logic.
1015
+
The `.collect()` operator waits for all upstream processes to complete, then passes everything to MultiQC as a single batch. Nextflow tracks which files to collect automatically.
1014
1016
1015
1017
!!! tip "Contrast with scripts"
1016
1018
@@ -1124,7 +1126,7 @@ Building production-quality pipelines with scripts means writing significant inf
1124
1126
1125
1127
Workflow managers like Nextflow handle that infrastructure for you. You declare what each process needs and produces; the framework figures out the rest. The result is code that's almost entirely focused on your science, with production-quality features built in.
1126
1128
1127
-
There's another benefit worth mentioning:**standardization**. Workflow managers are established tools with communities, documentation, and shared best practices. When you join a new team or project using one, you're working with familiar concepts rather than deciphering someone's homegrown scripting solution. Skills transfer. You're not maintaining custom infrastructure - you're using battle-tested tools that thousands of others rely on.
1129
+
There's also**standardization**. Workflow managers are established tools with communities, documentation, and shared best practices. When you join a new team or project using one, you're working with familiar concepts rather than deciphering someone's homegrown scripting solution. Skills transfer. You're not maintaining custom infrastructure - you're using battle-tested tools that thousands of others rely on.
0 commit comments