You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: paper/paper.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -48,21 +48,21 @@ bibliography: paper.bib
48
48
49
49
# Summary
50
50
SciWIn-Client (`s4n`) is a command-line tool developed as part of the Scientific Workflow Infrastructure (SciWIn) of the FAIRagro consortium [@Ewert2023Proposal]. It is designed to streamline the creation, execution and management of reproducible computational workflows using the _Common Workflow Language (CWL)_[@Crusoe2022MethodsIncluded]. By wrapping ordinary command-line commands with a thin layer of tooling, SciWIn-Client automatically generates CWL definitions, allowing scientists to write CWL using the well-known commands rather than hand-authoring verbose specifications.
51
-
Implemented in Rust for high performance and reliability, SciWIn-Client integrates nativly with Git for version control and provenance tracking. It supports both local and remote workflow execution and is interoperable with the Workflow RO-Crate[@??] and Workflow Run RO-Crate[@Leo2024WRRC] standards. Furthermore SciWIn-Client is interoperable with research data management frameworks such as DataPLANT's ARC format [@dataplant2025ARCSpec;@Weil2023PLANTdataHUB].
51
+
Implemented in Rust for high performance and reliability, SciWIn-Client integrates natively with Git for version control and provenance tracking. It supports both local and remote workflow execution and is interoperable with the Workflow RO-Crate[@??] and Workflow Run RO-Crate[@Leo2024WRRC] standards. Furthermore SciWIn-Client is interoperable with research data management frameworks such as DataPLANT's ARC format [@dataplant2025ARCSpec;@Weil2023PLANTdataHUB].
52
52
53
53
# Statement of Need
54
54
Automated computational workflows are essential for managing complex, multi-step data analysis across various scientific disciplines. Significant effort has been invested into domain-specific languages that formalize and standardize computational scientific processes, thereby enhancing reproducibility, scalability and efficiency.
55
-
To harmonize this wild growth of languages, the Common Workflow Language (CWL) was introducted as universal standard [@Crusoe2022MethodsIncluded]. Its design emphasizes flexibility and machine readability but its verbose YAML-based syntax poses a substantial barrier to adoption among researchers unfamiliar with structured data formats.
55
+
To harmonize this wild growth of languages, the Common Workflow Language (CWL) was introduced as universal standard [@Crusoe2022MethodsIncluded]. Its design emphasizes flexibility and machine readability but its verbose YAML-based syntax poses a substantial barrier to adoption among researchers unfamiliar with structured data formats.
56
56
57
57
CWL therefore is predestined to be written by machines rather than humans, which ultimately motivated the conception of SciWIn-Client.
58
58
SciWIn-Client provides an intuitive command-line interface that automates CWL generation and management. It translates typical research computing tasks into structured, version-controlled workflow definitions, effectively allowing scientists to “write CWL by doing science.”
59
59
60
60
# State of the field
61
-
The landscape of scientific workflow management is broad and fragmented. Numerous patforms and languages have emerged to adress the need for reproducible, automated data analysis pipeline. Tools such as Nextflow[@di_tommaso_nextflow_2017] and Galaxy[@giardine_galaxy_2005] have achieved significant adoption within the scientific community. Both offer powerful execution environments and rich graphical or scripting environments. Both platforms put significant effort in providing a broad set of scripts especially for the OMICS-community (e.g. nf-core), however lacking in the agro-community where individual scripting plays a key part.
62
-
Bringing individual scripts into the platform in both cases has a hurdle to overcome. For nextflow researchers need to learn the Groovy-based DSL, for Galaxy a curation process needs to be passed to get tools onto the platform. Workflows authored for Galaxy are typically bound to a specific Galaxy instance, and portability across infrastructures can require substantial re-engineering effort.
63
-
CWL was introduced as a vendor-neutral, platform agnostic standard to address fragmentation. CWL workflows are portable by design as they in principle can run on any compliant execution engine. They are even efforts to make Galaxy and Nextflow compliant to this standard [@ref]. One big downside however is the lack of tooling especially in the creation process. CWLs adoption is comparable smaller than Nextflow and Galaxy. Its verbose, YAML-based syntax demands familiarity with structured data formats and workflow abstractions that many domain researchers lack. The result is a paradox: a universal standard that remains inaccessible to a large share of its intended users.
61
+
The landscape of scientific workflow management is broad and fragmented. Numerous platforms and languages have emerged to address the need for reproducible, automated data analysis pipeline. Tools such as Nextflow[@di_tommaso_nextflow_2017] and Galaxy[@giardine_galaxy_2005] have achieved significant adoption within the scientific community. Both offer powerful execution environments and rich graphical or scripting environments. Both platforms put significant effort in providing a broad set of scripts especially for the OMICS-community (e.g. nf-core), however lacking in the agro-community where individual scripting plays a key part.
62
+
Bringing individual scripts into the platform in both cases has a hurdle to overcome. For Nextflow researchers need to learn the Groovy-based DSL, for Galaxy a curation process needs to be passed to get tools onto the platform. Workflows authored for Galaxy are typically bound to a specific Galaxy instance, and portability across infrastructures can require substantial re-engineering effort.
63
+
CWL was introduced as a vendor-neutral, platform agnostic standard to address fragmentation. CWL workflows are portable by design as they in principle can run on any compliant execution engine. There are even efforts to make Galaxy and Nextflow compliant to this standard [@ref]. One big downside however is the lack of tooling especially in the creation process. CWLs adoption is comparable smaller than Nextflow and Galaxy. Its verbose, YAML-based syntax demands familiarity with structured data formats and workflow abstractions that many domain researchers lack. The result is a paradox: a universal standard that remains inaccessible to a large share of its intended users.
64
64
The CWL ecosystem further compounds this problem. While a number of great runner implementations exist (e.g. cwltool, Toil, REANA, ........), the space of authoring tools is sparse. Rabix offered a graphical editor (Rabix Composer) which was made closed-source and moved into the seven-bridges Platform. The open-sourced version has been unmaintained for over 5 years and is significantly outdated. Many generators are outdated as well meaning there is no actively developed open and lightweight CWL generator that integrates naturally into a researchers existing command-line-driven "workflow". (Planemo einbauen??!?)
65
-
SciWIn-Client adresses this gap removing the need for researchers to write CWL by hand. Second it works fully offline without dependencies to any platform and is Git-native.
65
+
SciWIn-Client addresses this gap removing the need for researchers to write CWL by hand. Second it works fully offline without dependencies to any platform and is Git-native.
66
66
67
67
68
68
- Fragmented landscape
@@ -84,7 +84,7 @@ A central concept of the tool is the automation of CWL generation. When users in
84
84
85
85
Once individual CWL CommandLineTools have been created, the next step is to combine them into a CWL Workflow. This is achieved using the `s4n connect` command, which allows the user to specify a source (starting tool or node) and a target (a subsequent tool or node). By linking the output of one tool to the input of another, the user defines the workflow's execution sequence.
86
86
87
-
In order to expand the possible sources for connecting complex workflows, there is the option to `install`exisiting workflows using SciWIn-Client which internally uses Git's submodule feature.
87
+
In order to expand the possible sources for connecting complex workflows, there is the option to `install`existing workflows using SciWIn-Client which internally uses Git's submodule feature.
88
88
89
89
## Workflow Execution
90
90
The simplest way to execute a workflow is to run it directly on the machine where the workflow is defined by using the `s4n execute local` command (or `cwltool` which however does not support Windows).
0 commit comments