You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: paper/paper.md
+6-20Lines changed: 6 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -61,20 +61,7 @@ SciWIn-Client provides an intuitive command-line interface that automates CWL ge
61
61
The landscape of scientific workflow management is broad and fragmented. Numerous platforms and languages have emerged to address the need for reproducible, automated data analysis pipeline. Tools such as Nextflow[@di_tommaso_nextflow_2017] and Galaxy[@giardine_galaxy_2005] have achieved significant adoption within the scientific community. Both offer powerful execution environments and rich graphical or scripting environments. Both platforms put significant effort in providing a broad set of scripts especially for the OMICS-community (e.g. nf-core), however lacking in the agro-community where individual scripting plays a key part.
62
62
Bringing individual scripts into the platform in both cases has a hurdle to overcome. For Nextflow researchers need to learn the Groovy-based DSL, for Galaxy a curation process needs to be passed to get tools onto the platform. Workflows authored for Galaxy are typically bound to a specific Galaxy instance, and portability across infrastructures can require substantial re-engineering effort.
63
63
CWL was introduced as a vendor-neutral, platform agnostic standard to address fragmentation. CWL workflows are portable by design as they in principle can run on any compliant execution engine. There are even efforts to make Galaxy and Nextflow compliant to this standard [@ref]. One big downside however is the lack of tooling especially in the creation process. CWLs adoption is comparable smaller than Nextflow and Galaxy. Its verbose, YAML-based syntax demands familiarity with structured data formats and workflow abstractions that many domain researchers lack. The result is a paradox: a universal standard that remains inaccessible to a large share of its intended users.
64
-
The CWL ecosystem further compounds this problem. While a number of great runner implementations exist (e.g. cwltool, Toil, REANA, ........), the space of authoring tools is sparse. Rabix offered a graphical editor (Rabix Composer) which was made closed-source and moved into the seven-bridges Platform. The open-sourced version has been unmaintained for over 5 years and is significantly outdated. Many generators are outdated as well meaning there is no actively developed open and lightweight CWL generator that integrates naturally into a researchers existing command-line-driven "workflow". (Planemo einbauen??!?)
65
-
SciWIn-Client addresses this gap removing the need for researchers to write CWL by hand. Second it works fully offline without dependencies to any platform and is Git-native.
66
-
67
-
68
-
- Fragmented landscape
69
-
- Platforms like Nextflow, Galaxy -> significant adoption,
- CWL lacking ecosystem/tooling- lots of runners but only outdated "Generators" like Rabix (now behind vendor lock)
72
-
- Alleinstellungsmerkmale SciWIn:
73
-
- CWL, offline, niederschwelliger, unabhängiger, Anbindung an ARC-Ökosystem, unabhängiger was (insbesondere lokale) compute-Instanzen angeht. Vertrauliche Daten. Einfacheres Teilen von Skripten. Universeller durch Git-basierte Repräsentation (Benutzung beliebiger Forges).
74
-
- Arbeitsprozess ist auf **eine** GX-Instanz beschränkt?
75
-
- Interop mit Galaxy (wrappen um execution engines zu benutzen),
76
-
- Provenance capture, & versioning via git native
77
-
64
+
The CWL ecosystem further compounds this problem. While a number of great runner implementations exist (e.g. cwltool, Toil, REANA, Arvados), the space of authoring tools is sparse. Rabix offered a graphical editor (Rabix Composer) which was made closed-source and moved into the seven-bridges Platform. The open-sourced version has been unmaintained for over 5 years and is significantly outdated. Many generators are outdated as well meaning there is no actively developed open and lightweight CWL generator that integrates naturally into a researchers existing command-line-driven "workflow". SciWIn-Client addresses this gap removing the need for researchers to write CWL by hand. Second it works fully offline without dependencies to any platform and is Git-native.
78
65
79
66
# Software design
80
67
SciWIn-Client (short: `s4n`) is implemented in the Rust programming language, chosen for its high performance, strong type safety, and robust error handling — qualities essential in scientific software. Git integration provides built-in version control and interoperability with research data management frameworks such as DataPLANTs ARC [@dataplant2025ARCSpec][@Weil2023PLANTdataHUB] format which can be viewed as a Git-based implementation of the RO-Crate standard [@SoilandReyes2022ROCrate].
@@ -87,16 +74,15 @@ Once individual CWL CommandLineTools have been created, the next step is to comb
87
74
In order to expand the possible sources for connecting complex workflows, there is the option to `install` existing workflows using SciWIn-Client which internally uses Git's submodule feature.
88
75
89
76
## Workflow Execution
90
-
The simplest way to execute a workflow is to run it directly on the machine where the workflow is defined by using the `s4n execute local` command (or `cwltool` which however does not support Windows).
77
+
The simplest way to execute a workflow is to run it directly on the machine where the workflow is defined by using the `s4n execute local` command (or `cwltool` which however does not support Windows).
91
78
When performing high demanding calculations, workflows often need to be dispatched to large compute clusters. For the execution on compute clusters SciWIn-Client is able to communicate with the REST-API of Reana instances [@Simko2019Reana]. Reana is a reproducible research data analysis platform provided by CERN. FAIRagro operates their own Reana Installation in de.NBI Cloud.
92
79
Structured execution results in form of RO-crates [@SoilandReyes2022ROCrate] more specifically Workflow Run RO-Crates [@Leo2024WRRC] using the Provenance Run Crate profile can be exported.
93
80
94
81
# Research impact statement
95
-
- lowering technical barrier
96
-
- FAIRagro goals
97
-
- By automating CWL generation from everyday research computing tasks, it enables domain scientists — regardless of their software engineering background — to participate in open, collaborative, and reproducible science.
98
-
- transparent versioning, FAIR, ARC format, DataPLANT, WRRC
99
-
- Future Development: WorkflowHub? DockerGen
82
+
SciWIn-Client adresses a critical gap in open and reproducible science: The gap between the complexicty of formal workflow standards and the practical capabilities of reserachers. By automating CWL generation directly from command-line interactions, it enables scientists, regardless of their software engineering background, to produce structured, version-controlled, and portable workflow definitions without manual authoring of verbose specifications.
83
+
Within the FAIRagro-consortium[@Ewert2023Proposal], SciWIn-Client directly supports the FAIR principles for data by ensuring that computational processes in agricultural research are FAIR compliant by using a defined standard. Workflows produced by SciWIn-Client are natively compatible with the ARC format [@dataplant2025ARCSpec; @Weil2023PLANTdataHUB], enabling seamless integration with DataPLANT's research data management infrastructure, and can be exported as Workflow Run RO-Crates [@Leo2024WRRC], providing machine-readable provenance records for every execution.
84
+
The tool's Git-native design ensures transparent versioning of both data and workflow definitions, making the full computational history of a study useable and shareable. By supporting both local and remote execution SciWIn-Client accommodates the full spectrum of research computing needs, from exploratory analysis on a laptop to large-scale runs on institutional infrastructure.
85
+
100
86
The source code is openly available at https://github.com/fairagro/sciwin under a permissive license, and the project welcomes community contributions.
0 commit comments