Skip to content

Commit 3673db0

Browse files
committed
updated paper.md
Software design section may needs some work being done, some references are missing, affil 5 not valid
1 parent fc029b6 commit 3673db0

1 file changed

Lines changed: 6 additions & 20 deletions

File tree

paper/paper.md

Lines changed: 6 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -61,20 +61,7 @@ SciWIn-Client provides an intuitive command-line interface that automates CWL ge
6161
The landscape of scientific workflow management is broad and fragmented. Numerous platforms and languages have emerged to address the need for reproducible, automated data analysis pipeline. Tools such as Nextflow[@di_tommaso_nextflow_2017] and Galaxy[@giardine_galaxy_2005] have achieved significant adoption within the scientific community. Both offer powerful execution environments and rich graphical or scripting environments. Both platforms put significant effort in providing a broad set of scripts especially for the OMICS-community (e.g. nf-core), however lacking in the agro-community where individual scripting plays a key part.
6262
Bringing individual scripts into the platform in both cases has a hurdle to overcome. For Nextflow researchers need to learn the Groovy-based DSL, for Galaxy a curation process needs to be passed to get tools onto the platform. Workflows authored for Galaxy are typically bound to a specific Galaxy instance, and portability across infrastructures can require substantial re-engineering effort.
6363
CWL was introduced as a vendor-neutral, platform agnostic standard to address fragmentation. CWL workflows are portable by design as they in principle can run on any compliant execution engine. There are even efforts to make Galaxy and Nextflow compliant to this standard [@ref]. One big downside however is the lack of tooling especially in the creation process. CWLs adoption is comparable smaller than Nextflow and Galaxy. Its verbose, YAML-based syntax demands familiarity with structured data formats and workflow abstractions that many domain researchers lack. The result is a paradox: a universal standard that remains inaccessible to a large share of its intended users.
64-
The CWL ecosystem further compounds this problem. While a number of great runner implementations exist (e.g. cwltool, Toil, REANA, ........), the space of authoring tools is sparse. Rabix offered a graphical editor (Rabix Composer) which was made closed-source and moved into the seven-bridges Platform. The open-sourced version has been unmaintained for over 5 years and is significantly outdated. Many generators are outdated as well meaning there is no actively developed open and lightweight CWL generator that integrates naturally into a researchers existing command-line-driven "workflow". (Planemo einbauen??!?)
65-
SciWIn-Client addresses this gap removing the need for researchers to write CWL by hand. Second it works fully offline without dependencies to any platform and is Git-native.
66-
67-
68-
- Fragmented landscape
69-
- Platforms like Nextflow, Galaxy -> significant adoption,
70-
- CWL vendor-neutral, platform agnostic, portable, runs everywhere
71-
- CWL lacking ecosystem/tooling- lots of runners but only outdated "Generators" like Rabix (now behind vendor lock)
72-
- Alleinstellungsmerkmale SciWIn:
73-
- CWL, offline, niederschwelliger, unabhängiger, Anbindung an ARC-Ökosystem, unabhängiger was (insbesondere lokale) compute-Instanzen angeht. Vertrauliche Daten. Einfacheres Teilen von Skripten. Universeller durch Git-basierte Repräsentation (Benutzung beliebiger Forges).
74-
- Arbeitsprozess ist auf **eine** GX-Instanz beschränkt?
75-
- Interop mit Galaxy (wrappen um execution engines zu benutzen),
76-
- Provenance capture, & versioning via git native
77-
64+
The CWL ecosystem further compounds this problem. While a number of great runner implementations exist (e.g. cwltool, Toil, REANA, Arvados), the space of authoring tools is sparse. Rabix offered a graphical editor (Rabix Composer) which was made closed-source and moved into the seven-bridges Platform. The open-sourced version has been unmaintained for over 5 years and is significantly outdated. Many generators are outdated as well meaning there is no actively developed open and lightweight CWL generator that integrates naturally into a researchers existing command-line-driven "workflow". SciWIn-Client addresses this gap removing the need for researchers to write CWL by hand. Second it works fully offline without dependencies to any platform and is Git-native.
7865

7966
# Software design
8067
SciWIn-Client (short: `s4n`) is implemented in the Rust programming language, chosen for its high performance, strong type safety, and robust error handling — qualities essential in scientific software. Git integration provides built-in version control and interoperability with research data management frameworks such as DataPLANTs ARC [@dataplant2025ARCSpec][@Weil2023PLANTdataHUB] format which can be viewed as a Git-based implementation of the RO-Crate standard [@SoilandReyes2022ROCrate].
@@ -87,16 +74,15 @@ Once individual CWL CommandLineTools have been created, the next step is to comb
8774
In order to expand the possible sources for connecting complex workflows, there is the option to `install` existing workflows using SciWIn-Client which internally uses Git's submodule feature.
8875

8976
## Workflow Execution
90-
The simplest way to execute a workflow is to run it directly on the machine where the workflow is defined by using the `s4n execute local` command (or `cwltool` which however does not support Windows).
77+
The simplest way to execute a workflow is to run it directly on the machine where the workflow is defined by using the `s4n execute local` command (or `cwltool` which however does not support Windows).
9178
When performing high demanding calculations, workflows often need to be dispatched to large compute clusters. For the execution on compute clusters SciWIn-Client is able to communicate with the REST-API of Reana instances [@Simko2019Reana]. Reana is a reproducible research data analysis platform provided by CERN. FAIRagro operates their own Reana Installation in de.NBI Cloud.
9279
Structured execution results in form of RO-crates [@SoilandReyes2022ROCrate] more specifically Workflow Run RO-Crates [@Leo2024WRRC] using the Provenance Run Crate profile can be exported.
9380

9481
# Research impact statement
95-
- lowering technical barrier
96-
- FAIRagro goals
97-
- By automating CWL generation from everyday research computing tasks, it enables domain scientists — regardless of their software engineering background — to participate in open, collaborative, and reproducible science.
98-
- transparent versioning, FAIR, ARC format, DataPLANT, WRRC
99-
- Future Development: WorkflowHub? DockerGen
82+
SciWIn-Client adresses a critical gap in open and reproducible science: The gap between the complexicty of formal workflow standards and the practical capabilities of reserachers. By automating CWL generation directly from command-line interactions, it enables scientists, regardless of their software engineering background, to produce structured, version-controlled, and portable workflow definitions without manual authoring of verbose specifications.
83+
Within the FAIRagro-consortium[@Ewert2023Proposal], SciWIn-Client directly supports the FAIR principles for data by ensuring that computational processes in agricultural research are FAIR compliant by using a defined standard. Workflows produced by SciWIn-Client are natively compatible with the ARC format [@dataplant2025ARCSpec; @Weil2023PLANTdataHUB], enabling seamless integration with DataPLANT's research data management infrastructure, and can be exported as Workflow Run RO-Crates [@Leo2024WRRC], providing machine-readable provenance records for every execution.
84+
The tool's Git-native design ensures transparent versioning of both data and workflow definitions, making the full computational history of a study useable and shareable. By supporting both local and remote execution SciWIn-Client accommodates the full spectrum of research computing needs, from exploratory analysis on a laptop to large-scale runs on institutional infrastructure.
85+
10086
The source code is openly available at https://github.com/fairagro/sciwin under a permissive license, and the project welcomes community contributions.
10187

10288
# Acknowledgements

0 commit comments

Comments
 (0)