feat(cwl): add CWL workflow submission endpoint and DB storage model by ryuwd · Pull Request #877 · DIRACGrid/diracx

ryuwd · 2026-04-02T09:30:59Z

In draft, under development and to be tested in certification first

cc @aldbr

Follows the plan in #858

read-the-docs-community · 2026-04-02T11:35:25Z

Documentation build overview

📚 diracx | 🛠️ Build #32194815 | 📁 Comparing 1f35e11 against latest (63dc086)

🔍 Preview build

No files changed.

Introduce the `POST /api/jobs/` endpoint for submitting CWL workflows with per-job input parameters, and `GET /api/workflows/{workflow_id}` for retrieving stored workflows. CWL definitions are stored once in a new `Workflows` table (content-addressed by SHA-256), with per-job parameters and workflow references added as new columns on `Jobs`. During the transition period, JDL is still produced via `cwl_to_jdl()` for compatibility with existing DIRAC infrastructure.

Pydantic now validates schema_version at model construction instead of a runtime check in the logic layer. Users can omit schema_version to get the default "1.0", but invalid versions are rejected immediately.

Parse YAML to dict and serialize as sorted JSON before SHA-256 hashing, so whitespace, comments, and key ordering differences produce the same workflow ID. The original YAML is still stored as-is for readability.

Replace UploadFile multipart with a CWLJobSubmission pydantic model to fix autorest client generation. Add job_wrapper_template.py to diracx-logic for worker-side CWL execution via importlib.resources.

chore: regenerate client for gubbins

Move JobWrapper, JobReport, commands, and submission models into diracx-api and diracx-core so DIRAC does not need dirac-cwl installed.

fix: again

…eferences

…mines layout

Rename _resolve_lfn to _resolve_path and update all methods (_abs, glob, open, exists, isfile, isdir, size) to handle both LFN: and SB: prefixed paths. SB: keys are stored with prefix in the replica map while LFN keys are stored without prefix.

Parse SB: prefixed paths in input sandbox downloads, cache per PFN, and inject sandbox entries into the replica map for CWL executor resolution.

…mmandLineTool 28 tests covering LFN extraction, replica map filtering/merging, path resolution in DiracPathMapper, and tool factory routing in DiracCommandLineTool. Loads executor modules by file path to bypass __init__.py's mypyc compat hook.

…plica map wiring

…lica map - Create test_executor_integration.py with three subprocess tests: test_basic_execution_with_replica_map (LFN resolved via replica map), test_execution_without_replica_map (plain local file), and test_sb_reference_in_replica_map (SB: key resolved via replica map) - Fix DiracPathMapper to convert file:// PFNs to local paths for MapperEnt.target so cwltool passes filesystem paths (not file:// URLs) to subcommands - Extend DiracPathMapper.visit to handle SB: URI scheme alongside LFN: - Add DiracPathMapper.mapper() override so SB: keys with '#' fragments are found directly in _pathmap without cwltool's fragment-stripping logic

Skip non-SB: paths in input_sandbox and non-LFN: paths in input_data with a warning instead of silently handling them. This JobWrapper only supports CWL jobs where sandboxes are always SB: prefixed and input data is always LFN: prefixed.

Add integration test exercising the complete run_job() lifecycle: hint parsing, sandbox download, LFN download, replica map building, SB injection, real dirac-cwl-run execution, and output parsing. Also fix dirac-cwl-run logging to use stderr so that only the cwltool JSON output is written to stdout, allowing job_wrapper.py to reliably parse it with json.loads(). test(api): add JobWrapper integration test for full CWL execution chain Add integration test exercising the complete run_job() lifecycle: hint parsing, sandbox download, LFN download, replica map building, SB injection, real dirac-cwl-run execution, and output parsing. Add conftest.py to pre-load real cwl_utils/ruamel.yaml before other test files mock them, ensuring the integration test can access real types. Also fix dirac-cwl-run logging to use stderr so that only the cwltool JSON output is written to stdout, allowing job_wrapper.py to reliably parse it with json.loads().

All dependencies (cwltool, cwl_utils, ruamel.yaml, DIRACCommon, diracx.client.aio, diracx.api.job_report, diracx.api.jobs) are now installed in the test environment, so replace the sys.modules mock scaffolding and importlib file-loading with direct imports. Delete conftest.py which existed solely to save real module references before mocking occurred.

The executor is a CLI tool (dirac-cwl-run entry point), not an API component. Moves source and tests to diracx-cli, updates cwltool dependency location, and rewrites test_no_cwltool_import to verify the mypyc compatibility patch is installed before cwltool loads. chore: update lockfile chore: update .gitignore

fix: cwl-utils

Adds expand_range_inputs() to cwl_submission logic, extends the CWLJobSubmission router model with range_param/range_start/range_end/ range_step/base_inputs fields with mutual-exclusion validation, and wires range expansion into the submit_cwl_jobs router handler.

Move CLI commands to `dirac job submit {cwl,cmd,jdl}` hierarchy: - dirac cwl submit → dirac job submit cwl - dirac submit → dirac job submit cmd - dirac jobs submit (JDL) → dirac job submit jdl refactor(cli): remove legacy dirac jobs command group Move search utilities into job/search.py and remove jobs.py entirely. dirac job search is now the only search entry point.

chore: regenerated gubbins client

The server and client CWLJobSubmission model already had range_param, range_start, range_end, range_step fields. Remove the stale NotImplementedError and call the API with the range spec.

The worker-side job wrapper used attribute-chain access (DIRAC.Core.Security.DiracX.diracxTokenFromPEM) which fails because Python does not auto-load submodules via attribute access. Use an explicit from-import instead.

Fix temp file leak from load_document_by_uri (delete=False without cleanup). Also extract SB: PFN references from workflow params so sandbox jobs have their files available on the worker.

The JobWrapper downloads sandboxes by walking CWL inputs via the dirac:Job hint's input_sandbox source references, not the JobInputModel.sandbox field. The extraction was dead code.

Shows scalar key=value pairs from the input dict in the job name, e.g. "hello-world (seed=42)" so parametric jobs are distinguishable in search results and monitoring.

DIRAC's wrapper script passes only the jobID (not a json-file), so accept 1 or 2 args and always read the jobID from the last argument.

Report meaningful ApplicationStatus using the CWL task label instead of leaving it as "Unknown". Shows e.g. "parametric-echo completed" or "parametric-echo failed (exit 1)" in job search results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace subprocess.run with Popen to read stderr line-by-line, re-emit each line to the wrapper's stderr, and store it as ApplicationStatus with immediate commit for real-time visibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…Status Read stdout in a background thread to prevent pipe deadlock when both stdout and stderr use subprocess.PIPE. Add explicit application_status to the failure path so the final status is not just the last cwltool line. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace subprocess.Popen + threading with asyncio.create_subprocess_exec for proper async cooperation. Stderr is streamed line-by-line without blocking the event loop, stdout collected concurrently via asyncio task. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add stdout/stderr capture fields to generate_cwl() so cwltool writes the tool's output to stdout.log and stderr.log in the output sandbox, preventing job output from being silently discarded.

Avoids naming collision with DIRAC's conventional std.out filename. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Batch ApplicationStatus commits to at most once per 2 seconds instead of per stderr line, reducing HTTP traffic. All lines are still stored via set_job_status — only the commit frequency changes. Commit failures are logged as warnings instead of killing the job. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Only report step/job/workflow lifecycle lines (start, completed, failed, skipped) as ApplicationStatus. Noise lines (validation, file staging, timing) still go to stderr for Watchdog peek but are not sent to the server. Log level prefixes (INFO/WARNING) are stripped — ApplicationStatus shows clean cwltool output like "[job X] completed success". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Require INFO/WARNING/ERROR prefix before cwltool lifecycle patterns to prevent false positives from user application output. Use group(1) to extract the bracket-onward portion, stripping the level prefix. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The server checks sandbox ownership using the PFN without the #fragment. The fragment is only needed by the worker to extract the correct file from the tar archive.

ryuwd added 4 commits April 7, 2026 13:22

fix: enforce schema_version as Literal["1.0"] with default in JobHint

5d8ed95

Pydantic now validates schema_version at model construction instead of a runtime check in the logic layer. Users can omit schema_version to get the default "1.0", but invalid versions are rejected immediately.

fix: normalize CWL before hashing to prevent duplicate workflow rows

001e8ff

Parse YAML to dict and serialize as sorted JSON before SHA-256 hashing, so whitespace, comments, and key ordering differences produce the same workflow ID. The original YAML is still stored as-is for readability.

feat: use JSON body for CWL submission and add job wrapper template

13f8c14

Replace UploadFile multipart with a CWLJobSubmission pydantic model to fix autorest client generation. Add job_wrapper_template.py to diracx-logic for worker-side CWL execution via importlib.resources.

ryuwd force-pushed the feat/cwl-job-submission branch from 11d928f to b1362d6 Compare April 7, 2026 11:26

ryuwd added 24 commits April 8, 2026 15:53

chore: regenerate diracx client

93f3215

chore: regenerate client for gubbins

refactor: move job wrapper and commands from dirac-cwl into diracx

13bba67

Move JobWrapper, JobReport, commands, and submission models into diracx-api and diracx-core so DIRAC does not need dirac-cwl installed.

fix: job wrapper uses jobs.search

0de1f46

fix: again

feat: ReplicaMap integration and added dirac cwl executor

f649f63

refactor: cleaned executor significantly

69bb5ae

feat(core): extend replica map key validation to accept SB: sandbox r…

ddadf76

…eferences

refactor(core): remove path field from IOSource — tar structure deter…

e4416cf

…mines layout

feat(api): add SB: path parsing and sandbox replica map injection

bdad870

Parse SB: prefixed paths in input sandbox downloads, cache per PFN, and inject sandbox entries into the replica map for CWL executor resolution.

feat(api): add cwltool as explicit dependency of diracx-api

b0ea2d5

test(api): extend FsAccess tests with LFN resolution coverage

2c46728

test(api): add unit tests for JobWrapper commands, output parsing, re…

f1ce58d

…plica map wiring

fix: depend on cwl-utils

25c45be

fix: cwl-utils

chore: update lockfile

d6c3341

feat: add CWL input parsing module (YAML, JSON, CLI args, range)

1db8f7b

feat: add sandbox scanning, grouping, and path rewriting

bd85cab

test: add missing SB passthrough test for rewrite_sandbox_refs

fc6b398

feat: add submission confirmation prompt

6c7f8cd

ryuwd added 7 commits April 8, 2026 15:53

feat: add dirac job command group with search

63763ce

test: add integration tests for CWL submission pipeline

67e07a7

chore: regenerated client

a851b97

chore: regenerated gubbins client

chore: updated lockfile

636bc7d

test: no skipping if cwltool or dirac-cwl-run not available

32a33a2

ryuwd force-pushed the feat/cwl-job-submission branch from 850b0a5 to 32a33a2 Compare April 8, 2026 13:56

ryuwd and others added 22 commits April 8, 2026 18:45

chore: update lockfile

f1c3202

fix(core): SoftwareDistModule should be empty to not break the pilot

763ead5

feat: wire up --range submission to server-side CWL model

4234958

The server and client CWLJobSubmission model already had range_param, range_start, range_end, range_step fields. Remove the stale NotImplementedError and call the API with the range spec.

fix: use explicit import for diracxTokenFromPEM in job wrapper

26e3f35

The worker-side job wrapper used attribute-chain access (DIRAC.Core.Security.DiracX.diracxTokenFromPEM) which fails because Python does not auto-load submodules via attribute access. Use an explicit from-import instead.

fix: clean up temp CWL file and extract sandbox PFNs in job wrapper

5f32354

Fix temp file leak from load_document_by_uri (delete=False without cleanup). Also extract SB: PFN references from workflow params so sandbox jobs have their files available on the worker.

refactor: remove unnecessary sandbox PFN extraction from job wrapper

20238d7

The JobWrapper downloads sandboxes by walking CWL inputs via the dirac:Job hint's input_sandbox source references, not the JobInputModel.sandbox field. The extraction was dead code.

feat: append input parameters to job name for parametric jobs

8c19b4d

Shows scalar key=value pairs from the input dict in the job name, e.g. "hello-world (seed=42)" so parametric jobs are distinguishable in search results and monitoring.

fix: accept jobID as last argument in job wrapper

483ceb7

DIRAC's wrapper script passes only the jobID (not a json-file), so accept 1 or 2 args and always read the jobID from the last argument.

feat(cli): capture stdout/stderr to log files in generated CWL

76acbf8

Add stdout/stderr capture fields to generate_cwl() so cwltool writes the tool's output to stdout.log and stderr.log in the output sandbox, preventing job output from being silently discarded.

fix: use stdout.log instead of std.out in integration test

6f13421

Avoids naming collision with DIRAC's conventional std.out filename. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(cli): sandbox submit

36fbae1

fix: strip #fragment from InputSandbox JDL field

582ac77

The server checks sandbox ownership using the PFN without the #fragment. The fragment is only needed by the worker to extract the correct file from the tar archive.

fix: no temp files

85eea01

fix: some things need uploading regardless

0beb974

refactor: code around job wrapper sandbox handling

1f35e11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cwl): add CWL workflow submission endpoint and DB storage model#877

feat(cwl): add CWL workflow submission endpoint and DB storage model#877
ryuwd wants to merge 60 commits intoDIRACGrid:mainfrom
ryuwd:feat/cwl-job-submission

ryuwd commented Apr 2, 2026 •

edited

Loading

Uh oh!

read-the-docs-community bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ryuwd commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

read-the-docs-community bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation build overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ryuwd commented Apr 2, 2026 •

edited

Loading

read-the-docs-community bot commented Apr 2, 2026 •

edited

Loading