Skip to content

feat(cwl): add CWL workflow submission endpoint and DB storage model#877

Draft
ryuwd wants to merge 60 commits intoDIRACGrid:mainfrom
ryuwd:feat/cwl-job-submission
Draft

feat(cwl): add CWL workflow submission endpoint and DB storage model#877
ryuwd wants to merge 60 commits intoDIRACGrid:mainfrom
ryuwd:feat/cwl-job-submission

Conversation

@ryuwd
Copy link
Copy Markdown
Contributor

@ryuwd ryuwd commented Apr 2, 2026

In draft, under development and to be tested in certification first

cc @aldbr

Follows the plan in #858

@read-the-docs-community
Copy link
Copy Markdown

read-the-docs-community bot commented Apr 2, 2026

Documentation build overview

📚 diracx | 🛠️ Build #32194815 | 📁 Comparing 1f35e11 against latest (63dc086)

  🔍 Preview build  

No files changed.

ryuwd added 4 commits April 7, 2026 13:22
Introduce the `POST /api/jobs/` endpoint for submitting CWL workflows
with per-job input parameters, and `GET /api/workflows/{workflow_id}`
for retrieving stored workflows. CWL definitions are stored once in a
new `Workflows` table (content-addressed by SHA-256), with per-job
parameters and workflow references added as new columns on `Jobs`.
During the transition period, JDL is still produced via
`cwl_to_jdl()` for compatibility with existing DIRAC infrastructure.
Pydantic now validates schema_version at model construction instead of
a runtime check in the logic layer. Users can omit schema_version to
get the default "1.0", but invalid versions are rejected immediately.
Parse YAML to dict and serialize as sorted JSON before SHA-256 hashing,
so whitespace, comments, and key ordering differences produce the same
workflow ID. The original YAML is still stored as-is for readability.
Replace UploadFile multipart with a CWLJobSubmission pydantic model
to fix autorest client generation. Add job_wrapper_template.py to
diracx-logic for worker-side CWL execution via importlib.resources.
@ryuwd ryuwd force-pushed the feat/cwl-job-submission branch from 11d928f to b1362d6 Compare April 7, 2026 11:26
ryuwd added 24 commits April 8, 2026 15:53
chore: regenerate client for gubbins
Move JobWrapper, JobReport, commands, and submission models into
diracx-api and diracx-core so DIRAC does not need dirac-cwl
installed.
Rename _resolve_lfn to _resolve_path and update all methods (_abs, glob,
open, exists, isfile, isdir, size) to handle both LFN: and SB: prefixed
paths. SB: keys are stored with prefix in the replica map while LFN keys
are stored without prefix.
Parse SB: prefixed paths in input sandbox downloads, cache per PFN,
and inject sandbox entries into the replica map for CWL executor resolution.
…mmandLineTool

28 tests covering LFN extraction, replica map filtering/merging, path
resolution in DiracPathMapper, and tool factory routing in DiracCommandLineTool.
Loads executor modules by file path to bypass __init__.py's mypyc compat hook.
…lica map

- Create test_executor_integration.py with three subprocess tests:
  test_basic_execution_with_replica_map (LFN resolved via replica map),
  test_execution_without_replica_map (plain local file), and
  test_sb_reference_in_replica_map (SB: key resolved via replica map)
- Fix DiracPathMapper to convert file:// PFNs to local paths for MapperEnt.target
  so cwltool passes filesystem paths (not file:// URLs) to subcommands
- Extend DiracPathMapper.visit to handle SB: URI scheme alongside LFN:
- Add DiracPathMapper.mapper() override so SB: keys with '#' fragments are
  found directly in _pathmap without cwltool's fragment-stripping logic
Skip non-SB: paths in input_sandbox and non-LFN: paths in input_data
with a warning instead of silently handling them. This JobWrapper only
supports CWL jobs where sandboxes are always SB: prefixed and input
data is always LFN: prefixed.
Add integration test exercising the complete run_job() lifecycle: hint
parsing, sandbox download, LFN download, replica map building, SB
injection, real dirac-cwl-run execution, and output parsing.

Also fix dirac-cwl-run logging to use stderr so that only the cwltool
JSON output is written to stdout, allowing job_wrapper.py to reliably
parse it with json.loads().

test(api): add JobWrapper integration test for full CWL execution chain

Add integration test exercising the complete run_job() lifecycle: hint
parsing, sandbox download, LFN download, replica map building, SB
injection, real dirac-cwl-run execution, and output parsing.

Add conftest.py to pre-load real cwl_utils/ruamel.yaml before other test
files mock them, ensuring the integration test can access real types.

Also fix dirac-cwl-run logging to use stderr so that only the cwltool
JSON output is written to stdout, allowing job_wrapper.py to reliably
parse it with json.loads().
All dependencies (cwltool, cwl_utils, ruamel.yaml, DIRACCommon, diracx.client.aio,
diracx.api.job_report, diracx.api.jobs) are now installed in the test environment,
so replace the sys.modules mock scaffolding and importlib file-loading with direct
imports. Delete conftest.py which existed solely to save real module references
before mocking occurred.
The executor is a CLI tool (dirac-cwl-run entry point), not an API
component. Moves source and tests to diracx-cli, updates cwltool
dependency location, and rewrites test_no_cwltool_import to verify
the mypyc compatibility patch is installed before cwltool loads.

chore: update lockfile

chore: update .gitignore
ryuwd added 7 commits April 8, 2026 15:53
Adds expand_range_inputs() to cwl_submission logic, extends the
CWLJobSubmission router model with range_param/range_start/range_end/
range_step/base_inputs fields with mutual-exclusion validation, and
wires range expansion into the submit_cwl_jobs router handler.
Move CLI commands to `dirac job submit {cwl,cmd,jdl}` hierarchy:
- dirac cwl submit → dirac job submit cwl
- dirac submit → dirac job submit cmd
- dirac jobs submit (JDL) → dirac job submit jdl

refactor(cli): remove legacy dirac jobs command group

Move search utilities into job/search.py and remove jobs.py entirely.
dirac job search is now the only search entry point.
chore: regenerated gubbins client
@ryuwd ryuwd force-pushed the feat/cwl-job-submission branch from 850b0a5 to 32a33a2 Compare April 8, 2026 13:56
ryuwd and others added 22 commits April 8, 2026 18:45
The server and client CWLJobSubmission model already had range_param,
range_start, range_end, range_step fields. Remove the stale
NotImplementedError and call the API with the range spec.
The worker-side job wrapper used attribute-chain access
(DIRAC.Core.Security.DiracX.diracxTokenFromPEM) which fails because
Python does not auto-load submodules via attribute access. Use an
explicit from-import instead.
Fix temp file leak from load_document_by_uri (delete=False without
cleanup). Also extract SB: PFN references from workflow params so
sandbox jobs have their files available on the worker.
The JobWrapper downloads sandboxes by walking CWL inputs via the
dirac:Job hint's input_sandbox source references, not the
JobInputModel.sandbox field. The extraction was dead code.
Shows scalar key=value pairs from the input dict in the job name,
e.g. "hello-world (seed=42)" so parametric jobs are distinguishable
in search results and monitoring.
DIRAC's wrapper script passes only the jobID (not a json-file),
so accept 1 or 2 args and always read the jobID from the last
argument.
Report meaningful ApplicationStatus using the CWL task label instead of
leaving it as "Unknown". Shows e.g. "parametric-echo completed" or
"parametric-echo failed (exit 1)" in job search results.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace subprocess.run with Popen to read stderr line-by-line,
re-emit each line to the wrapper's stderr, and store it as
ApplicationStatus with immediate commit for real-time visibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Status

Read stdout in a background thread to prevent pipe deadlock when both
stdout and stderr use subprocess.PIPE. Add explicit application_status
to the failure path so the final status is not just the last cwltool line.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace subprocess.Popen + threading with asyncio.create_subprocess_exec
for proper async cooperation. Stderr is streamed line-by-line without
blocking the event loop, stdout collected concurrently via asyncio task.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add stdout/stderr capture fields to generate_cwl() so cwltool writes
the tool's output to stdout.log and stderr.log in the output sandbox,
preventing job output from being silently discarded.
Avoids naming collision with DIRAC's conventional std.out filename.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Batch ApplicationStatus commits to at most once per 2 seconds instead
of per stderr line, reducing HTTP traffic. All lines are still stored
via set_job_status — only the commit frequency changes. Commit failures
are logged as warnings instead of killing the job.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Only report step/job/workflow lifecycle lines (start, completed, failed,
skipped) as ApplicationStatus. Noise lines (validation, file staging,
timing) still go to stderr for Watchdog peek but are not sent to the
server. Log level prefixes (INFO/WARNING) are stripped — ApplicationStatus
shows clean cwltool output like "[job X] completed success".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Require INFO/WARNING/ERROR prefix before cwltool lifecycle patterns to
prevent false positives from user application output. Use group(1) to
extract the bracket-onward portion, stripping the level prefix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The server checks sandbox ownership using the PFN without the
#fragment. The fragment is only needed by the worker to extract
the correct file from the tar archive.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant