The current implementation of Submission.run_submission()
(link)
introduces a clean parameter that defaults to True.
When enabled, this triggers machine.context.clean() after job completion, which for SSHContext results in removing the entire remote working directory (i.e., everything under the submission-specific remote root) via recursive deletion (rmtree):
(clean)
(rmtree)
When using the CLI entrypoint:
dpdisp submit submission.json
the clean parameter is not exposed or configurable, neither via JSON schema nor via command-line options:
(submit.py)
As a result, dpdisp submit always executes with clean=True, leading to implicit deletion of all remote job artifacts after completion.
Impact
This behavior has several serious implications:
- Loss of traceability and auditability
Users cannot inspect intermediate or full results on the remote HPC environment after job completion.
- Silent destructive default
The cleanup is destructive (rm -rf) and occurs without explicit user consent.
- Agent / automation incompatibility
For automated workflows (e.g., LLM/agent-based pipelines), this implicit side-effect is particularly problematic, as it is:
not inferable from the CLI interface,
not declared in configuration,
and not documented.
Documentation Gap
The following are currently not documented in Submission.run_submission():
- The existence of the clean parameter
- Its default value (True)
- Its destructive effect on remote directories
- The fact that CLI submission always enables it
This significantly increases the risk of unintended data loss.
The current implementation of Submission.run_submission()
(link)
introduces a clean parameter that defaults to True.
When enabled, this triggers machine.context.clean() after job completion, which for SSHContext results in removing the entire remote working directory (i.e., everything under the submission-specific remote root) via recursive deletion (rmtree):
(clean)
(rmtree)
When using the CLI entrypoint:
dpdisp submit submission.json
the clean parameter is not exposed or configurable, neither via JSON schema nor via command-line options:
(submit.py)
As a result, dpdisp submit always executes with clean=True, leading to implicit deletion of all remote job artifacts after completion.
Impact
This behavior has several serious implications:
Users cannot inspect intermediate or full results on the remote HPC environment after job completion.
The cleanup is destructive (rm -rf) and occurs without explicit user consent.
For automated workflows (e.g., LLM/agent-based pipelines), this implicit side-effect is particularly problematic, as it is:
not inferable from the CLI interface,
not declared in configuration,
and not documented.
Documentation Gap
The following are currently not documented in Submission.run_submission():
This significantly increases the risk of unintended data loss.