Defer dynamic destination evaluation across resubmits#22670
Merged
mvdbeek merged 6 commits intoMay 19, 2026
Conversation
Chained dynamic destinations (e.g. gateway_1x -> gateway_1_5x -> gateway_2x, or any TPV setup with `destination: tpv_dispatcher` on resubmit) only resubmitted once. The resubmit handler in `state_handlers/resubmit.py` was eagerly re-evaluating the chain via `set_cached_job_destination` and caching the resolved bottom. Subsequent failures read resubmit definitions from that bottom, losing any definitions set on intermediate dynamic hops. Defer evaluation instead. The resubmit handler now persists only the dynamic *intent* on the job (e.g. `destination_id="tpv_dispatcher"`, `job_runner_name="dynamic"`) and exits without touching the mapper cache. When `__handle_waiting_jobs` picks the RESUBMITTED job up, `__recover_job_wrapper(resubmit=True)` calls `cache_job_destination(...)` to route through `__determine_job_destination`, which walks the entire chain afresh on every attempt. Caching invariants are preserved within a single attempt: the mapper still caches the resolved destination on first evaluation and serves all subsequent reads from cache. Cross-attempt, the cache is naturally fresh because each pickup constructs a new `JobWrapper`/`JobRunnerMapper`. Adds an integration test exercising a 3-link dynamic chain (initial_destination -> secondary_destination -> tertiary_destination) to verify multiple resubmits re-walk the chain end-to-end. Refs: galaxyproject#7118, galaxyproject#15208. Adopts the approach proposed in galaxyproject#9747.
The deferred-evaluation change in the previous commit moved persistence
to the dynamic dispatcher (e.g. tpv_dispatcher with mostly-empty static
params), which had the side effect of overwriting `job.destination_params`
on resubmit. Dynamic rules that branch on prior-attempt context by
reading `job.destination_params` (e.g. TPV's pattern of escalating
memory via `job.destination_params.get("SCALING_FACTOR")`) lost that
context after the first resubmit and produced the same destination on
every attempt.
Carry the prior attempt's `destination_params` forward by merging them
under the resubmit destination's static params before persisting. The
static dispatcher's keys (`function`, `rules_module`, `type`, ...) win
on conflicts so chain re-walk can still find the right rule; everything
else from the prior attempt is preserved for the rule to read.
Five focused tests covering the behaviour introduced in the previous two commits: - The handler does NOT call set_cached_job_destination (a regression test for the deferred-evaluation invariant). - It persists the dynamic dispatcher's id/runner as the *intent* on the job, not the resolved bottom of the prior chain walk. - Prior attempt's destination_params are merged into the persisted destination so dynamic rules that branch on prior context (e.g. TPV's SCALING_FACTOR escalation) keep seeing them. - Static dispatcher params (function, rules_module, ...) take precedence on key conflicts so the chain re-walk picks the right rule even if the prior resolved destination accidentally shadowed them. - The __resubmit_delay_seconds flag is persisted alongside the merged params so is_ready_for_resubmission picks it up. Tests use SimpleNamespace and unittest.mock to stand in for the Job model, JobWrapper, and JobConfig — no DB or Galaxy app required.
The TestJobResubmissionDynamicMultipleIntegration test previously only asserted the chained dynamic destinations (initial -> secondary -> tertiary) walked end-to-end without crashing. Extend each rule in resubmission_rules/rules.py to also read job.destination_params for a `chain_attempt` counter set by the previous link, and raise JobMappingException if it does not match the expected value. This makes the same single integration test simultaneously verify both behaviours of this PR: 1. Chain re-walk: every resubmit re-evaluates from the persisted dynamic intent rather than reusing the cached resolved destination of the prior attempt. 2. destination_params carry-forward: the prior attempt's resolved destination_params survive the resubmit handler so the next rule in the chain can read them on pickup. A regression in either property surfaces as a JobMappingException inside the rule, failing the integration test rather than producing a silently-wrong result. Verified passing locally (47s) alongside the other dynamic resubmission integration tests: - TestJobResubmissionDynamicIntegration::test_dynamic_resubmission - TestJobResubmissionSmallMemoryResubmitsToLargeIntegration::test_dynamic_resubmission
jmchilton
reviewed
May 10, 2026
jmchilton
reviewed
May 10, 2026
Remove redundant unit tests Move import
mvdbeek
reviewed
May 12, 2026
| @@ -0,0 +1,41 @@ | |||
| <?xml version="1.0"?> | |||
Member
There was a problem hiding this comment.
Bit of a nitpick but i like to pick up things from integration tests when i actually set up a server config. Could you write this in yaml to ease that ?
mvdbeek
approved these changes
May 12, 2026
Member
mvdbeek
left a comment
There was a problem hiding this comment.
This is awesome, thanks for working that out!
Address PR review feedback (galaxyproject#22670): integration test job configs are sometimes lifted into real server configs, so YAML is preferred over XML for readability. Replaces resubmission_dynamic_multiple_job_conf.xml with an equivalent .yml fixture and updates the test reference. No behavioural change; the TestJobResubmissionDynamicMultipleIntegration test still passes (verified locally, 22.7s).
Member
Author
|
Companion PR in TPV: galaxyproject/total-perspective-vortex#195 was tested with these changes, and the tests pass. |
Member
|
Thank you @nuwang ! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Chained dynamic destinations (e.g. gateway_1x -> gateway_1_5x -> gateway_2x, or any TPV setup with
destination: tpv_dispatcheron resubmit) only resubmitted once. The resubmit handler instate_handlers/resubmit.pywas eagerly re-evaluating the chain viaset_cached_job_destinationand caching the resolved bottom. Subsequent failures read resubmit definitions from that bottom, losing any definitions set on intermediate dynamic hops.Defer evaluation instead. The resubmit handler now persists only the dynamic intent on the job (e.g.
destination_id="tpv_dispatcher",job_runner_name="dynamic") and exits without touching the mapper cache. When__handle_waiting_jobspicks the RESUBMITTED job up,__recover_job_wrapper(resubmit=True)callscache_job_destination(...)to route through__determine_job_destination, which walks the entire chain afresh on every attempt.Caching invariants are preserved within a single attempt: the mapper still caches the resolved destination on first evaluation and serves all subsequent reads from cache. Cross-attempt, the cache is naturally fresh because each pickup constructs a new
JobWrapper/JobRunnerMapper.Adds an integration test exercising a 3-link dynamic chain (initial_destination -> secondary_destination -> tertiary_destination) to verify multiple resubmits re-walk the chain end-to-end.
Refs: #7118, #15208. Adopts the approach proposed in #9747.
How to test the changes?
(Select all options that apply)
License