Eliminate get_committed_snapshots from join path.#7832
Eliminate get_committed_snapshots from join path.#7832cjen1-msft wants to merge 18 commits intomicrosoft:mainfrom
Conversation
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the Python test infrastructure and multiple tests to stop implicitly calling get_committed_snapshots() as part of the default node join flow, instead making snapshot usage explicit via from_snapshot, snapshots_dir, and fetch_recent_snapshot parameters.
Changes:
- Update many test call sites to pass
from_snapshot=Falseexplicitly when joining nodes, and passsnapshots_dirwhen joining from a copied snapshot. - Refactor some tests (notably
tests/reconfiguration.py) to separate “copy snapshot locally” vs “fetch snapshot on startup” behaviors. - Adjust
tests/infra/network.pysnapshot-join logic and makeget_committed_snapshots()acceptnode=None(defaulting to the primary).
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/infra/network.py | Changes snapshot-join handling and get_committed_snapshots() defaulting behavior. |
| tests/reconfiguration.py | Makes snapshot usage explicit and refactors test_add_node parameters. |
| tests/recovery.py | Updates join calls to pass from_snapshot explicitly (and snapshots_dir where needed). |
| tests/lts_compatibility.py | Branches join behavior by version, using copied snapshots for older versions. |
| tests/e2e_operations.py | Updates several join paths and snapshot-related tests to be explicit about snapshot usage. |
| tests/redirects.py | Updates joins to pass from_snapshot=False. |
| tests/partitions_test.py | Updates join to be explicit about snapshot fetching behavior. |
| tests/limits.py | Updates join to pass from_snapshot=False. |
| tests/governance.py | Updates joins to pass from_snapshot=False. |
| tests/code_update.py | Updates multiple joins to pass from_snapshot=False. |
| tests/e2e_logging.py | Updates join to pass from_snapshot=False. |
| tests/e2e_common_endpoints.py | Updates joins to pass from_snapshot=False. |
| tests/js-custom-authorization/custom_authorization.py | Updates join to pass from_snapshot=False. |
| tests/infra/basicperf.py | Ensures snapshot-join setup explicitly sets from_snapshot=True. |
Custom instructions used:
.github/copilot-instructions.md.github/instructions/reviewing.instructions.md
|
@cjen1-msft it feels like turning |
|
@achamayou I don't quite agree. If we propagated that all the way to the config, then yes. |
| new_node, | ||
| args.package, | ||
| args, | ||
| # Tmp fix until the primary expresses an opinion |
There was a problem hiding this comment.
I am confused about why the primary does not express an opinion here. Do we not have backup download enabled?
There was a problem hiding this comment.
This is the 'we need the primary to have an opinion over what snapshot it likes that is more precise than startup_seqno.
The reason being that if startup_seqno is 0, as it is here, then we replicate the whole ledger via Raft.
There was a problem hiding this comment.
This is #7844, right? I suggest we mention that inline in the comment here ("Tmp" is just a way to avoid our "TODO" check, the way we actually prevent this shim living forever is "link to the issue/PR that fixes it), and probably aim to merge that before this, so this shim never hits main.
There was a problem hiding this comment.
@eddyashton Yep I'm going to delay merging this for a little bit until we merge #7844 and then this can be clean.
| network.join_node( | ||
| new_node, args.package, args, from_snapshot=from_snapshot | ||
| ) | ||
| if fetch_recent_snapshot: |
There was a problem hiding this comment.
You can make a kwargs dict, and pass it with **kwargs to avoid this type of duplication.
Currently most of our code uses get_committed_snapshots implicitly via the _add_node or join_node calls.
This PR has two changes:
This allows us to fully remove get_committed_snapshots from the startup path, pushing that out to the caller via snapshots_dir, this then paves the way for turning get_committed_snapshots into 'copy_snapshots_to_node_dir', bringing our testing closer to ACL's runtime.
I've left from_snapshot in in this PR to make it reviewable.
I'll submit another PR after this to remove it everywhere and instead just use snapshots_dir.