Skip to content

Fix non-contiguous ledger after join from snapshot with copied in-progress chunk (#7871)#7872

Closed
Copilot wants to merge 6 commits into
mainfrom
copilot/build-hypotheses-and-test
Closed

Fix non-contiguous ledger after join from snapshot with copied in-progress chunk (#7871)#7872
Copilot wants to merge 6 commits into
mainfrom
copilot/build-hypotheses-and-test

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 9, 2026

Fix for #7871 (reconfiguration.test_ledger_invariants failure: non-contiguous chunks after join from snapshot when target had an in-progress chunk).

  • In Ledger::init(idx, ...), promote any recovered writable file whose entries are entirely ≤ idx to .committed before clearing the in-memory list, so the on-disk ledger is contiguous up to the snapshot seqno.
  • Add unit test in src/host/test/ledger.cpp reproducing the scenario (writable file present at init).
  • Add CHANGELOG entry under [7.0.3]Fixed (with PR number reference).
  • Build & run unit tests (ledger_test 17/17 pass, files_cleanup_test 38/38 pass).
  • Apply scripts/check-format.sh (clang-format-18) to changed C++ files.

Agent-Logs-Url: https://github.com/microsoft/CCF/sessions/08d9f13b-0ab6-41e3-b8d0-b46dc31f2fd9

Co-authored-by: achamayou <4016369+achamayou@users.noreply.github.com>
@achamayou
Copy link
Copy Markdown
Member

@copilot fix the code formatting and try again

Agent-Logs-Url: https://github.com/microsoft/CCF/sessions/d609704c-2eb9-497d-a2fb-1324583fe76a

Co-authored-by: achamayou <4016369+achamayou@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 9, 2026

@copilot fix the code formatting and try again

Ran scripts/check-format.sh -f over the changed C++ files; only one line in src/host/test/ledger.cpp needed re-wrapping. Fixed in 0639d1d.

Comment thread CHANGELOG.md Outdated

### Fixed

- When a joiner copied an in-progress (uncommitted) ledger chunk from its target and then resumed from a snapshot whose seqno covered the chunk's last entry, that chunk used to remain forever as a writable file on disk, leaving an apparent gap between the joiner's read-only `.committed` chunks and any new chunks it produced. `Ledger::init` now promotes such writable files to `.committed` so that the on-disk ledger remains contiguous up to the snapshot seqno (#7871, #7872).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot only leave PR number here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e33d85b — entry now references only #7872.

Agent-Logs-Url: https://github.com/microsoft/CCF/sessions/229d441f-e1ea-4c5e-b03b-fe0b5548f221

Co-authored-by: maxtropets <16566519+maxtropets@users.noreply.github.com>
Copilot AI requested a review from maxtropets May 11, 2026 10:14
@cjen1-msft
Copy link
Copy Markdown
Contributor

@achamayou my recollection is that we require that every snapshot's seqno corresponds to the end of a ledger chunk.
Is this issue that we are somehow starting from a snapshot that is in the middle of a chunk?

@achamayou
Copy link
Copy Markdown
Member

@cjen1-msft I don't think it's in the middle of a chunk, but I think that the ledger files are copied when the last chunk before the snapshot is not yet closed. If the snapshot is also copied, this may be unlikely/impossible (but I am not sure that's the case, because the snapshot and ledger commit paths are somewhat independent), but if the snapshot is downloaded it's quite easy for that to happen obviously.

  1. Copy ledger_1-10.committed, ledger_10 from Node Foo
  2. Wait some
  3. Fetch latest snapshot from Primary Bar, get snapshot_20_22.committed
  4. !!!

@achamayou
Copy link
Copy Markdown
Member

Replaced by #7878

@achamayou achamayou closed this May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-long-test Run Long Test job

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants