Skip to content

[SDKv2] resumable downloads (cross-process lock + per-chunk state + linked cancellation)#793

Draft
bmehta001 wants to merge 19 commits into
mainfrom
bhamehta/flcore/resumable-downloads
Draft

[SDKv2] resumable downloads (cross-process lock + per-chunk state + linked cancellation)#793
bmehta001 wants to merge 19 commits into
mainfrom
bhamehta/flcore/resumable-downloads

Conversation

@bmehta001

Copy link
Copy Markdown
Contributor

Resumable downloads (C++ port)

Ports the resumable-download flow from neutron-server's AzureExtensions.cs / BlobDownloadState.cs / CrossProcessFileLock.cs to the C++ SDK in sdk_v2/cpp. Two commits, one per increment.

Increment 1 — cross-process lock + skip-existing (commit 45c99d5d)

  • CrossProcessFileLock: RAII, <dir>/.download.lock exclusive lock. Win: CreateFileW FILE_SHARE_NONE FILE_FLAG_DELETE_ON_CLOSE; POSIX: flock LOCK_EX|LOCK_NB. PIMPL (struct State defined only in the .cc) keeps platform headers out of the public include.
  • WaitForLockForDirectory: polls every 1.25 s with a 100 ms cancellation slice and a 3 h ceiling, driven by a std::function<bool()> predicate so cancellation can also serve as a heartbeat (the progress callback returning non-zero).
  • DownloadManager::DownloadModel: takes the lock right after create_directories, re-checks the cache once acquired, then proceeds with the existing download flow. Stores ILogger& so the lock and resume layers can log uniformly.
  • DownloadBlobsToDirectory: blobs already present at the expected size are skipped; their bytes are credited to the initial progress callback so the percentage stays accurate on resume. Emits 100% immediately if everything is cached.
  • 9 CrossProcessFileLock tests + 4 new skip-existing tests in download_test.cc.

Increment 2 — per-chunk resume + linked cancellation (commit b6d98db5)

  • BlobDownloadState (new): compact binary <file>.dlstate sidecar. ~45-byte LE header (magic FLDS + version + sizes + counters) followed by the truncated bitmap suffix. The prefix of fully-completed chunks is implicit — SaveState advances bitmap_byte_aligned_start past every all-1s word so the file stays proportional to the unfinished tail. LoadState rejects on magic, version, or layout mismatch and restarts the download in that case. Atomic save via tmp + rename, with remove-then-rename fallback.
  • AzureBlobDownloader::DownloadBlob rework: protected virtuals GetBlobSize and DownloadChunkToBuffer (against an opaque ChunkContext) form a test seam. Worker pool uses an atomic queue index over the pending-chunks list; workers claim, fetch, write to file (under a single file_mutex), mark complete, and periodically SaveState (every max(10, num_chunks/50) chunks). Pre-allocation is skipped if the file already matches blob size, so resume doesn't discard valid bytes.
  • Linked cancellation: a single shared Azure::Core::Context + a single std::atomic<bool> internal_cancel per call. The first chunk failure (or external cancel signal) flips the flag and calls ctx.Cancel(). Other in-flight chunks see either signal and exit fast — a chunk failing while 9 others are mid-flight drains in ~300 ms in the new ChunkFailureCancelsInFlightPeersFast test.
  • IsDownloadNeeded now also returns true if a .dlstate sidecar exists (the data file may be pre-allocated with holes).
  • Sidecar is persisted on any failure and deleted on full success.

Open decisions (resolved)

Question Resolution
BlobDownloadState design Port the C# truncated-bitmap shape directly.
Sidecar filename .dlstate (cross-language compat with C# .download.state was explicitly out of scope).
OnDownloadComplete telemetry Deferred per spec; C++ already signals via 100% progress callback.
Mockability Both: IBlobDownloader mock for orchestrator tests + AzureBlobDownloader subclass via protected virtuals for chunk-level injection.

Tests

  • 9 new CrossProcessFileLockTest cases (Inc 1)
  • 4 new skip-existing cases in download_test.cc (Inc 1)
  • 15 new BlobDownloadStateTest cases (Inc 2): create/mark/save/load/delete, gap enumeration, partial final chunk math, byte-aligned-start advancement, rejection of magic / size / total_chunks mismatches.
  • 5 new AzureBlobDownloaderResumeTest cases (Inc 2): resume from sidecar, fresh download, sidecar persists on chunk failure, stale sidecar cleaned up for empty blobs, cancel-cascade timing.

Targeted BlobDownloader|DownloadManager|CrossProcessFileLock|AzureBlobDownloader suite: 60/60 in 13 s. Full gtest run is 789 passed / 54 skipped; the 3 pre-existing ModelLoadManagerUnloadTest failures are environmental (missing local test model file qwen2.5-0.5b-instruct-generic-cpu-4), not caused by this PR.

Out of scope

Region-based download (Increment 3 in the spec) — tracked separately.

Notes for reviewers

  • C ABI is unchanged; this PR is purely internal to sdk_v2/cpp.
  • Reference implementation in C# lives in neutron-server/src/Downloader/ (private repo); see Increment 2's bitmap-truncation logic in BlobDownloadState.cs:237-281 for the closest analog.

@vercel

vercel Bot commented Jun 9, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
foundry-local Ready Ready Preview, Comment Jun 18, 2026 10:57pm

Request Review

bmehta001 added a commit that referenced this pull request Jun 10, 2026
Two unrelated -Werror diagnostics from Clang (and modern GCC) were tripping
the Linux x64 and macOS ARM64 jobs on PR #793; Windows + MSVC silently
accepted them.

1. blob_download_state.cc: 'kHeaderSize' was a namespace-scope constexpr that
   nothing referenced (the header layout is materialized by the WriteLE call
   sequence, not this constant). Triggers -Wunused-const-variable on Clang.
   Delete it; the layout comment above already documents the 45-byte size.

2. download_test.cc: ChunkFailureCancelsInFlightPeersFast captured 'kFailOffset'
   in a lambda, but it's a constexpr int64_t used only in a constant expression
   so the capture is redundant and -Wunused-lambda-capture flags it. Replace
   [kFailOffset] with [] to match the sister test's pattern.

Also fix a latent issue surfaced during review:

3. MutexFstreamFileWriter::WriteAt now calls file_.clear() before seekp() so
   a prior failure doesn't permanently poison the stream and cause subsequent
   workers to surface a spurious 'write failed' instead of the original error.
   Positional writers are unaffected (pwrite/WriteFile are stateless).

71 tests still pass on Windows (35 in the affected suites verified explicitly).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bmehta001 bmehta001 changed the title sdk_v2/cpp: resumable downloads (cross-process lock + per-chunk state + linked cancellation) [SDKv2] resumable downloads (cross-process lock + per-chunk state + linked cancellation) Jun 10, 2026
bmehta001 and others added 13 commits June 18, 2026 01:19
Increment 1 of the resumable-downloads port (see docs/ResumableDownloadsPlan.md).
No public C ABI changes.

CrossProcessFileLock
- New RAII helper backed by an OS-level exclusive lock on <dir>/.download.lock:
  Windows uses CreateFileW with FILE_SHARE_NONE + FILE_FLAG_DELETE_ON_CLOSE; POSIX
  uses open(O_CREAT|O_RDWR|O_CLOEXEC) + flock(LOCK_EX|LOCK_NB).
- Writes a PID:<pid>,Time:<iso8601> diagnostic line for crash forensics.
- WaitForLockForDirectory polls at 1.25 s with a 3 h timeout. The cancellation
  hook is a std::function<bool()> predicate (not a bare atomic) so callers can
  route it through their own cancellation channel — DownloadManager forwards it
  through the existing progress callback's non-zero return.

DownloadManager::DownloadModel
- Acquires the cross-process lock immediately after create_directories and
  before writing the in-progress signal file.
- Re-checks the cache after acquiring the lock to short-circuit when another
  process just finished the same download.
- Now stores ILogger& logger_ so the lock acquisition can log who is waiting.

DownloadBlobsToDirectory (skip-existing)
- New IsDownloadNeeded(blob, local_path) filter: blobs whose local file
  already exists at the expected content_length are skipped.
- Skipped bytes are credited toward the total — the initial progress callback
  now emits skipped_bytes / total_size * 100 instead of always 0%, so resumed
  downloads start at an honest percentage rather than rewinding to zero.
- If every blob is already on disk the function emits 100% and returns.

Tests
- 9 new CrossProcessFileLockTest cases (acquire, release, contention, recovery
  after release, directory creation, wait happy path, wait-then-acquire,
  cancellation, timeout).
- 4 new BlobDownloadTest cases for skip-existing (same-size, wrong-size,
  progress accounting, everything-cached).
- Full targeted suite passes 40/40 in 14 s.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Increment 2 of the resumable-downloads C++ port. Adds a <file>.dlstate
sidecar that tracks per-chunk completion via a truncated bitmap (matching
the C# BlobDownloadState design from neutron-server), and replaces
AzureBlobDownloader::DownloadBlob's batch loop with a worker pool that
shares a single Azure::Core::Context. The first chunk failure calls
Cancel() on the shared context and flips an internal cancel flag, so every
other in-flight chunk drains within tens of milliseconds instead of
waiting on its own retry+timeout budget.

Highlights:

- BlobDownloadState (new): compact binary on-disk format. ~45-byte LE
  header (magic FLDS + version + sizes + counters) followed by the
  truncated bitmap suffix. The prefix of fully-completed chunks is
  implied — SaveState advances �itmap_byte_aligned_start past every
  fully-set word to keep the sidecar proportional to the unfinished tail.
  LoadState rejects on magic, version, or layout (blob_size / chunk_size /
  total_chunks) mismatch and starts the download fresh in that case.
  Atomic save via tmp file + rename, with remove-then-rename fallback for
  filesystems that don't replace atomically.

- AzureBlobDownloader rework: protected virtual GetBlobSize and
  DownloadChunkToBuffer (against an opaque ChunkContext) form a test
  seam so subclasses can simulate per-chunk behavior without touching
  Azure. Worker pool uses an atomic queue index over pending chunks;
  workers claim, fetch, write, mark complete, and periodically save
  (max(10, num_chunks/50) chunks). Pre-allocation is skipped if the file
  is already at full size, so resume doesn't discard valid bytes.
  Sidecar is persisted on any failure and deleted on full success.

- IsDownloadNeeded now treats the presence of a .dlstate sidecar as
  "download still needed" — the data file may be pre-allocated with holes.

- AzureBlobDownloader picks up an optional ILogger*; DownloadManager
  passes its own logger through.

Tests:

- 15 BlobDownloadStateTest cases (create/mark/save/load/delete, gap
  enumeration, partial final chunk math, byte-aligned-start advancement,
  rejection of magic/size mismatches).
- 5 AzureBlobDownloaderResumeTest cases via a FakeChunkAzureDownloader
  subclass: resume skips already-completed chunks, sidecar persists on
  chunk failure, stale sidecar is cleaned up for empty blobs, and a
  failing chunk drains 9 sleeping peers within ~300 ms (well under the
  2 s threshold) — exercising the linked-cancellation cascade end to end.

20 new tests; full BlobDownloader/DownloadManager/CrossProcessFileLock
suite is 59/59 in ~15 s.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ker memory at 64 KB)

Eliminate the 2 MB-per-chunk std::vector<uint8_t> allocation in
AzureBlobDownloader::DownloadBlob by streaming each chunk through a
sink callback that forwards 64 KB pieces straight to a thread-safe file
writer. Peak download memory drops from concurrency * chunk_size
(128 MB at 64-way concurrency on 2 MB chunks) to roughly
concurrency * 64 KB (~4 MB) regardless of chunk size, matching the
.NET Stream.CopyTo semantics we ported from instead of doubling memory
with a buffer copy.

The file write strategy is now selected via a new `FileWriterKind`
constructor argument on `AzureBlobDownloader` and backed by a small
`IFileWriter` abstraction with two implementations:

- `Positional` (default, recommended): lock-free positional writes
  using `pwrite` on POSIX and `WriteFile` + `OVERLAPPED.Offset` on
  Windows. No user-space mutex; the kernel orders disjoint-range writes.
- `MutexFstream` (comparison / portable fallback): single shared
  `std::fstream` guarded by an internal mutex. The chunk-write fast
  path that used to open a fresh fstream and seek under a mutex per
  chunk is now subsumed by this writer; the file handle is opened once
  and reused for every WriteAt.

Both writers handle `Open` correctly for the resume path: an existing
file at exactly the expected size is preserved (so already-downloaded
bytes survive across the writer swap), and any other state triggers
pre-allocation to the expected size.

The orchestrator's worker pool, atomic cancel cascade, sidecar save
cadence, and progress reporting are unchanged. The renamed protected
virtual `DownloadChunkStreaming` (replacing
`DownloadChunkToBuffer`) is the new test seam; both production code
and the existing `FakeChunkAzureDownloader` test double now use the
sink callback to deliver chunk bytes.

Tests:
- New `WriterImpls/FileWriterTest` runs 5 correctness checks against
  both writer implementations (10 tests total): open semantics for
  fresh / existing-at-size / existing-at-different-size files, single
  thread WriteAt, and 8 threads writing 256 KB regions to disjoint
  offsets validated byte-for-byte after close.
- New `FileWriterPerfComparison.PositionalVsMutexFstream` runs the
  realistic AzureBlobDownloader workload (32 workers, 8 chunks/worker,
  2 MB chunks, 64 KB sink pieces, 512 MB total) against both writers
  and prints wall-clock + MB/s for comparison. Measured locally on
  NVMe NTFS: Positional averages ~590 MB/s, MutexFstream ~545 MB/s
  (Positional ~8 percent faster on average; both well above 500 MB/s).
- All existing 60 BlobDownload* + AzureBlobDownloader* tests still
  pass without modification beyond the chunk_hook signature update.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two unrelated -Werror diagnostics from Clang (and modern GCC) were tripping
the Linux x64 and macOS ARM64 jobs on PR #793; Windows + MSVC silently
accepted them.

1. blob_download_state.cc: 'kHeaderSize' was a namespace-scope constexpr that
   nothing referenced (the header layout is materialized by the WriteLE call
   sequence, not this constant). Triggers -Wunused-const-variable on Clang.
   Delete it; the layout comment above already documents the 45-byte size.

2. download_test.cc: ChunkFailureCancelsInFlightPeersFast captured 'kFailOffset'
   in a lambda, but it's a constexpr int64_t used only in a constant expression
   so the capture is redundant and -Wunused-lambda-capture flags it. Replace
   [kFailOffset] with [] to match the sister test's pattern.

Also fix a latent issue surfaced during review:

3. MutexFstreamFileWriter::WriteAt now calls file_.clear() before seekp() so
   a prior failure doesn't permanently poison the stream and cause subsequent
   workers to surface a spurious 'write failed' instead of the original error.
   Positional writers are unaffected (pwrite/WriteFile are stateless).

71 tests still pass on Windows (35 in the affected suites verified explicitly).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two correctness gaps surfaced during PR review of the new resumable-download
machinery; both bias toward silently destroying intact on-disk progress when a
transient filesystem error happens.

EnsureFileExistsAtSize (file_writer.cc): the previous implementation treated
any std::filesystem::file_size() error as 'file does not exist' and fell
through to opening an std::ofstream — which has implicit ios::trunc — over
the path. A permission glitch, NFS stat hiccup, or virus-scanner-induced EBUSY
on a file that *did* exist at the right size would wipe the partial download
and force a restart from chunk 0. Now: only the no_such_file_or_directory
case proceeds to (re)create; any other stat error throws so the resume bitmap
on disk is preserved and the caller can retry.

SaveState (blob_download_state.cc): the rename-failed fallback used to do
remove(state_path) + rename(tmp_path, state_path). If the second rename
also failed (sharing violation, EXDEV, etc.) we had already deleted the
old sidecar — leaving nothing on disk and forcing a from-scratch restart
on the next run. std::filesystem::rename atomically replaces on every
platform we target (POSIX rename(2); Windows MoveFileExW REPLACE_EXISTING),
so that fallback was both unnecessary and destructive. Now: on rename
failure, just remove the tmp file and log a warning; the previous
state_path is left intact and the next SaveState call retries with the
up-to-date in-memory bitmap.

All 61 download-related tests still pass on Windows.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…eases

The unlink-on-release behavior (POSIX explicit unlink, Windows
FILE_FLAG_DELETE_ON_CLOSE) mirrored what the C# reference does but inherited
the same theoretical race: between unlink and close on POSIX, another
acquirer can O_CREAT a fresh inode at the path and flock that, leaving two
processes briefly believing they hold the lock on different inodes. In our
download protocol the race is benign because every acquirer immediately
re-checks 'is the model already downloaded' under the new lock and returns a
no-op — but the cleaner answer is to never open the window in the first place.

Persist the .download.lock file across acquisitions:
  - POSIX State: drop the 'path' field; destructor just close()s.
  - Windows: drop FILE_FLAG_DELETE_ON_CLOSE; OPEN_ALWAYS opens the existing
    inode on re-acquire, and dwShareMode=0 still enforces exclusivity.

Re-acquirers reopen the same inode — there is no path-to-inode race window
anywhere in the lifecycle. The file is a few bytes of debug payload and
lives alongside the model artifacts; no user-visible impact.

Test update: ReleaseOnDestructionRemovesLockFile is replaced by
ReleaseLeavesLockFileForReuse, which asserts both that the file persists and
that a fresh TryAcquireForDirectory against the same directory succeeds.

All 61 download-related tests pass on Windows.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The IFileWriter interface + FileWriterKind strategy enum + MutexFstream second
implementation + runtime selection were speculative generality: the only real
variation (Windows vs POSIX) is already a compile-time #ifdef, and nothing in
production ever selected MutexFstream (download_manager always used the default
Positional; MutexFstream existed only for a unit-test parameterization and a
thresholdless perf benchmark).

Replace it with a single concrete FileWriter (Open/WriteAt/Close), backed by
pwrite (POSIX) / WriteFile+OVERLAPPED (Windows) via #ifdef. The Windows HANDLE
is stored as void* so <windows.h> stays out of the header. AzureBlobDownloader
now stack-allocates the writer; the FileWriterKind enum, constructor parameter,
and selection branch are gone.

Tests: drop the MutexFstream parameterization (TEST_P -> TEST) and the
PositionalVsMutexFstream benchmark; the same correctness assertions now run once
against FileWriter. The Azure test seam (GetBlobSize/DownloadChunkStreaming) and
real-temp-file writes are unchanged.

Verified (RelWithDebInfo): build clean; FileWriterTest (5), DownloadManagerTest
(17), CrossProcessFileLockTest (9), BlobDownloadStateTest (15) all pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address the download-lock review items:

- Per-model in-process lock. Replace the single global download_mutex_ with a
  per-model mutex keyed on the resolved cache path. Two downloads of the same
  model serialize; downloads of different models run concurrently in-process
  instead of queuing behind each other's (up to 3 h) cross-process waits.

- Close the POSIX flock()+unlink() orphan-inode race. After flock() succeeds,
  verify (fstat vs stat) that the inode we locked is still the file at the lock
  path; if a racing releaser unlinked it and a third process recreated it, drop
  the stale lock and report contention so the caller retries. This makes the
  self-cleaning unlink-on-release provably safe and guarantees two processes can
  never both believe they hold the lock - so a model can never be downloaded to
  the same directory twice at once, across any number of processes or apps.

- Fix the misleading Windows ACCESS_DENIED comment: it is the DELETE_ON_CLOSE
  delete-pending window (STATUS_DELETE_PENDING), not "narrower access rights".

- Document why the POSIX unlink-before-close is safe (fresh-fd non-blocking
  waiters; no work between unlink and close; the inode check above).

- Decouple the lock-wait cadence from the progress heartbeat: poll the
  cancellation/heartbeat callback once per poll_interval instead of every
  100 ms, so a user callback is not invoked ~10x/s for the whole wait.

- Tests: add ConcurrentDownloadsOfDifferentModelsRunConcurrently (proves
  different models do not serialize). Existing same-model serialize and
  CrossProcessFileLock acquire/release/wait/cancel/timeout tests still pass
  (66 download-suite tests green; POSIX branch syntax-checked under g++).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…actor

Rebasing onto main surfaced API changes that auto-merge could not reconcile:
- blob_downloader.cc: drop the local EndsWith helper; main supplies the shared
  EndsWithIgnoreCase (used for the inference_model.json filter), so the local
  copy was unreferenced (/WX dead-code break).
- download_test.cc: main moved ModelRegistryClient's test HTTP injection from a
  SetHttpGet setter (returning a JSON string) to constructor injection of an
  HttpGetResponseFn returning http::HttpResponse; update the one remaining
  branch-added test to the new pattern (MakeRegistryResponse).
- download_manager.cc: refresh a stale comment that referenced the removed
  global download_mutex_ (now the per-model lock).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…boundary

bitmap_byte_aligned_start marks the chunk index below which every chunk is
implicitly complete. SaveState advanced it by accumulating +64 per fully-set
word onto the (possibly unaligned) previous start, then adding the trailing-zero
offset measured from the word base. When the previous start was not a multiple
of 64, the two bases disagreed and the new start overshot by (start % 64),
marking never-downloaded chunks complete on reload — silent, permanent data
corruption once the sidecar is deleted on completion.

Derive the new start from the word index directly (word_idx * 64 + trailing
zero), independent of the previous start. Add a regression test that saves from
a non-word-aligned prefix, extends it across a word boundary, saves again, and
asserts the never-downloaded chunks remain pending after reload.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…s resumable

Found via end-to-end SDK testing: interrupting a download (process kill / crash)
left the model "cached" but full of zeros.

Root cause: AzureBlobDownloader pre-allocates each data file to its full size
(FileWriter::Open) before downloading chunks, but the .dlstate sidecar was only
written at the first periodic save (after save_interval ~= 10 chunks, ~20 MB).
IsDownloadNeeded treats "data file at full content_length + no sidecar" as a
completed download and skips it. So a crash in the window between pre-allocation
and the first periodic save left a full-size, mostly-empty file with no sidecar;
the next run skipped it and wrote inference_model.json, marking the model
complete while the weights were all zeros — silent, permanent corruption.

Fix: persist the sidecar immediately after CreateNew, before Open() pre-allocates
the file, upholding the invariant IsDownloadNeeded relies on ("pre-allocated but
unfinished <=> sidecar present"). A subsequent run then sees the sidecar and
resumes the missing chunks instead of skipping the file.

Regression test (AzureBlobDownloaderResumeTest.SidecarExistsBeforeFirstChunkCompletes)
asserts the sidecar is on disk the moment the first chunk is requested.

Verified E2E through the Python SDK: download qwen2.5-0.5b, kill at ~18%, resume
to completion — the result is now byte-for-byte (SHA-256) identical to a fresh
uninterrupted download, including the 862 MB model.onnx.data. Previously that
file was 100% zeros.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The per-chunk progress path (per_chunk_progress -> options.progress) could be
entered concurrently by up to max_concurrency (default 64) chunk worker threads,
but the public download progress API does not require the caller's callback to
be thread-safe. A typical callback that updates a counter, UI handle, or logger
would data-race. Guard the user callback invocation with a mutex so it is never
re-entered concurrently; the atomics that compute the percentage are unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The sidecar was saved only every save_interval = max(10, num_chunks/50) chunks.
On a slow connection that interval can span minutes, so a hard crash could lose
minutes of completed download (all re-fetched on resume). Add a time cap
(AzureBlobDownloader::save_state_interval_, default 3s): flush when the chunk
count OR the elapsed wall-clock since the last save is reached, whichever first.

This does not slow downloads: the check runs only at chunk completion (so it
never flushes more often than chunks arrive), the sidecar write is tiny and not
fsync'd, and it happens off the download critical path (network I/O and the file
write take no state lock). On fast links the chunk count is still hit first, so
save cadence is unchanged; on slow links the tiny extra writes land in the
network-idle gaps and bound crash loss to seconds.

The interval is an injectable member so tests can force the time path;
TimeBasedSaveFlushesBeforeChunkInterval drives a 5-chunk blob (below the 10-chunk
count interval) with a zero cap and asserts the sidecar is flushed mid-download.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rename WriteLE/ReadLE to WriteNative/ReadNative — the helpers serialize
scalars in host byte order (little-endian on every supported target),
not via explicit little-endian conversion. Update the on-disk format
comment to match.

Fix two stale BlobDownloadState doc comments: full_completion_bitmap is
pre-sized for all chunks by CreateNew and indexed by absolute chunk_idx
(not lazily grown), and bitmap_byte_aligned_start only trims the
serialized sidecar prefix, not the in-memory buffer.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Save the resume sidecar every ~16 MB of completed chunks instead of every 2%
of the blob, so a hard crash re-downloads at most a fixed amount regardless of
blob size. This makes the wall-clock save trigger redundant, so remove it
(save_state_interval_, the time-based branch, the SetSaveStateInterval hook,
and the TimeBasedSave test).

Drop ChunkContext::cancel_flag and GetCancelFlag(): production cancels in-flight
chunks via Azure::Core::Context::Cancel(), and the real DownloadChunkStreaming
never read the flag — it existed only for the test to observe cancellation. The
cancel test now polls the genuine signal through
IsCancellationRequested() -> azure_ctx->IsCancelled(), exercising the
production path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fork a child that holds the directory lock and verify this process is locked
out while the child holds it, and that the kernel releases the flock when the
child exits — even though the child leaves the lock file on disk, mirroring a
downloader that crashed mid-download. Closes the gap where lock coverage was
in-process only.

POSIX-gated (macOS/Linux); Windows share-none contention is already covered
in-process by SecondAcquireReturnsNullWhileFirstIsHeld.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant