Skip to content

Commit a2feffe

Browse files
committed
ci(rust-test): add mold linker to the coverage job (parity with test) + TD-CI-COVERAGE-MOLD-1
Diagnosis (grounded, not inferred): the test-with-coverage job intermittently failed (2/50 recent runs) while the plain test job stayed green on the SAME test command. Root cause is NOT the SoA-singleton migration and NOT a logic bug -- a logic bug would fail the plain test job too. The cause is a CI asymmetry: the `test` job sets up the mold linker (with a comment that the heavy lance+datafusion binaries OOM the default GNU ld at link), but the `test-with-coverage` job did not -- and it links even LARGER llvm-cov instrumented binaries with the default linker, so the OOM is more likely there. Fix: add the identical mold setup step to the coverage job (the action is already trusted -- used by the test job, release.yml, rust-publish.yml). Board: TD-CI-COVERAGE-MOLD-1 recorded (Open, paid-by this PR, confirm on next green coverage run). The entry explicitly records that the SoA migration plan (bindspace-singleton-to-mailbox-soa-v1) needs NO calibration on account of this -- the coverage failure is orthogonal infra noise, fail_ci_if_error:false already keeps it non-blocking, and the honest residual (timing-race not 100% excluded without the 403'd log) is noted with its escalation path. https://claude.ai/code/session_01PBTGaPCSnnt6u3pjXpbLwY
1 parent 5363f43 commit a2feffe

2 files changed

Lines changed: 37 additions & 0 deletions

File tree

.claude/board/TECH_DEBT.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,35 @@
1515

1616
## Open Debt
1717

18+
### TD-CI-COVERAGE-MOLD-1 — `test-with-coverage` job lacks the mold linker the `test` job has (2026-06-12)
19+
20+
**Open — fix applied this PR, CONFIRM on next green run.** The `Rust Tests`
21+
workflow's `test` job sets up the `mold` linker (`rui314/setup-mold@v1`) with the
22+
comment *"Heavy lance+datafusion integration-test binaries OOM the default GNU
23+
`ld` at the link step (intermittent)."* The sibling `test-with-coverage` job did
24+
**not** set up mold, and links the **even larger llvm-cov-instrumented** binaries
25+
with the default linker — so the OOM is *more* likely there. Symptom: across the
26+
last 50 `rust-test.yml` runs, exactly 2 hit `test=success / cov=failure`
27+
(`claude/probe-mantissa-fill` a32cb177 and `claude/nice-edison-g4rhhl` 12c5ea35);
28+
the plain `test` job stayed green in both. **This is NOT a logic/test failure and
29+
NOT a side-effect of the SoA-singleton migration** (`bindspace-singleton-to-mailbox-soa-v1`
30+
et al.): a migration bug would fail the plain `test` job too — it doesn't, and the
31+
two SoA debts (TD-RESONANCEDTO-DUP-1 P3/deferred, TD-UNBUNDLE-FROM-1 ~1-bit/100-epoch
32+
drift) crash nothing. Confidence: HIGH on the infra cause; the workflow's own
33+
comment names this exact OOM, mold is missing only on the coverage job, and the
34+
failure is intermittent (= memory pressure, not a deterministic bug).
35+
**Residual (honest):** the codecov upload step already sets `fail_ci_if_error:
36+
false`, so the noise is a job-level ❌ that does NOT block merge (`mergeable=True`);
37+
and without the CI log (token 403) a timing-sensitive race surfacing only under
38+
instrumentation's slower execution cannot be *100%* excluded — but the migration's
39+
concurrency tests (D-SNGL-6 writer+reader threads) are PROPOSAL, not shipped, so
40+
there is no concurrent SoA test to race yet. **Paid by:** this PR adds the mold
41+
step to the coverage job (parity with `test`). **Confirm** by a green
42+
`test-with-coverage` run; if it still fails after mold, escalate to the
43+
timing-race hypothesis (read the actual `cargo llvm-cov` log with a scoped token).
44+
Cross-ref: `.github/workflows/rust-test.yml` (test job mold step vs coverage job);
45+
`bindspace-singleton-to-mailbox-soa-v1` (the migration this is NOT).
46+
1847
### TD-UNBUNDLE-FROM-1 — `unbundle_from` is NOT the inverse of `bundle_into` (2026-06-07)
1948

2049
**Open.** `crates/lance-graph-planner/src/cache/kv_bundle.rs``unbundle_from`

.github/workflows/rust-test.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,14 @@ jobs:
120120
run: |
121121
rustup toolchain install stable
122122
rustup default stable
123+
- name: Setup mold linker
124+
# Parity with the `test` job above (TD-CI-COVERAGE-MOLD-1): the heavy
125+
# lance+datafusion test binaries OOM the default GNU `ld` at link
126+
# (intermittent) — and llvm-cov INSTRUMENTED binaries are larger, so
127+
# the OOM is MORE likely here than in the plain `test` job that already
128+
# has mold. Without this step the coverage job flaked while `test`
129+
# stayed green (2/50 runs). mold links them fast + low-memory.
130+
uses: rui314/setup-mold@v1
123131
- uses: Swatinem/rust-cache@v2
124132
with:
125133
shared-key: "lance-graph-deps"

0 commit comments

Comments
 (0)