Skip to content

Commit 35041fd

Browse files
committed
ci(rust-test): debuginfo=0 on the test job — link-footprint relief (TD-CI-COVERAGE-MOLD-1)
The `test` job has begun hitting the same disk/RSS link cliff already mitigated on `test-with-coverage` (b56bb2c): `ld terminated with signal 7 [Bus error]` + an LLVM crash at the `cargo test --no-run` link step of test_sql_query / intervene_counterfactual. Root cause is link-footprint growth, NOT a logic break (a layout break would fail an assertion, not SIGBUS at link). PR #507 (0c6ef02, +4055 lines across causal-edge ce64-v2 layout + cognitive-shader-driver MailboxSoaOwner / SurrealMailboxView) grew the integration-test object set enough to tip the previously-marginal `test`-job link over the same ceiling. It surfaced on the first full-workspace CI run after #507 (the intervening PRs are root-excluded crates, so their CI never linked the post-#507 tree). Fix: give the `test` job a job-level RUSTFLAGS with `-C debuginfo=0` (parity with the coverage job). debuginfo carries no value in CI (no debugger is attached); dropping it cut the coverage job's per-binary link ~930 MB -> ~252 MB (-73%, measured in b56bb2c) and relieves both the mold/GNU-ld RSS and the disk ceiling. mold is already installed on this job. Side effect: the job gets its own Swatinem cache key (first run repopulates). This is a fence (buys headroom), not a root reduction of #507's legitimate codegen — documented as such in the TD-CI-COVERAGE-MOLD-1 ledger addendum, including the secular-growth caveat and the separate (warns-not-fails) intervene_counterfactual.rs deprecated-API debt.
1 parent 1869862 commit 35041fd

2 files changed

Lines changed: 48 additions & 0 deletions

File tree

.claude/board/TECH_DEBT.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,38 @@ timing-race hypothesis (read the actual `cargo llvm-cov` log with a scoped token
147147
Cross-ref: `.github/workflows/rust-test.yml` (test job mold step vs coverage job);
148148
`bindspace-singleton-to-mailbox-soa-v1` (the migration this is NOT).
149149

150+
**2026-06-16 addendum — the `test` job now hits the SAME cliff; fix extended to
151+
it (branch `claude/ci-test-job-debuginfo0`).** The cliff this entry called out as
152+
*"a 2/50 intermittent"* on the coverage job has now surfaced on the **plain
153+
`test`** job: `ld terminated with signal 7 [Bus error]` + an LLVM crash dump at
154+
the `cargo test --no-run` link step of `test_sql_query` / `intervene_counterfactual`.
155+
Root-caused to a **link-footprint growth, not a logic break** (a layout break would
156+
fail an assertion, not SIGBUS at link): **PR #507** (`0c6ef02c`, +4055/−1048 across
157+
`causal-edge` edge.rs/layout.rs — the ce64-v2 layout — and `cognitive-shader-driver`
158+
mailbox_soa.rs/driver.rs/planner_bridge.rs — MailboxSoaOwner + SurrealMailboxView,
159+
D-PG-6) grew the object-file set linked by the lance-graph integration tests enough
160+
to tip the previously-marginal `test`-job link over the same disk/RSS ceiling. It
161+
surfaced on the first full-workspace CI run *after* #507 (the two PRs between, #509
162+
and the perturbation-sim #511, are root-`exclude`d so their CI never linked the
163+
post-#507 tree — which is why this is "the first failing PR" yet not its fault).
164+
**This is a FENCE, not a root reduction:** it does not shrink #507's legitimate
165+
codegen; it removes the dead `debuginfo=1` weight (CI never opens a debugger) to
166+
buy headroom — exactly the b56bb2cd lever, now applied to the `test` job. **Fix:**
167+
job-level `RUSTFLAGS: "-C debuginfo=0 -C target-cpu=x86-64-v3"` on `test` (parity
168+
with `test-with-coverage`; mold already installed). Side effect: the `test` job
169+
gets its own Swatinem cache key (first run repopulates). **Confirm** on the next
170+
green `test` run. **Residual debt if it recurs after this:** the footprint is on a
171+
secular upward trend (every cognitive-layer PR adds codegen) — the durable fix is a
172+
bigger runner or splitting the integration-test link set, not repeatedly shaving
173+
flags. Separately, #507 left `intervene_counterfactual.rs:133/165` calling the
174+
**deprecated** `CausalEdge64::inference_type()` (the consumer-migration commit
175+
`8131c480` lives on the unmerged `claude/continue-ndarray-x0Oaw`) — that WARNS, does
176+
not fail (v1 default routes through the canonical mapping per I-LEGACY-API-FEATURE-
177+
GATED); tracked here as a separate latent item, not fixed on this CI branch.
178+
Cross-ref: `.github/workflows/rust-test.yml` (now both jobs at `debuginfo=0`); PR
179+
#507 (`0c6ef02c`); `claude/continue-ndarray-x0Oaw` (the pending ce64-v2 consumer
180+
migration).
181+
150182
### TD-UNBUNDLE-FROM-1 — `unbundle_from` is NOT the inverse of `bundle_into` (2026-06-07)
151183

152184
**Open.** `crates/lance-graph-planner/src/cache/kv_bundle.rs``unbundle_from`

.github/workflows/rust-test.yml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,22 @@ jobs:
2929
test:
3030
runs-on: ubuntu-24.04
3131
timeout-minutes: 30
32+
env:
33+
# Override the workflow-level debuginfo=1 for this job too (parity with
34+
# test-with-coverage, TD-CI-COVERAGE-MOLD-1). The `test` job links the
35+
# full lance+datafusion integration-test set at the SAME disk/RSS cliff
36+
# the coverage job hit — and #507 (+4055 lines across causal-edge +
37+
# cognitive-shader-driver: ce64-v2 layout + MailboxSoaOwner/
38+
# SurrealMailboxView) grew that link footprint enough to tip the
39+
# previously-marginal link into a hard `ld` SIGBUS (signal 7 = object
40+
# file truncated when the runner partition fills mid-link). debuginfo=1
41+
# carried no value here (CI never opens a debugger); dropping it cut the
42+
# coverage job's per-binary link from ~930 MB to ~252 MB (-73%, measured
43+
# in b56bb2cd) and relieves BOTH ceilings (mold/GNU-ld RSS + disk). mold
44+
# is already installed below. Note: a job-level RUSTFLAGS gives this job
45+
# its own Swatinem cache key — the first run after this change
46+
# repopulates the test cache.
47+
RUSTFLAGS: "-C debuginfo=0 -C target-cpu=x86-64-v3"
3248
defaults:
3349
run:
3450
working-directory: lance-graph

0 commit comments

Comments
 (0)