Skip to content

refactor: parallelise ATS integration tests and shard CI coverage#1308

Open
MiguelLZPF wants to merge 2 commits into
developfrom
refactor/ats-test-shard-parallel-coverage
Open

refactor: parallelise ATS integration tests and shard CI coverage#1308
MiguelLZPF wants to merge 2 commits into
developfrom
refactor/ats-test-shard-parallel-coverage

Conversation

@MiguelLZPF

@MiguelLZPF MiguelLZPF commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Description

Behaviour-preserving test-infrastructure and CI changes on the ATS contracts. SDK and web are deprecated, so the workflow is contracts-only (PR trigger narrowed to packages/ats/contracts/**). Rebased onto develop v8.0.0; two commits (the test-isolation fix, and the coverage-sharding system).

  1. Parallel integration tests. The mega-asset integration suite is sharded across parallel mocha workers via a shared suiteDiscovery module + atsShardRunner: ats.test.ts delegates to the runner and ats.shard.{1..8}.test.ts are the local parallel entries, each deploying the mega-asset once (its own EVM/snapshot per worker — mocha parallelises by file). The custom revertedWithCustomError chai-matchers patch re-initialises per worker (mocha --parallel does not apply the require-based globalSetup to workers).

  2. Sharded CI coverage — balanced, one deploy + one compile per shard. solidity-coverage runs across a 4-way matrix that doubles as the contracts-integration gate. ats.test.ts reads ATS_MEGA_SHARD_INDEX/TOTAL, so each shard deploys the mega-asset once for its slice. Suites are assigned by greedy longest-processing-time bin-packing keyed on each suite's it() count (a near-exact proxy for test count), so the per-shard load is balanced instead of the old file-count round-robin (305–708). The redundant test-phase recompile that hardhat coverage would do is skipped during coverage runs (TASK_TESTnoCompile, gated on solidity-coverage's own __SOLIDITY_COVERAGE_RUNNING flag), cutting one full ~60s instrumented compile per shard.

  3. Single merged coverage report. Shards upload their lcov as artifacts; a merge-coverage job combines them into one report uploaded once to Codecov — matching a local single run instead of per-shard pieces. The merge uses a small in-repo lcov merger (mergeLcov.ts): the off-the-shelf lcov-result-merger silently drops function coverage, and the system lcov CLI needs a flag soup, emits thousands of warnings on solidity-coverage output, and rewrites the file in a newer format with unverified Codecov support — so the in-repo merger is the maintainable, exact choice. It is rewritten for clarity, unit-tested, and documented.

  4. Local parallel coverage. npm run test:coverage:parallel is the local twin of the CI matrix: one ephemeral git worktree per shard (required — a coverage run rewrites generated source in place, so concurrent shards in one checkout would corrupt each other), node_modules symlinked, merged via the same in-repo merger.

  5. Trust guards — proof the split drops no test. assignBalancedShards asserts a disjoint + complete partition at runtime (every suite in exactly one shard), backed by unit tests (shardPartition.test.ts, incl. the real planShard partition). A CI verify-test-count job runs the integration suite in one sequential pass and the merge-coverage job reconciles the sum of the shards' passing counts against that sequential count, failing on mismatch — so a silently dropped or double-run suite turns the build red.

  6. v8.0.0 coverage gas fix. After the rebase, the heavier instrumented orchestrator-library deploy (TokenCoreOps et al.) outgrew the fixed GAS_LIMIT.high (10M) those deploys used under coverage. They now route through gasLimitOverride, so a coverage run uses a higher limit (30M → 100M, well under the 300M coverage blockGasLimit) while Hedera/hiero-solo stays at GAS_LIMIT.high (10M), under the 15M per-tx cap — the Solo deployment job (real Hedera gas) confirms it.

  7. Scripts gate fix + shared install. Fixed a pre-existing test-isolation leak where atsRegistry.generated.test.ts seeded zero-address orchestrator libraries into a shared module singleton and never cleared them, breaking the scripts suite standalone (resetOrchestratorLibraryAddresses() cleared in an after hook). CI install is shared via an actions/cache of node_modules keyed on package-lock.json, installing with npm ci --ignore-scripts on a miss.

The whole pipeline is documented in scripts/tools/coverage-shard/README.md.

flowchart TD
  MA["mega-asset suites<br/>(one shared deploy)"]
  ST["standalone infra suites"]
  MA -->|"balanced slice"| S0 & S1 & S2 & S3
  ST -->|"balanced (it-count LPT)"| S0 & S1 & S2 & S3
  S0["coverage shard 0 → lcov"] --> MG
  S1["coverage shard 1 → lcov"] --> MG
  S2["coverage shard 2 → lcov"] --> MG
  S3["coverage shard 3 → lcov"] --> MG
  MG["merge-coverage<br/>(in-repo merger, preserves functions)"] --> CV[("Codecov<br/>single report")]
  VC["verify-test-count<br/>(sequential oracle)"] -.->|"Σ shards == sequential?"| MG
Loading

Type of change

  • Bug fix 🐞
  • New feature ✨
  • Breaking change 💥
  • Documentation update 📖
  • Refactor 🔧

Testing

All green on CI run 28080825464 (token-studio-linux-large); the post-fold run re-validates the same tree.

CI:

Job Time
contracts scripts tests 3m18s
coverage shards (×4, parallel) 5m58s / 4m55s / 5m21s / 6m30s (slowest)
verify test count (sequential oracle) 4m42s
merge coverage + upload (incl. count reconciliation) 1m17s
total workflow 8m1s

Separately, the Deploy to Hiero Solo Network job (real Hedera gas) passes — confirming the 10M Hedera deploy path is unaffected by the coverage gas bump.

→ For reference, serial CI coverage measured ~14m9s before sharding (via a temporary baseline job, since removed) — sharded ≈ 7m21s wall, ~2× faster.

Local (warm working tree, illustrative — absolute counts shift slightly with v8.0.0 + the added unit tests):

Run mode Result
test (serial — all integration + scripts) green
test:parallel:ats (mega core, --parallel) green
test:coverage (single run, full) ~2672 passing
test:coverage:parallel (4 worktrees) green; merged lcov == single run
test:coverage:shard ×4 665–670 passing each (balanced)

Local coverage: serial ~11m vs sharded parallel wall ~5m.

Coverage equality — single run vs the 4-shard merged lcov (what Codecov ingests), re-confirmed on the rebased branch (461 source files):

Metric Single run 4-shard merged
Lines 4849/5027 = 96.459% 4849/5027 = 96.459% (identical)
Functions 1916/1972 = 97.160% 1916/1972 = 97.160% (identical)
Branches 3343/3542 = 94.382% 3342/3542 = 94.353% (−0.03pp, 1 branch — never over-reports)

Lines and functions are identical; branches differ by one (a cross-suite branch only the single shared deploy reaches).

Test Results

All green; merged sharded coverage equals a single run; the verify-test-count job confirms the sharded test count equals a sequential pass.

Node version:

  • 20
  • 22
  • 24

Checklist

  • Style Guidelines followed ✅
  • Documentation Updated 📚
  • Linters - No New Warnings ⚠️
  • Local Tests Pass ✅
  • Effective Tests Added ✔️
  • No reduction of Coverage

@MiguelLZPF MiguelLZPF added the no-changeset bypass changeset check label Jun 19, 2026
@MiguelLZPF MiguelLZPF self-assigned this Jun 19, 2026
@MiguelLZPF MiguelLZPF force-pushed the refactor/ats-test-shard-parallel-coverage branch from b209bfd to 892df51 Compare June 22, 2026 06:32
@MiguelLZPF MiguelLZPF marked this pull request as ready for review June 22, 2026 07:20
@MiguelLZPF MiguelLZPF requested review from a team as code owners June 22, 2026 07:20
@MiguelLZPF MiguelLZPF force-pushed the refactor/ats-test-shard-parallel-coverage branch 3 times, most recently from a0f2db0 to 2b9b244 Compare June 24, 2026 06:35
…unit suite

The atsRegistry.generated unit suite seeds zero-address orchestrator library
placeholders into a shared module singleton to construct facet factories in
isolation, but never cleared them. Running the scripts suite on its own then
leaked those zeros into later token deployments, linking scheduledTasksOps to
address(0) and reverting initializeERC20. Add resetOrchestratorLibraryAddresses()
and clear only the seeded state in an after hook.

Signed-off-by: Miguel_LZPF <miguel.carpena@io.builders>
@MiguelLZPF MiguelLZPF force-pushed the refactor/ats-test-shard-parallel-coverage branch from 2b9b244 to 295e804 Compare June 24, 2026 06:39
Shard the mega-asset integration suite for parallel mocha workers (shared suiteDiscovery +
atsShardRunner; ats.test.ts delegates, shards/ats.shard.1..8 are the local parallel entries)
and split solidity-coverage across a weight-balanced 4-way CI matrix that merges per-shard
lcov into a single Codecov upload via an in-repo merger preserving function coverage. Add
local parallel coverage via ephemeral git worktrees (test:coverage:parallel) and skip the
redundant test-phase recompile under coverage. Guard the split with a disjoint+complete
partition assertion plus unit tests, and a CI verify-test-count job that reconciles the
sharded test count against a sequential pass so no suite can be silently dropped. Raise the
coverage gas limit (gasLimitOverride 30M->100M) and route the orchestrator-library deploys
through it so v8.0.0's heavier instrumented deploy fits under solidity-coverage. Cache CI
deps and make the workflow contracts-only. Documented in scripts/tools/coverage-shard/README.md.

Signed-off-by: Miguel_LZPF <miguel.carpena@io.builders>
@MiguelLZPF MiguelLZPF force-pushed the refactor/ats-test-shard-parallel-coverage branch from bceeab6 to 0c3ff70 Compare June 24, 2026 13:14
@MiguelLZPF MiguelLZPF removed the no-changeset bypass changeset check label Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant