Skip to content

loader: memoize includes to avoid re-expanding diamond include graphs#886

Open
davireis wants to merge 1 commit into
compose-spec:mainfrom
davireis:loader-memoize-includes
Open

loader: memoize includes to avoid re-expanding diamond include graphs#886
davireis wants to merge 1 commit into
compose-spec:mainfrom
davireis:loader-memoize-includes

Conversation

@davireis

Copy link
Copy Markdown

Problem

ApplyInclude re-parses and recursively re-expands an included file once per include path that reaches it. When the same file is reachable through more than one path (a "diamond" in the include graph), this is quadratic-to-exponential: a 24-level doubling graph loads the leaf 2²⁴ ≈ 16.7M times.

This shows up in real monorepos that aggregate per-target / per-project compose fragments via include:. A federation of ~80 services took ~55s in docker compose config; the cost is re-expansion, not the graph size.

Fix

Memoize each loaded include model for the duration of a single load (carried in ctx, so it never leaks across Load calls). The cache key is every input that determines the model — resolved paths, working dir, project dir, and effective environment — and a deep copy is handed out on each hit.

The merge into the parent (importResources) still runs for every occurrence, so:

  • a same-file extends in the including file still resolves (the included content is present in each including scope), and
  • the result is byte-identical to loading the file each time; only the parse + recursive expansion is shared.

Correctness details

  • Per-include env_file / project_directory: folded into the key, so the same path included with a different environment or project dir does not share a cache entry.
  • Relative-path rebasing: the same file reached through two parents can have a different relative base (e.g. a/b vs b), which yields models with different relative paths. Keying on the working dir prevents reusing a model whose paths the caller would then rebase incorrectly.
  • Cycle-safe: an include cycle is intrinsic to a node's subtree (the back-edge is in the fixed set of files the node includes), so it is detected on the node's first load — before it can be cached. A cyclic node is therefore never served from cache.
  • Listeners still fire per include entry; only the load is memoized.

Tests

  • TestIncludeDiamondDedup: a depth-24 diamond that times out without the cache and completes in ~0.15s with it (both a correctness and a non-flaky perf-regression test).
  • BenchmarkIncludeDiamond.
  • Full go test ./... passes; gofmt/go vet clean.

@davireis davireis requested a review from ndeloof as a code owner June 19, 2026 14:01
ApplyInclude re-parses and recursively re-expands an included file once per
include path that reaches it. When the same file is reached through more than
one path (a "diamond" in the include graph) this is quadratic-to-exponential:
a 24-level doubling graph loads the leaf 2^24 times. Monorepos that aggregate
per-target / per-project compose fragments hit this in practice (an ~80-service
federation took ~55s in `docker compose config`).

Memoize each loaded include model for the duration of a single load, keyed on
every input that determines it — resolved paths, working dir, project dir, and
effective environment — and hand out a deep copy on each hit. The merge into
the parent (importResources) still runs for every occurrence, so a same-file
`extends` in the including file still resolves and the result is identical to
loading each time; only the parse + recursive expansion is shared.

Keying on the working dir matters: the same file reached through two parents
can have a different relative base, yielding models with different relative
paths; reusing across bases would let the caller rebase an already-resolved
path. Cycle-safe: an include cycle is intrinsic to a node's subtree, so it is
detected on the node's first load, before it can be cached.

Adds a deep-diamond regression test (times out without the cache) and a
benchmark.

Signed-off-by: Davi de Castro Reis <davi@davi.eng.br>
@davireis davireis force-pushed the loader-memoize-includes branch from 08775c8 to e8f62dd Compare June 19, 2026 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant