loader: memoize includes to avoid re-expanding diamond include graphs#886
Open
davireis wants to merge 1 commit into
Open
loader: memoize includes to avoid re-expanding diamond include graphs#886davireis wants to merge 1 commit into
davireis wants to merge 1 commit into
Conversation
ApplyInclude re-parses and recursively re-expands an included file once per include path that reaches it. When the same file is reached through more than one path (a "diamond" in the include graph) this is quadratic-to-exponential: a 24-level doubling graph loads the leaf 2^24 times. Monorepos that aggregate per-target / per-project compose fragments hit this in practice (an ~80-service federation took ~55s in `docker compose config`). Memoize each loaded include model for the duration of a single load, keyed on every input that determines it — resolved paths, working dir, project dir, and effective environment — and hand out a deep copy on each hit. The merge into the parent (importResources) still runs for every occurrence, so a same-file `extends` in the including file still resolves and the result is identical to loading each time; only the parse + recursive expansion is shared. Keying on the working dir matters: the same file reached through two parents can have a different relative base, yielding models with different relative paths; reusing across bases would let the caller rebase an already-resolved path. Cycle-safe: an include cycle is intrinsic to a node's subtree, so it is detected on the node's first load, before it can be cached. Adds a deep-diamond regression test (times out without the cache) and a benchmark. Signed-off-by: Davi de Castro Reis <davi@davi.eng.br>
08775c8 to
e8f62dd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
ApplyIncludere-parses and recursively re-expands an included file once per include path that reaches it. When the same file is reachable through more than one path (a "diamond" in the include graph), this is quadratic-to-exponential: a 24-level doubling graph loads the leaf 2²⁴ ≈ 16.7M times.This shows up in real monorepos that aggregate per-target / per-project compose fragments via
include:. A federation of ~80 services took ~55s indocker compose config; the cost is re-expansion, not the graph size.Fix
Memoize each loaded include model for the duration of a single load (carried in
ctx, so it never leaks acrossLoadcalls). The cache key is every input that determines the model — resolved paths, working dir, project dir, and effective environment — and a deep copy is handed out on each hit.The merge into the parent (
importResources) still runs for every occurrence, so:extendsin the including file still resolves (the included content is present in each including scope), andCorrectness details
env_file/project_directory: folded into the key, so the same path included with a different environment or project dir does not share a cache entry.a/bvsb), which yields models with different relative paths. Keying on the working dir prevents reusing a model whose paths the caller would then rebase incorrectly.Tests
TestIncludeDiamondDedup: a depth-24 diamond that times out without the cache and completes in ~0.15s with it (both a correctness and a non-flaky perf-regression test).BenchmarkIncludeDiamond.go test ./...passes;gofmt/go vetclean.