perf: replace execution-context map copies with a non-copying scope chain#378
Open
Richtermeister wants to merge 3 commits into
Open
perf: replace execution-context map copies with a non-copying scope chain#378Richtermeister wants to merge 3 commits into
Richtermeister wants to merge 3 commits into
Conversation
…chain
Child execution contexts (created by {% for %}, {% with %}, {% macro %},
and block.Super) previously copied the entire parent private map on every
creation, and each Execute merged the user context and set globals into a
fresh map. Both copies scaled O(N) with map size and ran on every render.
Replace both with a non-copying scope chain:
- ExecutionContext.Private changes from Context (map) to Scope, a method-only
handle backed by the ExecutionContext itself (no per-scope allocation). A
child layers its own data over the parent via the context chain; lookups
walk outward. The own layer is allocated lazily on first write. Delete uses
a tombstone to mask a parent key without mutating the parent, so delete and
range semantics are preserved.
- ExecutionContext.Public changes from Context (map) to PublicContext, a
read-only view over the user context and set globals. The user's map is used
directly (never copied); globals are a resolution fallback reached through
Get/Range. Method-only access means callers cannot bypass globals via direct
indexing, so resolution output is unchanged.
Rendering output is identical. The only peripheral change: globals are no
longer re-validated or macro-clash-checked on every Execute (only the user
context is).
Measured (BenchmarkExecuteLargeContext, 1000-key input map, template uses 3):
before: ~71us/op 82,720 B/op 21 allocs/op
after: ~29us/op 672 B/op 15 allocs/op
Context handling no longer scales with input-map size. Nested/block-heavy
templates render ~27% faster with ~25% less memory.
BREAKING CHANGE: custom tags must use methods instead of map indexing on
ExecutionContext.Private/Public (ctx.Private[k] -> ctx.Private.Get/Set, etc.).
See CHANGELOG.md.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The per-Execute clash check iterated the entire input context comparing each key against the template's exported macros. When the template exports no macros (the common case) there is nothing to clash with, so the O(N) scan is pure overhead. Guard it on len(tpl.exportedMacros) > 0. BenchmarkExecuteLargeContext (1000-key map, no exported macros): ~29us -> ~19us. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The per-Execute checkForValidIdentifiers scan byte-checks every key in the user context on every render. It is a usability guard — an invalid identifier key cannot be referenced from a template anyway — and its cost scales with the context size, which is wasteful for hot paths that render large, trusted maps. Add TemplateSet.SkipContextValidation (default false, validation enabled). When set, the check is skipped. Behavior is unchanged by default. BenchmarkExecuteLargeContext (1000-key map): ~19us/op with validation, ~0.65us/op with SkipContextValidation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Summary
This PR removes the map copies that the execution context performs on every render, replacing them with a non-copying scope chain. It is a performance change with identical rendering output; the only API impact is that
ExecutionContext.Private/Publicbecome method-accessed types instead of bare maps (mechanical, compiler-flagged migration — details below).Two copies were happening on every
Execute, both scalingO(N)with map size:Executemerge — the user context and the set's globals were merged into a freshly allocated map (newContextForExecution), copying every entry even though a template typically references only a few keys.{% for %},{% with %},{% macro %},block.Super) copied the entire parent private map. This compounds with nesting depth.Approach
ExecutionContext.Private:Context→Scope. A child layers its own variables over its parent via a chain walked on lookup; nothing is copied.Scopeis a lightweight handle backed by theExecutionContextitself, so there is no extra per-scope allocation, and the own layer is allocated lazily on first write.Deleteuses a tombstone to mask a parent key without mutating the parent, so delete/range semantics are fully preserved.ExecutionContext.Public:Context→PublicContext. A read-only view over the user context and the set globals. The caller's map is used directly, never copied; globals are a resolution fallback reached throughGet/Range. Because access is method-only, callers cannot bypass globals by indexing a map — so resolution results are unchanged.Private→ user context → globals.Two follow-on wins on the same hot path:
TemplateSet.SkipContextValidation(defaultfalse) to opt out of the per-Executeidentifier validation in hot paths with large, trusted contexts.Benchmarks
go test -bench . -benchmem. Existing benchmarks (small contexts), median of 3:ExecuteBlocksExecuteBlocksDeepExecuteComplexThe win scales with context size and nesting depth. A new
BenchmarkExecuteLargeContext(1000-key context, template references 3 keys) makes the structural cost visible:SkipContextValidationContext handling no longer scales with map size — bytes are flat at 672 B whether the context has 30 keys or 10,000.
Why this is safe
{% for %}/{% with %}/{% macro %}scoping,block.Super,{% include %}/{% ssi %}context passing, and delete/range semantics all behave identically. The full suite (2960+ tests) passes;go vetandgofmtare clean.go test -raceonlinux/arm64(CGO_ENABLED=1): the full suite and the concurrentBenchmarkParallelExecuteComplexreport no data races. The scope chain is per-ExecutionContext, so concurrent renders never share private state. The only newly-shared read isset.Globals, which is read-only during rendering (no tag writes globals) — safe for concurrent reads.SkipContextValidationdefaults tofalse, so existing behavior is unchanged unless explicitly opted in.Breaking change (mechanical, compiler-flagged)
Custom tags that touch
ExecutionContext.Private/Publicmust use methods instead of indexing. The compiler flags every site, so nothing fails silently:All built-in tags and the bundled docs are updated in this PR. A
CHANGELOG.mdentry documents the migration.One intentional behavior note: with the un-merge, the set's globals are no longer re-validated/macro-clash-checked on every
Execute(only the user context is). Globals are set once and effectively trusted infrastructure.🤖 Generated with Claude Code