Skip to content

perf: replace execution-context map copies with a non-copying scope chain#378

Open
Richtermeister wants to merge 3 commits into
flosch:masterfrom
TheDMSGroup:perf/context-scope-chain
Open

perf: replace execution-context map copies with a non-copying scope chain#378
Richtermeister wants to merge 3 commits into
flosch:masterfrom
TheDMSGroup:perf/context-scope-chain

Conversation

@Richtermeister

Copy link
Copy Markdown
Contributor

Summary

This PR removes the map copies that the execution context performs on every render, replacing them with a non-copying scope chain. It is a performance change with identical rendering output; the only API impact is that ExecutionContext.Private/Public become method-accessed types instead of bare maps (mechanical, compiler-flagged migration — details below).

Two copies were happening on every Execute, both scaling O(N) with map size:

  1. Per-Execute merge — the user context and the set's globals were merged into a freshly allocated map (newContextForExecution), copying every entry even though a template typically references only a few keys.
  2. Per-child copy — every child context ({% for %}, {% with %}, {% macro %}, block.Super) copied the entire parent private map. This compounds with nesting depth.

Approach

  • ExecutionContext.Private: ContextScope. A child layers its own variables over its parent via a chain walked on lookup; nothing is copied. Scope is a lightweight handle backed by the ExecutionContext itself, so there is no extra per-scope allocation, and the own layer is allocated lazily on first write. Delete uses a tombstone to mask a parent key without mutating the parent, so delete/range semantics are fully preserved.
  • ExecutionContext.Public: ContextPublicContext. A read-only view over the user context and the set globals. The caller's map is used directly, never copied; globals are a resolution fallback reached through Get/Range. Because access is method-only, callers cannot bypass globals by indexing a map — so resolution results are unchanged.
  • Precedence is preserved exactly: Private → user context → globals.

Two follow-on wins on the same hot path:

  • Skip the macro-clash scan entirely when the template exports no macros.
  • New TemplateSet.SkipContextValidation (default false) to opt out of the per-Execute identifier validation in hot paths with large, trusted contexts.

Benchmarks

go test -bench . -benchmem. Existing benchmarks (small contexts), median of 3:

Benchmark Before After
ExecuteBlocks 907 ns, 1365 B, 13 allocs 650 ns, 1045 B, 11 allocs
ExecuteBlocksDeep 1596 ns, 2360 B, 22 allocs 1170 ns, 1720 B, 18 allocs
ExecuteComplex 15110 ns, 6560 B, 230 allocs 14950 ns, 6243 B, 228 allocs

The win scales with context size and nesting depth. A new BenchmarkExecuteLargeContext (1000-key context, template references 3 keys) makes the structural cost visible:

Stage Time/op Bytes/op
Original (copy + validate + clash) ~71 µs 82,720 B
+ scope chain (no copy) ~29 µs 672 B
+ skip clash when no macros ~19 µs 672 B
+ SkipContextValidation ~0.65 µs 672 B

Context handling no longer scales with map size — bytes are flat at 672 B whether the context has 30 keys or 10,000.

Why this is safe

  • Rendering output is unchanged. Lookup precedence, variable shadowing, {% for %}/{% with %}/{% macro %} scoping, block.Super, {% include %}/{% ssi %} context passing, and delete/range semantics all behave identically. The full suite (2960+ tests) passes; go vet and gofmt are clean.
  • Verified race-clean with go test -race on linux/arm64 (CGO_ENABLED=1): the full suite and the concurrent BenchmarkParallelExecuteComplex report no data races. The scope chain is per-ExecutionContext, so concurrent renders never share private state. The only newly-shared read is set.Globals, which is read-only during rendering (no tag writes globals) — safe for concurrent reads.
  • SkipContextValidation defaults to false, so existing behavior is unchanged unless explicitly opted in.

Breaking change (mechanical, compiler-flagged)

Custom tags that touch ExecutionContext.Private/Public must use methods instead of indexing. The compiler flags every site, so nothing fails silently:

ctx.Private[k]            → ctx.Private.Get(k)   // returns (any, bool)
ctx.Private[k] = vctx.Private.Set(k, v)
delete(ctx.Private, k)    → ctx.Private.Delete(k)
range ctx.Privatectx.Private.Range(func(k string, v any) bool { ... })
ctx.Public[k]             → ctx.Public.Get(k)    // includes globals

All built-in tags and the bundled docs are updated in this PR. A CHANGELOG.md entry documents the migration.

One intentional behavior note: with the un-merge, the set's globals are no longer re-validated/macro-clash-checked on every Execute (only the user context is). Globals are set once and effectively trusted infrastructure.

🤖 Generated with Claude Code

Richtermeister and others added 3 commits June 7, 2026 16:12
…chain

Child execution contexts (created by {% for %}, {% with %}, {% macro %},
and block.Super) previously copied the entire parent private map on every
creation, and each Execute merged the user context and set globals into a
fresh map. Both copies scaled O(N) with map size and ran on every render.

Replace both with a non-copying scope chain:

- ExecutionContext.Private changes from Context (map) to Scope, a method-only
  handle backed by the ExecutionContext itself (no per-scope allocation). A
  child layers its own data over the parent via the context chain; lookups
  walk outward. The own layer is allocated lazily on first write. Delete uses
  a tombstone to mask a parent key without mutating the parent, so delete and
  range semantics are preserved.
- ExecutionContext.Public changes from Context (map) to PublicContext, a
  read-only view over the user context and set globals. The user's map is used
  directly (never copied); globals are a resolution fallback reached through
  Get/Range. Method-only access means callers cannot bypass globals via direct
  indexing, so resolution output is unchanged.

Rendering output is identical. The only peripheral change: globals are no
longer re-validated or macro-clash-checked on every Execute (only the user
context is).

Measured (BenchmarkExecuteLargeContext, 1000-key input map, template uses 3):
  before: ~71us/op  82,720 B/op  21 allocs/op
  after:  ~29us/op       672 B/op  15 allocs/op
Context handling no longer scales with input-map size. Nested/block-heavy
templates render ~27% faster with ~25% less memory.

BREAKING CHANGE: custom tags must use methods instead of map indexing on
ExecutionContext.Private/Public (ctx.Private[k] -> ctx.Private.Get/Set, etc.).
See CHANGELOG.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The per-Execute clash check iterated the entire input context comparing each
key against the template's exported macros. When the template exports no
macros (the common case) there is nothing to clash with, so the O(N) scan is
pure overhead. Guard it on len(tpl.exportedMacros) > 0.

BenchmarkExecuteLargeContext (1000-key map, no exported macros): ~29us -> ~19us.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The per-Execute checkForValidIdentifiers scan byte-checks every key in the
user context on every render. It is a usability guard — an invalid identifier
key cannot be referenced from a template anyway — and its cost scales with the
context size, which is wasteful for hot paths that render large, trusted maps.

Add TemplateSet.SkipContextValidation (default false, validation enabled). When
set, the check is skipped. Behavior is unchanged by default.

BenchmarkExecuteLargeContext (1000-key map): ~19us/op with validation,
~0.65us/op with SkipContextValidation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@sonarqubecloud

sonarqubecloud Bot commented Jun 7, 2026

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant