Skip to content

fix: enforce expression size limit before source construction#1302

Open
thesmartshadow wants to merge 4 commits intogoogle:masterfrom
thesmartshadow:fix/early-expression-size-limit-enforcement
Open

fix: enforce expression size limit before source construction#1302
thesmartshadow wants to merge 4 commits intogoogle:masterfrom
thesmartshadow:fix/early-expression-size-limit-enforcement

Conversation

@thesmartshadow
Copy link
Copy Markdown

This PR enforces the configured ParserExpressionSizeLimit in Env.Compile()
and Env.Parse() before calling common.NewTextSource(), preventing memory
allocation proportional to the full input size for oversized expressions.

Problem

The current flow is:

  1. Env.Compile(txt)common.NewTextSource(txt)
  2. common.NewTextSource()runes.NewBufferAndLineOffsets(contents)
  3. runes.newBuffer() eagerly allocates capacity proportional to the full input
  4. Only afterwards does the parser check buf.Len() > p.expressionSizeCodePointLimit

Applications configuring ParserExpressionSizeLimit(1000) expect oversized
expressions to be rejected cheaply, but the internal source/rune-buffer
materialization still occurs before enforcement.

Fix

Add an early utf8.RuneCountInString(txt) check in both Compile() and
Parse() before calling common.NewTextSource().

Before / After

  • Before: 100MB compile → ~95MiB → 191MiB (returns size-limit error)
  • After: 100MB compile → ~95MiB → 95MiB (same error, no memory spike)

The test TestExpressionSizeLimitEarlyEnforcement confirms 0MiB allocation
delta for a 10MB oversized expression with a 1000 code point limit.

Enforce the configured ParserExpressionSizeLimit in Env.Compile() and
Env.Parse() before calling common.NewTextSource(), preventing memory
allocation proportional to the full input size for oversized expressions.

Previously, oversized expressions were correctly rejected but only after
the internal rune buffer had already been allocated. This allowed
substantial memory allocation even when a strict size limit was configured.

The fix adds an early utf8.RuneCountInString() check which is O(n) in CPU
but avoids the additional memory allocation and GC pressure caused by
eager source/rune-buffer construction.

CWE-400: Uncontrolled Resource Consumption
@thesmartshadow thesmartshadow force-pushed the fix/early-expression-size-limit-enforcement branch from 985387a to 9b5a353 Compare April 11, 2026 20:02
@TristonianJones TristonianJones requested a review from jcking April 12, 2026 22:52
@TristonianJones
Copy link
Copy Markdown
Collaborator

/gcbrun

@jcking
Copy link
Copy Markdown
Collaborator

jcking commented Apr 30, 2026

Decoding the entire string twice, seems not ideal. I'd rather we just add additional funcs near common.NewTextSource and friends to take the limit and return an additional error. First we can just check the string byte length (constant time) to see if its over, otherwise we can enforce it as we go when building the buffer.

@thesmartshadow
Copy link
Copy Markdown
Author

hi @jcking that makes sense.

I reworked the patch along those lines.

Instead of doing a separate full utf8.RuneCountInString() pass up front in Env.Compile / Env.Parse, the limit handling now lives closer to source construction.

The updated version adds limit-aware helpers near common.NewTextSource / NewStringSource, uses the cheap len(text) <= limit fast path, and otherwise enforces the code point limit incrementally while building the buffer / line offsets. It also keeps the existing default parser-limit behavior when no explicit limit is configured.

I reran the full test suite and the direct PoC as well. All tests are passing, and the compile-time memory spike is still gone with the updated approach.

Comment thread common/runes/buffer.go Outdated
@thesmartshadow
Copy link
Copy Markdown
Author

I dropped the capHint logic and just used len(data) directly for the buffer capacities. I also added a comment there to make it a bit more obvious why that is safe.

I reran the full test suite and the direct PoC after the change. Everything is still passing on my side and the memory behavior still looks good.

Comment thread common/runes/buffer.go
}

func newBuffer(data string, lines bool) (Buffer, []int32) {
func newBufferWithLimit(data string, lines bool, limit int) (Buffer, []int32, error) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still get here if len(data) > limit, which means we will allocate the buffer even though we know we are going to go past it. I would still add the following after this if statement.

if limit >= 0 && len(data) > limit {
		return nil, nil, &SizeLimitError{
			Size:  countRemainingCodePoints(data, 0, 0),
			Limit: limit,
		}
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I updated it so we return before any buffer allocation on the oversized path. I kept the code point count there so multibyte input still follows the same limit semantics.

I reran the tests after the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants