Skip to content

fix: don't share datatype across recursive validator calls#100

Merged
nic-6443 merged 2 commits intoapi7:masterfrom
jarvis9443:fix/utf8len-on-non-string
Apr 23, 2026
Merged

fix: don't share datatype across recursive validator calls#100
nic-6443 merged 2 commits intoapi7:masterfrom
jarvis9443:fix/utf8len-on-non-string

Conversation

@jarvis9443
Copy link
Copy Markdown
Contributor

@jarvis9443 jarvis9443 commented Apr 23, 2026

What

The codegen stored each generated function's per-call datatype / datakind variables as fields on a shared locals table (introduced by #41 to dodge LuaJIT's 200-local limit). Because every generated schema validator is its own Lua function, recursing into a nested validator (e.g. via items or properties) overwrote the outer function's datatype with the inner value's type. After the recursion returned, the outer function would take the wrong branch.

Concrete crash

local jsonschema = require 'jsonschema'
local v = jsonschema.generate_validator({
  type = 'array', maxLength = 10, items = { type = 'string' }
})
v({ 'a', 'b' })
-- jsonschema.lua:386: bad argument #1 to 'str_byte' (string expected, got table)

The outer validator looped over string items (each call set datatype = 'string' on the shared locals table), and after the loop returned, if datatype == 'string' then was true, so the string-only maxLength branch ran on the array and called utf8_len on a table.

Fix

Use real local slots for datatype / datakind. Each generated validator function has its own 200-local budget, so the original LuaJIT workaround from #41 is unnecessary for these per-call vars. localvartab is still used for the cross-function validator forward declarations and the propset cache where shared storage is intentional.

Tests

Summary by CodeRabbit

  • Bug Fixes

    • Fixed validator recursion that could cause outer-type checks to run with incorrect state, preventing crashes when validating nested array/string schemas.
  • Tests

    • Added a regression test ensuring nested schema validation no longer shares per-call state and validates safely in recursive scenarios.

The codegen stored each generated function's per-call `datatype` /
`datakind` variables as fields on a shared `locals` table (introduced
by api7#41 to dodge LuaJIT's 200-local limit). Because every generated
schema validator is its own Lua function, recursing into a nested
validator (e.g. via `items` or `properties`) overwrote the outer
function's datatype with the inner value's type. After the recursion
returned, the outer function would take the wrong branch.

Concrete crash: a schema like

    { type = "array", maxLength = 10, items = { type = "string" } }

generated an outer validator that, after looping over string items, saw
`datatype == "string"` (the last item's type) and ran the
string-only `maxLength` check on the array value, calling `utf8_len`
on a table and crashing with 'string expected, got table'.

Fix: use real `local` slots for `datatype` / `datakind`. Each
generated validator function has its own 200-local budget, so the
original LuaJIT workaround is unnecessary for these per-call vars.
`localvartab` is still used for the cross-function validator forward
declarations (line 171) and propset cache (line 746) where shared
storage is intentional.

Adds a regression test in t/default.lua.
Copilot AI review requested due to automatic review settings April 23, 2026 11:37
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dafb53f5-ffc3-462a-8671-a5dee3c03209

📥 Commits

Reviewing files that changed from the base of the PR and between d963ff9 and 249a917.

📒 Files selected for processing (1)
  • t/default.lua
🚧 Files skipped from review as they are similar to previous changes (1)
  • t/default.lua

📝 Walkthrough

Walkthrough

Emits datatype and datakind as per-validator local variables (via ctx:localvar) in generate_validator to avoid shared-state mutation during recursive validator generation; adds a regression test ensuring recursive schemas don't corrupt per-call datatype state.

Changes

Cohort / File(s) Summary
Code generation fix
lib/jsonschema.lua
Change in generate_validator: emit datatype and datakind as actual per-validator local variables instead of entries in the shared locals table, preventing recursive generation from mutating outer-call state.
Regression test
t/default.lua
Add test verifying jsonschema.generate_validator handles a schema with type="array" and items={type="string"} without propagating datatype/datakind state across recursive validators (prevents crash when validating arrays).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main fix: preventing datatype variable sharing across recursive validator calls, which is the core issue addressed in the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
E2e Test Quality Review ✅ Passed Regression test properly validates the core bug fix where per-call datatype state is no longer shared across recursive validator calls, with clear assertions and excellent documentation.
Security Check ✅ Passed PR fixes a resource isolation bug where datatype and datakind variables were shared across recursive validator calls, causing state corruption. No sensitive data handling, credentials, auth logic, or TLS operations present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a codegen reentrancy bug where per-call datatype/datakind state was stored in a shared locals table, allowing nested validator recursion to clobber the caller’s type state and trigger wrong-branch validation (including crashes).

Changes:

  • Switch datatype / datakind in generated validators from ctx:localvartab(...) (shared table fields) to ctx:localvar(...) (true per-function locals).
  • Add a regression test covering the recursive clobbering scenario involving items recursion and maxLength.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
lib/jsonschema.lua Moves per-call type state to real locals to prevent recursion from mutating outer validator state.
t/default.lua Adds regression coverage for the recursive datatype clobbering crash scenario.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread t/default.lua Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@t/default.lua`:
- Around line 249-250: The test currently only checks that calling
pcall(validator, { "a", "b", "c" }) did not error (local ok, err = ...), which
can hide a validator failure (validator returns false, error_msg); update the
assertion to capture pcall success and the validator result separately (e.g.,
local pcall_ok, valid, val_err = pcall(validator, { "a", "b", "c" })) and then
assert both pcall_ok and valid, including both pcall error and validator error
in the failure message so you fail the test when validation returns false as
well as when pcall throws.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2748523f-4f76-43bb-a7d5-4e26b9c89cc6

📥 Commits

Reviewing files that changed from the base of the PR and between cf8fbad and d963ff9.

📒 Files selected for processing (2)
  • lib/jsonschema.lua
  • t/default.lua

Comment thread t/default.lua Outdated
…rrors

Per coderabbit nit: the previous `assert(ok, ...)` only caught pcall
crashes; if the validator returned `false, err` instead of erroring,
the test would silently pass.
@jarvis9443
Copy link
Copy Markdown
Contributor Author

Addressed in 249a917 — split the assertion to check pcall success and the validator's boolean result separately.

@nic-6443 nic-6443 merged commit 92b1ea2 into api7:master Apr 23, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants