fix: address QA defects found by 10-spec public-API corpus#8
Conversation
Multi-spec QA against 10 widely-adopted public OpenAPI specs (github, twilio, box, cloudflare, plaid, openai, square, discourse, adyen-checkout, notion — 13,148 cases) surfaced four validator-side defects plus one jsonschema-side defect (the latter fixed in api7/jsonschema#100). This commit fixes all four validator-side bugs and adds regression coverage. Bug 1 — router: path parameter dropped when followed by literal suffix ==================================================================== lua-resty-radixtree's fetch_pat() treats any '/'-segment whose first byte is ':' as a single-segment param, discarding the literal suffix. Templates like '/users/{id}.json' or '/files/{name}.{ext}' became regex 'users/([^/]+)' with the param stored under name 'id.json' (or 'name}.{ext}'), so match() looked up the param under its declared name and found nil. Effect on the corpus: 195/197 Twilio operations and ~30 Discourse operations could never validate a positive request. Fix: detect mixed segments at compile time, build a per-route PCRE '^(prefix)<literal>(?P<id>[^/]+?)<literal>...$' and use ngx.re.match to re-extract path params after radixtree returns. Non-greedy '+?' is critical for '{name}.{ext}' patterns. base_paths are stripped so the PCRE only sees the spec-relative path. Bug 2 — normalize: nullable enum with explicit null silently disables ==================================================================== For schemas like '{type: string, nullable: true, enum: ["a", null]}', normalize wrapped them as 'anyOf [original, {type: null}]' but kept cjson.null inside the cloned enum. The cjson.null userdata in the enum array caused api7/jsonschema to silently skip the enum check entirely — any string value passed (Plaid corpus: 12 false negatives). Fix: when wrapping nullable schemas, strip cjson.null from the cloned enum (and clear const if it equals cjson.null). The added '{type:null}' branch already permits null, so the cjson.null entries were redundant. Bug 4 — params: pipeDelimited / spaceDelimited / form?explode=false ==================================================================== The array-style branches called split() on raw_value without checking whether it was already a Lua table (multi-value query input). Crashed with 'attempt to index a table value' on Box's deepObject-adjacent operations (48 crashes). Fix: coerce table raw_value to a delimiter-joined string before splitting. The form/explode=true branch was already table-safe. Bug 5 — body: cjson.null content_type crash ========================================== 'if route.body_content and content_type then' accepted cjson.null because userdata is truthy in Lua, then dereferenced it as a string. Easy to hit when callers decode requests via cjson.decode and pass fields through without sanitization. Fix: guard with 'type(content_type) == "string"'. Verification ============ Re-ran the full 13,148-case corpus with these fixes. Highlights: - discourse positive: 62/93 → 91/93 (Bug 1) - box positive: 246/296 → 277/296; crashes 48 → 0 (Bug 4) - cloudflare crashes: 17 → 0 (Bug 3, via jsonschema#100) - plaid bad_enum_body: 42/54 → 54/54 (Bug 2) - All existing unit + conformance tests still pass. Tests ===== Adds t/conformance/test_qa_bugs_v103.lua with one focused case per bug (compile-and-validate, no real HTTP).
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 5 minutes and 15 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThe PR fixes multiple issues: content-type validation crashes when non-string values are encountered, nullable schemas with enum/const containing null sentinels are incorrectly normalized, array parameters arriving as tables in delimited styles lack proper coercion, and path templates with mixed literal text and parameters are not matched accurately. A new conformance test suite validates all fixes. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Router
participant Radixtree
participant PCRE
participant ParamExtractor
Client->>Router: Request with path /users/{id}.json
Router->>Radixtree: Match path against tree
Radixtree-->>Router: Return route candidate (with base_path)
Note over Router: Detect mixed segment<br/>(contains {param} + literal suffix)
Router->>PCRE: Execute template-specific PCRE<br/>pattern with path
PCRE-->>Router: Capture groups with<br/>parameter mappings
alt PCRE Match Success
Router->>ParamExtractor: Extract params from<br/>PCRE captures
ParamExtractor-->>Router: Return accurate<br/>parameter values
Router-->>Client: Route match + params
else PCRE Match Fails
Router-->>Client: No match (override<br/>radixtree result)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
Fixes validator defects discovered by running a large public OpenAPI corpus, improving routing correctness, schema normalization, and parameter/body handling; adds targeted regression tests.
Changes:
- Router: add mixed-segment detection and PCRE-based re-extraction of path params for templates like
/users/{id}.json. - Normalization: ensure nullable enum/const normalization doesn’t leak
cjson.nullinto jsonschema validation. - Params/body: make delimiter-style array params table-safe; add a guard for non-string
content_type; add conformance regression tests.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
t/conformance/test_qa_bugs_v103.lua |
Adds regression cases covering the QA-found defects. |
lib/resty/openapi_validator/router.lua |
Adds mixed-segment detection and fallback param extraction via ngx.re.match. |
lib/resty/openapi_validator/params.lua |
Coerces table query values for delimiter-based array styles before splitting. |
lib/resty/openapi_validator/normalize.lua |
Cleans cjson.null out of nullable enum and adjusts nullable wrapping behavior. |
lib/resty/openapi_validator/body.lua |
Adds a type guard around the content-type declared-in-spec check. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/resty/openapi_validator/body.lua`:
- Around line 363-367: The current guard only checks content_type before one
str_lower call, but non-string content_type can still reach other str_lower
usages (e.g., later checks that assume a string). Sanitize content_type once at
the start of the validation scope (e.g., in the function that references
route.body_content) by replacing non-strings with nil: if type(content_type) ~=
"string" then content_type = nil end; then use the sanitized content_type for
all subsequent calls to str_lower and helper functions so no non-string value
ever reaches string.lower (refer to the variables content_type,
route.body_content and the str_lower calls).
In `@lib/resty/openapi_validator/router.lua`:
- Around line 323-338: Remove the unused local ngx_re and the unused err from
the fallback matcher: delete the unused require("ngx.re") assignment (remove the
ngx_re local) and change the re_match call that currently does "local mm, err =
re_match(rel, route.param_pcre, 'jo')" to either capture only mm or use a
throwaway variable (e.g., local mm, _ = re_match(...)) so that err is not an
unused local; this touches the fallback matcher that references re_match and
route.param_pcre.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2bf2b5ef-dcc9-4b71-b156-055394f5b922
📒 Files selected for processing (5)
lib/resty/openapi_validator/body.lualib/resty/openapi_validator/normalize.lualib/resty/openapi_validator/params.lualib/resty/openapi_validator/router.luat/conformance/test_qa_bugs_v103.lua
- body.lua: normalize non-string content_type to nil at the top of
validate() so all downstream sites (find_body_schema_for_content_type,
is_json_content_type, ...) treat it like an absent header. The
previous spot-fix only guarded the 'declared in spec' check.
- normalize.lua: previously, nullable+const-equals-null cleared the
const and produced an empty 'original' branch in the anyOf wrapper,
which would accept ANY value. Collapse to {type:'null'} instead so
the schema correctly accepts only null. Same collapse applied when
cleaning enum leaves an empty array (i.e. the enum was [null]).
- t/conformance/test_qa_bugs_v103.lua: the Bug 5 case used to early-
return on empty body, missing the actual crash path. Now sends a
non-empty body so content_type is evaluated. Added Bug 2b case for
the nullable+const=null collapse.
|
Addressed in d7e6c85:
|
Summary
Multi-spec QA against 10 widely-adopted public OpenAPI specs (github, twilio, box, cloudflare, plaid, openai, square, discourse, adyen-checkout, notion — 13,148 cases) surfaced four validator-side defects plus one jsonschema-side defect (the latter is fixed in api7/jsonschema#100). This PR fixes all four validator-side bugs and adds regression coverage.
QA report (with verification numbers):
api7ee/qa/lua-resty-openapi-validator-v1.0.3.md.Bugs fixed
Bug 1 — router: path parameter dropped when followed by a literal suffix
lua-resty-radixtree'sfetch_pat()treats any/-segment whose first byte is:as a single-segment param, discarding the literal suffix. Templates like/users/{id}.jsonor/files/{name}.{ext}became regexusers/([^/]+)with the param stored under nameid.json(orname}.{ext}), somatch()looked up the param under its declared name and got nil.Effect on corpus: 195/197 Twilio operations and ~30 Discourse operations could never validate a positive request.
Fix: detect mixed segments at compile time, build a per-route PCRE
^(prefix)<literal>(?P<id>[^/]+?)<literal>...$and usengx.re.matchto re-extract path params after radixtree returns. Non-greedy+?is critical for{name}.{ext}patterns.base_pathsare stripped so the PCRE only sees the spec-relative path.Bug 2 — normalize: nullable enum with explicit
nullsilently disables the enum checkFor schemas like
{type: string, nullable: true, enum: ["a", null]}, normalize wrapped them asanyOf [original, {type: null}]but keptcjson.nullinside the cloned enum. Thecjson.nulluserdata in the enum array causedapi7/jsonschemato silently skip the enum check entirely — any string value passed (Plaid corpus: 12 false negatives).Fix: when wrapping nullable schemas, strip
cjson.nullfrom the cloned enum (and clearconstif it equalscjson.null). The added{type: null}branch already permits null, so thecjson.nullentries were redundant.Bug 4 — params: pipeDelimited / spaceDelimited / form?explode=false crash on table value
The array-style branches called
split()onraw_valuewithout checking whether it was already a Lua table (multi-value query input). Crashed withattempt to index a table valueon Box's deepObject-adjacent operations (48 crashes).Fix: coerce table
raw_valueto a delimiter-joined string before splitting. Theform/explode=truebranch was already table-safe.Bug 5 — body:
cjson.nullcontent_type crashif route.body_content and content_type thenacceptedcjson.nullbecause userdata is truthy in Lua, then dereferenced it as a string. Easy to hit when callers decode requests viacjson.decodeand pass fields through without sanitization.Fix: guard with
type(content_type) == "string".Verification
Re-ran the full 13,148-case corpus with these fixes (out/*.fixed.results.jsonl). Highlights:
Twilio positive remains at 2/197, but the failures changed from "path param missing" to legitimate "value too short for minLength: 34" — that's a test-data artefact (Twilio SIDs are 34-char prefixed IDs; the generator emits
"1"), not a validator defect.All existing unit + conformance tests still pass.
Tests
Adds
t/conformance/test_qa_bugs_v103.luawith one focused case per bug (compile-and-validate, no real HTTP).Summary by CodeRabbit
Bug Fixes
Tests