Fix #1557: eagerly validate separator after non-root number values#1615
Draft
seonwooj0810 wants to merge 2 commits into
Draft
Fix #1557: eagerly validate separator after non-root number values#1615seonwooj0810 wants to merge 2 commits into
seonwooj0810 wants to merge 2 commits into
Conversation
…values Number values inside Arrays/Objects were only checked for a valid trailing separator at root level (cf. FasterXML#105 / FasterXML#679). Non-root values pushed the trailing character back unchecked, so malformed input such as `[ 123true ]`, `[ 100k ]` or `[ 1.5x ]` reported a number token and only failed lazily when accessing the *following* token. Generalize the existing root-only `_verifyRootSpace` into `_verifyNumberSeparator`, applied at every number-completion site in the three blocking parsers (Reader / UTF-8 stream / UTF-8 DataInput). A non-root number must now be followed by white space, `,`, `]`, `}`, a comment start (when comments are enabled) or end-of-input; otherwise an "expected space..." error is reported at the offending character. The check reuses the trailing character already read to detect the end of the number and is a constant-time switch, so the hot path cost is negligible. Tests: add non-root verification (and comment-adjacent cases) to `NonStandardNumberParsingTest` across ALL_MODES; remove the now-fixed `tofix` repros for the blocking parsers. Async (non-blocking) parser is intentionally left as a follow-up; its `tofix` repro remains under `@JacksonTestFailureExpected`.
77ab9b7 to
1ae30da
Compare
Contributor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1557
Problem
Number values inside Arrays / Objects (non-root) were not checked for a
valid trailing separator. Only root-level values were validated (added in #105,
extended in #679 for the second-decimal-point case). For non-root numbers the
parser simply pushed the trailing character back and returned the number token,
so malformed content failed lazily — only when the next token was read:
Approach
Generalize the existing root-only
_verifyRootSpaceinto a_verifyNumberSeparatorhelper, called at every number-completion site in thethree blocking parsers (
ReaderBasedJsonParser,UTF8StreamJsonParser,UTF8DataInputJsonParser). Root values keep the exact previous behavior; anon-root number must now be followed by one of:
' ','\t','\n','\r'),]}/or#when the correspondingJsonReadFeatureis enabledotherwise an
expected space, comma, or closing bracket/brace ...error isreported at the offending character (location aligned with the root path).
Performance: the check reuses the trailing character that the number loop
already reads to detect end-of-number (no extra buffer access / look-ahead),
and the accept/reject decision is a constant-time
switch. The hot path cost istherefore negligible, and the change keeps the same performance shape as the
existing
_verifyRootSpaceit generalizes.Scope / follow-up
This PR covers the three blocking parsers. The async (non-blocking)
parser still fails lazily for this case — its number state machine completes at
several different points (including buffer-boundary resumption) and deserves a
separate, carefully-tested change. Its reproduction remains under
tofix/async/AsyncTokenBranchNumberErrorTestwith@JacksonTestFailureExpected.@cowtowncoder — do you want async handled in this PR, or as a separate follow-up?
Happy to extend if preferred.
Tests
nonRootMangledIntegers1557/nonRootMangledFloats1557toNonStandardNumberParsingTest(verified acrossALL_MODES).nonRootNumberFollowedByComment1557to prove the/and#branches:numbers immediately adjacent to comments (
[1/* c */,2],[1// c\n,2],[1#c\n,2]) still parse when comments are enabled.tofixrepros for the blocking parsers (assertions movedinto the suite above), mirroring how Number parsing should fail for trailing dot (period) #679 was handled.
Full
./mvnw clean testis green.