Skip to content

Commit 4286a94

Browse files
authored
option to allow trailing characters while parsing (#439)
* option to allow trailing characters while parsing This adds an option `allowtrailing` to tolerate additional trailing characters in the buffer while parsing json. It is off by default, which keeps the parser strict and tries to parse the entire buffer as json. But when it is switched on, it allows parsing a valid json from the beginning of the buffer and ignore any additional following characters if they are present. This is useful in parsing scenarios that contain multiple json objects without a delimiter. E.g. `{"name": "value"}{"name": "value"}`. Or a json followed by other characters. E.g. `{"name": "value"} : this is...`. This also matches the pre 1.x behavior of this package. * use isroot instead * add docs, bump minor version
1 parent f4fbb5a commit 4286a94

4 files changed

Lines changed: 20 additions & 3 deletions

File tree

Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
name = "JSON"
22
uuid = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
3-
version = "1.4.0"
3+
version = "1.5.0"
44

55
[deps]
66
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"

src/lazy.jl

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ Currently supported keyword arguments include:
5959
- `inf::String = "Infinity"`: the string that will be used to parse `Inf` if `allownan=true`
6060
- `nan::String = "NaN"`: the string that will be sued to parse `NaN` if `allownan=true`
6161
- `jsonlines::Bool = false`: whether the JSON input should be treated as an implicit array, with newlines separating individual JSON elements with no leading `'['` or trailing `']'` characters. Common in logging or streaming workflows. Defaults to `true` when used with `JSON.parsefile` and the filename extension is `.jsonl` or `ndjson`. Note this ensures that parsing will _always_ return an array at the root-level.
62+
- `isroot::Bool = true`: whether this is the root LazyValue encompassing the entire json buffer. If `false` parses only the first JSON value and ignores trailing characters.
6263
6364
Note that validation is only fully done on `null`, `true`, and `false`,
6465
while other values are only lazily inferred from the first non-whitespace character:
@@ -88,7 +89,7 @@ lazyfile(file; jsonlines::Union{Bool, Nothing}=nothing, kw...) = open(io -> lazy
8889

8990
@doc (@doc lazy) lazyfile
9091

91-
function lazy(buf::Union{AbstractVector{UInt8}, AbstractString}; kw...)
92+
function lazy(buf::Union{AbstractVector{UInt8}, AbstractString}; isroot::Bool=true, kw...)
9293
if !applicable(pointer, buf, 1) || (buf isa AbstractVector{UInt8} && !isone(only(strides(buf))))
9394
if buf isa AbstractString
9495
buf = String(buf)
@@ -116,7 +117,7 @@ function lazy(buf::Union{AbstractVector{UInt8}, AbstractString}; kw...)
116117
# detect and ignore UTF-8 BOM
117118
pos = (len >= 3 && getbyte(buf, pos) == 0xef && getbyte(buf, pos + 1) == 0xbb && getbyte(buf, pos + 2) == 0xbf) ? pos + 3 : pos
118119
@nextbyte
119-
return _lazy(buf, pos, len, b, LazyOptions(; kw...), true)
120+
return _lazy(buf, pos, len, b, LazyOptions(; kw...), isroot)
120121

121122
@label invalid
122123
invalid(error, buf, pos, Any)

src/parse.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ Currently supported keyword arguments include:
1515
* `inf`: string to use for `Inf` (default: `"Infinity"`)
1616
* `nan`: string to use for `NaN` (default: `"NaN"`)
1717
* `jsonlines`: treat the `json` input as an implicit JSON array, delimited by newlines, each element being parsed from each row/line in the input
18+
* `isroot`: whether this is the root LazyValue encompassing the entire json buffer. If `false` parses only the first JSON value and ignores trailing characters. (default: `true`)
1819
* `dicttype`: a custom `AbstractDict` type to use instead of `$DEFAULT_OBJECT_TYPE` as the default type for JSON object materialization
1920
* `null`: a custom value to use for JSON null values (default: `nothing`)
2021
* `style`: a custom `StructUtils.StructStyle` subtype instance to be used in calls to `StructUtils.make` and `StructUtils.lift`. This allows overriding

test/parse.jl

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -771,3 +771,18 @@ end
771771
@test_throws ArgumentError JSON.parse("{}", Tuple{Int, Int, Int})
772772
@test_throws ArgumentError JSON.parse("{\"a\":1,\"b\":2}", Tuple{Int, Int, Int})
773773
end
774+
775+
@testset "isroot=false allows trailing" begin
776+
# default behavior: trailing content causes an error
777+
@test_throws ArgumentError JSON.parse("{\"hello\": \"world\"} asdaa")
778+
@test_throws ArgumentError JSON.parse("[1,2,3] extra")
779+
@test_throws ArgumentError JSON.parse("123 {}")
780+
781+
# isroot=false: trailing content is ignored
782+
@test JSON.parse("{\"hello\": \"world\"} asdaa", isroot=false) == JSON.Object("hello" => "world")
783+
@test JSON.parse("[1,2,3] extra", isroot=false) == Any[1, 2, 3]
784+
@test JSON.parse("123 {}", isroot=false) == 123
785+
786+
# isroot=false with typed parse
787+
@test JSON.parse("{\"a\": 1, \"b\": 2.0, \"c\": \"hi\"} trailing", D; isroot=false) == D(1, 2.0, "hi")
788+
end

0 commit comments

Comments
 (0)