Commit 69d0f44
authored
Support JSON arrays reader/parse for datafusion (#19924)
## Which issue does this PR close?
Closes #19920
## Rationale for this change
DataFusion currently only supports line-delimited JSON (NDJSON) format.
Many data sources provide JSON in array format `[{...}, {...}]`, which
cannot be parsed by the existing implementation.
## What changes are included in this PR?
- Add `newline_delimited` option to `JsonOptions` (default `true` for
backward compatibility)
- Implement streaming JSON array to NDJSON conversion via
`JsonArrayToNdjsonReader`
- Support both file-based and stream-based (e.g., S3) reading with
memory-efficient streaming
- Add `ChannelReader` for async-to-sync byte transfer in object store
streaming scenarios
- Add protobuf serialization support for the new option
- Rename `NdJsonReadOptions` to `JsonReadOptions` (with deprecation
alias)
- SQL support via `OPTIONS ('format.newline_delimited' 'false')`
### Architecture
```text
JSON Array File (e.g., 33GB)
│
▼ read chunks via ChannelReader (for streams) or BufReader (for files)
┌───────────────────┐
│ JsonArrayToNdjson │ ← streaming character substitution:
│ Reader │ '[' skip, ',' → '\n', ']' stop
└───────────────────┘
│
▼ outputs NDJSON format
┌───────────────────┐
│ Arrow Reader │ ← batch parsing
└───────────────────┘
│
▼ RecordBatch
```
### Memory Efficiency
| Approach | Memory for 33GB file | Parse count |
|----------|---------------------|-------------|
| Load entire file + serde_json | ~100GB+ | 3x |
| Streaming with JsonArrayToNdjsonReader | ~32MB | 1x |
## Are these changes tested?
Yes:
- Unit tests for `JsonArrayToNdjsonReader` (nested objects, escaped
strings, empty arrays, buffer boundaries)
- Unit tests for `ChannelReader`
- Integration tests for `JsonOpener` (file-based, stream-based, large
files, cancellation)
- Schema inference tests (normal, empty, nested struct, list types)
- End-to-end query tests with SQL
- SQLLogicTest for SQL validation
## Are there any user-facing changes?
Yes. Users can now read JSON array format files:
**Via SQL:**
```sql
CREATE EXTERNAL TABLE my_table
STORED AS JSON
OPTIONS ('format.newline_delimited' 'false')
LOCATION 'path/to/array.json';
```
**Via API:**
```rust
let options = JsonReadOptions::default().newline_delimited(false);
ctx.register_json("my_table", "path/to/array.json", options).await?;
```
**Note:** `NdJsonReadOptions` is deprecated in favor of
`JsonReadOptions`.
**Limitation:** JSON array format does not support range-based file
scanning (`repartition_file_scans`). Users will see a clear error
message if this is attempted.1 parent dff1cad commit 69d0f44
File tree
28 files changed
+1847
-85
lines changed- datafusion-examples/examples/custom_data_source
- datafusion
- common/src
- core
- src
- dataframe
- datasource
- file_format
- listing
- physical_plan
- execution/context
- tests
- dataframe
- data
- datasource-json
- src
- proto-common
- proto
- src
- from_proto
- generated
- to_proto
- proto
- src
- generated
- logical_plan
- tests/cases
- sqllogictest/test_files
28 files changed
+1847
-85
lines changedSome generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
188 | 188 | | |
189 | 189 | | |
190 | 190 | | |
| 191 | + | |
| 192 | + | |
191 | 193 | | |
192 | 194 | | |
193 | 195 | | |
| |||
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
125 | 125 | | |
126 | 126 | | |
127 | 127 | | |
| 128 | + | |
128 | 129 | | |
129 | 130 | | |
130 | 131 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3065 | 3065 | | |
3066 | 3066 | | |
3067 | 3067 | | |
| 3068 | + | |
| 3069 | + | |
| 3070 | + | |
| 3071 | + | |
| 3072 | + | |
| 3073 | + | |
| 3074 | + | |
| 3075 | + | |
| 3076 | + | |
| 3077 | + | |
| 3078 | + | |
| 3079 | + | |
| 3080 | + | |
| 3081 | + | |
| 3082 | + | |
| 3083 | + | |
3068 | 3084 | | |
3069 | 3085 | | |
3070 | 3086 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
512 | 512 | | |
513 | 513 | | |
514 | 514 | | |
515 | | - | |
| 515 | + | |
516 | 516 | | |
517 | 517 | | |
518 | 518 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
28 | | - | |
| 28 | + | |
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
| 49 | + | |
49 | 50 | | |
| 51 | + | |
50 | 52 | | |
51 | 53 | | |
52 | 54 | | |
53 | 55 | | |
54 | 56 | | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
55 | 97 | | |
56 | 98 | | |
57 | 99 | | |
| |||
208 | 250 | | |
209 | 251 | | |
210 | 252 | | |
211 | | - | |
| 253 | + | |
212 | 254 | | |
213 | 255 | | |
214 | 256 | | |
| |||
240 | 282 | | |
241 | 283 | | |
242 | 284 | | |
243 | | - | |
| 285 | + | |
244 | 286 | | |
245 | 287 | | |
246 | 288 | | |
| |||
314 | 356 | | |
315 | 357 | | |
316 | 358 | | |
317 | | - | |
318 | 359 | | |
319 | 360 | | |
320 | 361 | | |
| |||
354 | 395 | | |
355 | 396 | | |
356 | 397 | | |
357 | | - | |
| 398 | + | |
| 399 | + | |
358 | 400 | | |
359 | 401 | | |
360 | 402 | | |
361 | | - | |
362 | 403 | | |
363 | 404 | | |
364 | 405 | | |
| |||
381 | 422 | | |
382 | 423 | | |
383 | 424 | | |
384 | | - | |
| 425 | + | |
| 426 | + | |
385 | 427 | | |
386 | 428 | | |
387 | 429 | | |
388 | | - | |
389 | 430 | | |
390 | 431 | | |
391 | 432 | | |
392 | 433 | | |
393 | 434 | | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
394 | 637 | | |
0 commit comments