Skip to content

Commit 14f74f6

Browse files
authored
Merge pull request #17 from clue-labs/docs
Improve documentation and add NDJSON format description
2 parents d7e3fc7 + e2847f7 commit 14f74f6

File tree

4 files changed

+84
-1
lines changed

4 files changed

+84
-1
lines changed

README.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ remote procedure call (RPC) mechanism.
2222

2323
**Table of contents**
2424

25+
* [NDJSON format](#ndjson-format)
2526
* [Usage](#usage)
2627
* [Decoder](#decoder)
2728
* [Encoder](#encoder)
@@ -30,6 +31,76 @@ remote procedure call (RPC) mechanism.
3031
* [License](#license)
3132
* [More](#more)
3233

34+
## NDJSON format
35+
36+
NDJSON ("Newline-Delimited JSON" or sometimes referred to as "JSON lines") is a
37+
very simple text-based format for storing a large number of records, such as a
38+
list of user records or log entries.
39+
40+
```JSON
41+
{"name":"Alice","age":30,"comment":"Yes, I like cheese"}
42+
{"name":"Bob","age":50,"comment":"Hello\nWorld!"}
43+
```
44+
45+
If you understand JSON and you're now looking at this newline-delimited JSON for
46+
the first time, you should already know everything you need to know to
47+
understand NDJSON: As the name implies, this format essentially consists of
48+
individual lines where each individual line is any valid JSON text and each line
49+
is delimited with a newline character.
50+
51+
This example uses a list of user objects where each user has some arbitrary
52+
properties. This can easily be adjusted for many different use cases, such as
53+
storing for example products instead of users, assigning additional properties
54+
or having a significantly larger number of records. You can edit NDJSON files in
55+
any text editor or use them in a streaming context where individual records
56+
should be processed. Unlike normal JSON files, adding a new log entry to this
57+
NDJSON file does not require modification of this file's structure (note there's
58+
no "outer array" to be modified). This makes it a perfect fit for a streaming
59+
context, for line-oriented CLI tools (such as `grep` and others) or for a logging
60+
context where you want to append records at a later time. Additionally, this
61+
also allows it to be used in a streaming context, such as a simple inter-process
62+
commmunication (IPC) protocol or for a remote procedure call (RPC) mechanism.
63+
64+
The newline character at the end of each line allows for some really simple
65+
*framing* (detecting individual records). While each individual line is valid
66+
JSON, the complete file as a whole is technically no longer valid JSON, because
67+
it contains multiple JSON texts. This implies that for example calling PHP's
68+
`json_decode()` on this complete input would fail because it would try to parse
69+
multiple records at once. Likewise, using "pretty printing" JSON
70+
(`JSON_PRETTY_PRINT`) is not allowed because each JSON text is limited to exactly
71+
one line. On the other hand, values containing newline characters (such as the
72+
`comment` property in the above example) do not cause issues because each newline
73+
within a JSON string will be represented by a `\n` instead.
74+
75+
One common alternative to NDJSON would be Comma-Separated Values (CSV).
76+
If you want to process CSV files, you may want to take a look at the related
77+
project [clue/reactphp-csv](https://github.com/clue/reactphp-csv) instead:
78+
79+
```
80+
name,age,comment
81+
Alice,30,"Yes, I like cheese"
82+
Bob,50,"Hello
83+
World!"
84+
```
85+
86+
CSV may look slightly simpler, but this simplicity comes at a price. CSV is
87+
limited to untyped, two-dimensional data, so there's no standard way of storing
88+
any nested structures or to differentiate a boolean value from a string or
89+
integer. Field names are sometimes used, sometimes they're not
90+
(application-dependant). Inconsistent handling for fields that contain
91+
separators such as `,` or spaces or line breaks (see the `comment` field above)
92+
introduce additional complexity and its text encoding is usually undefined,
93+
Unicode (or UTF-8) is unlikely to be supported and CSV files often use ISO
94+
8859-1 encoding or some variant (again application-dependant).
95+
96+
While NDJSON helps avoiding many of CVS's shortcomings, it is still a
97+
(relatively) young format while CSV files have been used in production systems
98+
for decades. This means that if you want to interface with an existing system,
99+
you may have to rely on the format that's already supported. If you're building
100+
a new system, using NDJSON is an excellent choice as it provides a flexible way
101+
to process individual records using a common text-based format that can include
102+
any kind of structured data.
103+
33104
## Usage
34105

35106
### Decoder
@@ -271,3 +342,7 @@ This project is released under the permissive [MIT license](LICENSE).
271342
* If you want to concurrently process the records from your NDJSON stream,
272343
you may want to use [clue/reactphp-flux](https://github.com/clue/reactphp-flux)
273344
to concurrently process many (but not too many) records at once.
345+
346+
* If you want to process structured data in the more common text-based format,
347+
you may want to use [clue/reactphp-csv](https://github.com/clue/reactphp-csv)
348+
to process Comma-Separated-Values (CSV) files (`.csv` file extension).

examples/91-benchmark-count.php

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
<?php
22

3+
// simple usage:
4+
// $ php examples/91-benchmark-count.php < examples/users.ndjson
5+
//
6+
// getting reasonable results requires a large data set:
37
// 1) download a large CSV/TSV dataset, for example:
48
// @link https://datasets.imdbws.com/
59
// @link https://github.com/fivethirtyeight/russian-troll-tweets
@@ -8,7 +12,7 @@
812
// @link https://github.com/clue/reactphp-csv/blob/v1.0.0/examples/11-csv2ndjson.php
913
//
1014
// 3) pipe NDJSON into benchmark script:
11-
// $ examples/91-benchmark-count.php < title.ratings.ndjson
15+
// $ php examples/91-benchmark-count.php < title.ratings.ndjson
1216

1317
use Clue\React\NDJson\Decoder;
1418
use React\EventLoop\Factory;

examples/users.ndjson

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
{"name":"Alice","age":30,"comment":"Yes, I like cheese"}
2+
{"name":"Bob","age":50,"comment":"Hello\nWorld!"}

examples/validate.php

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
<?php
22

3+
// $ php examples/validate.php < examples/users.ndjson
4+
35
use React\EventLoop\Factory;
46
use React\Stream\ReadableResourceStream;
57
use React\Stream\WritableResourceStream;

0 commit comments

Comments
 (0)