@@ -22,6 +22,7 @@ remote procedure call (RPC) mechanism.
2222
2323** Table of contents**
2424
25+ * [ NDJSON format] ( #ndjson-format )
2526* [ Usage] ( #usage )
2627 * [ Decoder] ( #decoder )
2728 * [ Encoder] ( #encoder )
@@ -30,6 +31,76 @@ remote procedure call (RPC) mechanism.
3031* [ License] ( #license )
3132* [ More] ( #more )
3233
34+ ## NDJSON format
35+
36+ NDJSON ("Newline-Delimited JSON" or sometimes referred to as "JSON lines") is a
37+ very simple text-based format for storing a large number of records, such as a
38+ list of user records or log entries.
39+
40+ ``` JSON
41+ {"name" :" Alice" ,"age" :30 ,"comment" :" Yes, I like cheese" }
42+ {"name" :" Bob" ,"age" :50 ,"comment" :" Hello\n World!" }
43+ ```
44+
45+ If you understand JSON and you're now looking at this newline-delimited JSON for
46+ the first time, you should already know everything you need to know to
47+ understand NDJSON: As the name implies, this format essentially consists of
48+ individual lines where each individual line is any valid JSON text and each line
49+ is delimited with a newline character.
50+
51+ This example uses a list of user objects where each user has some arbitrary
52+ properties. This can easily be adjusted for many different use cases, such as
53+ storing for example products instead of users, assigning additional properties
54+ or having a significantly larger number of records. You can edit NDJSON files in
55+ any text editor or use them in a streaming context where individual records
56+ should be processed. Unlike normal JSON files, adding a new log entry to this
57+ NDJSON file does not require modification of this file's structure (note there's
58+ no "outer array" to be modified). This makes it a perfect fit for a streaming
59+ context, for line-oriented CLI tools (such as ` grep ` and others) or for a logging
60+ context where you want to append records at a later time. Additionally, this
61+ also allows it to be used in a streaming context, such as a simple inter-process
62+ commmunication (IPC) protocol or for a remote procedure call (RPC) mechanism.
63+
64+ The newline character at the end of each line allows for some really simple
65+ * framing* (detecting individual records). While each individual line is valid
66+ JSON, the complete file as a whole is technically no longer valid JSON, because
67+ it contains multiple JSON texts. This implies that for example calling PHP's
68+ ` json_decode() ` on this complete input would fail because it would try to parse
69+ multiple records at once. Likewise, using "pretty printing" JSON
70+ (` JSON_PRETTY_PRINT ` ) is not allowed because each JSON text is limited to exactly
71+ one line. On the other hand, values containing newline characters (such as the
72+ ` comment ` property in the above example) do not cause issues because each newline
73+ within a JSON string will be represented by a ` \n ` instead.
74+
75+ One common alternative to NDJSON would be Comma-Separated Values (CSV).
76+ If you want to process CSV files, you may want to take a look at the related
77+ project [ clue/reactphp-csv] ( https://github.com/clue/reactphp-csv ) instead:
78+
79+ ```
80+ name,age,comment
81+ Alice,30,"Yes, I like cheese"
82+ Bob,50,"Hello
83+ World!"
84+ ```
85+
86+ CSV may look slightly simpler, but this simplicity comes at a price. CSV is
87+ limited to untyped, two-dimensional data, so there's no standard way of storing
88+ any nested structures or to differentiate a boolean value from a string or
89+ integer. Field names are sometimes used, sometimes they're not
90+ (application-dependant). Inconsistent handling for fields that contain
91+ separators such as ` , ` or spaces or line breaks (see the ` comment ` field above)
92+ introduce additional complexity and its text encoding is usually undefined,
93+ Unicode (or UTF-8) is unlikely to be supported and CSV files often use ISO
94+ 8859-1 encoding or some variant (again application-dependant).
95+
96+ While NDJSON helps avoiding many of CVS's shortcomings, it is still a
97+ (relatively) young format while CSV files have been used in production systems
98+ for decades. This means that if you want to interface with an existing system,
99+ you may have to rely on the format that's already supported. If you're building
100+ a new system, using NDJSON is an excellent choice as it provides a flexible way
101+ to process individual records using a common text-based format that can include
102+ any kind of structured data.
103+
33104## Usage
34105
35106### Decoder
@@ -271,3 +342,7 @@ This project is released under the permissive [MIT license](LICENSE).
271342* If you want to concurrently process the records from your NDJSON stream,
272343 you may want to use [ clue/reactphp-flux] ( https://github.com/clue/reactphp-flux )
273344 to concurrently process many (but not too many) records at once.
345+
346+ * If you want to process structured data in the more common text-based format,
347+ you may want to use [ clue/reactphp-csv] ( https://github.com/clue/reactphp-csv )
348+ to process Comma-Separated-Values (CSV) files (` .csv ` file extension).
0 commit comments