|
2 | 2 |
|
3 | 3 | Streaming TSV (Tab-Separated Values) parser and encoder for [ReactPHP](https://reactphp.org/). |
4 | 4 |
|
| 5 | +**Table of contents** |
| 6 | + |
| 7 | +* [Support us](#support-us) |
| 8 | +* [Quickstart example](#quickstart-example) |
| 9 | +* [TSV format](#tsv-format) |
| 10 | +* [Usage](#usage) |
| 11 | + * [TsvDecoder](#tsvdecoder) |
| 12 | + * [TsvEncoder](#tsvencoder) |
| 13 | +* [Install](#install) |
| 14 | +* [Tests](#tests) |
| 15 | +* [License](#license) |
| 16 | +* [More](#more) |
| 17 | + |
| 18 | +## Support us |
| 19 | + |
| 20 | +[](https://github.com/clue-access/clue-access) |
| 21 | + |
| 22 | +*This project is currently under active development, |
| 23 | +you're looking at a temporary placeholder repository.* |
| 24 | + |
| 25 | +The code is available in early access to my sponsors here: https://github.com/clue-access/reactphp-tsv |
| 26 | + |
| 27 | +Do you sponsor me on GitHub? Thank you for supporting sustainable open-source, you're awesome! ❤️ Have fun with the code! 🎉 |
| 28 | + |
| 29 | +Seeing a 404 (Not Found)? Sounds like you're not in the early access group. Consider becoming a [sponsor on GitHub](https://github.com/sponsors/clue) for early access. Check out [clue·access](https://github.com/clue-access/clue-access) for more details. |
| 30 | + |
| 31 | +This way, more people get a chance to take a look at the code before the public release. |
| 32 | + |
5 | 33 | ## Quickstart example |
6 | 34 |
|
7 | 35 | TSV (Tab-Separated Values) is a very simple text-based format for storing a |
@@ -43,26 +71,293 @@ Carol's birthday is 2006-01-01 |
43 | 71 | Dave's birthday is 1995-01-01 |
44 | 72 | ``` |
45 | 73 |
|
| 74 | +## TSV format |
| 75 | +
|
| 76 | +TSV (Tab-Separated Values) is a very simple text-based format for storing a |
| 77 | +large number of (uniform) records, such as a list of temparature records or log |
| 78 | +entries. |
| 79 | +
|
| 80 | +``` |
| 81 | +name birthday ip |
| 82 | +Alice 2017-01-01 1.1.1.1 |
| 83 | +Carol 2006-01-01 2.1.1.1 |
| 84 | +Dave 1995-01-01 3.1.1.1 |
| 85 | +``` |
| 86 | +
|
| 87 | +While this may look somewhat trivial, this simplicity comes at a price. TSV is |
| 88 | +limited to untyped, two-dimensional data, so there's no standard way of storing |
| 89 | +any nested structures or to differentiate a boolean value from a string or |
| 90 | +integer. |
| 91 | + |
| 92 | +While TSV may look somewhat similar to CSV (Comma-Separated Values or less |
| 93 | +commonly Character-Separated Values), it is more than just a small variation. |
| 94 | + |
| 95 | +* TSV always uses a tab stop (`\t`) as a delimiter between fields, CSV uses a |
| 96 | + comma (`,`) by default, but some applications use variations such as a |
| 97 | + semicolon (`;`) or other application-dependant characters (this is |
| 98 | + particularly common for systems in Europe (and elsewhere) that use a comma as |
| 99 | + decimal separator). |
| 100 | +* TSV always uses field names in the first row, CSV allows for optional field |
| 101 | + names (which is application-dependant). |
| 102 | +* TSV always uses the same number of fields for all rows, CSV allows for rows |
| 103 | + with different number of fields (though this is rarely used). |
| 104 | +* CSV requires quoting |
| 105 | +* CSV supports newlines and thus requires more advanced parsing rules |
| 106 | +* MIME type CSV is text/csv and for TSV text/tab-separated-values. |
| 107 | +* TSV is defined in a [simple document](https://www.iana.org/assignments/media-types/text/tab-separated-values), |
| 108 | + while CSV is defined in a dedicated [RFC 4180](https://tools.ietf.org/html/rfc4180). |
| 109 | + However many applications started using some CSV-variant long before this |
| 110 | + standard was defined, so parsing rules differ somewhat between implementations. |
| 111 | + |
| 112 | +TSV files are commonly limited to only ASCII characters for best interoperability. |
| 113 | +However, many legacy TSV files often use ISO 8859-1 encoding or some other |
| 114 | +variant. Newer TSV files are usually best saved as UTF-8 and may thus also |
| 115 | +contain special characters from the Unicode range. The text-encoding is usually |
| 116 | +application-dependant, so your best bet would be to convert to (or assume) UTF-8 |
| 117 | +consistently. |
| 118 | + |
| 119 | +Despite its shortcomings, TSV is widely used and this is unlikely to change any |
| 120 | +time soon. In particular, TSV is a very common export format for a lot of tools |
| 121 | +to interface with spreadsheet processors (such as Excel, Calc etc.). This means |
| 122 | +that TSV is often used for historical reasons and using TSV to store structured |
| 123 | +application data is usually not a good idea nowadays – but exporting to TSV for |
| 124 | +known applications continues to be a very reasonable approach. |
| 125 | + |
| 126 | +As an alternative, if you want to process structured data in a more modern |
| 127 | +JSON-based format, you may want to use [clue/reactphp-ndjson](https://github.com/clue/reactphp-ndjson) |
| 128 | +to process newline-delimited JSON (NDJSON) files (`.ndjson` file extension). |
| 129 | + |
| 130 | +```json |
| 131 | +{"name":"Alice","age":30,"comment":"Yes, I like cheese"} |
| 132 | +{"name":"Bob","age":50,"comment":"Hello\nWorld!"} |
| 133 | +``` |
| 134 | + |
| 135 | +## Usage |
| 136 | + |
| 137 | +### TsvDecoder |
| 138 | + |
| 139 | +The `TsvDecoder` (parser) class can be used to make sure you only get back |
| 140 | +complete, valid TSV elements when reading from a stream. |
| 141 | +It wraps a given |
| 142 | +[`ReadableStreamInterface`](https://github.com/reactphp/stream#readablestreaminterface) |
| 143 | +and exposes its data through the same interface, but emits the TSV elements |
| 144 | +as parsed values instead of just chunks of strings: |
| 145 | + |
| 146 | +``` |
| 147 | +name age |
| 148 | +Alice 20 |
| 149 | +Carol 30 |
| 150 | +``` |
| 151 | + |
| 152 | +```php |
| 153 | +$stdin = new React\Stream\ReadableResourceStream(STDIN); |
| 154 | +$stream = new Clue\React\Tsv\TsvDecoder($stdin); |
| 155 | +
|
| 156 | +$stream->on('data', function ($data) { |
| 157 | + // data is a parsed element from the TSV stream |
| 158 | + // line 1: $data = array('name' => 'Alice', 'age' => '20'); |
| 159 | + // line 2: $data = array('name' => 'Carol', 'age' => '30'); |
| 160 | + var_dump($data); |
| 161 | +}); |
| 162 | +``` |
| 163 | +
|
| 164 | +ReactPHP's streams emit chunks of data strings and make no assumption about their lengths. |
| 165 | +These chunks do not necessarily represent complete TSV elements, as an |
| 166 | +element may be broken up into multiple chunks. |
| 167 | +This class reassembles these elements by buffering incomplete ones. |
| 168 | +
|
| 169 | +Accordingly, the `TsvDecoder` limits the maximum buffer size (maximum line |
| 170 | +length) to avoid buffer overflows due to malformed user input. Usually, there |
| 171 | +should be no need to change this value, unless you know you're dealing with some |
| 172 | +unreasonably long lines. It accepts an additional argument if you want to change |
| 173 | +this from the default of 64 KiB: |
| 174 | +
|
| 175 | +```php |
| 176 | +$stream = new Clue\React\Tsv\TsvDecoder($stdin, 64 * 1024); |
| 177 | +``` |
| 178 | +
|
| 179 | +If the underlying stream emits an `error` event or the plain stream contains |
| 180 | +any data that does not represent a valid TSV stream, |
| 181 | +it will emit an `error` event and then `close` the input stream: |
| 182 | +
|
| 183 | +```php |
| 184 | +$stream->on('error', function (Exception $error) { |
| 185 | + // an error occured, stream will close next |
| 186 | +}); |
| 187 | +``` |
| 188 | +
|
| 189 | +If the underlying stream emits an `end` event, it will flush any incomplete |
| 190 | +data from the buffer, thus either possibly emitting a final `data` event |
| 191 | +followed by an `end` event on success or an `error` event for |
| 192 | +incomplete/invalid TSV data as above: |
| 193 | +
|
| 194 | +```php |
| 195 | +$stream->on('end', function () { |
| 196 | + // stream successfully ended, stream will close next |
| 197 | +}); |
| 198 | +``` |
| 199 | +
|
| 200 | +If either the underlying stream or the `TsvDecoder` is closed, it will forward |
| 201 | +the `close` event: |
| 202 | +
|
| 203 | +```php |
| 204 | +$stream->on('close', function () { |
| 205 | + // stream closed |
| 206 | + // possibly after an "end" event or due to an "error" event |
| 207 | +}); |
| 208 | +``` |
| 209 | +
|
| 210 | +The `close(): void` method can be used to explicitly close the `TsvDecoder` and |
| 211 | +its underlying stream: |
| 212 | +
|
| 213 | +```php |
| 214 | +$stream->close(); |
| 215 | +``` |
| 216 | +
|
| 217 | +The `pipe(WritableStreamInterface $dest, array $options = array(): WritableStreamInterface` |
| 218 | +method can be used to forward all data to the given destination stream. |
| 219 | +Please note that the `TsvDecoder` emits decoded/parsed data events, while many |
| 220 | +(most?) writable streams expect only data chunks: |
| 221 | +
|
| 222 | +```php |
| 223 | +$stream->pipe($logger); |
| 224 | +``` |
| 225 | +
|
| 226 | +For more details, see ReactPHP's |
| 227 | +[`ReadableStreamInterface`](https://github.com/reactphp/stream#readablestreaminterface). |
| 228 | +
|
| 229 | +### TsvEncoder |
| 230 | +
|
| 231 | +The `TsvEncoder` (serializer) class can be used to make sure anything you write to |
| 232 | +a stream ends up as valid TSV elements in the resulting TSV stream. |
| 233 | +It wraps a given |
| 234 | +[`WritableStreamInterface`](https://github.com/reactphp/stream#writablestreaminterface) |
| 235 | +and accepts its data through the same interface, but handles any data as complete |
| 236 | +TSV elements instead of just chunks of strings: |
| 237 | +
|
| 238 | +```php |
| 239 | +$stdout = new React\Stream\WritableResourceStream(STDOUT); |
| 240 | +$stream = new Clue\React\Tsv\TsvEncoder($stdout); |
| 241 | +
|
| 242 | +$stream->write(array('name' => 'Alice', 'age' => '20')); |
| 243 | +$stream->write(array('name' => 'Carol', 'age' => '30')); |
| 244 | +``` |
| 245 | +
|
| 246 | +``` |
| 247 | +name age |
| 248 | +Alice 20 |
| 249 | +Carol 30 |
| 250 | +``` |
| 251 | +
|
| 252 | +If the underlying stream emits an `error` event or the given data contains |
| 253 | +any data that can not be represented as a valid TSV stream, |
| 254 | +it will emit an `error` event and then `close` the input stream: |
| 255 | +
|
| 256 | +```php |
| 257 | +$stream->on('error', function (Exception $error) { |
| 258 | + // an error occured, stream will close next |
| 259 | +}); |
| 260 | +``` |
| 261 | +
|
| 262 | +If either the underlying stream or the `TsvEncoder` is closed, it will forward |
| 263 | +the `close` event: |
| 264 | +
|
| 265 | +```php |
| 266 | +$stream->on('close', function () { |
| 267 | + // stream closed |
| 268 | + // possibly after an "end" event or due to an "error" event |
| 269 | +}); |
| 270 | +``` |
| 271 | +
|
| 272 | +The `end(mixed $data = null): void` method can be used to optionally emit |
| 273 | +any final data and then soft-close the `TsvEncoder` and its underlying stream: |
| 274 | +
|
| 275 | +```php |
| 276 | +$stream->end(); |
| 277 | +``` |
| 278 | +
|
| 279 | +The `close(): void` method can be used to explicitly close the `TsvEncoder` and |
| 280 | +its underlying stream: |
| 281 | +
|
| 282 | +```php |
| 283 | +$stream->close(); |
| 284 | +``` |
| 285 | +
|
| 286 | +For more details, see ReactPHP's |
| 287 | +[`WritableStreamInterface`](https://github.com/reactphp/stream#writablestreaminterface). |
| 288 | +
|
46 | 289 | ## Install |
47 | 290 |
|
48 | | -[](https://github.com/clue-access/clue-access) |
| 291 | +The recommended way to install this library is [through Composer](https://getcomposer.org/). |
| 292 | +[New to Composer?](https://getcomposer.org/doc/00-intro.md) |
49 | 293 |
|
50 | | -*This project is currently under active development, |
51 | | -you're looking at a temporary placeholder repository.* |
| 294 | +This project does not yet follow [SemVer](https://semver.org/). |
| 295 | +This will install the latest supported version: |
52 | 296 |
|
53 | | -The code is available in early access to my sponsors here: https://github.com/clue-access/reactphp-tsv |
| 297 | +While in [early access](#support-us), you first have to manually change your |
| 298 | +`composer.json` to include these lines to access the supporters-only repository: |
54 | 299 |
|
55 | | -Do you sponsor me on GitHub? Thank you for supporting sustainable open-source, you're awesome! ❤️ Have fun with the code! 🎉 |
| 300 | +```json |
| 301 | +{ |
| 302 | + "repositories": [ |
| 303 | + { |
| 304 | + "type": "vcs", |
| 305 | + "url": "https://github.com/clue-access/reactphp-tsv" |
| 306 | + } |
| 307 | + ] |
| 308 | +} |
| 309 | +``` |
56 | 310 |
|
57 | | -Seeing a 404 (Not Found)? Sounds like you're not in the early access group. Consider becoming a [sponsor on GitHub](https://github.com/sponsors/clue) for early access. Check out [clue·access](https://github.com/clue-access/clue-access) for more details. |
| 311 | +Then install this package as usual: |
58 | 312 |
|
59 | | -This way, more people get a chance to take a look at the code before the public release. |
| 313 | +```bash |
| 314 | +$ composer require clue/reactphp-tsv:dev-main |
| 315 | +``` |
| 316 | +
|
| 317 | +This project aims to run on any platform and thus does not require any PHP |
| 318 | +extensions and supports running on legacy PHP 5.3 through current PHP 8+. |
| 319 | +It's *highly recommended to use the latest supported PHP version* for this project. |
| 320 | +
|
| 321 | +# Tests |
| 322 | +
|
| 323 | +To run the test suite, you first need to clone this repo and then install all |
| 324 | +dependencies [through Composer](https://getcomposer.org/): |
| 325 | +
|
| 326 | +```bash |
| 327 | +$ composer install |
| 328 | +``` |
| 329 | +
|
| 330 | +To run the test suite, go to the project root and run: |
60 | 331 |
|
61 | | -Rock on 🤘 |
| 332 | +```bash |
| 333 | +$ vendor/bin/phpunit |
| 334 | +``` |
62 | 335 |
|
63 | 336 | ## License |
64 | 337 |
|
65 | | -This project will be released under the permissive [MIT license](LICENSE). |
| 338 | +This project is released under the permissive [MIT license](LICENSE). |
66 | 339 |
|
67 | 340 | > Did you know that I offer custom development services and issuing invoices for |
68 | 341 | sponsorships of releases and for contributions? Contact me (@clue) for details. |
| 342 | +
|
| 343 | +## More |
| 344 | +
|
| 345 | +* If you want to learn more about processing streams of data, refer to the documentation of |
| 346 | + the underlying [react/stream](https://github.com/reactphp/stream) component. |
| 347 | +
|
| 348 | +* If you want to process a more common text-based format, |
| 349 | + you may want to use [clue/reactphp-csv](https://github.com/clue/reactphp-csv) |
| 350 | + to process Comma-Separated Values (CSV) files (`.csv` file extension). |
| 351 | +
|
| 352 | +* If you want to process structured data in a more modern JSON-based format, |
| 353 | + you may want to use [clue/reactphp-ndjson](https://github.com/clue/reactphp-ndjson) |
| 354 | + to process newline-delimited JSON (NDJSON) files (`.ndjson` file extension). |
| 355 | +
|
| 356 | +* If you want to process compressed TSV files (`.tsv.gz` file extension) |
| 357 | + you may want to use [clue/reactphp-zlib](https://github.com/clue/reactphp-zlib) |
| 358 | + on the compressed input stream before passing the decompressed stream to the TSV decoder. |
| 359 | +
|
| 360 | +* If you want to create compressed TSV files (`.tsv.gz` file extension) |
| 361 | + you may want to use [clue/reactphp-zlib](https://github.com/clue/reactphp-zlib) |
| 362 | + on the resulting TSV encoder output stream before passing the compressed |
| 363 | + stream to the file output stream. |
0 commit comments