Skip to content

Commit 321c9bc

Browse files
committed
other changes
1 parent 6766551 commit 321c9bc

2 files changed

Lines changed: 69 additions & 1 deletion

File tree

c/tskit/core.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,7 @@ tsk_json_struct_metadata_get_blob(char *metadata, tsk_size_t metadata_length,
214214
out:
215215
return ret;
216216
}
217+
217218
static const char *
218219
tsk_strerror_internal(int err)
219220
{

docs/metadata.md

Lines changed: 68 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -527,7 +527,7 @@ of `B`, `H`, `I`, `L` or `Q` which have the same meaning as in the numeric
527527
types above. `L` is the default. As an example:
528528

529529
```
530-
{"type": "array", {"items": {"type":"number", "binaryFormat":"h"}}, "arrayLengthFormat":"B"}
530+
{"type": "array", "items": {"type":"number", "binaryFormat":"h"}, "arrayLengthFormat":"B"}
531531
```
532532

533533
Will result in an array of 2 byte integers, prepended by a single-byte array-length.
@@ -555,6 +555,73 @@ As a special case under the `struct` codec, the top-level type of metadata can b
555555
union of `object` and `null`. Set `"type": ["object", "null"]`. Properties should
556556
be defined as normal, and will be ignored if the metadata is `None`.
557557

558+
(sec_metadata_codecs_jsonstruct)=
559+
560+
### `json+struct`
561+
562+
An additional codec provides the ability to store *both* JSON and binary-encoded data.
563+
This is provided for the case where we want to store some arbitrary metadata
564+
(as JSON) along with a relatively large amount of data (as binary, for efficiency).
565+
For instance, we might want to record a raster map of the sampled area
566+
along with a few pieces of generic information (e.g., the name of the area).
567+
568+
The metadata schema for "json+struct" metadata basically just specifies both
569+
a JSON metadata schema and a struct metadata schema.
570+
Each entry in the metadata is encoded with either the JSON or the struct codec.
571+
Here is a simple example:
572+
573+
```{code-cell}
574+
schema = {
575+
"codec": "json+struct",
576+
"json": {
577+
"type": "object",
578+
"properties": {
579+
"label": {"type": "string"},
580+
"id": {"type": "number"},
581+
},
582+
"required": ["label"],
583+
},
584+
"struct": {
585+
"type": "object",
586+
"properties": {
587+
"values": {
588+
"type": "array",
589+
"arrayLengthFormat": "B",
590+
"items": {"type": "number", "binaryFormat": "i"},
591+
},
592+
},
593+
},
594+
}
595+
ms = tskit.MetadataSchema(schema)
596+
row = {"label": "alpha", "id": 7, "values": [5, 10, 2, 12]}
597+
encoded = ms.validate_and_encode_row(row)
598+
print("Encoded:", encoded)
599+
print("Decoded:", ms.decode_row(encoded))
600+
```
601+
602+
This encodes two things in JSON: a label and an ID number,
603+
and then an array of integers in binary (using the ``struct`` codec).
604+
If the array of integers is large, this could result in
605+
much better performance.
606+
607+
608+
#### Binary representation
609+
610+
The underlying structure of the JSON+struct codec is as follows.
611+
(If you're not writing out data in this format,
612+
you don't need to worry about this.)
613+
(1) some magic bytes;
614+
(2) a version number;
615+
(3) the length of the JSON in bytes;
616+
(4) the length of the binary (struct) data in bytes;
617+
(5) the JSON data;
618+
(6) zero-ed "padding" bytes to bring the start of the binary section
619+
into 8-byte alignment;
620+
(7) the binary data.
621+
The structure of the binary data is specified using the "struct" portion
622+
of the metadata schema.
623+
624+
558625
(sec_metadata_schema_examples)=
559626

560627
## Schema examples

0 commit comments

Comments
 (0)