@@ -527,7 +527,7 @@ of `B`, `H`, `I`, `L` or `Q` which have the same meaning as in the numeric
527527types above. ` L ` is the default. As an example:
528528
529529```
530- {"type": "array", { "items": {"type":"number", "binaryFormat":"h"} }, "arrayLengthFormat":"B"}
530+ {"type": "array", "items": {"type":"number", "binaryFormat":"h"}, "arrayLengthFormat":"B"}
531531```
532532
533533Will result in an array of 2 byte integers, prepended by a single-byte array-length.
@@ -555,6 +555,73 @@ As a special case under the `struct` codec, the top-level type of metadata can b
555555union of ` object ` and ` null ` . Set ` "type": ["object", "null"] ` . Properties should
556556be defined as normal, and will be ignored if the metadata is ` None ` .
557557
558+ (sec_metadata_codecs_jsonstruct)=
559+
560+ ### ` json+struct `
561+
562+ An additional codec provides the ability to store * both* JSON and binary-encoded data.
563+ This is provided for the case where we want to store some arbitrary metadata
564+ (as JSON) along with a relatively large amount of data (as binary, for efficiency).
565+ For instance, we might want to record a raster map of the sampled area
566+ along with a few pieces of generic information (e.g., the name of the area).
567+
568+ The metadata schema for "json+struct" metadata basically just specifies both
569+ a JSON metadata schema and a struct metadata schema.
570+ Each entry in the metadata is encoded with either the JSON or the struct codec.
571+ Here is a simple example:
572+
573+ ``` {code-cell}
574+ schema = {
575+ "codec": "json+struct",
576+ "json": {
577+ "type": "object",
578+ "properties": {
579+ "label": {"type": "string"},
580+ "id": {"type": "number"},
581+ },
582+ "required": ["label"],
583+ },
584+ "struct": {
585+ "type": "object",
586+ "properties": {
587+ "values": {
588+ "type": "array",
589+ "arrayLengthFormat": "B",
590+ "items": {"type": "number", "binaryFormat": "i"},
591+ },
592+ },
593+ },
594+ }
595+ ms = tskit.MetadataSchema(schema)
596+ row = {"label": "alpha", "id": 7, "values": [5, 10, 2, 12]}
597+ encoded = ms.validate_and_encode_row(row)
598+ print("Encoded:", encoded)
599+ print("Decoded:", ms.decode_row(encoded))
600+ ```
601+
602+ This encodes two things in JSON: a label and an ID number,
603+ and then an array of integers in binary (using the `` struct `` codec).
604+ If the array of integers is large, this could result in
605+ much better performance.
606+
607+
608+ #### Binary representation
609+
610+ The underlying structure of the JSON+struct codec is as follows.
611+ (If you're not writing out data in this format,
612+ you don't need to worry about this.)
613+ (1) some magic bytes;
614+ (2) a version number;
615+ (3) the length of the JSON in bytes;
616+ (4) the length of the binary (struct) data in bytes;
617+ (5) the JSON data;
618+ (6) zero-ed "padding" bytes to bring the start of the binary section
619+ into 8-byte alignment;
620+ (7) the binary data.
621+ The structure of the binary data is specified using the "struct" portion
622+ of the metadata schema.
623+
624+
558625(sec_metadata_schema_examples)=
559626
560627## Schema examples
0 commit comments