|
| 1 | +# About this document |
| 2 | + |
| 3 | +This document attempts to document the format currently implemented by binjs-fbssdc, also known as "context 0.1". |
| 4 | + |
| 5 | +# Global structure |
| 6 | + |
| 7 | +``` |
| 8 | +Stream ::= MagicHeader BrotliBody |
| 9 | +``` |
| 10 | + |
| 11 | +# Magic Header |
| 12 | + |
| 13 | +The magic header serves to identify the format. |
| 14 | + |
| 15 | +``` |
| 16 | +MagicHeader ::= "\x89BJS\r\n\0\n" FormatVersion |
| 17 | +FormatVersion ::= 0b00000010 |
| 18 | +``` |
| 19 | + |
| 20 | +# Brotli content |
| 21 | + |
| 22 | +With the exception of the header, the entire file is brotli-compressed. |
| 23 | + |
| 24 | +``` |
| 25 | +BrotliBody ::= Brotli(Body) |
| 26 | +Body ::= Prelude AST |
| 27 | +``` |
| 28 | + |
| 29 | +Where `Brotli(...)` represents data that may be uncompressed by the |
| 30 | +`brotli` command-line tool or any compatible library. |
| 31 | + |
| 32 | +# Prelude |
| 33 | + |
| 34 | +The prelude defines a dictionary of strings and a dictionary of probabilities. The order of both is meaningful. |
| 35 | + |
| 36 | +``` |
| 37 | +Prelude ::= StringPrelude ProbabilityPrelude |
| 38 | +``` |
| 39 | + |
| 40 | +## String dictionary |
| 41 | + |
| 42 | +``` |
| 43 | +StringPrelude ::= n=NumberOfStrings StringDefinition{n} |
| 44 | +NumberOfStrings ::= varnum |
| 45 | +StringDefinition ::= NonZeroByte* ZeroByte |
| 46 | +NonZeroByte ::= 0x01-0xFF |
| 47 | +ZeroByte ::= 0x00 |
| 48 | +``` |
| 49 | + |
| 50 | +Strings are utf-8 encoded, then we replace any embedded `0x00 0x01` with `0x01 0x01` (FIXME: Why does this work?) |
| 51 | + |
| 52 | + |
| 53 | + |
| 54 | +## Probability dictionaries |
| 55 | + |
| 56 | +``` |
| 57 | +ProbabilityPrelude ::= ProbabilityTable* # FIXME: How do we determine the number of tables? |
| 58 | +ProbabilityTable ::= ProbabilityTableUnreachable # Compression artifact. A table that needs to appear here but is never used. |
| 59 | + | ProbabilityTableOptimizedOne # Optimization: A probability table with a single symbol. |
| 60 | + | ProbabilityTableExplicitSymbols # Used for strings, numbers. |
| 61 | + | ProbabilityTableIndexedSymbols # Used for enums, booleans, sums of interfaces. |
| 62 | +``` |
| 63 | + |
| 64 | +The probability tables are written down in an order extracted from the grammar and define a model |
| 65 | +`huffman_at: (parent type, my type) -> HuffmanTable`. |
| 66 | + |
| 67 | +FIXME: Specify how the order is extracted from the grammar. |
| 68 | + |
| 69 | +``` |
| 70 | +ProbabilityTableUnreachable ::= 0x02 |
| 71 | +ProbabilityTableOptimizedOne ::= 0x00 ExplicitSymbolData # Used for strings, numbers. |
| 72 | + | 0x00 Index # Used for enums, booleans, sums of interfaces. |
| 73 | +ProbabilityTableExplicitSymbols ::= 0x01 n=ProbabilityTableLen Probability{n} ExplicitSymbolData{n} # Only list the symbols actually used. |
| 74 | +ProbabilityTableIndexedSymbols ::= 0x01 Probability* # List all symbols, in the order extracted from the grammar. |
| 75 | +ProbabilityTableLen ::= varnum |
| 76 | +Index ::= varnum |
| 77 | +Probability ::= u8 |
| 78 | +ExplicitSymbolData ::= ExplicitSymbolStringIndex |
| 79 | + | ExplicitSymbolOptionalStringIndex |
| 80 | + | ExplicitSymbolF64 |
| 81 | + | ExplicitSymbolU32 |
| 82 | + | ExplicitSymbolI32 |
| 83 | +ExplicitSymbolStringIndex ::= varnum |
| 84 | +ExplicitSymbolOptionalStringIndex ::= 0x00 |
| 85 | + | n=varnum |
| 86 | + where n > 0 |
| 87 | +ExplicitSymbolF64 ::= f64 (IEEE 754, big endian) |
| 88 | +ExplicitSymbolU32 ::= u32 (big endian) |
| 89 | +ExplicitSymbolI32 ::= i32 (big endian) |
| 90 | +``` |
| 91 | + |
| 92 | +An `Index` is an index in a list of well-known symbols (enums, booleans, sums of interfaces). The list is |
| 93 | +extracted statically from the grammar. |
| 94 | + |
| 95 | +FIXME: Specify the order of well-known symbols. |
| 96 | + |
| 97 | +Both `ExplicitSymbolStringIndex` and `ExplicitSymbolOptionalStringIndex` are indices in the list of strings. |
| 98 | +The list is in the order specified by `StringPrelude`. In `ExplicitSymbolOptionalStringIndex`, if the result |
| 99 | +is the non-0 value `n`, the actual index is `n - 1`. |
| 100 | + |
| 101 | +# AST |
| 102 | + |
| 103 | +AST definitions are recursive. Any AST definition may itself contain further definitions, |
| 104 | +used to represent lazy functions. |
| 105 | + |
| 106 | +``` |
| 107 | +AST ::= RootNode n=NumberOfLazyParts LazyPartByteLen{n} LazyAST{n} |
| 108 | +NumerOfLazyParts ::= varnum |
| 109 | +LazyPartByteLen ::= varnum |
| 110 | +LazyAST ::= Node |
| 111 | +``` |
| 112 | + |
| 113 | +In the definition of `AST`, for each `i`, `LazyPartByteLen[i]` represents the number |
| 114 | +of bytes used to store the item of the sub-ast `LazyAST[i]`. |
| 115 | + |
| 116 | +# Nodes |
| 117 | + |
| 118 | +Nodes are stored as sequences of Huffman-encoded values. Note that the encoding uses |
| 119 | +numerous distinct Huffman tables. Each `(parent tag, value type)` pair determines the |
| 120 | +Huffman table to be used to decode the next few bits in the sequence. |
| 121 | + |
| 122 | +``` |
| 123 | +RootNode ::= Value(ε)* |
| 124 | +Node(parent) ::= t=Tag(parent) Field(t)* |
| 125 | +Tag(parent) ::= Primitive(parent, TAG) |
| 126 | +Value(parent) ::= "" # If field is lazy |
| 127 | + | Node(parent) # If field is an interface or sum of interfaces |
| 128 | + | List(parent) # If field is a list |
| 129 | + | Primitive(parent, U32) # If field is a u32 |
| 130 | + | Primitive(parent, I32) # If field is a i32 |
| 131 | + | Primitive(parent, F64) # ... |
| 132 | + | Primitive(parent, StringIndex) |
| 133 | + | Primitive(parent, OptionalStringIndex) |
| 134 | +List(parent) ::= ListLength(parent) Value(parent)* |
| 135 | +ListLength(parent) ::= Primitive(ListLength<parent>, U32) # List lengths are u32 values with a special parent |
| 136 | +Primitive(parent, type) ::= bit* |
| 137 | +``` |
| 138 | + |
| 139 | +In every instance of `Primitive(parent, type)`, we use the Huffman table defined as `huffman_at` (see above) |
| 140 | +to both determine the number of bits to read and interpret these bits as a value of the corresponding `type`. |
0 commit comments