|
| 1 | +.. Licensed to the Apache Software Foundation (ASF) under one |
| 2 | + or more contributor license agreements. See the NOTICE file |
| 3 | + distributed with this work for additional information |
| 4 | + regarding copyright ownership. The ASF licenses this file |
| 5 | + to you under the Apache License, Version 2.0 (the |
| 6 | + "License"); you may not use this file except in compliance |
| 7 | + with the License. You may obtain a copy of the License at |
| 8 | +
|
| 9 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 10 | +
|
| 11 | + Unless required by applicable law or agreed to in writing, |
| 12 | + software distributed under the License is distributed on an |
| 13 | + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| 14 | + KIND, either express or implied. See the License for the |
| 15 | + specific language governing permissions and limitations |
| 16 | + under the License. |
| 17 | +
|
| 18 | +.. include:: ../../common.defs |
| 19 | + |
| 20 | +.. _binary-log-v3-format: |
| 21 | + |
| 22 | +Self-Describing Binary Log Format (v3) |
| 23 | +************************************** |
| 24 | + |
| 25 | +This page specifies the on-disk format of a binary log segment, version 3, in |
| 26 | +enough detail to implement a decoder *without* the Traffic Server source tree. |
| 27 | +A version 3 segment is **self-describing**: every field's type is published in |
| 28 | +the segment header, so a generic reader can decode each entry by dispatching on |
| 29 | +a small, stable set of type codes — no embedded copy of the ATS symbol-to-type |
| 30 | +table is required. |
| 31 | + |
| 32 | +Motivation |
| 33 | +========== |
| 34 | + |
| 35 | +In version 2, a segment header carries the field *symbols* (``fmt_fieldlist``, |
| 36 | +e.g. ``"chi cqu pssc"``) and a printf-style *template* (``fmt_printf``) but |
| 37 | +**not** the field types. To decode an entry a reader had to already know the |
| 38 | +type of each symbol, because the value encodings are only self-delimiting once |
| 39 | +the type is known (``IP`` is variable length, for example). That coupled every |
| 40 | +out-of-tree parser to the exact ATS build that wrote the log. |
| 41 | + |
| 42 | +Version 3 adds one thing: a per-segment **field-type schema** that lists the |
| 43 | +wire type of every field, in field order. Decoding then needs only the symbols |
| 44 | +(as keys) and the schema (for types). |
| 45 | + |
| 46 | +Segment layout |
| 47 | +============== |
| 48 | + |
| 49 | +A ``.blog`` file is a stream of segments, each a serialized ``LogBuffer``: |
| 50 | + |
| 51 | +:: |
| 52 | + |
| 53 | + LogBufferHeader (per segment) |
| 54 | + cookie = 0xaceface |
| 55 | + version = 3 |
| 56 | + format_type, byte_count, entry_count, timestamps, flags, signature |
| 57 | + fmt_name_offset |
| 58 | + fmt_fieldlist_offset -> "chi cqu pssc ..." (symbols, space separated) |
| 59 | + fmt_printf_offset -> "%<chi> %<cqu> ..." |
| 60 | + src_hostname_offset, log_filename_offset |
| 61 | + data_offset -> first entry |
| 62 | + fmt_fieldtypes_offset -> field-type schema (NEW in v3) |
| 63 | + [ LogEntryHeader | field0 field1 field2 ... ] x entry_count |
| 64 | + LogEntryHeader: timestamp(8) timestamp_usec(4) entry_len(4) |
| 65 | + fields: concatenated in fieldlist order, no per-field tags |
| 66 | + |
| 67 | +All ``*_offset`` members are byte offsets from the start of the segment (the |
| 68 | +address of the ``LogBufferHeader``). ``fmt_fieldtypes_offset`` is appended |
| 69 | +**after** ``data_offset`` so that the layout through ``data_offset`` is |
| 70 | +byte-identical to version 2; a value of ``0`` means the schema is absent (e.g. |
| 71 | +a text-format segment, or a version 2 segment). |
| 72 | + |
| 73 | +Field-type schema |
| 74 | +================= |
| 75 | + |
| 76 | +At ``fmt_fieldtypes_offset`` the segment stores: |
| 77 | + |
| 78 | +:: |
| 79 | + |
| 80 | + uint16_t field_count; // == number of symbols in fmt_fieldlist |
| 81 | + uint8_t type_code[field_count]; // one type code per field, in order |
| 82 | + |
| 83 | +``type_code[i]`` is the type of the i-th field, which corresponds to the i-th |
| 84 | +symbol in ``fmt_fieldlist`` and the i-th value in each entry. The ``uint16_t`` |
| 85 | +``field_count`` prefix is written in **host byte order**, like the rest of |
| 86 | +``LogBufferHeader``. The blob is padded along with the header to an 8-byte |
| 87 | +boundary. |
| 88 | + |
| 89 | +The schema carries no independent version of its own: the segment ``version`` |
| 90 | +(``3`` here) governs this layout, so a future schema change rides the same |
| 91 | +``LOG_SEGMENT_VERSION`` bump rather than a second, separate counter. |
| 92 | + |
| 93 | +Stable type codes |
| 94 | +================= |
| 95 | + |
| 96 | +The type codes are the values of the in-tree ``LogField::Type`` enumeration, |
| 97 | +serialized directly. They are part of the published format and are |
| 98 | +**append-only**: codes are never renumbered or reused. |
| 99 | + |
| 100 | +==== ========= =========================================================== |
| 101 | +Code Name Wire encoding |
| 102 | +==== ========= =========================================================== |
| 103 | +0 INVALID Reserved. Not emitted by a correct writer; a reader that |
| 104 | + meets it -- or any code it does not recognize -- cannot |
| 105 | + determine the field length and must stop decoding the entry. |
| 106 | +1 sINT A single ``int64_t``, fixed 8 bytes, **host byte order**. |
| 107 | +2 dINT Two ``int64_t`` (16 bytes), host byte order. Used for |
| 108 | + values stored as two integers, e.g. HTTP version |
| 109 | + major/minor. |
| 110 | +3 STRING NUL-terminated bytes, then padded to an 8-byte boundary. |
| 111 | +4 IP ``uint16_t`` address family followed by a family-sized |
| 112 | + address, then padded to an 8-byte boundary (see below). |
| 113 | +==== ========= =========================================================== |
| 114 | + |
| 115 | +The code reflects how the value is *framed* on disk, i.e. how a reader walks |
| 116 | +(or skips) it -- not what the value means. (The ``sINT``/``dINT`` names are an |
| 117 | +ATS-internal distinction; on the wire ``sINT`` is one 8-byte integer and |
| 118 | +``dINT`` is two consecutive ones.) How a consumer *renders* a value -- mapping |
| 119 | +a cache-result integer to ``TCP_HIT``, or a ``dINT`` to ``1.1`` -- is layered |
| 120 | +on top by the consumer and is not part of the wire format. |
| 121 | + |
| 122 | +Value encodings |
| 123 | +=============== |
| 124 | + |
| 125 | +sINT |
| 126 | + An ``int64_t`` occupying exactly 8 bytes, in **host byte order** (as in |
| 127 | + version 2). Integer values are not endianness-normalized, so a ``.blog`` is |
| 128 | + not portable across hosts of differing endianness; cross-architecture |
| 129 | + portability is future work. |
| 130 | + |
| 131 | +dINT |
| 132 | + Two consecutive ``sINT`` values: 16 bytes total, in host byte order. Used |
| 133 | + where one log field is stored as two integers, such as an HTTP version |
| 134 | + (major then minor). The reference decoder renders it as a JSON array, e.g. |
| 135 | + ``[1,1]``; turning that into ``1.1`` is a consumer concern. |
| 136 | + |
| 137 | +STRING |
| 138 | + The string bytes followed by a single NUL, then zero padding up to the next |
| 139 | + 8-byte boundary. The on-wire length is therefore |
| 140 | + ``align_up(strlen + 1, 8)``. An empty/absent string is written as ``"-"``. |
| 141 | + |
| 142 | +IP |
| 143 | + A ``uint16_t`` address family in host byte order, then: |
| 144 | + |
| 145 | + .. list-table:: |
| 146 | + :header-rows: 1 |
| 147 | + :widths: 30 70 |
| 148 | + |
| 149 | + * - Family |
| 150 | + - Following bytes |
| 151 | + * - ``AF_INET`` (IPv4) |
| 152 | + - 4-byte ``in_addr`` |
| 153 | + * - ``AF_INET6`` (IPv6) |
| 154 | + - 16-byte ``in6_addr`` |
| 155 | + * - ``AF_UNIX`` |
| 156 | + - fixed-size path buffer |
| 157 | + * - ``AF_UNSPEC`` / other |
| 158 | + - no address bytes |
| 159 | + |
| 160 | + The whole field is padded to the next 8-byte boundary. Because the length |
| 161 | + depends on the family byte *inside* the value, only a reader that knows the |
| 162 | + field is an ``IP`` (from the schema) can compute its size — which is exactly |
| 163 | + why the schema is required to skip or decode unknown fields safely. |
| 164 | + |
| 165 | +Decoding an entry |
| 166 | +================= |
| 167 | + |
| 168 | +Given a segment, a generic decoder: |
| 169 | + |
| 170 | +#. Reads ``field_count`` and the ``type_code[]`` array from the schema at |
| 171 | + ``fmt_fieldtypes_offset``. |
| 172 | +#. Splits ``fmt_fieldlist`` into ``field_count`` whitespace-separated symbols. |
| 173 | +#. For each entry (located via ``data_offset`` and walked using |
| 174 | + ``LogEntryHeader::entry_len``), reads the fields left to right, using |
| 175 | + ``type_code[i]`` to pick the encoding above and advance the read cursor. |
| 176 | + |
| 177 | +The reference implementation is ``log_entry_to_json()`` |
| 178 | +(``src/traffic_logcat/LogEntryJson.cc``), which renders an entry as a JSON |
| 179 | +object using only the symbols and the schema — it does not consult the global |
| 180 | +field table. It is exposed by :program:`traffic_logcat`'s ``-j``/``--json`` |
| 181 | +option. For example, a three-field entry decodes to: |
| 182 | + |
| 183 | +:: |
| 184 | + |
| 185 | + {"chi":"192.0.2.10","cqu":"GET /index.html","pssc":200} |
| 186 | + |
| 187 | +.. note:: |
| 188 | + |
| 189 | + Some integer fields hold coded values (cache result, hierarchy, finish |
| 190 | + status, etc.). The binary format stores the raw integer; mapping it to a |
| 191 | + mnemonic such as ``TCP_HIT`` is a presentation concern left to the consumer. |
| 192 | + |
| 193 | +Compatibility |
| 194 | +============= |
| 195 | + |
| 196 | +* **New reader, old file (v3 reader, v2 file):** supported. The readers shipped |
| 197 | + with Traffic Server accept the inclusive version range |
| 198 | + ``[2, 3]`` and size the header read to the on-disk version, so a v2 segment |
| 199 | + (which has no ``fmt_fieldtypes_offset``) still decodes. Its ASCII output is |
| 200 | + produced from ``fmt_fieldlist`` + ``fmt_printf`` exactly as before. |
| 201 | +* **Old reader, new file (v2 reader, v3 file):** a reader built before v3 |
| 202 | + support gates on the version and will skip v3 segments. v3 logs therefore |
| 203 | + require tooling from a release that understands v3. As an escape hatch, a |
| 204 | + binary log object can be pinned to the version 2 layout with |
| 205 | + ``binary_log_version: 2`` in :file:`logging.yaml`, so a not-yet-upgraded |
| 206 | + downstream parser keeps working during a migration. |
| 207 | +* The text/Squid/CLF ASCII output paths are unchanged: the schema is additive |
| 208 | + and ignored when rendering ASCII. |
| 209 | + |
| 210 | +.. note:: |
| 211 | + |
| 212 | + v3 does not change integer endianness: field values, the integers in |
| 213 | + ``LogBufferHeader`` / ``LogEntryHeader``, and the ``IP`` family word are all |
| 214 | + written in host byte order, as in v2. A ``.blog`` is therefore not portable |
| 215 | + across hosts of differing endianness; cross-architecture portability is |
| 216 | + future work. |
0 commit comments