|
| 1 | +# Tracegrind MsgPack+LZ4 Output Format |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +Tracegrind's `--output-format=msgpack` produces a binary trace file combining MsgPack serialization with LZ4 block compression. Files use the `.msgpack.lz4` extension. |
| 6 | + |
| 7 | +## File Structure |
| 8 | + |
| 9 | +``` |
| 10 | +┌─────────────────────────────────┐ |
| 11 | +│ File Header (8 bytes) │ |
| 12 | +├─────────────────────────────────┤ |
| 13 | +│ Schema Chunk │ |
| 14 | +├─────────────────────────────────┤ |
| 15 | +│ Data Chunk 1..N │ |
| 16 | +├─────────────────────────────────┤ |
| 17 | +│ End Marker (8 bytes) │ |
| 18 | +└─────────────────────────────────┘ |
| 19 | +``` |
| 20 | + |
| 21 | +## File Header |
| 22 | + |
| 23 | +| Offset | Size | Field | Description | |
| 24 | +|--------|------|---------|-------------| |
| 25 | +| 0 | 4 | magic | ASCII `TGMP` (0x54 0x47 0x4D 0x50) | |
| 26 | +| 4 | 4 | version | Format version, uint32 LE (currently 1) | |
| 27 | + |
| 28 | +## Chunk Format |
| 29 | + |
| 30 | +Each chunk (schema and data) has the same header: |
| 31 | + |
| 32 | +| Offset | Size | Field | Description | |
| 33 | +|--------|------|-------------------|-------------| |
| 34 | +| 0 | 4 | uncompressed_size | Size after decompression, uint32 LE | |
| 35 | +| 4 | 4 | compressed_size | Size of LZ4 block, uint32 LE | |
| 36 | +| 8 | N | data | LZ4 block-compressed MsgPack data | |
| 37 | + |
| 38 | +## Schema Chunk |
| 39 | + |
| 40 | +The first chunk contains a MsgPack map: |
| 41 | + |
| 42 | +```json |
| 43 | +{ |
| 44 | + "version": 1, |
| 45 | + "format": "tracegrind-msgpack", |
| 46 | + "columns": ["seq", "tid", "event", "fn", "obj", "file", "line", "Ir", ...] |
| 47 | +} |
| 48 | +``` |
| 49 | + |
| 50 | +### Fixed Columns |
| 51 | + |
| 52 | +| Index | Name | Type | Description | |
| 53 | +|-------|-------|--------|-------------| |
| 54 | +| 0 | seq | uint64 | Sequence number | |
| 55 | +| 1 | tid | int32 | Thread ID | |
| 56 | +| 2 | event | int | 0 = ENTER, 1 = EXIT | |
| 57 | +| 3 | fn | string | Function name | |
| 58 | +| 4 | obj | string | Shared object path | |
| 59 | +| 5 | file | string | Source file path | |
| 60 | +| 6 | line | int32 | Line number (0 if unknown) | |
| 61 | + |
| 62 | +### Event Columns (index 7+) |
| 63 | + |
| 64 | +Event counters as delta values: `Ir`, `Dr`, `Dw`, `I1mr`, `D1mr`, `D1mw`, `ILmr`, `DLmr`, `DLmw`, `Bc`, `Bcm`, `Bi`, `Bim`. Which columns are present depends on Tracegrind options. |
| 65 | + |
| 66 | +## Data Chunks |
| 67 | + |
| 68 | +Each data chunk contains concatenated MsgPack arrays (one per row): |
| 69 | + |
| 70 | +``` |
| 71 | +[seq, tid, event, fn, obj, file, line, delta_Ir, ...] |
| 72 | +``` |
| 73 | + |
| 74 | +The reference implementation writes 4096 rows per chunk. |
| 75 | + |
| 76 | +## End Marker |
| 77 | + |
| 78 | +8 zero bytes (uncompressed_size = 0, compressed_size = 0). |
| 79 | + |
| 80 | +## Example: Reading in Python |
| 81 | + |
| 82 | +```python |
| 83 | +import struct, lz4.block, msgpack |
| 84 | + |
| 85 | +def read_tracegrind(filepath): |
| 86 | + with open(filepath, 'rb') as f: |
| 87 | + assert f.read(4) == b'TGMP' |
| 88 | + version = struct.unpack('<I', f.read(4))[0] |
| 89 | + |
| 90 | + # Read schema chunk |
| 91 | + usize, csize = struct.unpack('<II', f.read(8)) |
| 92 | + schema = msgpack.unpackb( |
| 93 | + lz4.block.decompress(f.read(csize), uncompressed_size=usize)) |
| 94 | + columns = [c.decode() if isinstance(c, bytes) else c |
| 95 | + for c in schema[b'columns']] |
| 96 | + |
| 97 | + # Read data chunks |
| 98 | + rows = [] |
| 99 | + while True: |
| 100 | + usize, csize = struct.unpack('<II', f.read(8)) |
| 101 | + if usize == 0 and csize == 0: |
| 102 | + break |
| 103 | + chunk = lz4.block.decompress(f.read(csize), uncompressed_size=usize) |
| 104 | + unpacker = msgpack.Unpacker(raw=False) |
| 105 | + unpacker.feed(chunk) |
| 106 | + for row in unpacker: |
| 107 | + rows.append(dict(zip(columns, row))) |
| 108 | + |
| 109 | + return columns, rows |
| 110 | +``` |
| 111 | + |
| 112 | +## References |
| 113 | + |
| 114 | +- [MsgPack Specification](https://github.com/msgpack/msgpack/blob/master/spec.md) |
| 115 | +- [LZ4 Block Format](https://github.com/lz4/lz4/blob/dev/doc/lz4_Block_format.md) |
| 116 | + |
| 117 | +## Reference Implementation |
| 118 | + |
| 119 | +- `tracegrind/tg_msgpack.c/h` - MsgPack encoder |
| 120 | +- `tracegrind/tg_lz4.c/h` - LZ4 compression wrapper |
| 121 | +- `tracegrind/lz4.c/h` - Vendored LZ4 library |
| 122 | +- `tracegrind/dump.c` - Trace output integration |
0 commit comments