Commit dccfb24
authored
Add
## Summary
Tracking issue: #7216
We have very little observability into the compressor. When we are
debugging, we don't really have any idea of what schemes the compressor
is trying, how good or how bad estimates are, how reliable sampling is,
how the cascading paths look, etc.
This change adds structured `tracing` support to `vortex-compressor`.
The compressor now emits a top-level `compress` span and decision/debug
events on the `vortex_compressor::encode` target, so a normal tracing
subscriber can see what the compressor sampled, selected,
accepted/rejected, and where nested failures happened.
The `scheme.compress_result` event reports `scheme`, `before_nbytes`,
`after_nbytes`, `estimated_ratio` when available, `actual_ratio` when
available, and `accepted`. Sampling is recorded through `sample.result`;
compression failures are recorded through `scheme.compress_failed` /
`sample.compress_failed` with `cascade_path` and `cascade_depth`.
Zero-byte outputs intentionally omit ratio fields instead of logging
infinities.
This also adds JSON formatting to the benchmark logging setup via
`--log-format json`, which makes `data-gen` / `compress-bench` output
usable as JSONL. One useful workflow is to generate TPC-H data with
compressor logs enabled and use `jq` to find over-optimistic estimates
that were rejected.
<details>
<summary>Example jq query for rejected over-estimates</summary>
```fish
set LOG data-gen.jsonl
jq -R -s -r '
def rows: split("\n") | map(fromjson? // empty);
def r:
if type == "number" then ((. * 1000 | round) / 1000)
else .
end;
([
"estimated_over_actual",
"scheme",
"estimated_ratio",
"actual_ratio",
"before_nbytes",
"after_nbytes",
"extra_bytes",
"span_dtype",
"span_len"
] | @TSV),
(
rows
| map(select(.target == "vortex_compressor::encode"))
| map(select(.fields.message == "scheme.compress_result"))
| map(select(.fields.accepted == false))
| map(select(.fields.estimated_ratio != null and .fields.actual_ratio != null))
| map(select(.fields.estimated_ratio > .fields.actual_ratio))
| map(.fields as $f | {
scheme: $f.scheme,
estimated_ratio: $f.estimated_ratio,
actual_ratio: $f.actual_ratio,
estimated_over_actual: ($f.estimated_ratio / $f.actual_ratio),
before_nbytes: $f.before_nbytes,
after_nbytes: $f.after_nbytes,
extra_bytes: ($f.after_nbytes - $f.before_nbytes),
span_dtype: (.span.dtype // ""),
span_len: (.span.len // "")
})
| sort_by(.estimated_over_actual) | reverse
| .[:50][]
| [
(.estimated_over_actual | r),
.scheme,
(.estimated_ratio | r),
(.actual_ratio | r),
.before_nbytes,
.after_nbytes,
.extra_bytes,
.span_dtype,
.span_len
]
| @TSV
)
' $LOG
```
```
estimated_over_actual scheme estimated_ratio actual_ratio before_nbytes after_nbytes extra_bytes span_dtype span_len
512 vortex.int.for 1.6 0.003 2 640 638 utf8? 2
512 vortex.int.for 2.667 0.005 2 384 382 utf8? 2
512 vortex.int.for 5.333 0.01 8 768 760 decimal(15,2)? 2
512 vortex.int.for 4.571 0.009 8 896 888 decimal(15,2)? 2
512 vortex.int.for 4 0.008 8 1024 1016 decimal(15,2)? 2
512 vortex.int.for 1.6 0.003 2 640 638 utf8? 2
512 vortex.int.for 8 0.016 2 128 126 utf8? 2
512 vortex.int.for 2.56 0.005 16 3200 3184 i64? 2
512 vortex.int.for 5.818 0.011 16 1408 1392 i64? 2
256 vortex.int.for 2 0.008 4 512 508 utf8 4
256 vortex.int.for 2 0.008 4 512 508 utf8 4
256 vortex.int.for 2 0.008 4 512 508 utf8 8192
256 vortex.int.for 2 0.008 4 512 508 utf8 8192
256 vortex.int.for 2 0.008 4 512 508 utf8 8192
256 vortex.int.for 2 0.008 4 512 508 utf8 8192
256 vortex.int.for 2 0.008 4 512 508 utf8 8192
256 vortex.int.for 2 0.008 4 512 508 utf8 8192
204.8 vortex.int.for 4 0.02 5 256 251 utf8 5
204.8 vortex.int.for 4 0.02 5 256 251 utf8 8192
204.8 vortex.int.for 4 0.02 5 256 251 utf8 8192
204.8 vortex.int.for 2.667 0.013 5 384 379 utf8 5
204.8 vortex.int.for 2.667 0.013 5 384 379 utf8 5
146.286 vortex.int.for 1.333 0.009 7 768 761 utf8? 19
146.286 vortex.int.for 1.333 0.009 7 768 761 utf8? 19
128 vortex.int.for 2 0.016 8 512 504 utf8? 98
128 vortex.int.for 2 0.016 8 512 504 utf8? 98
113.778 vortex.int.for 1.6 0.014 18 1280 1262 i32? 184
113.778 vortex.int.for 1.6 0.014 18 1280 1262 i32? 184
113.778 vortex.int.for 32 0.281 36 128 92 i32? 184
113.778 vortex.int.for 32 0.281 36 128 92 i32? 184
64 vortex.int.for 2 0.031 16 512 496 utf8? 98
64 vortex.int.for 2 0.031 16 512 496 utf8? 98
64 vortex.int.for 2 0.031 16 512 496 utf8? 98
64 vortex.int.for 2 0.031 16 512 496 utf8? 98
53.895 vortex.int.for 1.6 0.03 19 640 621 utf8? 19
53.895 vortex.int.for 3.2 0.059 76 1280 1204 decimal(15,2)? 19
53.895 vortex.int.for 2.909 0.054 76 1408 1332 decimal(15,2)? 19
53.895 vortex.int.for 1.6 0.03 19 640 621 utf8? 19
53.895 vortex.int.for 1.6 0.03 19 640 621 utf8? 19
53.895 vortex.int.for 32 0.594 76 128 52 i32? 184
53.895 vortex.int.for 32 0.594 76 128 52 i32? 184
53.895 vortex.int.for 4 0.074 76 1024 948 decimal(15,2)? 19
51.2 vortex.int.for 2 0.039 20 512 492 utf8? 98
51.2 vortex.int.for 2 0.039 20 512 492 utf8? 98
51.2 vortex.int.for 2 0.039 20 512 492 utf8? 98
51.2 vortex.int.for 2 0.039 20 512 492 utf8? 98
51.2 vortex.int.for 2 0.039 20 512 492 utf8? 98
51.2 vortex.int.for 2 0.039 20 512 492 utf8? 98
40.96 vortex.int.for 2 0.049 25 512 487 utf8? 25
40.96 vortex.int.for 2 0.049 25 512 487 utf8? 25
```
</details>
## Testing
Some basic tracing tests (that was claude-generated).
---------
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>tracing support to the compressor (#7385)1 parent 91f1c2f commit dccfb24
13 files changed
Lines changed: 890 additions & 159 deletions
File tree
- benchmarks/compress-bench/src
- vortex-bench
- src
- bin
- utils
- vortex-btrblocks/src/schemes
- vortex-compressor
- src
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
| |||
39 | 40 | | |
40 | 41 | | |
41 | 42 | | |
42 | | - | |
| 43 | + | |
43 | 44 | | |
44 | 45 | | |
45 | 46 | | |
| |||
69 | 70 | | |
70 | 71 | | |
71 | 72 | | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
72 | 77 | | |
73 | 78 | | |
74 | 79 | | |
75 | 80 | | |
76 | 81 | | |
77 | 82 | | |
78 | | - | |
| 83 | + | |
79 | 84 | | |
80 | 85 | | |
81 | 86 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
| 64 | + | |
64 | 65 | | |
65 | 66 | | |
66 | 67 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
25 | | - | |
| 26 | + | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
| |||
37 | 38 | | |
38 | 39 | | |
39 | 40 | | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
40 | 46 | | |
41 | 47 | | |
42 | 48 | | |
| |||
49 | 55 | | |
50 | 56 | | |
51 | 57 | | |
52 | | - | |
| 58 | + | |
53 | 59 | | |
54 | 60 | | |
55 | 61 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
7 | 8 | | |
8 | 9 | | |
9 | 10 | | |
| 11 | + | |
10 | 12 | | |
11 | 13 | | |
12 | | - | |
13 | | - | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
14 | 47 | | |
15 | 48 | | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
22 | 74 | | |
23 | 75 | | |
24 | 76 | | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
| 77 | + | |
35 | 78 | | |
36 | 79 | | |
37 | 80 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
307 | 307 | | |
308 | 308 | | |
309 | 309 | | |
310 | | - | |
311 | | - | |
312 | 310 | | |
313 | 311 | | |
314 | 312 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
| 22 | + | |
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| 32 | + | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
| |||
0 commit comments