You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[CORE] Add customMetrics extension point to ShuffleWriterMetrics
ShuffleWriterMetrics currently has a hand-rolled list of 9 scalar fields,
two of which (`avgDictionaryFields`, `dictionarySize`) are Velox-specific.
Adding more backend-specific scalars every time someone needs another counter
doesn't scale — other backends (ClickHouse, GPU, RSS) have the same need and
the cross-backend coordination cost grows linearly per metric.
This PR adds a generic `std::unordered_map<std::string, int64_t> customMetrics`
to ShuffleWriterMetrics that any shuffle writer can populate with
backend-specific stats. It is plumbed through the existing JNI `stop()`
serialization as two parallel arrays (keys + values) into
`GlutenSplitResult`, where the JVM side reassembles them lazily into an
unmodifiable `Map<String, Long>` on first access.
Convention for keys: `<Backend>.<Family>.<Stat>` — e.g.
`Velox.InputEncoding.Flat` or `Velox.SplitRV.FixedWidthWallNanos`.
Spark-side registration as SQLMetrics happens per-key in the backend's
MetricsApi (`VeloxMetricsApi` / `CHMetricsApi`); unknown keys are silently
dropped on the Scala side so a backend can ship new metrics ahead of the
Spark-side registration without breaking older Spark wrappers.
This commit only introduces the plumbing — no backend populates the map
yet. The follow-up commit on this PR wires up the Velox hash shuffle
writer as the first consumer.
Includes `GlutenSplitResultSuite` covering the JVM-side reassembly
(empty / null / populated arrays, caching, immutability) so the JNI boundary
is fenced by a unit test that doesn't need a full Spark / native round-trip.
Generated-by: GitHub Copilot CLI (Claude Opus 4.7 1M context)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
0 commit comments