Skip to content

fix(celerdata): collect all per-db starrocks_fe_table_num series#3025

Open
jaogoy wants to merge 2 commits into
DataDog:masterfrom
jaogoy:fix.celerdata-table-num-gauge
Open

fix(celerdata): collect all per-db starrocks_fe_table_num series#3025
jaogoy wants to merge 2 commits into
DataDog:masterfrom
jaogoy:fix.celerdata-table-num-gauge

Conversation

@jaogoy

@jaogoy jaogoy commented Jun 3, 2026

Copy link
Copy Markdown

Why I'm doing this

StarRocks FE exposes starrocks_fe_table_num interleaved with starrocks_fe_db_size_bytes (one pair per database) under a single # TYPE ... gauge line, which violates the Prometheus text exposition rule that all samples of a metric family must be contiguous.

The prometheus_client streaming parser used by OpenMetrics V2 therefore splits the metric into separate families: only the first carries the gauge type while the rest are typed unknown. OpenMetricsBaseCheckV2 drops the unknown families via skip_native_metric, so only the first per-db series (db_name=information_schema) is collected and every other database silently disappears.

What I'm doing

Pin starrocks_fe_table_num to the gauge type in METRIC_MAP, using the same dict form already applied to tablet_num / thread_pool. An explicitly typed metric is handled by the gauge transformer instead of the default native path, which bypasses the unknown-type skip and lets every per-db series through. The interleaving itself is a StarRocks-side bug; pinning the type fixes it on the integration side without waiting for a StarRocks release.

Verified on a live Agent: celerdata.fe.table_num now reports all per-db series (information_schema, _statistics_, sys, plus user databases) instead of only information_schema.

Fixes #2854.


Notes for reviewers

  • Depends on [CelerData] Add slow_lock_held_time_ms and and slow_lock_wait_time_ms metrics metrics #3021 (slow_lock metrics, bumps to 1.3.0). Please merge that first; I will rebase this onto master afterward, at which point it becomes 1.3.1. It is versioned 1.2.2 here because it currently branches off master (1.2.1).
  • conf.yaml.example and config_models/ are regenerated via ddev validate ... --sync to clear pre-existing drift from the shared OpenMetrics config template (not specific to this change).

jaogoy added 2 commits June 3, 2026 18:01
## Why I'm doing this

StarRocks FE exposes `starrocks_fe_table_num` interleaved with
`starrocks_fe_db_size_bytes` (one pair per database) under a single
`# TYPE ... gauge` line, which violates the Prometheus text exposition
rule that all samples of a metric family must be contiguous.

The prometheus_client streaming parser used by OpenMetrics V2 therefore
splits the metric into separate families: only the first carries the
`gauge` type while the rest are typed `unknown`. OpenMetricsBaseCheckV2
drops the `unknown` families via `skip_native_metric`, so only the first
per-db series (db_name=information_schema) is collected and every other
database silently disappears.

## What I'm doing

Pin `starrocks_fe_table_num` to the `gauge` type in `METRIC_MAP`, using
the same dict form already applied to `tablet_num` / `thread_pool`. An
explicitly typed metric is handled by the `gauge` transformer instead of
the default `native` path, which bypasses the `unknown`-type skip and
lets every per-db series through. The interleaving itself is a
StarRocks-side bug; pinning the type fixes it on the integration side
without waiting for a StarRocks release.

Fixes DataDog#2854.

Signed-off-by: Planck Li <jaogoy@gmail.com>
Bump version to 1.2.2 and add the CHANGELOG entry for the table_num fix.

Also regenerate conf.yaml.example and config_models via 'ddev validate ... --sync' to clear pre-existing drift from the shared OpenMetrics template, so 'ddev validate' passes.

Signed-off-by: Planck Li <jaogoy@gmail.com>
@jaogoy jaogoy requested a review from a team as a code owner June 3, 2026 11:10
@jaogoy jaogoy requested review from davidfeng-datadog and removed request for a team June 3, 2026 11:10
@datadog-prod-us1-5

datadog-prod-us1-5 Bot commented Jun 3, 2026

Copy link
Copy Markdown

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 2 Pipeline jobs failed

PR | test / check   View in Datadog   GitHub Actions

PR | test / test-minimum-base-package (linux, ubuntu-22.04, celerdata, celerdata (py3.13), py3.13) / minimum-base-package-celerdata (py3.13)-py3.13   View in Datadog   GitHub Actions

See error Failed to build `ddtrace==2.10.6`. Error running `setuptools.build_meta.build_wheel`: No module named 'pkg_resources'

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 1f57452 | Docs | Datadog PR Page | Give us feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Celerdata Integration]Some metrics were not reported to Datadog

1 participant