Skip to content

[CelerData] Add slow_lock_held_time_ms and and slow_lock_wait_time_ms metrics metrics#3021

Open
jaogoy wants to merge 5 commits into
DataDog:masterfrom
jaogoy:feat.celerdata-slow-lock-metrics
Open

[CelerData] Add slow_lock_held_time_ms and and slow_lock_wait_time_ms metrics metrics#3021
jaogoy wants to merge 5 commits into
DataDog:masterfrom
jaogoy:feat.celerdata-slow-lock-metrics

Conversation

@jaogoy

@jaogoy jaogoy commented May 29, 2026

Copy link
Copy Markdown

What does this PR do?

Celerdata/StarRocks FE introduced two summary metrics for slow-lock observability in StarRocks/starrocks#66027:

  • starrocks_fe_slow_lock_held_time_ms — how long locks were held when slow locks were detected (max across owners).
  • starrocks_fe_slow_lock_wait_time_ms — how long waiters waited before the lock was acquired.

Both are emitted as Prometheus summary (quantiles 0.75/0.95/0.98/0.99/0.999 plus _sum / _count), so map them with the same three-line pattern already used by other histogram metrics in this integration. Add corresponding metadata.csv entries.

Motivation

It's important to monitor the lock info when the cluster is pending for a lot of loadings and queries.

Review checklist

  • PR has a meaningful title or PR has the no-changelog label attached
  • Feature or bugfix has tests
  • Git history is clean
  • If PR impacts documentation, docs team has been notified or an issue has been opened on the documentation repo
  • If this PR includes a log pipeline, please add a description describing the remappers and processors.

Additional Notes

Anything else we should know when reviewing?

jaogoy added 2 commits May 28, 2026 18:58
…rics

StarRocks FE introduced two summary metrics for slow-lock observability
in StarRocks/starrocks#66027 (back-ported to 4.0/3.5):

- starrocks_fe_slow_lock_held_time_ms — how long locks were held when slow
  locks were detected (max across owners).
- starrocks_fe_slow_lock_wait_time_ms — how long waiters waited before the
  lock was acquired.

Both are emitted as Prometheus summary (quantiles 0.75/0.95/0.98/0.99/0.999
plus _sum / _count), so map them with the same three-line pattern already
used by other histogram metrics in this integration. Add corresponding
metadata.csv entries.

Bump version to 1.3.0 and update CHANGELOG + README.

Signed-off-by: Planck Li <jaogoy@gmail.com>
@jaogoy jaogoy requested review from a team as code owners May 29, 2026 02:18
@jaogoy jaogoy requested review from bgoldberg122 and removed request for a team May 29, 2026 02:18
@datadog-prod-us1-4

This comment has been minimized.

…rics

StarRocks FE introduced two summary metrics for slow-lock observability
in StarRocks/starrocks#66027 (back-ported to 4.0/3.5):

- starrocks_fe_slow_lock_held_time_ms — how long locks were held when slow
  locks were detected (max across owners).
- starrocks_fe_slow_lock_wait_time_ms — how long waiters waited before the
  lock was acquired.

Both are emitted as Prometheus summary (quantiles 0.75/0.95/0.98/0.99/0.999
plus _sum / _count), so map them with the same three-line pattern already
used by other histogram metrics in this integration. Add corresponding
metadata.csv entries.

Bump version to 1.3.0 and update CHANGELOG + README.

Also regenerate config_models/{defaults,instance}.py via `ddev validate
models celerdata -s` (ddev 16.1.1) — the generated files had drifted from
their spec.yaml source since the previous sync in 1.2.0, which CI flags
as out-of-sync. These changes are auto-generated and not behavioral.

Signed-off-by: Planck Li <jaogoy@gmail.com>
Regenerate conf.yaml.example via 'ddev validate config celerdata -s' so it
matches the updated shared OpenMetrics config template. Fixes the 'validations'
CI job which failed with 'conf.yaml.example is not in sync'.

Signed-off-by: Planck Li <jaogoy@gmail.com>
@jaogoy

jaogoy commented Jun 2, 2026

Copy link
Copy Markdown
Author

@urseberry Thanks for you review.
I need a help that what should I do for the CI failure ddtrace==2.10.6? I've no idea.

@urseberry

Copy link
Copy Markdown
Contributor

@urseberry Thanks for you review. I need a help that what should I do for the CI failure ddtrace==2.10.6? I've no idea.

@jaogoy I am a technical writer and also don't know how to address that CI failure. An engineer will also review your PR, and they will advise.

The previous floor (36.16.0) pulls ddtrace 2.10.6, whose sdist build
requires pkg_resources and fails on Python 3.13 + modern setuptools,
breaking the `test-minimum-base-package` CI job. Base 37.21.0 pulls
ddtrace 3.12.5 which ships cp313 prebuilt wheels, so the source build
step (and the pkg_resources lookup) is skipped entirely.

Verified locally: `ddev test --compat celerdata` passes 2/2 in 7m02s on
Python 3.13 with ddtrace 3.12.5 installed.

Signed-off-by: Planck Li <jaogoy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants