Skip to content

Commit 98255f8

Browse files
add or update a changelog
1 parent a6e721d commit 98255f8

1 file changed

Lines changed: 29 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Changelog
2+
3+
All notable changes to this project will be documented in this file.
4+
5+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7+
8+
## [Unreleased]
9+
10+
### Added
11+
12+
- **Metrics harmonization** - canonical metric surface aligned with the cross-SDK catalog, opt-in via `WORKER_CANONICAL_METRICS=true`
13+
- New `CanonicalMetricsCollector` emits the harmonized cross-SDK catalog using real Prometheus `Histogram`s for timing and size, replacing the legacy quantile-gauge timing shape. New canonical-only metrics: `task_poll_error_total`, `task_execution_started_total`, `task_result_size_bytes`, `workflow_input_size_bytes`, `http_api_client_request_seconds`, `active_workers`. Time buckets `0.001…10s`; size buckets `100…10_000_000` bytes.
14+
- `metrics_factory.create_metrics_collector(settings)` selects `LegacyMetricsCollector` (default) or `CanonicalMetricsCollector` based on `WORKER_CANONICAL_METRICS` (truthy: `true`, `1`, `yes`, case-insensitive, whitespace-trimmed). `WORKER_LEGACY_METRICS` is documented but not yet read.
15+
- New abstract `MetricsCollectorBase` consolidates Prometheus infrastructure (lazy `prometheus_client` imports, multiprocess `NoPidCollector` aggregation, HTTP server, exception-label cardinality bounding) and event handlers shared by both collectors.
16+
- `(Async)TaskRunner` now records `task_update_time` (`status="SUCCESS"` / `"FAILURE"`) on every update path.
17+
- `OrkesWorkflowClient.start_workflow*` records workflow input payload size and increments `workflow_start_error` on exception; `OrkesClients` / `OrkesBaseClient` accept an optional `metrics_collector`.
18+
- `MetricsSettings(clean_directory=True)` removes leftover `*.db` files in the multiprocess directory at init.
19+
- `CONDUCTOR_MP_START_METHOD` env var (`spawn` / `fork` / `forkserver`; default `fork` on POSIX, `spawn` on Windows) to control the worker pool's multiprocessing start method (motivated by a `prometheus_client` lock-fork deadlock).
20+
- Harness manifest sets `WORKER_CANONICAL_METRICS=true`; `harness/main.py` logs which collector is active.
21+
22+
### Changed
23+
24+
- **Metrics harmonization** - defaults preserved; legacy metrics emit unchanged when `WORKER_CANONICAL_METRICS` is unset
25+
- `MetricLabel.PAYLOAD_TYPE` value changed from `"payload_type"` to `"payloadType"` to align with canonical camelCase labels; `PAYLOAD_TYPE_LEGACY = "payload_type"` was added so the legacy collector keeps emitting the snake-case label on `external_payload_used_total`.
26+
- `metrics_collector.py` is now a thin compatibility shim: `MetricsCollector = LegacyMetricsCollector`, so `from conductor.client.telemetry.metrics_collector import MetricsCollector` continues to work.
27+
- Default behavior is unchanged: with no env var set, the legacy metric names, label conventions, and quantile-gauge timing shape from prior releases are preserved.
28+
- Rewrote `METRICS.md` to document both surfaces, the env-var gate, full canonical and legacy catalogs, labels, a "Migrating From Legacy to Canonical" mapping (including the `payload_type``payloadType` label change and PromQL replacements), and troubleshooting.
29+
- Updated `README.md`, `WORKER_CONFIGURATION.md`, and `docs/design/WORKER_DESIGN.md` to point at `METRICS.md`.

0 commit comments

Comments
 (0)