You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+6Lines changed: 6 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,12 +3,15 @@
3
3
## master / unreleased
4
4
*[CHANGE] Querier: Make query time range configurations per-tenant: `query_ingesters_within`, `query_store_after`, and `shuffle_sharding_ingesters_lookback_period`. Uses `model.Duration` instead of `time.Duration` to support serialization but has minimum unit of 1ms (nanoseconds/microseconds not supported). #7160
5
5
*[CHANGE] Cache: Setting `-blocks-storage.bucket-store.metadata-cache.bucket-index-content-ttl` to 0 will disable the bucket-index cache. #7446
6
+
*[CHANGE] HA Tracker: Move `-distributor.ha-tracker.failover-timeout` from a global config to a per-tenant runtime config. The flag name and default value (30s) remain the same. #7481
7
+
*[FEATURE] Ingester: Add experimental active series tracker that counts active series by configurable label matchers (including regex) per tenant and exposes `cortex_ingester_active_series_per_tracker` metric. Configured via `active_series_trackers` in runtime config overrides. #7476
6
8
*[FEATURE] Ruler: Add per-tenant `ruler_alert_generator_url_template` runtime config option to customize alert generator URLs using Go templates. Supports Grafana Explore, Perses, and other UIs. #7302
7
9
*[FEATURE] Distributor: Add experimental `-distributor.enable-start-timestamp` flag for Prometheus Remote Write 2.0. When enabled, `StartTimestamp (ST)` is ingested. #7371
8
10
*[FEATURE] Memberlist: Add `-memberlist.cluster-label` and `-memberlist.cluster-label-verification-disabled` to prevent accidental cross-cluster gossip joins and support rolling label rollout. #7385
9
11
*[FEATURE] Querier: Add timeout classification to classify query timeouts as 4XX (user error) or 5XX (system error) based on phase timing. When enabled, queries that spend most of their time in PromQL evaluation return `422 Unprocessable Entity` instead of `503 Service Unavailable`. #7374
10
12
*[FEATURE] Querier: Implement Resource Based Throttling in Querier. #7442
11
13
*[FEATURE] Querier: Add resource-based query eviction that automatically cancels the heaviest running query when CPU or heap utilization exceeds configured thresholds. #7488
14
+
*[ENHANCEMENT] Tenant Federation: Avoid purging the regex resolver LRU cache on user-sync ticks when the set of known users has not changed. #7489
12
15
*[ENHANCEMENT] Parquet Converter: Add a ring status page to expose the ring status. #7455
13
16
*[ENHANCEMENT] Ingester: Add WAL record metrics to help evaluate the effectiveness of WAL compression type (e.g. snappy, zstd): `cortex_ingester_tsdb_wal_record_part_writes_total`, `cortex_ingester_tsdb_wal_record_parts_bytes_written_total`, and `cortex_ingester_tsdb_wal_record_bytes_saved_total`. #7420
*[ENHANCEMENT] Compactor: Prevent partition compaction to compact any blocks marked for deletion. #7391
20
23
*[ENHANCEMENT] Distributor: Optimize memory allocations by reusing the existing capacity of these pooled slices in the Prometheus Remote Write 2.0 path. #7392
21
24
*[ENHANCEMENT] Upgrade gRPC from v1.71.2 to v1.79.3 to address CVE-2026-33186. #7460
25
+
*[ENHANCEMENT] Query Frontend: Add `query_too_expensive` reason to QFE and `reason` field to query stats. #7479
26
+
*[ENHANCEMENT] Distributor: Add HMAC-SHA256 stream authentication for `PushStream` via `-distributor.sign-write-requests-keys`. #7475
27
+
*[ENHANCEMENT] Instrument Ingester CPU profile with source for read APIs. #7494
22
28
*[BUGFIX] Querier: Fix queryWithRetry and labelsWithRetry returning (nil, nil) on cancelled context by propagating ctx.Err(). #7370
23
29
*[BUGFIX] Metrics Helper: Fix non-deterministic bucket order in merged histograms by sorting buckets after map iteration, matching Prometheus client library behavior. #7380
24
30
*[BUGFIX] Distributor: Return HTTP 401 Unauthorized when tenant ID resolution fails in the Prometheus Remote Write 2.0 path. #7389
Copy file name to clipboardExpand all lines: docs/configuration/v1-guarantees.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -114,6 +114,7 @@ Currently experimental features are:
114
114
-`-store-gateway.query-protection.rejection`
115
115
- Distributor/Ingester: Stream push connection
116
116
- Enable stream push connection between distributor and ingester by setting `-distributor.use-stream-push=true` on Distributor.
117
+
- Enable stream push authentication on Distributor/Ingester. (`-distributor.sign-write-requests-keys`)
117
118
- Add `__type__` and `__unit__` labels to OTLP and remote write v2 requests (`-distributor.enable-type-and-unit-labels`)
118
119
- Handle StartTimestampMs (ST) for remote write v2 samples and histograms, using CreatedTimestamp (CT) as a fallback when ST is not set (`-distributor.enable-start-timestamp`)
119
120
- Ingester: Series Queried Metric
@@ -136,3 +137,6 @@ Currently experimental features are:
AMP needs to monitor active series counts by configurable patterns (e.g., all series with `__name__=~"api_.*"`) for internal observability. The existing `LimitsPerLabelSet` feature is unsuitable because:
6
+
7
+
1.**No regex matching** — only supports exact `label=value` matching.
3.**Coupled to limit enforcement** — designed for enforcing series limits, not pure monitoring.
10
+
11
+
## Requirements
12
+
13
+
- Track active series counts by configurable label matchers (including regex).
14
+
- Expose counts as Prometheus metrics on the ingester (internal only, not vended to customers).
15
+
- Configuration supports **per-tenant overrides** with a **default** fallback (same pattern as all other Limits fields).
16
+
-**Runtime hot-reloadable** via the existing runtime config file mechanism.
17
+
-**No limit enforcement** — purely observational.
18
+
-**No default partition** — unmatched series are simply not tracked.
19
+
- A series can match multiple tracker entries simultaneously.
20
+
21
+
## Design
22
+
23
+
### Configuration
24
+
25
+
Tracker config lives in the `Limits` struct, following the same per-tenant override pattern as `LimitsPerLabelSet`:
26
+
27
+
```yaml
28
+
# Default trackers (applied to all tenants without overrides)
29
+
limits:
30
+
active_series_trackers:
31
+
- name: api_metrics
32
+
matchers: '{__name__=~"api_.*"}'
33
+
34
+
# Per-tenant overrides via runtime config
35
+
overrides:
36
+
tenant-123:
37
+
active_series_trackers:
38
+
- name: api_metrics
39
+
matchers: '{__name__=~"api_.*"}'
40
+
- name: system_metrics
41
+
matchers: '{__name__=~"node_.*|process_.*"}'
42
+
```
43
+
44
+
The `matchers` field uses standard PromQL matcher syntax parsed via `parser.ParseMetricSelector`.
45
+
46
+
### Runtime Reload
47
+
48
+
Tracker config is part of `Limits`, which is reloaded via the runtime config manager every `runtime-config.reload-period` (default 10s). Matchers are parsed and validated during YAML/JSON unmarshalling. Invalid configs are rejected (existing config stays active).
0 commit comments