enhancement(tag_cardinality_limit transform): add sliding-window TTL for tag values by kaarolch · Pull Request #25485 · vectordotdev/vector

kaarolch · 2026-05-22T09:20:37Z

Summary

Adds optional sliding-window TTL to the tag_cardinality_limit transform via two new settings (ttl_secs, ttl_generations) on the global config, every per_metric_limits entry, and the corresponding schemas (Inner / OverrideInner).

When set, tag values not observed within ttl_secs are expired so fresh values can take their slot under value_limit.

The cardinality cache is currently monotonic: every distinct value ever seen for a tag holds a slot under value_limit until the next Vector restart — and a restart is the only thing that resets the cache today. That's fine for stable identifiers, but it breaks down for high-churn tag values with a natural lifetime: pod_name, abused_account_id, ephemeral request IDs, short-lived job IDs, etc. These are valid for hours to days at most (a pod rolled out yesterday, an account that was banned last week, a request that completed minutes ago), then never appear again — yet they keep holding slots until Vector is restarted. Once enough dead slots accumulate, the cache hits value_limit and starts rejecting legitimate new values that would fit comfortably under the cap, simply because old ones never aged out.

Setting ttl_secs turns the cache into a sliding window: any tag value not observed within that many seconds is dropped, freeing the slot for a fresh value, without requiring a Vector restart.

Two storage backends, picked at construction time from (Mode, ttl_secs):

mode: exact + TTL → HashMap<value, last_seen> with lazy sweep on access. Per-value precision, no extra memory over today's HashSet.
mode: probabilistic + TTL → generational / rolling bloom filter (VecDeque<BloomFilter> of ttl_generations shards) so a non-deletable bloom can still expire values. Eviction granularity is ttl_secs / ttl_generations. Memory cost is ttl_generations × cache_size_per_key per (metric, tag-key) pair; reduce cache_size_per_key to keep total memory flat.

Other design points worth flagging for review:

Refresh-on-sighting on both backends: any contains hit on the accept path extends the value's lease, so continuously-observed values stay in the cache across rotation boundaries.
Lazy eviction (sweep / shard rotation) on insert / contains / len. No background task, no extra locking — drops cleanly into the existing TaskTransform model.
DropEvent correctness: the pre-check uses a new contains_no_refresh variant so an event rejected by a later tag does not silently extend the leases of earlier-checked tags. The cache reflects accepted values, not just seen values.
Per-metric ttl_secs is a full override (unset on a per-metric entry means "no TTL for this metric", not "inherit from global"), matching how per-metric value_limit shadows the global one.
New internal counter tag_cardinality_ttl_expirations_total reports the volume of evictions.
Backwards compatibility: when ttl_secs is unset (default) or 0, both backends are bit-for-bit identical to pre-PR behavior — HashSet for exact, single BloomFilter for probabilistic. All existing configs continue to deserialize unchanged (regression-tested in ttl_existing_yaml_unchanged).

Why generational/rolling Bloom + per-value `HashMap`, not Cuckoo or LRU

Approach	Why it was rejected for this transform
Cuckoo filter	The TTL-capable cuckoo work referenced in #23595 (esensar) targets VRL enrichment tables, not transforms, and is not yet upstream. The available Rust crate (`axiomhq/rust-cuckoofilter`) supports deletion but has no TTL or LRU primitive — we'd still need an external timestamp store to know when to delete each value, which defeats the space win. False-not-found on full buckets is also a worse failure mode than bloom's false-positive for cardinality limiting (bloom's "ah, we've seen this, pass it through" is safe; cuckoo's "no we haven't" can drop legitimate values).
Counting Bloom + timestamps	A counting bloom maps a value to k slots; one timestamp per slot doesn't map back to individual values, and one timestamp per value requires a parallel data structure — at which point you've reinvented `HashMap<value, Instant>` (which is what we use for `exact` mode anyway) and lost the bloom's memory advantage.
HashMap + LRU	LRU evicts the least recently used only when the cache is full — it is eviction by capacity, not by time. A value last seen 24 h ago can sit at the head of an under-capacity LRU indefinitely. That doesn't match the goal: we want stale values gone whether or not the cache is full, because downstream billing (and `value_limit`) cares about the rolling window of activity, not "how many distinct values fit in N slots".
`HashMap<value, Instant>` for both modes	Works fine, and is exactly what we use for `mode: exact`. But for `mode: probabilistic` — chosen specifically by users who can't afford `HashSet`-sized memory at high cardinalities — replacing the bloom with a hashmap would silently undo their memory savings.
Generational / rolling Bloom (chosen for `probabilistic`)	Keeps bloom's memory profile (no per-value overhead). TTL is implemented by maintaining `N` time-sliced bloom filters and rotating the oldest one out every `ttl_secs / N`. Refresh-on-sighting is one line (insert into newest shard). Rotation is O(1). Approximation error is bounded and configurable via `ttl_generations`.
`HashMap<value, Instant>` (chosen for `exact`)	Per-value precision is what `mode: exact` users explicitly opted into. Lazy `retain` sweep on access is cheap (amortized over real traffic) and avoids a background task. No memory regression vs. today's `HashSet`.

Vector configuration

transforms:
  cardinality_limiter:
    type: tag_cardinality_limit
    inputs: [some_metrics]
    value_limit: 500
    mode: probabilistic
    cache_size_per_key: 5120
    limit_exceeded_action: drop_tag
    ttl_secs: 3600
    ttl_generations: 4
    # Global per-tag overrides (top-level — same shape as per-metric per_tag_limits).
    per_tag_limits:
      kube_pod_name:
        mode: limit_override
        value_limit: 200      # tighter cap on this one tag
      trace_id:
        mode: excluded        # don't track at all
    per_metric_limits:
      http_requests_total:
        namespace: my_app     # optional; restricts the match
        mode: exact
        value_limit: 1000
        ttl_secs: 7200
        # When per_metric_limits matches, the top-level per_tag_limits above is
        # IGNORED for this metric — only this entry's per_tag_limits applies.
        per_tag_limits:
          abused_account_id:
            mode: limit_override
            value_limit: 50

Operational impact and tuning recommendations

Side effects of enabling TTL

Effect	`mode: exact`	`mode: probabilistic`	Notes
Memory per `(metric, tag-key)` pair	~50% over the `HashSet` baseline (one `Instant` per value)	`ttl_generations × cache_size_per_key` (4× by default)	Probabilistic is the dominant cost.
CPU on every event	`Instant::now()` + a `HashMap::get_mut`	`Instant::now()` + bloom OR-scan + possible idempotent insert into newest shard	Negligible vs. the rest of the transform's hot path.
CPU on access after a sweep boundary	`HashMap::retain` over the bucket (lazy, at most once per `ttl_secs / generations`)	Drop-and-replace a bloom shard (O(1) `VecDeque::pop_front` + `push_back`)	Amortized over real traffic. No background task.
Eviction precision	Exact to the second (subject to the lazy-sweep cadence below)	Approximate to `ttl_secs / ttl_generations`	Default ±15 min on a 1 h TTL.
TTL clock semantics	"Time since last accepted sighting" — the `DropEvent` pre-check uses `contains_no_refresh`, so rejected events don't extend leases	Same	Important if you rely on `DropEvent`: the TTL window is the active set of accepted values, not raw event volume.
Restart behavior	Cache is empty after every Vector restart (unchanged from today)	Same	TTL doesn't persist to disk. First N events after restart see "false misses" until the cache warms up.

Memory estimation (concrete numbers)

Assume the default cache_size_per_key = 5 × 1024 = 5,120 bytes and a workload of 100 metrics × 5 tracked tag keys = 500 pairs.

Configuration	Per-pair	Total	Δ vs. today
`mode: probabilistic`, no TTL (today)	5,120 B	~2.5 MB	baseline
`mode: probabilistic`, `ttl_secs: 3600, ttl_generations: 4`	20,480 B	~10 MB	+4×
`mode: probabilistic`, `ttl_secs: 3600, ttl_generations: 4`, `cache_size_per_key: 1280` (1/4 of default)	5,120 B	~2.5 MB	unchanged total; trade 4× bloom precision for 4× shards
`mode: probabilistic`, `ttl_secs: 3600, ttl_generations: 1` (tumbling window)	5,120 B	~2.5 MB	unchanged total; gives up sliding-window smoothness for memory parity
`mode: exact`, 500 distinct values × 32 B avg + 16 B `Instant` per value	~24,000 B	~12 MB	~50% over the `HashSet`-only baseline

For probabilistic mode the rule of thumb is cache_size_per_key / ttl_generations keeps total memory flat. If memory is the constraint, scale cache_size_per_key down by the same factor you scale ttl_generations up.

Recommendations

1. Don't enable TTL globally — scope it to the metrics that need it

Most deployments have a small subset of metrics whose tags actually churn (pod_name, account_id, request_id, short-lived job IDs). The vast majority of tags (service, environment, region, host) are stable and will be re-sighted on every event anyway, so TTL on them does literally nothing except cost memory.

Use per_metric_limits to scope TTL to the noisy metrics:

type: tag_cardinality_limit
value_limit: 500
mode: probabilistic
cache_size_per_key: 5120
# No global ttl_secs → stable metrics keep the cheap, no-TTL backend.
per_metric_limits:
  k8s_pod_metrics:
    mode: probabilistic
    cache_size_per_key: 2048   # smaller bloom × more shards = flat memory
    value_limit: 5000
    ttl_secs: 3600
    ttl_generations: 4
  fraud_events:
    mode: probabilistic
    cache_size_per_key: 2048
    value_limit: 10000
    ttl_secs: 86400            # banned accounts age out after 24h
    ttl_generations: 6         # ±4h precision is fine for this one

2. Pick `ttl_generations` based on memory budget vs. eviction precision

ttl_generations: 1 → tumbling window. Free memory-wise (1 shard = same as no TTL), but every ttl_secs the cache is dumped wholesale. Good for "clean reset every hour" semantics; bad if your traffic isn't uniform across the window (a burst near the boundary loses a lot of useful state).
ttl_generations: 4 (default) → sliding window with ±25% eviction granularity. Reasonable middle ground.
ttl_generations: 8 → ±12.5% eviction granularity, 8× memory. Only worth it if you're matching a strict downstream window and you have memory to spare.

For mode: exact, ttl_generations controls only the sweep cadence (and the sweep is cheap), not the precision — bumping it up just makes the sweep run more often. Default is fine for exact mode in almost all cases.

3. Monitor `tag_cardinality_ttl_expirations_total`

After enabling TTL, watch this counter for the first few hours:

Climbing steadily → TTL is doing its job; the cache is being kept clean.
Spikes followed by zeros (probabilistic mode) → expected — eviction happens in shard-rotation jumps, not continuously.
Zero for an extended period → either TTL is too long for your actual churn, or there isn't enough churn to justify TTL in the first place. Reconsider.
Climbing and tag_value_limit_exceeded_total is climbing → ttl_secs is too short or value_limit is too low. Don't just bump TTL down further; check the lifecycle of the underlying entity.

How did you test this PR?

cargo test -p vector \
  --no-default-features \
  --features sources-internal_metrics,transforms-tag_cardinality_limit,test-utils \
  tag_cardinality_limit

test result: ok. 52 passed; 0 failed; 0 ignored; 0 measured; 269 filtered out

Storage-level unit tests in src/transforms/tag_cardinality_limit/tag_value_set.rs::tests — these drive Instants directly (and use small real-time sleeps where the public method internally calls Instant::now()), which is the only way to exercise timing-sensitive behavior deterministically:

Test	What it pins
`ttl_exact_expires_values_past_ttl`	`HashMap<value, Instant>` sweep evicts entries whose `last_seen` is older than TTL.
`ttl_exact_refresh_on_contains_extends_lease`	A real `contains()` call pushes the stored `Instant` forward — guards the `*slot = now;` line that implements refresh-on-sighting.
`ttl_exact_sweep_interval_floors_to_one_second`	`ttl_secs / ttl_generations` is clamped to a 1 s minimum so the sweep can't dominate CPU on tight TTLs.
`ttl_exact_contains_no_refresh_does_not_extend_lease`	Regression guard for the `DropEvent` pre-check bug: a `contains_no_refresh` call must not mutate the stored `Instant`.
`rolling_bloom_drops_oldest_shard_on_rotate`	A full window's worth of rotations evicts the original shard's contents.
`rolling_bloom_refresh_on_contains_seeds_newest_shard`	A real `contains()` hit re-inserts the value into the newest shard — the mechanism that gives hot values survival across future rotations.
`rolling_bloom_catch_up_capped_to_generations`	After a 1 h idle (>> window), `rotate_if_needed` performs exactly `generations` rotations — not a tight spin.
`rolling_bloom_slice_floors_to_one_second`	`ttl_secs / ttl_generations` is clamped to 1 s for the bloom backend too.
`rolling_bloom_generations_clamped_to_at_least_one`	`ttl_generations: 0` doesn't produce an empty deque (constructor clamps to 1).
`rolling_bloom_contains_no_refresh_does_not_seed_newest_shard`	Companion to the exact-mode no-refresh regression guard.

Transform-level integration tests in src/transforms/tag_cardinality_limit/tests.rs:

Test	What it pins
`ttl_defaults_off`	Default `Inner` produces `ttl_secs: None`; without TTL the backend is bit-for-bit the legacy `HashSet`/`BloomFilter`.
`ttl_global_yaml_deserializes`	YAML config surface for global TTL deserializes and reaches `Inner`.
`ttl_per_metric_yaml_deserializes`	YAML config surface for per-metric TTL deserializes and reaches `OverrideInner` (covers the "full override, not inherit" semantic).
`ttl_existing_yaml_unchanged`	Backwards-compat contract: a pre-PR YAML config (no `ttl_secs`, no `ttl_generations`) still parses and produces `ttl_secs: None` and `ttl_generations: default_ttl_generations()`.
`ttl_zero_disables_ttl`	Regression guard: `ttl_secs: 0` and `ttl_secs: None` both select the non-TTL backend on every mode. Verified directly via a `#[cfg(test)] AcceptedTagValueSet::ttl_enabled()` accessor, not just by observing externally-equivalent behavior — so a future change that flipped `0` to "expire immediately" (which the 1 s sweep-interval floor would otherwise mask) gets caught.
`contains_no_refresh_finds_inserted_values_on_all_backends`	Basic contract: `contains_no_refresh` returns `true` for an inserted value across all four backends (exact/bloom × TTL/non-TTL). Note: the timing-sensitive no-refresh semantic is enforced by the two storage-level tests above; the wiring from `tag_limit_exceeded` to `contains_no_refresh` is enforced by code review of the private match arm.
All 45 pre-existing transform tests	Continue to pass unchanged — no behavior regression for the no-TTL path.

Change Type

Is this a breaking change?

Yes
No

All new fields default to "off"; existing YAML/TOML configs parse unchanged and produce bit-for-bit identical behavior. Verified by ttl_existing_yaml_unchanged, ttl_defaults_off, and ttl_zero_disables_ttl.

Does this PR include user facing changes?

Yes. Please add a changelog fragment based on our guidelines.
No. A maintainer will apply the no-changelog label to this PR.

References

Related: Cuckoo filter + LRU - seems like a natural match for Vector? #23595

Notes

Please read our Vector contributor resources.
Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
Some CI checks run only after we manually approve them.
- We recommend adding a pre-push hook, please see this template.
- Alternatively, we recommend running the following locally before pushing to the remote branch:
  - make fmt
  - make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
  - make test
After a review is requested, please avoid force pushes to help us review incrementally.
- Feel free to push as many commits as you want. They will be squashed into one before merging.
- For example, you can run git merge origin master and git push.
If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
run make build-licenses to regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.

…for tracked tag values Add `ttl_secs` and `ttl_generations` to the global config, per-metric overrides, and the corresponding schemas. When set, tag values not observed within `ttl_secs` are expired so fresh values can take their slot under `value_limit` — useful when the downstream backend (e.g. Datadog custom metrics) counts uniqueness over a rolling window. `mode: exact` uses a `HashMap<value, last_seen>` with lazy sweep; `mode: probabilistic` uses a generational rolling bloom filter so a non-deletable bloom can still expire values (eviction granularity = `ttl_secs / ttl_generations`). Both refresh on sighting and evict lazily on access. The `DropEvent` pre-check uses a new `contains_no_refresh` variant so rejected events don't extend leases. New `tag_cardinality_ttl_expirations_total` internal counter exposes eviction volume. Backwards compatible: unset / `0` ⇒ pre-PR behavior.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7aa7efbef3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…ache_ttl # Conflicts: # src/transforms/tag_cardinality_limit/mod.rs # src/transforms/tag_cardinality_limit/tag_value_set.rs

chatgpt-codex-connector

💡 Codex Review

vector/src/transforms/tag_cardinality_limit/tag_value_set.rs

Lines 59 to 63 in dde8896

    
           fn insert(&mut self, value: &TagValueSet) { 
        
               // Only update the count if the value is not already in the bloom filter. 
        
               if !self.inner.contains(value) { 
        
                   self.inner.insert(value); 
        
                   self.count += 1;

Always insert on rolling-bloom refresh hits

BloomFilterStorage::insert currently skips writes when contains is already true, which is safe for a single monotonic bloom but breaks the new rolling TTL path. In RollingBloomStorage::contains, a hit in an older shard is refreshed into the newest shard via this method; if the newest shard returns a false positive for that value, the refresh is skipped, and after the older shard rotates out the value can disappear despite being recently observed. Near value_limit, that turns into legitimate recurring tag values being treated as new and dropped (or causing DropEvent rejections). Refresh writes in the active shard need to be unconditional, with count tracking handled separately.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…lementation Five fixes flagged by Codex on PR vectordotdev#25485: P1 - `RollingBloomStorage::len()` returned the max across shards, which under-counts the window-union cardinality. Distinct cold values spread across shards kept every individual shard below `value_limit` while the union exceeded it, silently admitting values past the cap. Now sums shard counts as a strict upper bound; the only failure mode is over-rejection (observable via `tag_value_limit_exceeded_total`). P1 - `BloomFilterStorage::insert` skipped setting bits when `contains` already returned true. For the rolling-bloom refresh path, that meant a recently-sighted value could ride on another value's false-positive bits rather than holding its own — its lifetime then depended on when those unrelated bits aged out instead of its own activity. The insert now sets bits unconditionally; count tracking remains conditional. P2 - `slice` was floored to 1s, silently stretching the configured TTL window to `ttl_generations` seconds whenever `ttl_secs < ttl_generations`. The effective number of generations is now capped to `ttl_secs` so `slice * generations == ttl` is preserved exactly. Same cap applied to `TtlExactStorage::new` for consistency. P2 - `Instant + Duration` could panic on pathological `ttl_secs` close to `u64::MAX`. Now uses checked arithmetic with saturation, so an oversized config keeps the transform alive instead of crashing it. P2 - TTL eviction left empty `(metric, tag-key)` buckets in place, which permanently consumed slots under `max_tracked_keys`. A lazy `reclaim_empty_buckets` runs only on the cap-hit path; freed slots are now reused. Steady-state allocations are unaffected. Refs: vectordotdev#25485

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b95265e91c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…check, and Instant saturation - Trim Codex / "previously …" framing and consolidate the changelog so all pre-merge refinements ship under the single feature entry. - `tag_limit_exceeded` now reclaims empty TTL buckets before the `value_limit: 0` capacity check, matching `record_tag_value` and preventing `drop_event` events from silently admitting a value when the cap is full but contains reclaimable empty buckets. - `saturating_add` falls back to `instant + ~136 years` (with bisection) on `Instant + Duration` overflow so `next_rotate` is never pinned to `now`, avoiding `generations`-per-call rotation churn under pathological `ttl_secs`. Co-authored-by: Cursor <cursoragent@cursor.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 528561d02f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-22T12:39:36Z

+        // `sweep_interval * effective == ttl`. Eviction precision is then
+        // `[ttl, ttl + sweep_interval)`.
+        let requested = generations.max(1) as u32;
+        let max_for_ttl = ttl.as_secs().max(1) as u32;


Preserve full ttl_secs range when capping generations

ttl.as_secs() is cast to u32 before computing effective generations, which truncates values ≥ 2^32 seconds. For example, ttl_secs: 4294967296 becomes 0 in this expression, forcing effective to 1 and changing sweep/rotation cadence from ttl/4 (or requested) to a full-window interval. This silently alters TTL behavior for large but valid u64 configs in both TtlExactStorage::new and RollingBloomStorage::new; keep the calculation in u64 (or clamp before narrowing) to avoid modulo truncation.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-22T12:39:36Z

+        let requested = generations.max(1) as u32;
+        let max_for_ttl = ttl.as_secs().max(1) as u32;
+        let effective = requested.min(max_for_ttl).max(1);
+        let slice = ttl / effective;


Keep rolling-bloom window equal to configured ttl_secs

slice is computed with integer division (ttl / effective), so when ttl_secs is not divisible by effective the retained window becomes slice * generations, which is strictly smaller than the configured TTL. For example, ttl_secs: 10 with ttl_generations: 4 yields 2-second slices and an ~8-second window, evicting values earlier than configured and admitting fresh values too soon for the intended TTL contract.

Useful? React with 👍 / 👎.

kaarolch added 5 commits May 22, 2026 10:12

Fix tests

3c9a0bf

Add changelog.md

6395814

Fix formating

59e05f8

Add missing const

7aa7efb

kaarolch requested review from a team as code owners May 22, 2026 09:20

chatgpt-codex-connector Bot reviewed May 22, 2026

View reviewed changes

Comment thread src/transforms/tag_cardinality_limit/tag_value_set.rs Outdated

Comment thread src/transforms/tag_cardinality_limit/tag_value_set.rs Outdated

Merge remote-tracking branch 'upstream/master' into tag_cardinality_c…

dde8896

…ache_ttl # Conflicts: # src/transforms/tag_cardinality_limit/mod.rs # src/transforms/tag_cardinality_limit/tag_value_set.rs

github-actions Bot added the docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. label May 22, 2026

chatgpt-codex-connector Bot reviewed May 22, 2026

View reviewed changes

Comment thread src/transforms/tag_cardinality_limit/tag_value_set.rs Outdated

Comment thread src/transforms/tag_cardinality_limit/tag_value_set.rs

chatgpt-codex-connector Bot reviewed May 22, 2026

View reviewed changes

Comment thread src/transforms/tag_cardinality_limit/mod.rs Outdated

Comment thread src/transforms/tag_cardinality_limit/tag_value_set.rs Outdated

chatgpt-codex-connector Bot reviewed May 22, 2026

View reviewed changes

brett0000FF approved these changes May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enhancement(tag_cardinality_limit transform): add sliding-window TTL for tag values#25485

enhancement(tag_cardinality_limit transform): add sliding-window TTL for tag values#25485
kaarolch wants to merge 8 commits into
vectordotdev:masterfrom
kaarolch:tag_cardinality_cache_ttl

kaarolch commented May 22, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	fn insert(&mut self, value: &TagValueSet) {
	// Only update the count if the value is not already in the bloom filter.
	if !self.inner.contains(value) {
	self.inner.insert(value);
	self.count += 1;

Conversation

kaarolch commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why generational/rolling Bloom + per-value HashMap, not Cuckoo or LRU

Vector configuration

Operational impact and tuning recommendations

Side effects of enabling TTL

Memory estimation (concrete numbers)

Recommendations

1. Don't enable TTL globally — scope it to the metrics that need it

2. Pick ttl_generations based on memory budget vs. eviction precision

3. Monitor tag_cardinality_ttl_expirations_total

How did you test this PR?

Change Type

Is this a breaking change?

Does this PR include user facing changes?

References

Notes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kaarolch commented May 22, 2026 •

edited

Loading

Why generational/rolling Bloom + per-value `HashMap`, not Cuckoo or LRU

2. Pick `ttl_generations` based on memory budget vs. eviction precision

3. Monitor `tag_cardinality_ttl_expirations_total`