Skip to content

enhancement(tag_cardinality_limit): Add per-tag cache_size_per_key override in probabilistic mode#25650

Merged
ArunPiduguDD merged 6 commits into
masterfrom
feature/tag-cardinality-per-tag-cache-size
Jun 29, 2026
Merged

enhancement(tag_cardinality_limit): Add per-tag cache_size_per_key override in probabilistic mode#25650
ArunPiduguDD merged 6 commits into
masterfrom
feature/tag-cardinality-per-tag-cache-size

Conversation

@ArunPiduguDD

@ArunPiduguDD ArunPiduguDD commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

In probabilistic mode, the bloom filter cache size (cache_size_per_key) was a single global/per-metric setting inherited by every tag. However, when specifying a per-tag override, it doesn't really make sense to inherit this value, especially in cases where the per-tag override specifies a much higher limit than the global/per-metric limit.

This adds an optional cache_size_per_key field to per-tag limit_override entries. When set, it overrides the bloom filter size for that tag only. When omitted, it inherits from the enclosing per-metric or global config as before. Setting it in exact mode causes a validation error.

Example:

mode: probabilistic
cache_size_per_key: 5120
per_tag_limits:
  trace_id:
    mode: limit_override
    value_limit: 1000
    cache_size_per_key: 32768  # bigger filter for this tag only

Vector configuration

See example above. Works in both top-level per_tag_limits and inside per_metric_limits.

How did you test this PR?

  • Added unit tests for apply_cache_size_override covering probabilistic+Some, probabilistic+None, and exact+Some (no-op).
  • Added YAML deserialization tests for both per_metric_limits and global per_tag_limits scopes.
  • All existing 41 tag cardinality tests continue to pass.

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

@github-actions github-actions Bot added docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. domain: transforms Anything related to Vector's transform components domain: external docs Anything related to Vector's external, public documentation and removed docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. labels Jun 18, 2026
@ArunPiduguDD ArunPiduguDD marked this pull request as ready for review June 18, 2026 02:36
@ArunPiduguDD ArunPiduguDD requested review from a team as code owners June 18, 2026 02:36
@ArunPiduguDD ArunPiduguDD force-pushed the feature/tag-cardinality-per-tag-cache-size branch from 3bc46fc to b845b1c Compare June 18, 2026 02:47
@github-actions github-actions Bot added the docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. label Jun 18, 2026
@ArunPiduguDD ArunPiduguDD force-pushed the feature/tag-cardinality-per-tag-cache-size branch 2 times, most recently from 9850e50 to ce0e240 Compare June 18, 2026 02:49

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ce0e240502

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread website/cue/reference/generated/configuration.cue
@ArunPiduguDD ArunPiduguDD force-pushed the feature/tag-cardinality-per-tag-cache-size branch 2 times, most recently from 98dc579 to 657934e Compare June 18, 2026 03:56
@ArunPiduguDD ArunPiduguDD changed the title feat(tag_cardinality_limit): add per-tag cache_size_per_key override in probabilistic mode enhancement(tag_cardinality_limit): add per-tag cache_size_per_key override in probabilistic mode Jun 18, 2026
@ArunPiduguDD ArunPiduguDD changed the title enhancement(tag_cardinality_limit): add per-tag cache_size_per_key override in probabilistic mode enhancement(tag_cardinality_limit): Add per-tag cache_size_per_key override in probabilistic mode Jun 18, 2026
@rtrieu rtrieu self-requested a review June 22, 2026 18:22
Comment thread changelog.d/tag_cardinality_per_tag_cache_size.enhancement.md
@ArunPiduguDD ArunPiduguDD requested a review from pront June 24, 2026 17:35

pront commented Jun 26, 2026

Copy link
Copy Markdown
Member

codex generated

Mostly yes, but I’d ask for one small follow-up before merging:

Finding

  • [P2] cache_size_per_key validation only rejects inherited exact, not inherited excluded. In src/transforms/tag_cardinality_limit/config.rs, per-metric validation only checks OverrideMode::Exact; a per-metric config with mode: excluded plus a per-tag cache_size_per_key would still build, even though the changelog/docs now say the option is only valid in probabilistic mode. Tiny fix: reject any per-metric inherited mode that is not OverrideMode::Probabilistic(_), or explicitly document that excluded metrics ignore nested per-tag settings.

Minor polish:

  • The PR body still says exact mode is “silently ignored,” but the code/changelog now say it errors.
  • A couple comments/docs still imply all per-tag settings are inherited, e.g. src/transforms/tag_cardinality_limit/mod.rs. Worth rewording so cache_size_per_key is called out as the exception.

Verification: cargo test -p vector --lib --no-default-features --features transforms-tag_cardinality_limit transforms::tag_cardinality_limit:: passes: 45/45. git diff --check is clean. The earlier generated-docs churn also looks fixed now.

@ArunPiduguDD

Copy link
Copy Markdown
Contributor Author

Updated PR with requested changes

@pront

pront commented Jun 29, 2026

Copy link
Copy Markdown
Member

Updated PR with requested changes

@ArunPiduguDD thanks. Please also fix the merge conflicts and I will take a final look.

@ArunPiduguDD ArunPiduguDD force-pushed the feature/tag-cardinality-per-tag-cache-size branch from 3536cf6 to 2ab0c0f Compare June 29, 2026 15:37
@ArunPiduguDD

Copy link
Copy Markdown
Contributor Author

Updated PR with requested changes

@ArunPiduguDD thanks. Please also fix the merge conflicts and I will take a final look.

@pront Rebased PR & fixed merged conflicts (+signed earlier unsigned commits)

@github-actions github-actions Bot removed the docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. label Jun 29, 2026
@ArunPiduguDD ArunPiduguDD added this pull request to the merge queue Jun 29, 2026
Merged via the queue into master with commit be2b50c Jun 29, 2026
84 checks passed
@ArunPiduguDD ArunPiduguDD deleted the feature/tag-cardinality-per-tag-cache-size branch June 29, 2026 17:44
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 29, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

domain: external docs Anything related to Vector's external, public documentation domain: transforms Anything related to Vector's transform components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants