Skip to content

fix(enrichment tables): preserve memory enrichment table state on reload#25547

Open
esensar wants to merge 25 commits into
vectordotdev:masterfrom
esensar:fix/memory-table-reload-state
Open

fix(enrichment tables): preserve memory enrichment table state on reload#25547
esensar wants to merge 25 commits into
vectordotdev:masterfrom
esensar:fix/memory-table-reload-state

Conversation

@esensar

@esensar esensar commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Summary

While working on #25143, it was brought to my attention that reload was not handled for memory tables (#25143 (comment)) - that is, new components were generated, that were not attached to the tables that were queried. This PR resolves that by making these tables take over the state of previous components.

Vector configuration

enrichment_tables:
  memory_table:
    type: memory
    ttl: 60
    flush_interval: 5
    inputs: ["cache_generator"]


sources:
  demo_logs_test:
    type: "demo_logs"
    format: "json"

transforms:
  demo_logs_processor:
    type: "remap"
    inputs: ["demo_logs_test"]
    source: |
      . = parse_json!(.message)
      user_id = get!(., path: ["user-identifier"])

      existing, err = get_enrichment_table_record("memory_table", { "key": user_id })

      if err == null {
        . = existing.value
        .source = "cache"
      } else {
        .referer = parse_url!(.referer)
        .referer.host = encode_punycode!(.referer.host)
        .source = "transform"
      }      

  cache_generator:
    type: "remap"
    inputs: ["demo_logs_processor"]
    source: |
      existing, err = get_enrichment_table_record("memory_table", { "key": get!(., path: ["user-identifier"]) })
      if err != null {
        data = .
        . = set!(value: {}, path: [get!(data, path: ["user-identifier"])], data: data)
      } else {
        . = {}
      }      

sinks:
  console:
    inputs: ["demo_logs_processor"]
    target: "stdout"
    type: "console"
    encoding:
      codec: "json"

How did you test this PR?

Ran vector with the above configuration and the --watch-config flag. Changed TTL a couple of times and Vector properly reloaded and kept state, observed by seeing cached output data, instead of newly generated.

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.

Sponsored by Quad9

@esensar esensar requested a review from a team as a code owner June 1, 2026 12:51
@github-actions github-actions Bot added the domain: topology Anything related to Vector's topology code label Jun 1, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6583fd70e4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/topology/running.rs Outdated
Comment thread src/topology/builder.rs Outdated
@pront

pront commented Jun 1, 2026

Copy link
Copy Markdown
Member

Thanks @esensar! Per our new policy I will come back to this once codex comments are resolved.

@esensar esensar changed the title fix(enrichment): preserve memory enrichment table state on reload fix(enrichment tables): preserve memory enrichment table state on reload Jun 2, 2026
@esensar

esensar commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

While resolving this I found that there were quite a few typos and missed cases for reload in enrichment tables that have source/sink. I think I covered them all now.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0044cc7f03

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/topology/builder.rs Outdated

@pront pront left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution. Can please add a couple of test cases, e.g. take_state_preserves_data and reload_preserves_state?

Comment thread lib/vector-vrl/enrichment/src/lib.rs Outdated
Comment thread lib/vector-vrl/enrichment/src/lib.rs Outdated
Comment thread lib/vector-vrl/enrichment/src/lib.rs Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1ed0732c66

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread lib/vector-vrl/enrichment/src/test_util.rs Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f4cb301f9e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/topology/builder.rs
@esensar

esensar commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

I am adding a topology reload test to confirm this, but I am having some issues with it. For some reason tables are missing on reload, that is TableRegistry::finish_load doesn't store tables properly during the test (it works fine in runtime)

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 961e72fe3b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/enrichment_tables/memory/table.rs
Comment thread src/topology/running.rs
@esensar

esensar commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

I am adding a topology reload test to confirm this, but I am having some issues with it. For some reason tables are missing on reload, that is TableRegistry::finish_load doesn't store tables properly during the test (it works fine in runtime)

I have added the test - it works when run individually, but fails when run together with others - I assume because ENRICHMENT_TABLES is static and shared between tests. Not sure how to best resolve this...

@esensar esensar requested a review from pront June 8, 2026 15:27
@pront

pront commented Jun 12, 2026

Copy link
Copy Markdown
Member

I am adding a topology reload test to confirm this, but I am having some issues with it. For some reason tables are missing on reload, that is TableRegistry::finish_load doesn't store tables properly during the test (it works fine in runtime)

I have added the test - it works when run individually, but fails when run together with others - I assume because ENRICHMENT_TABLES is static and shared between tests. Not sure how to best resolve this...

@esensar no worries, I will take a closer look and make suggestions.

@pront

pront commented Jun 16, 2026

Copy link
Copy Markdown
Member

Updated this branch with a fix. Can you try the tests now?

@esensar

esensar commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

Updated this branch with a fix. Can you try the tests now?

Thanks, that fixed it!

@pront pront left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit:

This makes memory table state preservation across reload unconditional when the table key remains the same. I can imagine users who treat reload as a cache invalidation point after changing the table’s producing logic or cached value schema.

This is currently breaking the existing reload behavior. Thoughts on adding an explicit reload behavior config option? Or preserving the current behavior and making this opt-in?

@esensar

esensar commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Edit:

This makes memory table state preservation across reload unconditional when the table key remains the same. I can imagine users who treat reload as a cache invalidation point after changing the table’s producing logic or cached value schema.

This is currently breaking the existing reload behavior. Thoughts on adding an explicit reload behavior config option? Or preserving the current behavior and making this opt-in?

Okay, that makes sense. Maybe a flag for state preservation on reload, or like you said reload behavior with just these 2 options for now so that we can expand it in the future if some other interesting behavior comes up.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 12939a04e6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/topology/running.rs
Comment thread src/topology/builder.rs
Comment thread src/enrichment_tables/memory/table.rs Outdated
@pront pront enabled auto-merge June 22, 2026 17:13
@datadog-vectordotdev

This comment has been minimized.

auto-merge was automatically disabled June 22, 2026 18:21

Head branch was pushed to by a user without write access

@esensar esensar requested a review from a team as a code owner June 22, 2026 18:21
@github-actions github-actions Bot added the domain: external docs Anything related to Vector's external, public documentation label Jun 22, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

.filter_map(|key| {
self.config
.enrichment_table(key)
.and_then(|t| t.as_source(key).map(|(key, _)| key))

P2 Badge Register added table sources from the new config

When a reload adds a memory enrichment table with source_config (or changes its source_key), diff.enrichment_tables.changed_and_added() contains the new source key, but this lookup uses the old self.config, where that table/source does not exist. As a result outputs_tap_metadata and component_type_names are never populated for the newly spawned table source, so topology/tap subscribers cannot see or tap it until Vector is restarted.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/topology/running.rs

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2564ca377f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/topology/running.rs
Comment thread src/topology/test/mod.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: external docs Anything related to Vector's external, public documentation domain: topology Anything related to Vector's topology code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants