enhancement(transforms): Add global option to loosen ordering guarantees in stateless transforms + introduce associated metric#25070
Open
ArunPiduguDD wants to merge 10 commits intomasterfrom
Open
Conversation
9 tasks
1a4d60d to
72718d3
Compare
72718d3 to
6957e46
Compare
6957e46 to
9774502
Compare
e35c5e2 to
5ab0ba0
Compare
9774502 to
3829755
Compare
…ms in exchange for potential performance benefits
3829755 to
aa3e104
Compare
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ing pressure metric Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
dea4a7c to
0a73a90
Compare
…ig until feature lands on master Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
pront
requested changes
Apr 21, 2026
Member
pront
left a comment
There was a problem hiding this comment.
Hello, thank you for this contribution! This PR can be split into two dedicated PRs, I would start with introducing estimated_concurrent_transform_scheduling_pressure first.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Introduces a new global configuration option called
preserve_ordering_stateless_transformsthat disables ordering guarantees in concurrently running Function Transforms. Also adds a corresponding metricestimated_concurrent_transform_scheduling_pressureto inform users when they should consider enabling this option.Detailed Context
In Vector's existing concurrency model stateless function transforms are run concurrently (e.g. a function transform can have multiple threads working on batches of events in parallel). However, the existing implementation still guarantees event ordering (e.g. if 1000 events arrive at a transform and are processed across 10 batches/Tasks, they will still leave the transform in the same order they arrived, even if later batches complete before earlier batches).
In cases where processing latency of the events within the transform is both high & variable, then this can lead to inefficiencies - as mentioned above events that are processed in later batches can be blocked by batches/Tasks scheduled earlier (if the earlier batch is still processing when the later batch finishes)
The effect can be illustrated by this wall-time profile (measured during a benchmark test with 8 CPUs / parallel threads)
In this test the vector instance was constantly flooded with events so there are always events waiting to be processed. Multiple threads finish processing their batch, however new batches / Tasks are unable to be scheduled for these idle threads due to the fact that the head Task in the `FuturesOrdered` [queue](https://github.com/vectordotdev/vector/blob/master/src/topology/builder.rs#L1215) is still processing its batch, leading to a CPU utilization inefficiency and overall lower throughput.
When switching this to an ordered queue, the the transform is not held up by long-running tasks and the overall ingress throughput increases (graph below shows bytes / second throughput of ordered vs unordered queue - test was done using remap processor with many regex rules)
In order to determine if enabling this option is needed, this PR also adds a new metric
estimated_concurrent_transform_scheduling_pressurewhich keeps track of how many Tasks have been completed and are blocked by the head task from being scheduled (metric is a distribution which ranges from 0-1). B/c this introduces a shared counter to the transform "hot path", ran regression benchmark tests to confirm there are no issues: https://github.com/vectordotdev/vector/actions/runs/24585515449 (note: a few tests failed to run but seems to be unrelated issues - tests also failing to run on the latest merged commit in master)Vector configuration
Added a new regression test with the following Vector config
How did you test this PR?
Ran Vector pipelines with the
preserve_ordering_stateless_transformsoption set. Also ran regression testsChange Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References
Notes
@vectordotdev/vectorto reach out to us regarding this PR.pre-pushhook, please see this template.make fmtmake check-clippy(if there are failures it's possible some of them can be fixed withmake clippy-fix)make testgit merge origin masterandgit push.Cargo.lock), pleaserun
make build-licensesto regenerate the license inventory and commit the changes (if any). More details here.