Skip to content

dead_letter_queue.flush_check_interval new config for flushing staled segment files.#19036

Merged
mashhurs merged 6 commits into
elastic:mainfrom
mashhurs:introduce-flush-check-interval
May 7, 2026
Merged

dead_letter_queue.flush_check_interval new config for flushing staled segment files.#19036
mashhurs merged 6 commits into
elastic:mainfrom
mashhurs:introduce-flush-check-interval

Conversation

@mashhurs
Copy link
Copy Markdown
Contributor

@mashhurs mashhurs commented Apr 21, 2026

Release notes

Introduces new dead_letter_queue.flush_check_interval config for flushing the staled segment files scheduler which can reduce frequent check overhead.

What does this PR do?

  1. Introduces a new configurable dead_letter_queue.flush_check_interval param for the segment file stale check scheduler. See the problem description - Introduce a period for file flushing staled segment files #19037
  • it cannot be less than 1sec
  • it cannot be greater than dead_letter_queue.flush_interval
  1. Validates dead_letter_queue.flush_interval for min 1s to keep consistency with the docs - https://www.elastic.co/docs/reference/logstash/dead-letter-queues: "Note that this value cannot be set to lower than 1000ms."

High level results

5-pipelines runs with 1-worker (to get comparable output), their configurations:

  • pipeline-1: default, flush_interval: 5000 and flush_check_interval: 1000
  • pipeline-2: default, flush_interval: 5000 and flush_check_interval: 2000
  • pipeline-3: default, flush_interval: 5000 and flush_check_interval: 5000
  • pipeline-4: default, flush_interval: 10000 and flush_check_interval: 5000
  • pipeline-5: default, flush_interval: 10000 and flush_check_interval: 7000

Following is the result table which shows the efficiency:

Configuration Flush Interval Check Schedule Interval CPU time Total CPU/min
Pipeline 1 (baseline) 5s 1s ~50.67ms ~3,040ms (5.1%)
Pipeline 2 5s 2s ~50.67ms ~1,520ms (2.5%)
Pipeline 3 5s 5s ~50.67ms ~608ms (1.0%)
Pipeline 4 10s 5s ~50.67ms ~608ms (1.0%)
Pipeline 5 10s 7s ~50.67ms ~434ms (0.7%)

Why is it important/What is the impact to the user?

The users who are using intensive DLQ operations (write/read), the frequent flush check scheduler might give overhead to the pipeline, means uses much CPU. Introducing configurable scheduler cadence improves the pipeline efficiency by removing frequent operations.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • [ ]

How to test this PR locally

  • pull this change
  • create multiple pipelines with different flush_interval and flush_check_interval params, below is the example.
  • run the Logstash
- pipeline.id: dlq-test-pipeline-1
  pipeline.workers: 1
  dead_letter_queue.enable: true
  dead_letter_queue.flush_interval: 5000
  dead_letter_queue.flush_check_interval: 1000
  dead_letter_queue.max_bytes: 14mb
  path.config: "config/tests/elasticsearch-output.conf"

- pipeline.id: dlq-test-pipeline-2
  pipeline.workers: 1
  dead_letter_queue.enable: true
  dead_letter_queue.flush_interval: 5000
  dead_letter_queue.flush_check_interval: 2000
  dead_letter_queue.max_bytes: 24mb
  path.config: "config/tests/elasticsearch-output.conf"

- pipeline.id: dlq-test-pipeline-3
  pipeline.workers: 1
  dead_letter_queue.enable: true
  dead_letter_queue.flush_interval: 5000
  dead_letter_queue.flush_check_interval: 5000
  dead_letter_queue.max_bytes: 24mb
  path.config: "config/tests/elasticsearch-output.conf"

- pipeline.id: dlq-test-pipeline-4
  pipeline.workers: 1
  dead_letter_queue.enable: true
  dead_letter_queue.flush_interval: 10000
  dead_letter_queue.flush_check_interval: 5000
  dead_letter_queue.max_bytes: 24mb
  path.config: "config/tests/elasticsearch-output.conf"

- pipeline.id: dlq-test-pipeline-5
  pipeline.workers: 1
  dead_letter_queue.enable: true
  dead_letter_queue.flush_interval: 10000
  dead_letter_queue.flush_check_interval: 7000
  dead_letter_queue.max_bytes: 24mb
  path.config: "config/tests/elasticsearch-output.conf"

ES configured in config/tests/elasticsearch-output.conf needs to reject events either 400 or 404 to be routed to the DLQ. Used the following config:

input {
  generator {
    id => "generator-id"
    ecs_compatibility => disabled
    count => 30000000
    threads => 2
    codec => json
    lines => [
	'{"fileset":{"module":"system","name":"auth"},"system":{"auth":{"timestamp":"May 17 05:17:00","ssh":{"source":{"ip":"123.123.123.123"}}}},"event":{"module":"cisco","data":{"User-Name":"mashhur"}},"client":{"ip":"123.123.123.123"},"DstIP":"123.123.123.123","SrcIP":"123.123.123.123","orginalClientSrcIP":"123.123.123.123","destination":{"ip":"123.123.123.123"},"source":{"ip":"123.123.123.123"},"ReferencedHost":"ip-192-168-1-2","DNSQuery":"example.com/my-path?query=value"}',
'{"fileset":{"module":"system","name":"syslog"},"system":{"auth":{"timestamp":"May 17  05:17:00","ssh":{"source":{"ip":"123.123.123.123"}}}},"event":{"module":"cisco","data":{"User-Name":"mashhur"}},"client":{"ip":"123.123.123.123"},"DstIP":"123.123.123.123","SrcIP":"123.123.123.123","orginalClientSrcIP":"123.123.123.123","destination":{"ip":"123.123.123.123"},"source":{"ip":"123.123.123.123"},"ReferencedHost":"ip-192-168-1-2","DNSQuery":"example.com/my-path?query=value"}',
'{"fileset":{"module":"system","name":"asa"},"system":{"auth":{"timestamp":"May 17  05:17:00","ssh":{"source":{"ip":"123.123.123.123"}}}},"event":{"category":"cisco-category", "type":"cisco-type", "data":{"User-Name":"mashhur"}},"client":{"ar_net":"123.123.123.123", "ongisac_ip":"123.123.123.123", "ip":"123.123.123.123"}, "destination": {"ar_net":"123.123.123.123", "ongisac_ip":"123.123.123.123"}, "source": {"ar_net":"123.123.123.123", "ongisac_ip":"123.123.123.123"}, "url":{"origin_domain": "ip-192-168-1-2"}, "DstIP":"123.123.123.123","SrcIP":"123.123.123.123","orginalClientSrcIP":"123.123.123.123","ReferencedHost":"ip-192-168-1-2", "dns":{"question": {"origin_domain":"example.com/my-path?query=value"}}}'
    ]
  }
}

output {
  elasticsearch {
    hosts => "http://127.0.0.1:9200"
    user => "elastic"
    password => "{pwd}"
    index => "test-dlq"
    action => "update"
    document_id => "nonexistent_id_12345"
    ecs_compatibility => disabled
  }
}

Related issues

Use cases

Screenshots

Logs

…new config for flushing staled segment files.
@mashhurs mashhurs self-assigned this Apr 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)
  • run exhaustive tests : Run the exhaustive tests Buildkite pipeline.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 21, 2026

This pull request does not have a backport label. Could you fix it @mashhurs? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
  • If no backport is necessary, please add the backport-skip label

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 21, 2026

🔍 Preview links for changed docs

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 21, 2026

✅ Vale Linting Results

No issues found on modified lines!


The Vale linter checks documentation changes against the Elastic Docs style guide.

To use Vale locally or report issues, refer to Elastic style guide for Vale.

@mashhurs mashhurs linked an issue Apr 22, 2026 that may be closed by this pull request
…in the threads API results. Add suggestions from the docs review. Re-organize the duration clam logic in a way for better maintainable and fix the unit tests.
@mashhurs mashhurs marked this pull request as ready for review April 22, 2026 22:42
@mashhurs mashhurs added backport-8.19 Automated backport to the 8.19 branch backport-9.3 Automated backport to the 9.3 branch backport-9.4 labels Apr 22, 2026
Copy link
Copy Markdown
Contributor

@alexcams alexcams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great implementation @mashhurs! I've compared results locally with before and after this feature and I see the CPU usage improvement on those pipelines using a value greater than the default (1s).
Just left one question about the value clamping.
Would be nice to hear @andsel thoughts too :)

Comment thread logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueWriter.java Outdated
Comment thread logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueWriter.java Outdated
@mashhurs mashhurs requested a review from andsel April 27, 2026 22:07
Copy link
Copy Markdown
Member

@andsel andsel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comment on how to make the changes more readable and testable.

Setting::BooleanSetting.new("dead_letter_queue.enable", false),
Setting::BytesSetting.new("dead_letter_queue.max_bytes", "1024mb"),
Setting::NumericSetting.new("dead_letter_queue.flush_interval", 5000),
Setting::NumericSetting.new("dead_letter_queue.flush_check_interval", 1000),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a specific reason to do not use TimeValueSetting which should be preferred for time intervals?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to keep the consistency with flush_interval ? I initially tried and in the docs it looked strange that flush_interval guides in milliseconds like 5000 and flush_check_interval in string format like 1s.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your concern, but the setting is an interval and there is a specific setting type for this case, would be nice to use that because is less error prone. If we use a numeric value that are milliseconds, we have to specify in the doc the unit of time, while with TimeValueSetting is explicit. However, like you said, it created inconsistency with the existing, but this is a new setting.
Prefer consistency or use the appropriate setting? I don't have a strong motivation for one or the other.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer consistency or use the appropriate setting?

I am also okay with either one. I was thinking from the user experience point of view that they are belong to the same domain, same type but in a different format.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsvd WDYT? about this thread? Prefer consistency or use the setting type that's more aligned with the use (it's an interval => TimeValue) ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ on using the more aligned one. if we want we can change the type for 10.x

Comment thread logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueWriter.java Outdated
Comment thread logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueWriter.java Outdated
Comment thread logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueWriter.java Outdated
Comment thread logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueWriter.java Outdated
Comment thread logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueWriter.java Outdated
Comment thread logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueWriter.java Outdated
Comment thread docs/reference/dead-letter-queues.md Outdated
mashhurs and others added 2 commits April 28, 2026 10:35
…ueueWriter.java


Remove unused method.

Co-authored-by: Andrea Selva <selva.andre@gmail.com>
…emove confusing scheduler from the docs explanations. unit tests for the only newly introduced conditions.
@mashhurs mashhurs requested review from alexcams and andsel April 28, 2026 22:15
Copy link
Copy Markdown
Member

@andsel andsel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some suggestion to improve test code and a little rewording of the doc to avoid using "scheduler" without introducing the concept first.

Comment thread docs/reference/dead-letter-queues.md Outdated
Comment thread logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueWriter.java Outdated
Comment thread logstash-core/src/test/java/org/logstash/common/io/DeadLetterQueueWriterTest.java Outdated
Comment thread logstash-core/src/test/java/org/logstash/common/io/DeadLetterQueueWriterTest.java Outdated
mashhurs and others added 2 commits April 29, 2026 10:22
Doc consistency and test rename suggestions accepted.

Co-authored-by: Andrea Selva <selva.andre@gmail.com>
@elasticmachine
Copy link
Copy Markdown

💛 Build succeeded, but was flaky

Failed CI Steps

History

cc @mashhurs

@mashhurs mashhurs requested a review from andsel April 29, 2026 20:09
Copy link
Copy Markdown
Member

@andsel andsel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Thanks @mashhurs great work 👍
I've asked @jsvd for an opinion about the type of setting, wait for his thought, but to me it's not blocking.

@mashhurs mashhurs merged commit f2f0d3f into elastic:main May 7, 2026
11 checks passed
@mashhurs mashhurs deleted the introduce-flush-check-interval branch May 7, 2026 16:10
@mashhurs
Copy link
Copy Markdown
Contributor Author

mashhurs commented May 7, 2026

@Mergifyio backport 9.4

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 7, 2026

backport 9.4

✅ Backports have been created

Details

mashhurs added a commit that referenced this pull request May 7, 2026
…ed segment files. (#19036) (#19090)

* Validates  to be min 1s to keep consistency with the docs. Introduces  new config for flushing staled segment files.

* Add pipeline name to the DLQ flush thread name for better visibility in the threads API results. Add suggestions from the docs review. Re-organize the duration clam logic in a way for better maintainable and fix the unit tests.

* Update logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueWriter.java

Remove unused method.



* Move the flush chech interval to the DeadLetterQueueWriter.Builder. Remove confusing scheduler from the docs explanations. unit tests for the only newly introduced conditions.

* Apply suggestions from code review

Doc consistency and test rename suggestions accepted.



* Keep the interval type as a Duration, rename and simplify test suites.

---------


(cherry picked from commit f2f0d3f)

Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com>
Co-authored-by: Andrea Selva <selva.andre@gmail.com>
mashhurs added a commit that referenced this pull request May 7, 2026
…config for flushing staled segment files. (#19089)

* `dead_letter_queue.flush_check_interval` new config for flushing staled segment files. (#19036)

* Validates  to be min 1s to keep consistency with the docs. Introduces  new config for flushing staled segment files.

* Add pipeline name to the DLQ flush thread name for better visibility in the threads API results. Add suggestions from the docs review. Re-organize the duration clam logic in a way for better maintainable and fix the unit tests.

* Update logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueWriter.java

Remove unused method.

Co-authored-by: Andrea Selva <selva.andre@gmail.com>

* Move the flush chech interval to the DeadLetterQueueWriter.Builder. Remove confusing scheduler from the docs explanations. unit tests for the only newly introduced conditions.

* Apply suggestions from code review

Doc consistency and test rename suggestions accepted.

Co-authored-by: Andrea Selva <selva.andre@gmail.com>

* Keep the interval type as a Duration, rename and simplify test suites.

---------

Co-authored-by: Andrea Selva <selva.andre@gmail.com>
(cherry picked from commit f2f0d3f)

# Conflicts:
#	logstash-core/src/main/java/org/logstash/execution/AbstractPipelineExt.java

* Resolve the conflict.

---------

Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com>
Co-authored-by: Mashhur <mashhur.sattorov@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-8.19 Automated backport to the 8.19 branch backport-9.3 Automated backport to the 9.3 branch backport-9.4 enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce a period for file flushing staled segment files

5 participants