Skip to content

[8.19] (backport #19036) dead_letter_queue.flush_check_interval new config for flushing staled segment files.#19088

Draft
mergify[bot] wants to merge 3 commits into
8.19from
mergify/bp/8.19/pr-19036
Draft

[8.19] (backport #19036) dead_letter_queue.flush_check_interval new config for flushing staled segment files.#19088
mergify[bot] wants to merge 3 commits into
8.19from
mergify/bp/8.19/pr-19036

Conversation

@mergify
Copy link
Copy Markdown
Contributor

@mergify mergify Bot commented May 7, 2026

Release notes

Introduces new dead_letter_queue.flush_check_interval config for flushing the staled segment files scheduler which can reduce frequent check overhead.

What does this PR do?

  1. Introduces a new configurable dead_letter_queue.flush_check_interval param for the segment file stale check scheduler. See the problem description - Introduce a period for file flushing staled segment files #19037
  • it cannot be less than 1sec
  • it cannot be greater than dead_letter_queue.flush_interval
  1. Validates dead_letter_queue.flush_interval for min 1s to keep consistency with the docs - https://www.elastic.co/docs/reference/logstash/dead-letter-queues: "Note that this value cannot be set to lower than 1000ms."

High level results

5-pipelines runs with 1-worker (to get comparable output), their configurations:

  • pipeline-1: default, flush_interval: 5000 and flush_check_interval: 1000
  • pipeline-2: default, flush_interval: 5000 and flush_check_interval: 2000
  • pipeline-3: default, flush_interval: 5000 and flush_check_interval: 5000
  • pipeline-4: default, flush_interval: 10000 and flush_check_interval: 5000
  • pipeline-5: default, flush_interval: 10000 and flush_check_interval: 7000

Following is the result table which shows the efficiency:

Configuration Flush Interval Check Schedule Interval CPU time Total CPU/min
Pipeline 1 (baseline) 5s 1s ~50.67ms ~3,040ms (5.1%)
Pipeline 2 5s 2s ~50.67ms ~1,520ms (2.5%)
Pipeline 3 5s 5s ~50.67ms ~608ms (1.0%)
Pipeline 4 10s 5s ~50.67ms ~608ms (1.0%)
Pipeline 5 10s 7s ~50.67ms ~434ms (0.7%)

Why is it important/What is the impact to the user?

The users who are using intensive DLQ operations (write/read), the frequent flush check scheduler might give overhead to the pipeline, means uses much CPU. Introducing configurable scheduler cadence improves the pipeline efficiency by removing frequent operations.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • [ ]

How to test this PR locally

  • pull this change
  • create multiple pipelines with different flush_interval and flush_check_interval params, below is the example.
  • run the Logstash
- pipeline.id: dlq-test-pipeline-1
  pipeline.workers: 1
  dead_letter_queue.enable: true
  dead_letter_queue.flush_interval: 5000
  dead_letter_queue.flush_check_interval: 1000
  dead_letter_queue.max_bytes: 14mb
  path.config: "config/tests/elasticsearch-output.conf"

- pipeline.id: dlq-test-pipeline-2
  pipeline.workers: 1
  dead_letter_queue.enable: true
  dead_letter_queue.flush_interval: 5000
  dead_letter_queue.flush_check_interval: 2000
  dead_letter_queue.max_bytes: 24mb
  path.config: "config/tests/elasticsearch-output.conf"

- pipeline.id: dlq-test-pipeline-3
  pipeline.workers: 1
  dead_letter_queue.enable: true
  dead_letter_queue.flush_interval: 5000
  dead_letter_queue.flush_check_interval: 5000
  dead_letter_queue.max_bytes: 24mb
  path.config: "config/tests/elasticsearch-output.conf"

- pipeline.id: dlq-test-pipeline-4
  pipeline.workers: 1
  dead_letter_queue.enable: true
  dead_letter_queue.flush_interval: 10000
  dead_letter_queue.flush_check_interval: 5000
  dead_letter_queue.max_bytes: 24mb
  path.config: "config/tests/elasticsearch-output.conf"

- pipeline.id: dlq-test-pipeline-5
  pipeline.workers: 1
  dead_letter_queue.enable: true
  dead_letter_queue.flush_interval: 10000
  dead_letter_queue.flush_check_interval: 7000
  dead_letter_queue.max_bytes: 24mb
  path.config: "config/tests/elasticsearch-output.conf"

ES configured in config/tests/elasticsearch-output.conf needs to reject events either 400 or 404 to be routed to the DLQ. Used the following config:

input {
  generator {
    id => "generator-id"
    ecs_compatibility => disabled
    count => 30000000
    threads => 2
    codec => json
    lines => [
	'{"fileset":{"module":"system","name":"auth"},"system":{"auth":{"timestamp":"May 17 05:17:00","ssh":{"source":{"ip":"123.123.123.123"}}}},"event":{"module":"cisco","data":{"User-Name":"mashhur"}},"client":{"ip":"123.123.123.123"},"DstIP":"123.123.123.123","SrcIP":"123.123.123.123","orginalClientSrcIP":"123.123.123.123","destination":{"ip":"123.123.123.123"},"source":{"ip":"123.123.123.123"},"ReferencedHost":"ip-192-168-1-2","DNSQuery":"example.com/my-path?query=value"}',
'{"fileset":{"module":"system","name":"syslog"},"system":{"auth":{"timestamp":"May 17  05:17:00","ssh":{"source":{"ip":"123.123.123.123"}}}},"event":{"module":"cisco","data":{"User-Name":"mashhur"}},"client":{"ip":"123.123.123.123"},"DstIP":"123.123.123.123","SrcIP":"123.123.123.123","orginalClientSrcIP":"123.123.123.123","destination":{"ip":"123.123.123.123"},"source":{"ip":"123.123.123.123"},"ReferencedHost":"ip-192-168-1-2","DNSQuery":"example.com/my-path?query=value"}',
'{"fileset":{"module":"system","name":"asa"},"system":{"auth":{"timestamp":"May 17  05:17:00","ssh":{"source":{"ip":"123.123.123.123"}}}},"event":{"category":"cisco-category", "type":"cisco-type", "data":{"User-Name":"mashhur"}},"client":{"ar_net":"123.123.123.123", "ongisac_ip":"123.123.123.123", "ip":"123.123.123.123"}, "destination": {"ar_net":"123.123.123.123", "ongisac_ip":"123.123.123.123"}, "source": {"ar_net":"123.123.123.123", "ongisac_ip":"123.123.123.123"}, "url":{"origin_domain": "ip-192-168-1-2"}, "DstIP":"123.123.123.123","SrcIP":"123.123.123.123","orginalClientSrcIP":"123.123.123.123","ReferencedHost":"ip-192-168-1-2", "dns":{"question": {"origin_domain":"example.com/my-path?query=value"}}}'
    ]
  }
}

output {
  elasticsearch {
    hosts => "http://127.0.0.1:9200"
    user => "elastic"
    password => "{pwd}"
    index => "test-dlq"
    action => "update"
    document_id => "nonexistent_id_12345"
    ecs_compatibility => disabled
  }
}

Related issues

Use cases

Screenshots

Logs


This is an automatic backport of pull request #19036 done by [Mergify](https://mergify.com).

…ed segment files. (#19036)

* Validates  to be min 1s to keep consistency with the docs. Introduces  new config for flushing staled segment files.

* Add pipeline name to the DLQ flush thread name for better visibility in the threads API results. Add suggestions from the docs review. Re-organize the duration clam logic in a way for better maintainable and fix the unit tests.

* Update logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueWriter.java

Remove unused method.

Co-authored-by: Andrea Selva <selva.andre@gmail.com>

* Move the flush chech interval to the DeadLetterQueueWriter.Builder. Remove confusing scheduler from the docs explanations. unit tests for the only newly introduced conditions.

* Apply suggestions from code review

Doc consistency and test rename suggestions accepted.

Co-authored-by: Andrea Selva <selva.andre@gmail.com>

* Keep the interval type as a Duration, rename and simplify test suites.

---------

Co-authored-by: Andrea Selva <selva.andre@gmail.com>
(cherry picked from commit f2f0d3f)

# Conflicts:
#	docs/reference/dead-letter-queues.md
#	logstash-core/lib/logstash/environment.rb
#	logstash-core/src/main/java/org/logstash/execution/AbstractPipelineExt.java
#	logstash-core/src/test/java/org/logstash/common/io/DeadLetterQueueWriterTest.java
@mergify mergify Bot added backport conflicts Detected git conflicts labels May 7, 2026
@mergify
Copy link
Copy Markdown
Contributor Author

mergify Bot commented May 7, 2026

Cherry-pick of f2f0d3f has failed:

On branch mergify/bp/8.19/pr-19036
Your branch is up to date with 'origin/8.19'.

You are currently cherry-picking commit f2f0d3fde.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   config/logstash.yml
	modified:   logstash-core/lib/logstash/settings.rb
	modified:   logstash-core/src/main/java/org/logstash/common/DeadLetterQueueFactory.java
	modified:   logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueWriter.java
	modified:   logstash-core/src/main/java/org/logstash/common/io/RecordIOWriter.java
	modified:   logstash-core/src/test/java/org/logstash/common/AbstractDeadLetterQueueWriterExtTest.java
	modified:   logstash-core/src/test/java/org/logstash/common/DeadLetterQueueFactoryTest.java
	modified:   logstash-core/src/test/java/org/logstash/common/io/DeadLetterQueueReaderTest.java
	modified:   logstash-core/src/test/java/org/logstash/common/io/DeadLetterQueueWriterAgeRetentionTest.java

Unmerged paths:
  (use "git add/rm <file>..." as appropriate to mark resolution)
	deleted by us:   docs/reference/dead-letter-queues.md
	both modified:   logstash-core/lib/logstash/environment.rb
	both modified:   logstash-core/src/main/java/org/logstash/execution/AbstractPipelineExt.java
	both modified:   logstash-core/src/test/java/org/logstash/common/io/DeadLetterQueueWriterTest.java

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)
  • run exhaustive tests : Run the exhaustive tests Buildkite pipeline.

@mashhurs mashhurs marked this pull request as draft May 7, 2026 16:13
@mashhurs
Copy link
Copy Markdown
Contributor

mashhurs commented May 7, 2026

This needs a work to convert docs MD to ascii format.

@mashhurs mashhurs requested a review from andsel May 7, 2026 19:05
@mashhurs mashhurs marked this pull request as ready for review May 7, 2026 19:12
@elasticmachine
Copy link
Copy Markdown

💚 Build Succeeded

History

cc @mashhurs

@mashhurs mashhurs marked this pull request as draft May 7, 2026 20:02
@mashhurs
Copy link
Copy Markdown
Contributor

mashhurs commented May 7, 2026

@andsel I have backported to see how complex the backport will be. And 8.19 is the last 8.x we have but with more releases. I am not really sure the commitment to include in the 8.19. It looks to me nice to have.

@andsel
Copy link
Copy Markdown
Member

andsel commented May 8, 2026

Technically this is a feature, so shouldn't be backported. However, we can keep this if someone request it or merge once the feature proove for stability on 9.x releases.

Copy link
Copy Markdown
Member

@andsel andsel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mashhurs
Copy link
Copy Markdown
Contributor

mashhurs commented May 8, 2026

Technically this is a feature, so shouldn't be backported. However, we can keep this if someone request it or merge once the feature proove for stability on 9.x releases.

Makes sense, I will keep in mind. Thank you!

@mergify
Copy link
Copy Markdown
Contributor Author

mergify Bot commented May 11, 2026

This pull request has not been merged yet. Could you please review and merge it @mashhurs? 🙏

1 similar comment
@mergify
Copy link
Copy Markdown
Contributor Author

mergify Bot commented May 18, 2026

This pull request has not been merged yet. Could you please review and merge it @mashhurs? 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport conflicts Detected git conflicts

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants