Skip to content

[9.4] (backport #19036) dead_letter_queue.flush_check_interval new config for flushing staled segment files.#19090

Merged
mashhurs merged 1 commit into
9.4from
mergify/bp/9.4/pr-19036
May 7, 2026
Merged

[9.4] (backport #19036) dead_letter_queue.flush_check_interval new config for flushing staled segment files.#19090
mashhurs merged 1 commit into
9.4from
mergify/bp/9.4/pr-19036

Conversation

@mergify
Copy link
Copy Markdown
Contributor

@mergify mergify Bot commented May 7, 2026

Release notes

Introduces new dead_letter_queue.flush_check_interval config for flushing the staled segment files scheduler which can reduce frequent check overhead.

What does this PR do?

  1. Introduces a new configurable dead_letter_queue.flush_check_interval param for the segment file stale check scheduler. See the problem description - Introduce a period for file flushing staled segment files #19037
  • it cannot be less than 1sec
  • it cannot be greater than dead_letter_queue.flush_interval
  1. Validates dead_letter_queue.flush_interval for min 1s to keep consistency with the docs - https://www.elastic.co/docs/reference/logstash/dead-letter-queues: "Note that this value cannot be set to lower than 1000ms."

High level results

5-pipelines runs with 1-worker (to get comparable output), their configurations:

  • pipeline-1: default, flush_interval: 5000 and flush_check_interval: 1000
  • pipeline-2: default, flush_interval: 5000 and flush_check_interval: 2000
  • pipeline-3: default, flush_interval: 5000 and flush_check_interval: 5000
  • pipeline-4: default, flush_interval: 10000 and flush_check_interval: 5000
  • pipeline-5: default, flush_interval: 10000 and flush_check_interval: 7000

Following is the result table which shows the efficiency:

Configuration Flush Interval Check Schedule Interval CPU time Total CPU/min
Pipeline 1 (baseline) 5s 1s ~50.67ms ~3,040ms (5.1%)
Pipeline 2 5s 2s ~50.67ms ~1,520ms (2.5%)
Pipeline 3 5s 5s ~50.67ms ~608ms (1.0%)
Pipeline 4 10s 5s ~50.67ms ~608ms (1.0%)
Pipeline 5 10s 7s ~50.67ms ~434ms (0.7%)

Why is it important/What is the impact to the user?

The users who are using intensive DLQ operations (write/read), the frequent flush check scheduler might give overhead to the pipeline, means uses much CPU. Introducing configurable scheduler cadence improves the pipeline efficiency by removing frequent operations.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • [ ]

How to test this PR locally

  • pull this change
  • create multiple pipelines with different flush_interval and flush_check_interval params, below is the example.
  • run the Logstash
- pipeline.id: dlq-test-pipeline-1
  pipeline.workers: 1
  dead_letter_queue.enable: true
  dead_letter_queue.flush_interval: 5000
  dead_letter_queue.flush_check_interval: 1000
  dead_letter_queue.max_bytes: 14mb
  path.config: "config/tests/elasticsearch-output.conf"

- pipeline.id: dlq-test-pipeline-2
  pipeline.workers: 1
  dead_letter_queue.enable: true
  dead_letter_queue.flush_interval: 5000
  dead_letter_queue.flush_check_interval: 2000
  dead_letter_queue.max_bytes: 24mb
  path.config: "config/tests/elasticsearch-output.conf"

- pipeline.id: dlq-test-pipeline-3
  pipeline.workers: 1
  dead_letter_queue.enable: true
  dead_letter_queue.flush_interval: 5000
  dead_letter_queue.flush_check_interval: 5000
  dead_letter_queue.max_bytes: 24mb
  path.config: "config/tests/elasticsearch-output.conf"

- pipeline.id: dlq-test-pipeline-4
  pipeline.workers: 1
  dead_letter_queue.enable: true
  dead_letter_queue.flush_interval: 10000
  dead_letter_queue.flush_check_interval: 5000
  dead_letter_queue.max_bytes: 24mb
  path.config: "config/tests/elasticsearch-output.conf"

- pipeline.id: dlq-test-pipeline-5
  pipeline.workers: 1
  dead_letter_queue.enable: true
  dead_letter_queue.flush_interval: 10000
  dead_letter_queue.flush_check_interval: 7000
  dead_letter_queue.max_bytes: 24mb
  path.config: "config/tests/elasticsearch-output.conf"

ES configured in config/tests/elasticsearch-output.conf needs to reject events either 400 or 404 to be routed to the DLQ. Used the following config:

input {
  generator {
    id => "generator-id"
    ecs_compatibility => disabled
    count => 30000000
    threads => 2
    codec => json
    lines => [
	'{"fileset":{"module":"system","name":"auth"},"system":{"auth":{"timestamp":"May 17 05:17:00","ssh":{"source":{"ip":"123.123.123.123"}}}},"event":{"module":"cisco","data":{"User-Name":"mashhur"}},"client":{"ip":"123.123.123.123"},"DstIP":"123.123.123.123","SrcIP":"123.123.123.123","orginalClientSrcIP":"123.123.123.123","destination":{"ip":"123.123.123.123"},"source":{"ip":"123.123.123.123"},"ReferencedHost":"ip-192-168-1-2","DNSQuery":"example.com/my-path?query=value"}',
'{"fileset":{"module":"system","name":"syslog"},"system":{"auth":{"timestamp":"May 17  05:17:00","ssh":{"source":{"ip":"123.123.123.123"}}}},"event":{"module":"cisco","data":{"User-Name":"mashhur"}},"client":{"ip":"123.123.123.123"},"DstIP":"123.123.123.123","SrcIP":"123.123.123.123","orginalClientSrcIP":"123.123.123.123","destination":{"ip":"123.123.123.123"},"source":{"ip":"123.123.123.123"},"ReferencedHost":"ip-192-168-1-2","DNSQuery":"example.com/my-path?query=value"}',
'{"fileset":{"module":"system","name":"asa"},"system":{"auth":{"timestamp":"May 17  05:17:00","ssh":{"source":{"ip":"123.123.123.123"}}}},"event":{"category":"cisco-category", "type":"cisco-type", "data":{"User-Name":"mashhur"}},"client":{"ar_net":"123.123.123.123", "ongisac_ip":"123.123.123.123", "ip":"123.123.123.123"}, "destination": {"ar_net":"123.123.123.123", "ongisac_ip":"123.123.123.123"}, "source": {"ar_net":"123.123.123.123", "ongisac_ip":"123.123.123.123"}, "url":{"origin_domain": "ip-192-168-1-2"}, "DstIP":"123.123.123.123","SrcIP":"123.123.123.123","orginalClientSrcIP":"123.123.123.123","ReferencedHost":"ip-192-168-1-2", "dns":{"question": {"origin_domain":"example.com/my-path?query=value"}}}'
    ]
  }
}

output {
  elasticsearch {
    hosts => "http://127.0.0.1:9200"
    user => "elastic"
    password => "{pwd}"
    index => "test-dlq"
    action => "update"
    document_id => "nonexistent_id_12345"
    ecs_compatibility => disabled
  }
}

Related issues

Use cases

Screenshots

Logs


This is an automatic backport of pull request #19036 done by [Mergify](https://mergify.com).

…ed segment files. (#19036)

* Validates  to be min 1s to keep consistency with the docs. Introduces  new config for flushing staled segment files.

* Add pipeline name to the DLQ flush thread name for better visibility in the threads API results. Add suggestions from the docs review. Re-organize the duration clam logic in a way for better maintainable and fix the unit tests.

* Update logstash-core/src/main/java/org/logstash/common/io/DeadLetterQueueWriter.java

Remove unused method.

Co-authored-by: Andrea Selva <selva.andre@gmail.com>

* Move the flush chech interval to the DeadLetterQueueWriter.Builder. Remove confusing scheduler from the docs explanations. unit tests for the only newly introduced conditions.

* Apply suggestions from code review

Doc consistency and test rename suggestions accepted.

Co-authored-by: Andrea Selva <selva.andre@gmail.com>

* Keep the interval type as a Duration, rename and simplify test suites.

---------

Co-authored-by: Andrea Selva <selva.andre@gmail.com>
(cherry picked from commit f2f0d3f)
@mergify mergify Bot added the backport label May 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)
  • run exhaustive tests : Run the exhaustive tests Buildkite pipeline.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

🔍 Preview links for changed docs

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

✅ Vale Linting Results

No issues found on modified lines!


The Vale linter checks documentation changes against the Elastic Docs style guide.

To use Vale locally or report issues, refer to Elastic style guide for Vale.

Copy link
Copy Markdown
Contributor

@mashhurs mashhurs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean backport

@elasticmachine
Copy link
Copy Markdown

💚 Build Succeeded

cc @mashhurs

@mashhurs mashhurs merged commit 9fa6631 into 9.4 May 7, 2026
13 checks passed
@mashhurs mashhurs deleted the mergify/bp/9.4/pr-19036 branch May 7, 2026 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants