Honor max_event_age in cluster event periodical and improve performance (7.0)#25514
Merged
patrickmann merged 4 commits into7.0from Apr 14, 2026
Merged
Honor max_event_age in cluster event periodical and improve performance (7.0)#25514patrickmann merged 4 commits into7.0from
7.0)#25514patrickmann merged 4 commits into7.0from
Conversation
Contributor
Author
Conflict resolutionsThree files conflicted due to the JUnit 4 → 5 migration on
|
The original PR replaced the compound index from (timestamp, producer, consumers) to (consumers, timestamp) but didn't remove the old index. Drop it on startup if present. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
xd4rker
approved these changes
Apr 13, 2026
Contributor
xd4rker
left a comment
There was a problem hiding this comment.
Tested on 7.0 and it worked as expected 👍
garybot2
pushed a commit
that referenced
this pull request
Apr 14, 2026
…ce (`7.0`) (#25514) * Honor max_event_age in cluster event periodical and improve performance (#25265) * honor max_event_age * CL * add validation; migrate test format to junit5 style * reduce default and min period * revise default and min value * add config param documentation --------- Co-authored-by: Anton Ebel <anton.ebel@graylog.com> (cherry picked from commit a2c2f52) * Drop old cluster_events index when creating the new one The original PR replaced the compound index from (timestamp, producer, consumers) to (consumers, timestamp) but didn't remove the old index. Drop it on startup if present. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * adjust for graylog collection wrapper --------- Co-authored-by: Anton Ebel <anton.ebel@graylog.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Ismail Belkacim <xd4rker@users.noreply.github.com> (cherry picked from commit a3e0792)
patrickmann
added a commit
that referenced
this pull request
Apr 14, 2026
…ce (`7.0`) (#25514) (#25631) * Honor max_event_age in cluster event periodical and improve performance (#25265) * honor max_event_age * CL * add validation; migrate test format to junit5 style * reduce default and min period * revise default and min value * add config param documentation --------- (cherry picked from commit a2c2f52) * Drop old cluster_events index when creating the new one The original PR replaced the compound index from (timestamp, producer, consumers) to (consumers, timestamp) but didn't remove the old index. Drop it on startup if present. * adjust for graylog collection wrapper --------- (cherry picked from commit a3e0792) Co-authored-by: Patrick Mann <patrickmann@users.noreply.github.com> Co-authored-by: Anton Ebel <anton.ebel@graylog.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Ismail Belkacim <xd4rker@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note: This is a backport of #25265 to
7.0.However, a few changes were required to adapt from 7.1 (mainly affecting unit tests, see conflict resolution comment below). Also we add a missing
dropIndexof the obsolete oldcluster_eventsindex.Problem
ClusterEventCleanupPeriodicalhad a hardcoded cleanup period of 1 day (86400s), regardless of the configuredmax_event_age. Ifmax_event_agewas set to e.g. 2 hours, stale cluster events could linger for up to ~25 hours before being cleaned up.Behavioral Changes
Dynamic cleanup period based on
max_event_age—getPeriodSeconds()now returns the configuredmax_event_ageduration (in seconds) instead of a fixed 1-day interval, clamped to a minimum of 1 hour to prevent excessive DB load. Default is reduced to 12 hours given the performance impact noted by customers.MongoDB index reordering (
ClusterEventPeriodical) — The compound index changed from(timestamp, producer, consumers)to(consumers, timestamp). Theproducerfield was removed since it's not used in the query predicate. This better matches theeventsIterable()query pattern, which filters onconsumers($nin) and sorts bytimestamp. The new index allows MongoDB to satisfy both the filter and sort from a single index scan, whereas the old index order required scanning all timestamps first. Given the 1-second polling frequency across all cluster nodes, the cumulative effect is substantial — fewer documents scanned, no in-memory sorts, and a smaller index to maintain. On a cluster with N nodes, that's N queries/second against this collection, continuously. The improvement scales with both cluster size and event volume.Joda-Time → java.time migration —
ClusterEventCleanupPeriodicalnow usesjava.time.Clock/Instant/Durationinstead of Joda'sDateTime. TheClockis injected via constructor, making the class properly testable without global time mocking.#25404 ## Motivation and Context
Relates to #25259
Manual test
Default behavior — Start server with default config (no
max_event_ageset). Verify in logs thatClusterEventCleanupPeriodicalschedules at 43200s (12h), not 86400s (1d).Custom
max_event_age— Setmax_event_age = 1hinserver.conf, restart. Confirm cleanup schedules at 3600s.Minimum clamp — Set
max_event_age = 30m, restart. Confirm cleanup still schedules at 3600s (1h minimum).Index — Check
db.cluster_events.getIndexes()in MongoDB. Should show{ consumers: 1, timestamp: 1 }(not the old{ timestamp: 1, producer: 1, consumers: 1 }).Smoke test
max_event_age = 1h, and confirmgetInitialDelaySeconds()returns 0).