Let events affect incidents in multiple ways#442
Conversation
d5fd9ae to
581ecf4
Compare
|
Rebased and fixed some issues reported by Johannes offline. |
63de7a2 to
0a6a32b
Compare
|
Added some new basic testing for the |
0a6a32b to
12445a7
Compare
There was a problem hiding this comment.
Pull request overview
This PR updates the event/incident processing model to support multiple independent “effects” (open/escalate, close, mute/unmute, notify) in a more atomic way, and introduces persistence of the latest incident message (plugin output) on the incident itself.
Changes:
- Add
incident.messagecolumn (PostgreSQL/MySQL) and persist incident message from incoming events. - Rework incident processing to support combinations of
incident/close/muted/notify, reduce redundant DB writes, and adjust HTTP validation/handling accordingly. - Update object syncing and channel notification payloads to reduce copying and align with the updated event model/library dependency.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| schema/pgsql/upgrades/incident-message.sql | Adds message column upgrade for PostgreSQL. |
| schema/pgsql/schema.sql | Adds message column to PostgreSQL incident schema. |
| schema/mysql/upgrades/incident-message.sql | Adds message column upgrade for MySQL. |
| schema/mysql/schema.sql | Adds message column to MySQL incident schema. |
| internal/object/object.go | Reduces copying in SyncFromEvent and updates only changed fields. |
| internal/listener/listener.go | Updates request preprocessing/validation flow and incomplete-relations handling based on OpenOrEscalate(). |
| internal/incident/sync.go | Extends incident upsert shape to include message (but currently contains a critical bug). |
| internal/incident/incidents.go | Updates incident creation/closing logic and introduces stricter error mapping for invalid event combinations. |
| internal/incident/incidents_test.go | Refactors and expands incident behavior tests for new flags/flows. |
| internal/incident/incident.go | Implements new event-effect handling, notification triggering rules, message persistence, and close semantics. |
| internal/event/event.go | Delegates base validation to library, improves log marshaling, and adds early return in ExtractMissingRelations. |
| internal/channel/channel.go | Adjusts plugin notification event payload to match updated event fields. |
| go.mod | Updates icinga-go-library dependency revision. |
| go.sum | Updates checksums for the bumped icinga-go-library revision. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
12445a7 to
de9d3c6
Compare
|
Fixed the broken new unittests after the newest changes in Icinga/icinga-go-library#212 and |
de9d3c6 to
c643460
Compare
Previously, we've updated the incident table multiple times within the very same transaction, and was clearly highlighted by the Go profiler. Since the attributes aren't referenced anywhere else, we can just update them in memory and only persist them to the database once at the end of the transaction, which significantly eliminates a bunch of useless upsert queries.
There's no need to retrieve a parser from pool to just immediately release it if no filter columns were provided.
Instead of copying blindly all fields, we only need to update the fields that were actually changed. For example, the tags field doesn't need to be updated here, because if they've different tags, the ID would be different, and thus we would have a different object in the cache. This was highlighted by the Go profiler as one of the hotspots for memory allocations, so don't update the fields that don't need to be updated.
c643460 to
9cb792b
Compare
9cb792b to
99f5900
Compare
Apart from the main changes, this PR also contains the following three additional commits that are not directly related to the referenced issue.
Also, about the memory problems I had in Icinga Notifications, it was not a memory leak and there's no memory problems at all ;). It was consuming too much memory simply because I've built my containers with the
-raceflag, and after removing it, the daemon never goes beyond the 200MiB mark.Requires
fixes #261
resolves #407