Skip to content

fix(kafka): preserve offset when DLQ publish fails (#69)#108

Merged
galt-tr merged 3 commits into
mainfrom
fix/issue-69-arcade-dlq-publish
May 1, 2026
Merged

fix(kafka): preserve offset when DLQ publish fails (#69)#108
galt-tr merged 3 commits into
mainfrom
fix/issue-69-arcade-dlq-publish

Conversation

@galt-tr
Copy link
Copy Markdown
Contributor

@galt-tr galt-tr commented May 1, 2026

Summary

  • processOne no longer MarkMessages when sendToDLQ returns an error; the offset stays uncommitted so Kafka redelivers on the next session / pod restart.
  • sendToDLQ now returns an error from broker publish failures (marshal failures and the no-producer-configured branch still return nil since retrying does not help).
  • New counter arcade_kafka_dlq_publish_failures_total{topic} so operators can alert on DLQ outages.
  • Three new tests cover the failure mode (no Mark, metric incremented), the recovery path (Mark, no metric increment), and the happy path (Mark, no DLQ traffic). The existing processWithRetry tests still pass.

Closes #69

Test plan

  • go build ./...
  • go vet ./...
  • go test ./kafka/... -race
  • golangci-lint run ./kafka/... ./metrics/... (0 issues)
  • Reviewer to confirm metric naming convention

processOne previously called MarkMessage unconditionally after attempting
to send a failed message to the DLQ, so a transient DLQ-topic outage
silently lost the message. The offset is now only marked when processing
succeeded or the DLQ publish succeeded; DLQ-publish failures leave the
offset uncommitted so the consumer redelivers on the next session.
A new metric counts DLQ publish failures. Closes F-011.
@galt-tr galt-tr requested a review from mrz1836 as a code owner May 1, 2026 18:21
@github-actions github-actions Bot added size/M Medium change (51–200 lines) bug-P3 Lowest rated bug, affects nearly none or low-impact labels May 1, 2026
@galt-tr galt-tr merged commit e3b2272 into main May 1, 2026
46 checks passed
@galt-tr galt-tr deleted the fix/issue-69-arcade-dlq-publish branch May 1, 2026 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-P3 Lowest rated bug, affects nearly none or low-impact size/M Medium change (51–200 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[F-011] failed messages are acknowledged even when DLQ publish fails

2 participants