Skip to content

[-] handle pg sink being down gracefully, fixes #1426#1429

Open
0xgouda wants to merge 3 commits into
masterfrom
fix-pg-sink-flush
Open

[-] handle pg sink being down gracefully, fixes #1426#1429
0xgouda wants to merge 3 commits into
masterfrom
fix-pg-sink-flush

Conversation

@0xgouda

@0xgouda 0xgouda commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator
  • If CopyFrom() fails due to a connection error, assume that the sink DB is down and break from the write loop without writing the metrics.
  • The to-be-flushed metrics will be lost and removed from the pgwatch cache.

New behaviour:

Now, pgwatch will issue a retry connection and possibly complain with a connection error once per flush.

Also, I chose to drop the to-be-written metrics from the pg sink cache so the system doesn't get slow as the measurements accumulate which will then trigger a highLoadTimeOut delay (i.e. 5 seconds) for every new measurement added!!!, which won't be good if other sinks are expecting the data (e.g, Prometheus or an additional pg sink).

Closes #1426

@0xgouda 0xgouda self-assigned this Jun 2, 2026
@0xgouda 0xgouda added bug Something isn't working sinks Where and how to store monitored data labels Jun 2, 2026
@0xgouda 0xgouda force-pushed the fix-pg-sink-flush branch from 20a1e76 to 7e6eead Compare June 2, 2026 17:19
@coveralls

coveralls commented Jun 2, 2026

Copy link
Copy Markdown

Coverage Report for CI Build 26842545543

Coverage increased (+0.2%) to 86.102%

Details

  • Coverage increased (+0.2%) from the base build.
  • Patch coverage: 3 of 3 lines across 1 file are fully covered (100%).
  • No coverage regressions found.

Uncovered Changes

No uncovered changes found.

Coverage Regressions

No coverage regressions found.


Coverage Stats

Coverage Status
Relevant Lines: 5454
Covered Lines: 4696
Line Coverage: 86.1%
Coverage Strength: 0.98 hits per line

💛 - Coveralls

@0xgouda 0xgouda requested a review from pashagolub June 2, 2026 17:24
@0xgouda 0xgouda force-pushed the fix-pg-sink-flush branch from d38deb2 to 495c410 Compare June 2, 2026 19:09
@pashagolub pashagolub changed the title [-] handle pg sink being down gracefully [-] handle pg sink being down gracefully, fixes #1426 Jun 2, 2026
Comment on lines +441 to 447
if _, ok := err.(*pgconn.ConnectError); ok {
logger.Errorf("Sink DB not reachable, dropping %d cached measurements", len(msgs))
break
}
if PgError, ok := err.(*pgconn.PgError); ok {
pgw.forceRecreatePartitions = PgError.Code == "23514"
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use switch+errors.Is() here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working sinks Where and how to store monitored data

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pgwatch goes mad when the pg sink is down

3 participants