Skip to content

cert-checker: fix logging & push metrics#8763

Open
lenaunderwood22 wants to merge 22 commits into
mainfrom
cert-checker-logs
Open

cert-checker: fix logging & push metrics#8763
lenaunderwood22 wants to merge 22 commits into
mainfrom
cert-checker-logs

Conversation

@lenaunderwood22
Copy link
Copy Markdown
Contributor

Fixes #8753

@lenaunderwood22 lenaunderwood22 requested a review from a team as a code owner May 26, 2026 21:25
@lenaunderwood22 lenaunderwood22 requested a review from ezekiel May 26, 2026 21:25
ezekiel
ezekiel previously approved these changes May 26, 2026
@ezekiel ezekiel requested review from a team and aarongable and removed request for a team May 26, 2026 21:56
@github-actions
Copy link
Copy Markdown
Contributor

@lenaunderwood22, this PR appears to contain configuration and/or SQL schema changes. Please ensure that a corresponding deployment ticket has been filed with the new values.

@ezekiel ezekiel requested review from a team and removed request for aarongable May 26, 2026 21:56
Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/cert-checker/main.go
Comment thread cmd/cert-checker/main.go
Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/shell.go
Co-authored-by: Aaron Gable <aaron@letsencrypt.org>
aarongable
aarongable previously approved these changes May 28, 2026
@aarongable aarongable requested a review from ezekiel May 28, 2026 23:33
ezekiel
ezekiel previously approved these changes May 28, 2026
@lenaunderwood22 lenaunderwood22 dismissed stale reviews from ezekiel and aarongable via 15ce586 May 29, 2026 14:50
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates cert-checker to emit structured audit logs (instead of printing a large multi-line JSON blob to stdout) and to publish run metrics (good/bad cert counts, latency, last-run timestamp) to a Prometheus Pushgateway discovered via DNS SRV (intended to replace an external wrapper script per issue #8753).

Changes:

  • Replace stdout JSON report output with per-certificate audit error logs plus a final audit summary.
  • Add Prometheus metrics for cert-checker results and push them to a Pushgateway (incl. SRV discovery + shared cmd.PushMetrics helper).
  • Update tests/config fixtures and vendor the Prometheus push subpackage.

Reviewed changes

Copilot reviewed 5 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
cmd/cert-checker/main.go Switch to audit logging, add metrics, add SRV-based Pushgateway discovery and push behavior.
cmd/cert-checker/main_test.go Update tests to assert audit log emission rather than JSON report contents.
cmd/shell.go Add PushMetrics helper using Prometheus push client.
test/config/cert-checker.json Remove unexpiredOnly field from fixture config.
test/config-next/cert-checker.json Remove unexpiredOnly field from fixture config.
vendor/modules.txt Add vendored package path for prometheus/push.
vendor/github.com/prometheus/client_golang/prometheus/push/push.go Vendor Prometheus Pushgateway client implementation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/shell.go
Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/cert-checker/main.go Outdated
Comment thread cmd/cert-checker/main.go Outdated
Copy link
Copy Markdown
Contributor

@aarongable aarongable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some nits / test improvements.

Comment thread cmd/cert-checker/main.go
checkerBadCount prometheus.Gauge
}

func NewCertCheckerMetrics(stats prometheus.Registerer) *certCheckerMetrics {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructor is only called from inside this package, so it doesn't need to be Exported.

Suggested change
func NewCertCheckerMetrics(stats prometheus.Registerer) *certCheckerMetrics {
func newCertCheckerMetrics(stats prometheus.Registerer) *certCheckerMetrics {

Comment thread cmd/cert-checker/main.go
checker.processCerts(context.TODO(), wg, config.CertChecker.BadResultsOnly)
checkerLatency.Observe(checker.clock.Since(s).Seconds())
}()
checker.processCerts(context.TODO())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
checker.processCerts(context.TODO())
checker.processCerts(context.Background())

wg := new(sync.WaitGroup)
wg.Add(1)
checker.processCerts(context.Background(), wg, false)
checker.processCerts(context.Background())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In unit tests, always use the context provided by the test harness:

Suggested change
checker.processCerts(context.Background())
checker.processCerts(t.Context())

}

func TestGetPushgatewayURL(t *testing.T) {
ctx := context.Background()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here. In fact, probably leave this line out entirely, and just use t.Context() in the three places below.

t.Run("happy path", func(t *testing.T) {
gotURL, err := getPushgatewayURL(ctx, "consul.service.consul:53",
cmd.ServiceDomain{Service: "redisratelimits", Domain: "service.consul"})
test.AssertNotError(t, err, "")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a new test, we prefer to follow the Google Go style guide. That means using traditional error checking, and not using the assertion helper library.

Suggested change
test.AssertNotError(t, err, "")
if err != nil {
t.Fatalf("getPushgatewayURL(consul.service.consul:53) = %s, but want success", err)
}

The same applies for the assertions below. Use t.Fatalf for places where test execution should stop, and t.Errorf for places where the test should fail but we should maybe also see the results of the next thing as well.

t.Run("SRV not found", func(t *testing.T) {
_, err := getPushgatewayURL(ctx, "consul.service.consul:53",
cmd.ServiceDomain{Service: "doesnotexist", Domain: "service.consul"})
test.AssertError(t, err, "")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the comment above about not using the test assertion helper, these tests should probably also ensure that they're getting the right error:

Suggested change
test.AssertError(t, err, "")
if err != nil {
t.Fatalf("getPushgatewayURL(consul.service.consul:53) = %s, but want success", err)
}
if !strings.Contains(err.Error(), "SRV lookup of doesnotexist._tcp.service.consul failed") {
t.Errorf("getPushgatewayURL(doesnotexist) = %q, but want 'SRV lookup failed'"), err)
}

or something like that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve cert-checker's log and metric output

5 participants