feat: add Prometheus Pushgateway support for CLI apps#3176
feat: add Prometheus Pushgateway support for CLI apps#3176coolwednesday wants to merge 3 commits into
Conversation
CLI apps are short-lived and exit before Prometheus can scrape /metrics. This adds push-based metrics export via Pushgateway, configured through METRICS_PUSH_GATEWAY_URL env var, along with auto CLI metrics tracking (duration, success/error counters) and observability infrastructure. Closes gofr-dev#2232
Umang01-hash
left a comment
There was a problem hiding this comment.
-
Issue #2232 explicitly listed "Support cleanup (optional) so old metrics don't pile up" as a requirement. Every CronJob run permanently adds a job group to the Pushgateway. Please add A
Delete(ctx context.Context)error method on PushGateway using pusher.DeleteContext(ctx) andMETRICS_PUSH_GATEWAY_DELETE_ON_FINISH=trueenv var to opt in . -
All apps without
APP_NAMEset push under the same job group and silently overwrite each other. Change the fallback to filepath.Base(os.Args[0]) or add a dedicatedMETRICS_PUSH_GATEWAY_JOBenv var override. -
Current max bucket is 60s. Cron buckets extend to 3600s. A 5-minute batch job falls into +Inf only. Align upper boundary with app_cron_duration_seconds.
-
Metric naming inconsistency with cron :
app_cmd_errors_total → app_cmd_failures (match cron's _failures)
app_cmd_success_total → app_cmd_success (match cron's no-_total)
Add app_cmd_total (match cron's app_cron_job_total) -
Move
metricServer.Shutdown(ctx)beforecontainer.Close()in Shutdown() so the Prometheus scrape endpoint stops accepting requests before the OTel meter provider is shut down.
|
Regarding Comment 1 (Delete support / The Pushgateway documentation explicitly states that the Pushgateway is designed as a metric cache — the standard recommendation is to not delete pushed metrics, and instead use If you push and immediately delete, Prometheus may not have scraped yet (typical scrape interval is 15–30s), and the metrics are lost forever. There's no reliable way for the CLI to know whether Prometheus has completed its scrape before issuing a delete. For users who need cleanup of stale metrics, this is best handled at the Pushgateway operational level (e.g., Pushgateway's own This can always be revisited in a follow-up if users explicitly request it, but for v1 the "push and leave" approach is the correct and safe default. |
|
Regarding Comment 5 (Shutdown order — move metricServer.Shutdown before container.Close): The current shutdown order is actually correct: The For the Pushgateway path specifically, the push happens inside |
|
Regarding Comment 8 (Factory.go test coverage): The new pushgateway wiring in |
- Replace basic CMD Metrics panels with enriched CLI dashboard (health overview, job status table, duration bar chart with p50-p99) - Add pushgateway service to http-server docker-compose - Add pushgateway scrape config with honor_labels - Add $job and $command template variables for CLI filtering - Expand sample-cmd README with setup instructions
There was a problem hiding this comment.
Pull request overview
This PR adds Pushgateway-based metrics support for GoFr CLI applications so short-lived commands can export Prometheus metrics, while also wiring example dashboards and docker setups to visualize those metrics. It extends the existing metrics/exporter infrastructure and CLI runtime path to support pushing metrics on shutdown.
Changes:
- Added a new Pushgateway exporter with read-modify-write merge logic and hooked CLI shutdown into metric pushing.
- Added automatic CLI command metrics (
success,failures,duration,last_success_timestamp) plus related tests and exporter plumbing. - Expanded sample and http-server observability assets with Pushgateway, Prometheus, Grafana, and dashboard updates.
Reviewed changes
Copilot reviewed 24 out of 25 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
pkg/gofr/run.go |
Triggers CLI shutdown after command execution so metrics can flush. |
pkg/gofr/metrics/register_test.go |
Refactors metrics tests to use the new Prometheus helper signature. |
pkg/gofr/metrics/handler_test.go |
Updates handler tests to use the new test metrics helper. |
pkg/gofr/metrics/exporters/pushgateway_test.go |
Adds unit coverage for Pushgateway push/merge behavior. |
pkg/gofr/metrics/exporters/pushgateway.go |
Implements the Pushgateway client and metric merge logic. |
pkg/gofr/metrics/exporters/exporter.go |
Returns both meter and provider; adds dedicated app registry support. |
pkg/gofr/factory.go |
Wires Pushgateway support into NewCMD() based on config. |
pkg/gofr/container/container_test.go |
Adds tests for pushing metrics during container close. |
pkg/gofr/container/container.go |
Stores meter provider / pusher and flushes them on close. |
pkg/gofr/cmd_test.go |
Adds CLI tests around success/failure command execution paths. |
pkg/gofr/cmd.go |
Registers and records built-in CLI metrics. |
go.sum |
Updates dependency lockfile for exporter-related dependency changes. |
go.mod |
Adds Prometheus parsing deps and removes unused top-level deps. |
examples/sample-cmd/main.go |
Adds sample failing and batch commands for CLI metric demos. |
examples/sample-cmd/docker/provisioning/datasources/datasource.yaml |
Adds Grafana Prometheus datasource provisioning for the sample stack. |
examples/sample-cmd/docker/provisioning/dashboards/gofr-dashboard/dashboards.json |
Adds a sample CLI metrics dashboard. |
examples/sample-cmd/docker/provisioning/dashboards/dashboards.yaml |
Provisions the sample dashboard directory in Grafana. |
examples/sample-cmd/docker/prometheus/prometheus.yml |
Configures Prometheus to scrape Pushgateway in the sample stack. |
examples/sample-cmd/docker/docker-compose.yaml |
Defines the sample CLI observability stack with Pushgateway/Prometheus/Grafana. |
examples/sample-cmd/configs/.env |
Adds sample CLI env defaults. |
examples/sample-cmd/README.md |
Documents CLI metrics usage and local observability setup. |
examples/sample-cmd/Dockerfile |
Adds a container image for the sample CLI app. |
examples/http-server/docker/provisioning/dashboards/gofr-dashboard/dashboards.json |
Extends the main dashboard with CLI metrics panels and filters. |
examples/http-server/docker/prometheus/prometheus.yml |
Adds Pushgateway scraping to the shared http-server Prometheus config. |
examples/http-server/docker/docker-compose.yaml |
Adds a Pushgateway service to the shared observability stack. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "type": "prometheus", | ||
| "uid": "prometheus" | ||
| }, | ||
| "expr": "sum(app_cmd_success_total) by (command)", |
| "type": "prometheus", | ||
| "uid": "prometheus" | ||
| }, | ||
| "expr": "sum(app_cmd_errors_total) by (command)", |
| existing := p.fetchExisting(ctx) | ||
|
|
||
| merged := mergeMetrics(existing, localFamilies) |
| func mergeMetrics(existing map[string]*dto.MetricFamily, local []*dto.MetricFamily) []*dto.MetricFamily { | ||
| result := make([]*dto.MetricFamily, 0, len(local)) | ||
|
|
||
| for _, lf := range local { | ||
| name := lf.GetName() | ||
| ef, found := existing[name] | ||
|
|
||
| if !found { | ||
| result = append(result, lf) | ||
| continue | ||
| } | ||
|
|
||
| merged := mergeFamilies(ef, lf) | ||
| result = append(result, merged) | ||
| } | ||
|
|
||
| return result |
| // into a map of metric families. Returns an empty map on any failure so that | ||
| // Push can proceed with local-only values (first run or Pushgateway down). | ||
| func (p *PushGateway) fetchExisting(ctx context.Context) map[string]*dto.MetricFamily { | ||
| req, err := http.NewRequestWithContext(ctx, http.MethodGet, p.metricsURL, http.NoBody) | ||
| if err != nil { | ||
| p.logger.Logf("could not create GET request for existing metrics: %v", err) | ||
| return nil | ||
| } | ||
|
|
||
| resp, err := p.client.Do(req) | ||
| if err != nil { | ||
| p.logger.Logf("could not fetch existing metrics (first run?): %v", err) | ||
| return nil | ||
| } | ||
|
|
||
| defer resp.Body.Close() | ||
|
|
||
| if resp.StatusCode != http.StatusOK { | ||
| p.logger.Logf("Pushgateway returned %d on GET /metrics, treating as empty", resp.StatusCode) | ||
| return nil | ||
| } | ||
|
|
||
| parser := expfmt.NewTextParser(model.LegacyValidation) | ||
|
|
||
| families, err := parser.TextToMetricFamilies(resp.Body) | ||
| if err != nil { | ||
| p.logger.Logf("could not parse existing metrics: %v", err) | ||
| return nil | ||
| } | ||
|
|
||
| return families |
| app.container.Create(app.Config) | ||
| app.initTracer() | ||
|
|
||
| if url := app.Config.Get("METRICS_PUSH_GATEWAY_URL"); url != "" { | ||
| jobName := app.Config.Get("APP_NAME") | ||
| if jobName == "" { | ||
| jobName = filepath.Base(os.Args[0]) | ||
| } | ||
|
|
||
| // Use a dedicated registry that only collects app metrics (no Go runtime/process | ||
| // collectors) so Pushgateway groups stay clean and consistent with pull-based scraping. | ||
| appRegistry, meter, provider := exporters.NewAppRegistry(app.container.GetAppName(), app.container.GetAppVersion()) | ||
| app.container.SetMeterProvider(provider) | ||
| app.container.SetMetricsManager(meter) | ||
| app.container.SetPushGateway(exporters.NewPushGateway(url, jobName, appRegistry, app.container.Logger)) | ||
| } | ||
|
|

Summary
CLI applications are short-lived — they exit before Prometheus can scrape
/metrics. This PR adds push-based metrics export via Prometheus Pushgateway for GoFr CLI apps with cumulative counters across runs.Closes #2232
Problem
Each CLI run starts counters at 0. A plain Pushgateway overwrites on each push, so counters never accumulate (run1=1, run2=1, run3=1 instead of 1, 2, 3). Gauges like
last_success_timestampmust not be summed.Solution: Read-Modify-Write
Instead of using an aggregation gateway, we implement read-modify-write on the standard Pushgateway:
Why not an aggregation gateway?
last_success_timestampwould produce nonsensical values (timestamp1 + timestamp2)clearmodelabel, but dormant since Nov 2023, single maintainer, no prebuilt Docker images, only ~117 starsRead-modify-write gives correct semantics with zero additional infrastructure. The trade-off is a small race window when concurrent CLI runs overlap, but this is unlikely for CLI workloads (worst case: one lost increment).
What's included
pkg/gofr/metrics/exporters/pushgateway.go):expfmtfor Prometheus text format encoding/decodingmergeMetrics()— sums counters/histograms, replaces gaugeslabelKey()— matches metrics by app-defined labels only, filtering out Pushgateway-injected (job,instance) and OTel scope labelscmd.go:app_cmd_duration_seconds(histogram with CLI-appropriate buckets)app_cmd_success(counter, cumulative across runs)app_cmd_failures(counter, cumulative across runs)app_cmd_last_success_timestamp(gauge, latest value)run.go: CallsShutdown()aftercmd.Run()to flush metricsMETRICS_PUSH_GATEWAY_URLenv var to enable (CLI only)$joband$commandtemplate variables for CLI filteringhonor_labels: trueDesign decisions
NewCMD()only — HTTP apps continue using pull-based scrapingContainerowns the pushgateway and flushes onClose()prometheus/pushdependency — raw HTTP withexpfmtgives full control over the read-modify-write cycleAppRegistry(notDefaultGatherer) to avoid pushing Go runtime metrics${DataSource}variable and collapsed row — non-intrusive for HTTP-only usersTest plan
go build ./...compilesgo test ./pkg/gofr/metrics/exporters/— 18 tests covering all merge logic, label filtering, error pathsgo test ./pkg/gofr/container/ ./pkg/gofr/— existing tests passgolangci-lint runcleango vet -raceclean on our packages