feat: add Prometheus Pushgateway support for CLI apps by coolwednesday · Pull Request #3176 · gofr-dev/gofr

coolwednesday · 2026-03-17T08:15:58Z

Summary

CLI applications are short-lived — they exit before Prometheus can scrape /metrics. This PR adds push-based metrics export via Prometheus Pushgateway for GoFr CLI apps with cumulative counters across runs.

Closes #2232

Problem

Each CLI run starts counters at 0. A plain Pushgateway overwrites on each push, so counters never accumulate (run1=1, run2=1, run3=1 instead of 1, 2, 3). Gauges like last_success_timestamp must not be summed.

Solution: Read-Modify-Write

Instead of using an aggregation gateway, we implement read-modify-write on the standard Pushgateway:

CLI Run N:
  1. GET /metrics from Pushgateway (fetch existing values)
  2. Gather local metrics from this run
  3. Merge:
     - Counters/Histograms → sum existing + local
     - Gauges → use local value (latest wins)
  4. PUT merged result back to Pushgateway

Why not an aggregation gateway?

Zapier prom-aggregation-gateway: Sums ALL metric types including gauges — last_success_timestamp would produce nonsensical values (timestamp1 + timestamp2)
Prometheus Gravel Gateway: Supports per-type aggregation via clearmode label, but dormant since Nov 2023, single maintainer, no prebuilt Docker images, only ~117 stars

Read-modify-write gives correct semantics with zero additional infrastructure. The trade-off is a small race window when concurrent CLI runs overlap, but this is unlikely for CLI workloads (worst case: one lost increment).

What's included

Read-modify-write Pushgateway client (pkg/gofr/metrics/exporters/pushgateway.go):
- Custom HTTP client using expfmt for Prometheus text format encoding/decoding
- mergeMetrics() — sums counters/histograms, replaces gauges
- labelKey() — matches metrics by app-defined labels only, filtering out Pushgateway-injected (job, instance) and OTel scope labels
Auto CLI metrics in cmd.go:
- app_cmd_duration_seconds (histogram with CLI-appropriate buckets)
- app_cmd_success (counter, cumulative across runs)
- app_cmd_failures (counter, cumulative across runs)
- app_cmd_last_success_timestamp (gauge, latest value)
CLI shutdown path in run.go: Calls Shutdown() after cmd.Run() to flush metrics
Config-driven: Set METRICS_PUSH_GATEWAY_URL env var to enable (CLI only)
Enriched CMD Metrics dashboard merged into the main GoFr Application Services Monitoring dashboard:
- Health Overview: jobs tracked, last push age, total successes/failures, success rate
- Per-Job Breakdown: table with merge transform (successes, failures, p95 duration, last success per job×command)
- Duration Analysis: bar chart with p50/p90/p95/p99 percentiles, gauge panel with thresholds
- $job and $command template variables for CLI filtering
Pushgateway added to http-server docker setup: docker-compose service + Prometheus scrape config with honor_labels: true
sample-cmd README expanded with setup instructions and GitHub links to shared docker/dashboard setup

Design decisions

Pushgateway is wired in NewCMD() only — HTTP apps continue using pull-based scraping
Container owns the pushgateway and flushes on Close()
Dropped prometheus/push dependency — raw HTTP with expfmt gives full control over the read-modify-write cycle
Uses dedicated AppRegistry (not DefaultGatherer) to avoid pushing Go runtime metrics
Dashboard uses ${DataSource} variable and collapsed row — non-intrusive for HTTP-only users

Test plan

go build ./... compiles
go test ./pkg/gofr/metrics/exporters/ — 18 tests covering all merge logic, label filtering, error paths
go test ./pkg/gofr/container/ ./pkg/gofr/ — existing tests pass
golangci-lint run clean
go vet -race clean on our packages
Docker smoke test: run hello×6, fail×4, batch×1, progress×1 → counters accumulated correctly, gauge shows latest timestamp, histogram buckets merged
Grafana dashboard verified: all panels populated, merge transform working, otel labels hidden

CLI apps are short-lived and exit before Prometheus can scrape /metrics. This adds push-based metrics export via Pushgateway, configured through METRICS_PUSH_GATEWAY_URL env var, along with auto CLI metrics tracking (duration, success/error counters) and observability infrastructure. Closes gofr-dev#2232

Umang01-hash

Issue #2232 explicitly listed "Support cleanup (optional) so old metrics don't pile up" as a requirement. Every CronJob run permanently adds a job group to the Pushgateway. Please add A Delete(ctx context.Context) error method on PushGateway using pusher.DeleteContext(ctx) and METRICS_PUSH_GATEWAY_DELETE_ON_FINISH=true env var to opt in .
All apps without APP_NAME set push under the same job group and silently overwrite each other. Change the fallback to filepath.Base(os.Args[0]) or add a dedicated METRICS_PUSH_GATEWAY_JOB env var override.
Current max bucket is 60s. Cron buckets extend to 3600s. A 5-minute batch job falls into +Inf only. Align upper boundary with app_cron_duration_seconds.
Metric naming inconsistency with cron :
app_cmd_errors_total → app_cmd_failures (match cron's _failures)
app_cmd_success_total → app_cmd_success (match cron's no-_total)
Add app_cmd_total (match cron's app_cron_job_total)
Move metricServer.Shutdown(ctx) before container.Close() in Shutdown() so the Prometheus scrape endpoint stops accepting requests before the OTel meter provider is shut down.

coolwednesday · 2026-03-20T06:58:49Z

Regarding Comment 1 (Delete support / METRICS_PUSH_GATEWAY_DELETE_ON_FINISH):

The Pushgateway documentation explicitly states that the Pushgateway is designed as a metric cache — the standard recommendation is to not delete pushed metrics, and instead use job and instance labels to distinguish runs.

If you push and immediately delete, Prometheus may not have scraped yet (typical scrape interval is 15–30s), and the metrics are lost forever. There's no reliable way for the CLI to know whether Prometheus has completed its scrape before issuing a delete.

For users who need cleanup of stale metrics, this is best handled at the Pushgateway operational level (e.g., Pushgateway's own --push.disable-consistency-check flag, TTL configurations, or external cron jobs that prune old job groups) — not from the framework level. Baking delete into the framework adds a footgun that's hard to use safely by default.

This can always be revisited in a follow-up if users explicitly request it, but for v1 the "push and leave" approach is the correct and safe default.

coolwednesday · 2026-03-20T07:18:49Z

Regarding Comment 5 (Shutdown order — move metricServer.Shutdown before container.Close):

The current shutdown order is actually correct:

httpServer.Shutdown → grpcServer.Shutdown → container.Close() → metricServer.Shutdown

The /metrics HTTP endpoint should stay alive as long as possible so Prometheus can scrape final metrics. Shutting it down earlier would mean Prometheus misses the last scrape window.

For the Pushgateway path specifically, the push happens inside container.Close() before the meter provider shuts down — which is the right sequence (push metrics first, then tear down the provider).

coolwednesday · 2026-03-20T07:18:51Z

Regarding Comment 8 (Factory.go test coverage):

The new pushgateway wiring in factory.go is 4 lines of config-read + constructor call. The core logic (NewPushGateway, Push) is already covered in pushgateway_test.go. Writing a proper test for the factory wiring requires heavy config mocking for minimal additional coverage. Deferring this to a follow-up PR.

- Replace basic CMD Metrics panels with enriched CLI dashboard (health overview, job status table, duration bar chart with p50-p99) - Add pushgateway service to http-server docker-compose - Add pushgateway scrape config with honor_labels - Add $job and $command template variables for CLI filtering - Expand sample-cmd README with setup instructions

coolwednesday · 2026-03-24T06:43:48Z

Here is the screenshot of the CLI Dashboard :

Copilot

Pull request overview

This PR adds Pushgateway-based metrics support for GoFr CLI applications so short-lived commands can export Prometheus metrics, while also wiring example dashboards and docker setups to visualize those metrics. It extends the existing metrics/exporter infrastructure and CLI runtime path to support pushing metrics on shutdown.

Changes:

Added a new Pushgateway exporter with read-modify-write merge logic and hooked CLI shutdown into metric pushing.
Added automatic CLI command metrics (success, failures, duration, last_success_timestamp) plus related tests and exporter plumbing.
Expanded sample and http-server observability assets with Pushgateway, Prometheus, Grafana, and dashboard updates.

Reviewed changes

Copilot reviewed 24 out of 25 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`pkg/gofr/run.go`	Triggers CLI shutdown after command execution so metrics can flush.
`pkg/gofr/metrics/register_test.go`	Refactors metrics tests to use the new Prometheus helper signature.
`pkg/gofr/metrics/handler_test.go`	Updates handler tests to use the new test metrics helper.
`pkg/gofr/metrics/exporters/pushgateway_test.go`	Adds unit coverage for Pushgateway push/merge behavior.
`pkg/gofr/metrics/exporters/pushgateway.go`	Implements the Pushgateway client and metric merge logic.
`pkg/gofr/metrics/exporters/exporter.go`	Returns both meter and provider; adds dedicated app registry support.
`pkg/gofr/factory.go`	Wires Pushgateway support into `NewCMD()` based on config.
`pkg/gofr/container/container_test.go`	Adds tests for pushing metrics during container close.
`pkg/gofr/container/container.go`	Stores meter provider / pusher and flushes them on close.
`pkg/gofr/cmd_test.go`	Adds CLI tests around success/failure command execution paths.
`pkg/gofr/cmd.go`	Registers and records built-in CLI metrics.
`go.sum`	Updates dependency lockfile for exporter-related dependency changes.
`go.mod`	Adds Prometheus parsing deps and removes unused top-level deps.
`examples/sample-cmd/main.go`	Adds sample failing and batch commands for CLI metric demos.
`examples/sample-cmd/docker/provisioning/datasources/datasource.yaml`	Adds Grafana Prometheus datasource provisioning for the sample stack.
`examples/sample-cmd/docker/provisioning/dashboards/gofr-dashboard/dashboards.json`	Adds a sample CLI metrics dashboard.
`examples/sample-cmd/docker/provisioning/dashboards/dashboards.yaml`	Provisions the sample dashboard directory in Grafana.
`examples/sample-cmd/docker/prometheus/prometheus.yml`	Configures Prometheus to scrape Pushgateway in the sample stack.
`examples/sample-cmd/docker/docker-compose.yaml`	Defines the sample CLI observability stack with Pushgateway/Prometheus/Grafana.
`examples/sample-cmd/configs/.env`	Adds sample CLI env defaults.
`examples/sample-cmd/README.md`	Documents CLI metrics usage and local observability setup.
`examples/sample-cmd/Dockerfile`	Adds a container image for the sample CLI app.
`examples/http-server/docker/provisioning/dashboards/gofr-dashboard/dashboards.json`	Extends the main dashboard with CLI metrics panels and filters.
`examples/http-server/docker/prometheus/prometheus.yml`	Adds Pushgateway scraping to the shared http-server Prometheus config.
`examples/http-server/docker/docker-compose.yaml`	Adds a Pushgateway service to the shared observability stack.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+            "type": "prometheus",
+            "uid": "prometheus"
+          },
+          "expr": "sum(app_cmd_success_total) by (command)",


+            "type": "prometheus",
+            "uid": "prometheus"
+          },
+          "expr": "sum(app_cmd_errors_total) by (command)",


+	existing := p.fetchExisting(ctx)
+
+	merged := mergeMetrics(existing, localFamilies)


+func mergeMetrics(existing map[string]*dto.MetricFamily, local []*dto.MetricFamily) []*dto.MetricFamily {
+	result := make([]*dto.MetricFamily, 0, len(local))
+
+	for _, lf := range local {
+		name := lf.GetName()
+		ef, found := existing[name]
+
+		if !found {
+			result = append(result, lf)
+			continue
+		}
+
+		merged := mergeFamilies(ef, lf)
+		result = append(result, merged)
+	}
+
+	return result


+// into a map of metric families. Returns an empty map on any failure so that
+// Push can proceed with local-only values (first run or Pushgateway down).
+func (p *PushGateway) fetchExisting(ctx context.Context) map[string]*dto.MetricFamily {
+	req, err := http.NewRequestWithContext(ctx, http.MethodGet, p.metricsURL, http.NoBody)
+	if err != nil {
+		p.logger.Logf("could not create GET request for existing metrics: %v", err)
+		return nil
+	}
+
+	resp, err := p.client.Do(req)
+	if err != nil {
+		p.logger.Logf("could not fetch existing metrics (first run?): %v", err)
+		return nil
+	}
+
+	defer resp.Body.Close()
+
+	if resp.StatusCode != http.StatusOK {
+		p.logger.Logf("Pushgateway returned %d on GET /metrics, treating as empty", resp.StatusCode)
+		return nil
+	}
+
+	parser := expfmt.NewTextParser(model.LegacyValidation)
+
+	families, err := parser.TextToMetricFamilies(resp.Body)
+	if err != nil {
+		p.logger.Logf("could not parse existing metrics: %v", err)
+		return nil
+	}
+
+	return families


 	app.container.Create(app.Config)
 	app.initTracer()

+	if url := app.Config.Get("METRICS_PUSH_GATEWAY_URL"); url != "" {
+		jobName := app.Config.Get("APP_NAME")
+		if jobName == "" {
+			jobName = filepath.Base(os.Args[0])
+		}
+
+		// Use a dedicated registry that only collects app metrics (no Go runtime/process
+		// collectors) so Pushgateway groups stay clean and consistent with pull-based scraping.
+		appRegistry, meter, provider := exporters.NewAppRegistry(app.container.GetAppName(), app.container.GetAppVersion())
+		app.container.SetMeterProvider(provider)
+		app.container.SetMetricsManager(meter)
+		app.container.SetPushGateway(exporters.NewPushGateway(url, jobName, appRegistry, app.container.Logger))
+	}
+


Umang01-hash reviewed Mar 18, 2026

View reviewed changes

Comment thread pkg/gofr/container/container.go Outdated

Comment thread pkg/gofr/container/container.go Outdated

Comment thread pkg/gofr/factory.go

Comment thread pkg/gofr/run.go Outdated

Comment thread pkg/gofr/metrics/exporters/pushgateway.go Outdated

merge development to resolve go.work.sum conflicts

7bf1ee5

Umang01-hash requested a review from Copilot May 4, 2026 06:59

Copilot started reviewing on behalf of Umang01-hash May 4, 2026 06:59 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Prometheus Pushgateway support for CLI apps#3176

feat: add Prometheus Pushgateway support for CLI apps#3176
coolwednesday wants to merge 3 commits into
gofr-dev:developmentfrom
coolwednesday:feature/metrics-pushgateway-cli

coolwednesday commented Mar 17, 2026 •

edited

Loading

Uh oh!

Umang01-hash left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coolwednesday commented Mar 20, 2026

Uh oh!

coolwednesday commented Mar 20, 2026

Uh oh!

coolwednesday commented Mar 20, 2026

Uh oh!

coolwednesday commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		existing := p.fetchExisting(ctx)

		merged := mergeMetrics(existing, localFamilies)

Conversation

coolwednesday commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution: Read-Modify-Write

What's included

Design decisions

Test plan

Uh oh!

Umang01-hash left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coolwednesday commented Mar 20, 2026

Uh oh!

coolwednesday commented Mar 20, 2026

Uh oh!

coolwednesday commented Mar 20, 2026

Uh oh!

coolwednesday commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coolwednesday commented Mar 17, 2026 •

edited

Loading