Skip to content

POC: Async variant of SelectMergeProfile#4995

Closed
aleks-p wants to merge 11 commits intomainfrom
async-query-poc
Closed

POC: Async variant of SelectMergeProfile#4995
aleks-p wants to merge 11 commits intomainfrom
async-query-poc

Conversation

@aleks-p
Copy link
Copy Markdown
Contributor

@aleks-p aleks-p commented Apr 7, 2026

No description provided.

aleks-p and others added 11 commits April 20, 2026 09:22
Introduces a new AsyncQuerierService with a SelectMergeProfileAsync RPC
that allows callers to start a query asynchronously and poll for results
using a request_id. Results are stored in object storage with a 30-minute
TTL and per-tenant concurrency limits with tenant-level overrides.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When --async is set, profilecli uses SelectMergeProfileAsync to start
the query and polls every second until results are ready. Prints elapsed
time during polling so the user sees progress. Gracefully detects servers
that don't support async queries.

Also fixes the async coordinator to inject the tenant ID into the
background context so downstream handlers can extract it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests the full async flow: start query, poll until completion, verify
profile results. Covers tenants with data, single-service tenants, and
non-existing tenants. Skipped for V1 which lacks a query-frontend
component in the microservices test cluster.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of duplicating fields, SelectMergeProfileAsyncRequest now
contains a SelectMergeProfileRequest field. This avoids drift between
the two messages and simplifies the handler.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The executing goroutine sends periodic heartbeats (every 15s) to object
storage. When polling, if an in_progress query's last heartbeat is older
than 45s, it is reported as failed with a clear orphaned message. This
handles the case where a query-frontend replica crashes mid-execution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cQuerierService

The Async suffix on the method was redundant given the service name
already conveys it. The RPC is now AsyncQuerierService/SelectMergeProfile.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces the SelectMergeProfile-specific AsyncQuerierService with a
generic QueryFrontendService.Query endpoint that supports all query
types (pprof, tree, series, heatmap, labels) with automatic async
promotion.

Key changes:
- Extends QueryRequest/QueryResponse in query.proto with async fields
  (request_id, async flag, status, error_message)
- Deletes the old querier_async.proto and AsyncQuerierService
- Handler runs every query in a goroutine; if it completes within
  the configurable AsyncQueryThreshold (default 2s, per-tenant), the
  response is returned synchronously. If the threshold fires, the query
  is promoted to async and the client polls for results.
- The async flag in QueryRequest forces async mode (for testing)
- profilecli transparently handles async responses via a shared
  queryViaFrontendService helper, with fallback to QuerierService
  for servers that don't support QueryFrontendService
- Store now serializes QueryResponse instead of Profile

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ands

Moves the --async flag to the parent query command so it applies to all
subcommands (profile, go-pgo, series, top, label-values-cardinality).

Each command now tries QueryFrontendService first with transparent async
handling, falling back to the existing QuerierService RPCs for servers
that don't support it.

Adapters in query_frontend_adapter.go translate between the existing
QuerierService request types and QueryRequest, keeping the existing
command implementations untouched.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@aleks-p aleks-p closed this May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant