CRE-4405: Add gateway request tracing; use default HTTP transport#22559
Conversation
|
I see you updated files related to
|
|
✅ No conflicts with other open PRs targeting |
4b4b31f to
9d8b7e1
Compare
|
1554a55 to
c28c309
Compare
824c827 to
714bf49
Compare
5f7921f to
cd68438
Compare
There was a problem hiding this comment.
Pull request overview
Risk Rating: MEDIUM
Adds outbound HTTP request tracing/metrics for the Gateway HTTP client and switches safeurl to use a transport with Go’s sensible defaults (enabling connection pooling), via a temporary fork.
Changes:
- Instrument Gateway outbound HTTP requests with
net/http/httptracephase timing and a total request-duration metric, plus OTEL histogram views registration. - Configure
safeurlto use the default HTTP transport (via a temporaryreplaceto a forkedsafeurl). - Propagate the
safeurlreplaceacross submodules (deployment, integration-tests, system-tests, scripts) with updatedgo.sumentries.
Scrupulous human review recommended (high impact areas):
core/services/gateway/network/httpclient.go: transport wiring and safety aroundhttp.DefaultTransportusage (panic risk) and ensuring the chosen transport semantics match desired pooling/timeouts.core/services/gateway/network/httpclient_trace.go+httpclient_metrics.go: metric semantics (phase “durations” vs “time-to” events) and label meaning/usage.
Reviewed changes
Copilot reviewed 12 out of 19 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
core/services/gateway/network/httpclient.go |
Adds transport override for safeurl client and records trace/total metrics on request paths. |
core/services/gateway/network/httpclient_trace.go |
Introduces httptrace.ClientTrace callbacks to record phase timings and connection reuse. |
core/services/gateway/network/httpclient_metrics.go |
Defines OTEL histogram instrument, attributes, and metric view/bucket configuration. |
core/services/gateway/network/httpclient_metrics_test.go |
Basic construction/recording smoke tests for the new metrics helpers. |
core/cmd/shell.go |
Registers the new Gateway HTTP client metric views in the node’s metric view list. |
go.mod |
Adds replace to forked safeurl pending upstream merge. |
go.sum |
Updates checksums for the forked safeurl. |
deployment/go.mod |
Adds replace to forked safeurl in deployment module. |
deployment/go.sum |
Updates checksums for the forked safeurl in deployment module. |
integration-tests/go.mod |
Adds replace to forked safeurl in integration-tests module. |
integration-tests/go.sum |
Updates checksums for the forked safeurl in integration-tests module. |
integration-tests/load/go.mod |
Adds replace to forked safeurl in integration-tests/load module. |
integration-tests/load/go.sum |
Updates checksums for the forked safeurl in integration-tests/load module. |
system-tests/lib/go.mod |
Adds replace to forked safeurl in system-tests/lib module. |
system-tests/lib/go.sum |
Updates checksums for the forked safeurl in system-tests/lib module. |
system-tests/tests/go.mod |
Adds replace to forked safeurl in system-tests/tests module. |
system-tests/tests/go.sum |
Updates checksums for the forked safeurl in system-tests/tests module. |
core/scripts/go.mod |
Adds replace to forked safeurl in core/scripts module. |
core/scripts/go.sum |
Updates checksums for the forked safeurl in core/scripts module. |
| requestStart := time.Now() | ||
| trace, traceState := newClientTrace(ctx, req.Method, requestStart, c.metrics) | ||
| timeoutCtx = httptrace.WithClientTrace(timeoutCtx, trace) | ||
|
|
cd68438 to
3bddc70
Compare
| // recordTotal records the total request lifetime with the result attributes. | ||
| // The histogram's count for phase=total doubles as the request counter. | ||
| // Success means the request returned a response, it does not imply a successful 2xx statusCode. | ||
| func (m *httpClientMetrics) recordTotal(ctx context.Context, method string, statusCode int, success, connReused bool, d time.Duration) { | ||
| m.phaseDuration.Record(ctx, d.Milliseconds(), metric.WithAttributes( | ||
| attribute.String("method", method), | ||
| attribute.String("phase", string(PhaseTotal)), | ||
| attribute.String("statusCode", strconv.Itoa(statusCode)), | ||
| attribute.String("success", strconv.FormatBool(success)), | ||
| attribute.String("connectionReused", strconv.FormatBool(connReused)), | ||
| )) |
|
|
||
| defaultTransport, ok := http.DefaultTransport.(*http.Transport) | ||
| if !ok { | ||
| return nil, errors.New("could not coerce http.DefaultTransport to *http.Transport") |
3bddc70 to
facd17c
Compare
|




See this screenshot from a before/after of using the default HTTP transport using the metrics:
