Summary
When running hyperf-opentelemetry inside a long-lived Hyperf/Swoole worker, the
OTLP exporters become unreliable after the collector recycles the HTTP/2
connection (e.g. LoadBalancer idle-timeout, rolling restart of the collector pod).
Two independent issues surface:
SwooleGrpcTransport::getClient() never detects a server-side close, so
the transport keeps reusing a half-closed Swoole\Coroutine\Http2\Client and
every subsequent export fails with "Broken pipe".
OtlpHttpTransportFactory falls back to PsrTransport with a blocking cURL
client, which is incompatible with Swoole's coroutine scheduler — throwing
eventLoop already created whenever the batch processor flushes inside a
coroutine context.
Both issues are reproducible with the exporters otlp_grpc and otlp_http as
documented in publish/open-telemetry.php, using:
opencodeco/hyperf-opentelemetry: 0.3.6 (latest)
- PHP: 8.2+ / 8.3
ext-swoole: 5.1.x / 5.x
- Hyperf: 3.1
- Collector:
otel/opentelemetry-collector-contrib (any recent version), reached
through a Kubernetes Service / ALB that closes idle HTTP/2 connections after
a few minutes.
Bug 1 — SwooleGrpcTransport::getClient() does not detect closed server connections
Location
src/Transport/SwooleGrpcTransport.php
private function getClient(): Client
{
if ($this->client === null || ! $this->client->connected) {
// (re)connect ...
}
return $this->client;
}
Problem
Swoole\Coroutine\Http2\Client::$connected only reflects whether
connect() was called — it does not flip to false when the peer sends a
TCP FIN / HTTP/2 GOAWAY. After the collector rotates, the client is still marked
connected === true, so getClient() returns the stale instance and the next
send() fails with:
RuntimeException: Failed to send gRPC request: Broken pipe
or, depending on timing:
RuntimeException: Failed to receive gRPC response: recv timeout
The transport keeps the same dead client forever, so every subsequent span/
metric export fails for the lifetime of the worker.
Additional noise
On PHP 8.2+ the same code path also triggers:
PHP Deprecated: Creation of dynamic property Swoole\Coroutine\Http2\Client::$serverLastStreamId is deprecated
The property is written by Swoole's own extension code inside recv(), so the
deprecation cannot be fixed by userland — it should at least be filtered.
Suggested fix
In send(), treat "broken pipe", "connection reset", "connection closed",
"GOAWAY" and send/recv false as retryable: drop and rebuild the client, then
retry once before surfacing the error. The stored client should also be nulled
on any error so the next batch re-connects.
Minimal patch:
public function send(string $payload, ?CancellationInterface $c = null): FutureInterface
{
if ($this->closed) {
return new ErrorFuture(new RuntimeException('Transport is closed'));
}
$attempts = 0;
retry:
try {
$client = $this->getClient();
$streamId = $client->send($this->buildRequest($payload));
if ($streamId === false || $streamId <= 0) {
throw new RuntimeException('send failed: ' . ($client->errMsg ?: 'unknown'));
}
$response = $client->recv($this->timeout);
if ($response === false) {
throw new RuntimeException('recv failed: ' . ($client->errMsg ?: 'timeout'));
}
// ... grpc-status handling
return new CompletedFuture(null);
} catch (Throwable $e) {
$this->client?->close();
$this->client = null;
if ($attempts++ < 1 && $this->isRetryable($e)) {
goto retry;
}
return new ErrorFuture($e);
}
}
Bonus: wrap send()/recv() in a scoped set_error_handler that swallows
only the serverLastStreamId E_DEPRECATED noise.
Workaround we shipped
We had to subclass and wire our own ResilientSwooleGrpcTransport via
config/autoload/dependencies.php, overriding both
OtlpGrpcTraceExporterFactory and OtlpGrpcMetricExporterFactory. Happy to
open a PR with the fix.
Bug 2 — OtlpHttpTransportFactory / PsrTransport blocks the Swoole event loop
Location
src/Factory/Trace/Exporter/OtlpHttpTraceExporterFactory.php
src/Factory/Log/Exporter/OtlpHttpLogExporterFactory.php
src/Factory/Metric/Exporter/OtlpHttpMetricExporterFactory.php
These factories resolve to open-telemetry/exporter-otlp's
OtlpHttpTransportFactory, which picks a PSR-18 client via discovery. In our
environment (Hyperf + Swoole 5.x) that discovery lands on the blocking cURL
stack.
Problem
When the BatchSpan/Metric/LogProcessor flushes inside a worker coroutine, the
blocking cURL call triggers:
RuntimeException: eventLoop already created. Swoole only support one eventLoop,
it cannot be created repeatedly.
…and the worker dies on the next exporter tick.
Suggested fix
OtlpHttp*ExporterFactory should prefer HyperfGuzzle (already declared via
Support/HyperfGuzzle) or a custom coroutine-safe PSR transport by default
when running under Swoole, and only fall back to cURL when ext-swoole is not
loaded. Alternatively, expose a config key to force the PSR client.
Workaround
Currently we are forced to use otlp_grpc exclusively (which hit Bug 1). A
coroutine-native HTTP transport in the library would let us avoid gRPC
altogether.
Reproduction
- Hyperf 3.1 skeleton with
opencodeco/hyperf-opentelemetry: ^0.3.6.
- Enable traces + metrics exporting to a local collector over gRPC.
tc qdisc add dev eth0 root netem loss 100% for ~30s against the collector
(or simply restart the collector pod) to force a server-side close.
- After traffic resumes, every exporter tick fails with "Broken pipe" until the
worker is killed.
For Bug 2: set OTEL_TRACES_EXPORTER=otlp_http, fire a request, observe
eventLoop already created on the first batch flush.
Stack traces
Bug 1 (gRPC broken pipe)
RuntimeException: Failed to send gRPC request: Broken pipe
at Hyperf\OpenTelemetry\Transport\SwooleGrpcTransport::send()
/vendor/opencodeco/hyperf-opentelemetry/src/Transport/SwooleGrpcTransport.php:60
at OpenTelemetry\Contrib\Otlp\SpanExporter::export()
at OpenTelemetry\SDK\Trace\SpanProcessor\BatchSpanProcessor::flush()
Companion:
PHP Deprecated: Creation of dynamic property Swoole\Coroutine\Http2\Client::$serverLastStreamId is deprecated
in /vendor/opencodeco/hyperf-opentelemetry/src/Transport/SwooleGrpcTransport.php line ~69
Bug 2 (HTTP / eventLoop)
RuntimeException: eventLoop already created. Swoole only support one eventLoop, it cannot be created repeatedly.
at OpenTelemetry\SDK\Common\Export\Http\PsrTransport::send()
at OpenTelemetry\Contrib\Otlp\SpanExporter::export()
at OpenTelemetry\SDK\Trace\SpanProcessor\BatchSpanProcessor::flush()
Environment
| Component |
Version |
opencodeco/hyperf-opentelemetry |
0.3.6 |
open-telemetry/sdk |
latest compatible |
| PHP |
8.2.x / 8.3.x |
ext-swoole |
5.1.x / 5.x |
| Hyperf |
3.1 |
| OS |
Linux (k8s pod, Alpine/Debian base) |
Happy to contribute
If a maintainer can confirm the approach, I can open a PR for Bug 1 (retry +
error-handler scoping) and start a discussion on Bug 2 (Swoole-native HTTP
transport).
Summary
When running
hyperf-opentelemetryinside a long-lived Hyperf/Swoole worker, theOTLP exporters become unreliable after the collector recycles the HTTP/2
connection (e.g. LoadBalancer idle-timeout, rolling restart of the collector pod).
Two independent issues surface:
SwooleGrpcTransport::getClient()never detects a server-side close, sothe transport keeps reusing a half-closed
Swoole\Coroutine\Http2\Clientandevery subsequent export fails with "Broken pipe".
OtlpHttpTransportFactoryfalls back toPsrTransportwith a blocking cURLclient, which is incompatible with Swoole's coroutine scheduler — throwing
eventLoop already createdwhenever the batch processor flushes inside acoroutine context.
Both issues are reproducible with the exporters
otlp_grpcandotlp_httpasdocumented in
publish/open-telemetry.php, using:opencodeco/hyperf-opentelemetry: 0.3.6 (latest)ext-swoole: 5.1.x / 5.xotel/opentelemetry-collector-contrib(any recent version), reachedthrough a Kubernetes Service / ALB that closes idle HTTP/2 connections after
a few minutes.
Bug 1 —
SwooleGrpcTransport::getClient()does not detect closed server connectionsLocation
src/Transport/SwooleGrpcTransport.phpProblem
Swoole\Coroutine\Http2\Client::$connectedonly reflects whetherconnect()was called — it does not flip tofalsewhen the peer sends aTCP FIN / HTTP/2 GOAWAY. After the collector rotates, the client is still marked
connected === true, sogetClient()returns the stale instance and the nextsend()fails with:or, depending on timing:
The transport keeps the same dead client forever, so every subsequent span/
metric export fails for the lifetime of the worker.
Additional noise
On PHP 8.2+ the same code path also triggers:
The property is written by Swoole's own extension code inside
recv(), so thedeprecation cannot be fixed by userland — it should at least be filtered.
Suggested fix
In
send(), treat "broken pipe", "connection reset", "connection closed","GOAWAY" and send/recv
falseas retryable: drop and rebuild the client, thenretry once before surfacing the error. The stored client should also be nulled
on any error so the next batch re-connects.
Minimal patch:
Bonus: wrap
send()/recv()in a scopedset_error_handlerthat swallowsonly the
serverLastStreamIdE_DEPRECATEDnoise.Workaround we shipped
We had to subclass and wire our own
ResilientSwooleGrpcTransportviaconfig/autoload/dependencies.php, overriding bothOtlpGrpcTraceExporterFactoryandOtlpGrpcMetricExporterFactory. Happy toopen a PR with the fix.
Bug 2 —
OtlpHttpTransportFactory/PsrTransportblocks the Swoole event loopLocation
src/Factory/Trace/Exporter/OtlpHttpTraceExporterFactory.phpsrc/Factory/Log/Exporter/OtlpHttpLogExporterFactory.phpsrc/Factory/Metric/Exporter/OtlpHttpMetricExporterFactory.phpThese factories resolve to
open-telemetry/exporter-otlp'sOtlpHttpTransportFactory, which picks a PSR-18 client via discovery. In ourenvironment (Hyperf + Swoole 5.x) that discovery lands on the blocking
cURLstack.
Problem
When the BatchSpan/Metric/LogProcessor flushes inside a worker coroutine, the
blocking cURL call triggers:
…and the worker dies on the next exporter tick.
Suggested fix
OtlpHttp*ExporterFactoryshould preferHyperfGuzzle(already declared viaSupport/HyperfGuzzle) or a custom coroutine-safe PSR transport by defaultwhen running under Swoole, and only fall back to cURL when
ext-swooleis notloaded. Alternatively, expose a config key to force the PSR client.
Workaround
Currently we are forced to use
otlp_grpcexclusively (which hit Bug 1). Acoroutine-native HTTP transport in the library would let us avoid gRPC
altogether.
Reproduction
opencodeco/hyperf-opentelemetry: ^0.3.6.tc qdisc add dev eth0 root netem loss 100%for ~30s against the collector(or simply restart the collector pod) to force a server-side close.
worker is killed.
For Bug 2: set
OTEL_TRACES_EXPORTER=otlp_http, fire a request, observeeventLoop already createdon the first batch flush.Stack traces
Bug 1 (gRPC broken pipe)
Companion:
Bug 2 (HTTP / eventLoop)
Environment
opencodeco/hyperf-opentelemetryopen-telemetry/sdkext-swooleHappy to contribute
If a maintainer can confirm the approach, I can open a PR for Bug 1 (retry +
error-handler scoping) and start a discussion on Bug 2 (Swoole-native HTTP
transport).