Skip to content

ext_proc: ProcessorClientImpl::start() returns nullptr without firing onGrpcError on client-creation failure (silent fail-open) #45188

@omkhar

Description

@omkhar

Description

ProcessorClientImpl::start() in source/extensions/filters/common/ext_proc/grpc_client_impl.h:220-235 returns nullptr without invoking callbacks.onGrpcError() when getOrCreateRawAsyncClientWithHashKey() returns an error (missing or invalid cluster discovered at runtime, raw async-client construction failure, etc):

auto client_or_error =
    client_manager_.getOrCreateRawAsyncClientWithHashKey(config_with_hash_key, scope_, true);
if (!client_or_error.status().ok()) {
  ENVOY_LOG_PERIODIC_MISC(error, std::chrono::seconds(10), "Creating raw asyc client failed {}",
                          client_or_error.status());
  return nullptr;  // no callback fired
}

The downstream openStream() predicate in source/extensions/filters/http/ext_proc/ext_proc.cc:786-793:

ExternalProcessorStreamPtr stream_object = grpc_client->start(*this, ...);
if (processing_complete_ || stream_object == nullptr) {
  // Stream failed while starting and either onGrpcError or onGrpcClose was already called.
  return sent_immediate_response_ ? StreamOpenState::Error : StreamOpenState::IgnoreError;
}

…treats stream_object == nullptr as IgnoreError and continues request processing. Since start() did not invoke onGrpcError() / onGrpcClose(), processing_complete_ is still false on the error path — but the predicate widens to accept that case after #39388. The filter therefore silently admits the request rather than running the operator-configured failure_mode_allow policy.

Behavior chronology

First affected release: v1.35.0. Reachable on 1.35.x / 1.36.x / 1.37.x / 1.38.x / current main.

Proposed fix

Call callbacks.onGrpcError(Grpc::Status::Unavailable, "ext_proc client creation failed: <status>") before return nullptr;. The ext_proc filter then takes the operator-configured failure mode (fail-open or fail-close per failure_mode_allow) instead of unconditionally failing open. Pull request follows.

Background

This was first reported privately as GHSA-jp7j-jw2q-9494. @yanjunxiang-google asked for the fix to land in the open as a non-security bug ("This is not an issue which can be exploited by a client or an ext_proc server. We can fix this in open."), so filing here per their redirect.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions