Skip to content

fix(spanner): don't set sticky AFFINITY_KEY for multiplexed sessions#12725

Open
akash329d wants to merge 1 commit intogoogleapis:mainfrom
akash329d:fix-spanner-static-pool-affinity-collapse
Open

fix(spanner): don't set sticky AFFINITY_KEY for multiplexed sessions#12725
akash329d wants to merge 1 commit intogoogleapis:mainfrom
akash329d:fix-spanner-static-pool-affinity-collapse

Conversation

@akash329d
Copy link
Copy Markdown

@akash329d akash329d commented Apr 9, 2026

Problem

Since 6.105.0 (#4239 enabled grpc-gcp by default), a Spanner client whose first traffic is a high-concurrency burst sees throughput collapse and p50 latency climb. disableGrpcGcpExtension() restores the expected scaling.

concurrency static numChannels=32 qps / p50 / p99 dynamic channel pool qps / p50 / p99 grpc-gcp OFF qps / p50 / p99
50 723 / 67.6 / 84.9 723 / 67.5 / 78.9
100 1198 / 82.1 / 137.8 1261 / 68.1 / 309.5
200 1491 / 142.9 / 178.5 2841 / 67.5 / 109.7
400 973 / 402.2 / 538.8 5384 / 66 / 341 5862 / 64.8 / 80.1

(single multiplexed-session client, SELECT 1; each cell is a fresh client whose first traffic is at the target concurrency. Dynamic-pool row from a separate probe with enableDynamicChannelPool() set on the builder.)

Root cause

newCallContext sets GcpManagedChannel.AFFINITY_KEY on every data RPC under grpc-gcp. GcpManagedChannel.getChannelRef(key) binds each new key via pickLeastBusyChannel() (1754-1790), which reads activeStreamsCount and ties to channelRefs.get(0). The count isn't incremented until later in GcpClientCall.start() (:284), so a concurrent first burst all binds to channel 0 and the bindings are sticky. Subsequent RPCs funnel through one HTTP/2 connection and queue at MAX_CONCURRENT_STREAMS.

The static-numChannels path used affinity.intValue() % numChannels (≈63 distinct keys), so the collapse is permanent. The dynamic-pool path mostly self-corrects (p50 matches OFF) because most hints are per-transaction random longs — but MultiplexedSessionDatabaseClient.getSingleUseChannelHint allocates the first numChannels concurrent hints from a recycled BitSet (values 0..N-1), and those few sticky-bind during the warmup race and keep getting reused, leaving a p99 tail. (Separately: in our testing, setting enableDynamicChannelPool=true via JDBC/connection properties did not fully propagate to GcpManagedChannel, so DCP is not currently a workaround for connection-API users.)

Fix

Don't set AFFINITY_KEY when grpc-gcp is on. Multiplexed sessions are a single session, so sticky per-transaction channel affinity provides no backend-locality benefit. With no key, getChannelRef(null) does a fresh per-call least-busy pick with no sticky binding and no affinity-map growth — matches the OFF curve. (Math.floorMod alone doesn't help; the race still ties bounded keys to channel 0.)

RetryOnDifferentGrpcChannelMockServerTest previously asserted distinct AFFINITY_KEY values via an interceptor; with no key set those assertions are unobservable, so they're removed (the request-count and session assertions still cover the retry loop). Note this means the opt-in spanner.retry_deadline_exceeded_on_different_channel feature now relies on grpc-gcp's per-call least-busy pick rather than a forced distinct channel under grpc-gcp (the wedged channel will normally have a higher active-stream count, so least-busy usually picks a different one, but it's no longer guaranteed); the GAX withChannelAffinity path (used when grpc-gcp is off) is unchanged.

Repro

The trigger is the client's first traffic burst being high-concurrency (e.g., a connection pool warming many connections at once); a gentle low-C warmup spreads the keys and masks the bug. Standalone single-file reproducer (only dep google-cloud-spanner):

SpannerAffinityRepro.java
// Repro: grpc-gcp channel affinity collapses onto a few channels under
// concurrent warmup with multiplexed sessions, capping per-client throughput.
//
// For each (grpcGcp, concurrency) pair, builds a FRESH Spanner client,
// fires a high-concurrency warmup burst (this is when the race binds keys),
// then measures qps/p50/p99 of SELECT 1 at that concurrency. Expect parity
// at C<=100 and a large divergence (lower qps, higher p50) for grpc-gcp=ON
// at C>=200.
//
// javac -cp 'lib/*' SpannerAffinityRepro.java
// java  -cp '.:lib/*' SpannerAffinityRepro PROJECT INSTANCE DATABASE
import com.google.cloud.spanner.*;
import java.util.*;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicInteger;

public class SpannerAffinityRepro {
  static final int NUM_CHANNELS = 32;
  static final int[] SWEEP = {50, 100, 200, 400};

  record Row(double qps, double p50, double p99) {}

  static Row one(String proj, String inst, String dbId, boolean gcp, int c) throws Exception {
    SpannerOptions.Builder b = SpannerOptions.newBuilder()
        .setProjectId(proj).setNumChannels(NUM_CHANNELS);
    if (!gcp) b.disableGrpcGcpExtension();
    try (Spanner s = b.build().getService()) {
      DatabaseClient db = s.getDatabaseClient(DatabaseId.of(proj, inst, dbId));
      Statement stmt = Statement.of("SELECT 1");
      Runnable q = () -> { try (ResultSet rs = db.singleUse().executeQuery(stmt)) { while (rs.next()) {} } };
      ExecutorService ex = Executors.newVirtualThreadPerTaskExecutor();
      // Warmup AT the target concurrency: this is the burst during which the
      // ~63 affinity keys race through pickLeastBusyChannel and bind.
      runAt(ex, q, c, Math.max(500, c * 4), null);
      int iters = Math.max(2000, c * 30);
      long[] lats = new long[iters];
      long elapsed = runAt(ex, q, c, iters, lats);
      ex.shutdown();
      Arrays.sort(lats);
      double ms = 1e-6;
      return new Row(iters / (elapsed * 1e-9), lats[iters / 2] * ms, lats[(int) (iters * 0.99)] * ms);
    }
  }

  static long runAt(ExecutorService ex, Runnable q, int c, int iters, long[] lats) throws Exception {
    AtomicInteger left = new AtomicInteger(iters), idx = new AtomicInteger();
    CountDownLatch done = new CountDownLatch(c);
    long t0 = System.nanoTime();
    for (int i = 0; i < c; i++) ex.submit(() -> {
      while (true) {
        int s = left.decrementAndGet(); if (s < 0) break;
        long t = System.nanoTime(); q.run();
        if (lats != null) lats[idx.getAndIncrement()] = System.nanoTime() - t;
      }
      done.countDown();
    });
    done.await();
    return System.nanoTime() - t0;
  }

  public static void main(String[] a) throws Exception {
    if (a.length < 3) { System.err.println("usage: PROJECT INSTANCE DATABASE"); System.exit(1); }
    System.out.printf("numChannels=%d, multiplexed sessions (default), fresh client per cell%n", NUM_CHANNELS);
    System.out.printf("%5s | %22s | %22s%n", "", "grpc-gcp ON (default)", "grpc-gcp OFF");
    System.out.printf("%5s | %6s %6s %6s | %6s %6s %6s%n", "C", "qps", "p50ms", "p99ms", "qps", "p50ms", "p99ms");
    for (int c : SWEEP) {
      Row on = one(a[0], a[1], a[2], true, c);
      Row off = one(a[0], a[1], a[2], false, c);
      System.out.printf("%5d | %6.0f %6.1f %6.1f | %6.0f %6.1f %6.1f%n",
          c, on.qps, on.p50, on.p99, off.qps, off.p50, off.p99);
    }
  }
}

@akash329d akash329d requested review from a team as code owners April 9, 2026 06:16
@google-cla
Copy link
Copy Markdown

google-cla bot commented Apr 9, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the channel affinity logic in GapicSpannerRpc.java. Specifically, it disables the setting of AFFINITY_KEY when dynamic channel pooling is disabled to prevent sticky binding issues under concurrent load, allowing the system to perform fresh least-busy channel picks instead. I have no feedback to provide.

Under grpc-gcp (default since 6.105.0, googleapis#4239), newCallContext set
GcpManagedChannel.AFFINITY_KEY on every data RPC. GcpManagedChannel
binds each new key via pickLeastBusyChannel, which reads
activeStreamsCount before any concurrent caller's start() has
incremented it (tiebreak channelRefs[0]). A high-concurrency cold start
therefore binds keys to channel 0 and the bindings are sticky, so RPCs
funnel through one HTTP/2 connection and queue at MAX_CONCURRENT_STREAMS.

The static-numChannels path bounded the key to ~2*numChannels-1 distinct
values, making the collapse permanent (~6x throughput regression at 400
concurrent). The dynamic-channel-pool path used per-transaction random
keys and largely self-corrected, but a few BitSet-recycled hints still
sticky-bound, leaving a p99 tail.

Multiplexed sessions get no backend-locality benefit from sticky
per-transaction channel affinity, so don't set the key under grpc-gcp at
all. getChannelRef(null) does a fresh per-call least-busy pick with no
sticky binding and no affinity-map growth.

Drops the now-unobservable distinct-AFFINITY_KEY assertions from
RetryOnDifferentGrpcChannelMockServerTest; the request-count and session
assertions still cover the retry loop.
@akash329d akash329d force-pushed the fix-spanner-static-pool-affinity-collapse branch from 55a3136 to 4738cd6 Compare April 9, 2026 10:38
@akash329d akash329d changed the title fix(spanner): drop sticky AFFINITY_KEY for static gRPC-GCP channel pool fix(spanner): don't set sticky AFFINITY_KEY for multiplexed sessions Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant