Skip to content

Excessive gRPC Executor Threads Without Resource Quota Limits #291

@Besroy

Description

@Besroy

Problem

During SH testing, a Storage Manager (SM) process exhausted thread resources with 541 threads (normally ~80+). Thread dump analysis shows:

Thread Distribution:

  • 209 threads: grpc_core::Executor::ThreadMain (gRPC's global executor pool, all idle)
  • 246 threads: Empty stack (0x0000000000000000, likely leaked/zombie threads)
  • 32 threads: grpc_threadpool (gRPC internal thread pool)
  • 54 threads: Application threads (folly, nuraft, iomanager, etc.) - normal

AI Analysis - Possible Causes

  1. gRPC executor unbounded growth: gRPC's internal thread pool auto-scales with load but never shrinks. Without ResourceQuota limits, high concurrent RPC calls or connection churn causes accumulation.
  2. Thread lifecycle issue: 246 empty-stack threads suggest cleanup problems, possibly in system libraries (folly/nuraft/boost.asio) or OS-level issues.

Current State

The sisl gRPC wrapper (sisl/src/grpc/rpc_server.cpp:44-79) does not set resource quotas:

  m_builder.SetMaxReceiveMessageSize(max_receive_msg_size);
  m_builder.SetMaxSendMessageSize(max_send_msg_size);
  // Missing: ResourceQuota to limit executor threads

Why Not Fixing Now

  • Root cause unclear: Need to double confirm by human
    -Impact minimal: Pod auto-restarts when hitting limits, no persistent service degradation
  • Optimal limit unknown: Need to determine appropriate MaxThreads value

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions