Skip to content

fix(tracer): prevent stale ctx.tracing crash on HTTPS keepalive connections#13232

Open
janiussyafiq wants to merge 3 commits intoapache:masterfrom
janiussyafiq:fix/tracing-https
Open

fix(tracer): prevent stale ctx.tracing crash on HTTPS keepalive connections#13232
janiussyafiq wants to merge 3 commits intoapache:masterfrom
janiussyafiq:fix/tracing-https

Conversation

@janiussyafiq
Copy link
Copy Markdown
Contributor

@janiussyafiq janiussyafiq commented Apr 16, 2026

Description

When apisix.tracing is enabled, the core tracer instruments every request phase — including ssl_client_hello_phase — by allocating a tracing table from lua-tablepool and storing it in ngx.ctx.tracing. On HTTPS keepalive connections, OpenResty reuses the same ngx.ctx object across multiple HTTP requests on the same TLS session.

The bug occurs in the following sequence:

  1. ssl_client_hello_phase calls tracer.start(), which allocates ctx.tracing via tablepool and initialises tracing.spans.
  2. The first HTTP request completes and tracer.release() is called in the log phase, returning the tracing table to the pool. lua-tablepool internally calls table.clear() on release, zeroing all fields — tracing.spans becomes nil — but ctx.tracing still holds a reference to this now-cleared table.
  3. On the second HTTP request (same keepalive connection), tracer.start() finds ctx.tracing is non-nil (a cleared table is still truthy in Lua) and skips re-initialisation.
  4. span.new() then crashes at table.insert(tracing.spans, self) because spans is nil.

This fix addresses the root cause at two layers:

  • tracer.start(): The initialisation guard is extended from if not tracing then to if not tracing or not tracing.spans then. Since lua-tablepool always zeroes tracing.spans on release via table.clear(), this reliably detects a stale or cleared tracing table and re-initialises it correctly — including on HTTPS keepalive second requests and any diverged HTTP/2 contexts.
  • tracer.release(): A if spans then guard is added before iterating and releasing the spans table, making release safe even when called on a partially-cleared state. The explicit nil assignments are intentionally avoided to preserve the tablepool contract — re-allocation and de-allocation of tables is expensive, and lua-tablepool already handles clearing internally.

Which issue(s) this PR fixes:

Fixes #13200

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change (t/node/tracer.t)
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: apisix 3.16.0 comprehensive tracing breaks with HTTPS keepalive connections

2 participants