Skip to content

fix(trogon-gateway): restore server_info().max_payload once async-nats race is resolved #122

@yordis

Description

@yordis

Background

trogon_nats::connect always sets retry_on_initial_connect, which causes async-nats to return the Client before the background connection task has populated the ServerInfo watch channel. Calling server_info().max_payload immediately after connect() therefore returns 0 — the Default value for usize — not the actual server limit.

MaxPayload::from_server_limit(0) saturates to a threshold of 0, routing every payload through the object-store claim-check path regardless of size.

Root cause in async-nats

async-nats maintains two separate sources of truth for max_payload:

  1. An Arc<AtomicUsize> initialized to 1024 * 1024 — used internally for publish size enforcement, correctly updated on connection
  2. A ServerInfo watch channel seeded with ServerInfo::default() (where max_payload = 0) when retry_on_initial_connect is set — this is what server_info().max_payload reads

The bug is that the watch channel seed uses Default::default() instead of the same 1 MiB default the Arc uses. async-nats even comments line 1017: "We're setting it to the default server payload size" — they applied the correct default to the Arc but not to the ServerInfo seed.

Current workaround

TROGON_GATEWAY_NATS_MAX_PAYLOAD_BYTES (default 1_048_576) — see PR #119.

What to restore when fixed

// In crates/trogon-gateway/src/main.rs
let server_max_payload = nats.server_info().max_payload;
let max_payload = MaxPayload::from_server_limit(server_max_payload);

Remove nats_max_payload_bytes from GatewayConfig / ResolvedConfig / config.rs and the TROGON_GATEWAY_NATS_MAX_PAYLOAD_BYTES env var.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions