Skip to content

CBOR event replay truncation causes workflows to hang indefinitely (specVersion 3 / workflow 4.2.x) #1735

@thedogwiththedataonit

Description

@thedogwiththedataonit

After upgrading to workflow@4.2.2 (specVersion 3 with CBOR queue transport), workflows with moderate step I/O sizes hang indefinitely after completing their initial steps. The queue callback fails with CBOR parsing errors, preventing subsequent steps from executing.

The workflow completes its first long-running step successfully (verified via step output logging), but then hangs in a processing state forever. No subsequent steps execute.

Error Messages
Two distinct transport-level errors appear in server logs:

Error 1 -- CBOR truncation:

Queue callback error: Error [WorkflowWorldError]: Failed to parse response body for GET /v3/runs/wrun_01KP718STBFFJBXMGVB5CXJC26/events?sortOrder=asc&remoteRefBehavior=lazy (Content-Type: application/cbor):
Error: Unexpected end of CBOR data
at processTicksAndRejections (null) {
status: undefined,
url: 'https://vercel-workflow.com/api/v3/runs/wrun_01KP718STBFFJBXMGVB5CXJC26/events?sortOrder=asc&remoteRefBehavior=lazy',
[cause]: Error: Unexpected end of CBOR data { incomplete: true }
}

Error 2 -- Multipart boundary missing:

Queue callback error: Error [MultipartParseError]: Invalid multipart stream: missing initial boundary
at processTicksAndRejections (null)

Key Observations
The { incomplete: true } flag on the CBOR cause error confirms the response body is being truncated mid-stream, not malformed at the encoding level.
Both errors happen during the queue callback phase (event replay), not during step execution.
The workflow's first step (prepareCsvStep) completes successfully and returns a valid result (verified in logs). The hang occurs when the runtime tries to fetch events for replay to execute the second step.
The issue appeared immediately after upgrading to 4.2.x. Prior versions (JSON transport) worked correctly for the same workflow.

Reproduction Context
Workflow: A CSV import workflow that processes 20,000 rows
Step 1 output: A PrepareCsvResult object containing chunkBatches: Array<Array> (8 arrays of 5 Convex document IDs each = ~2KB of serialized data)
Step count: 12+ steps total (prepare, indexes, status update, cleanup, 8x process chunk batches, finalize)
Each subsequent step requires replaying all prior events via GET /v3/runs/{runId}/events
Even though the individual step I/O is small (~2KB), the cumulative events payload grows with each completed step. The CBOR response truncation suggests either a response size limit or a streaming issue in the Vercel Workflow API's CBOR encoder.

We worked around this by minimizing step I/O: storing data in our database (Convex) instead of returning it from steps, then fetching it via a lightweight step that returns only a reference. This reduces the cumulative events payload.

However, this is a transport-level issue -- the CBOR response is being truncated regardless of payload size, and larger workflows will always accumulate more events.

Environment
workflow: 4.2.2
@workflow/core: 4.2.2
@workflow/next: 4.0.3
Next.js 16 with withWorkflow() plugin
Deployed on Vercel (Fluid Compute)
Node.js 25.2.1
Suspected Root Cause
The CBOR queue transport was introduced in 4.2.0-beta.78 (PR #1627, specVersion 3). The GET /v3/runs/{runId}/events endpoint appears to truncate the CBOR-encoded response body under certain conditions, causing the CBOR decoder to fail with { incomplete: true }. Old deployments (specVersion < 3) use JSON transport and are unaffected.

Questions
Is there a known response size limit for the events endpoint with CBOR transport?
Is there a way to force JSON transport on specVersion 3 deployments as a temporary fallback?
Are there recommended limits for step I/O payload sizes to avoid this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions