Skip to content

[pull] main from triggerdotdev:main#119

Merged
pull[bot] merged 10 commits into
Dustin4444:mainfrom
triggerdotdev:main
May 14, 2026
Merged

[pull] main from triggerdotdev:main#119
pull[bot] merged 10 commits into
Dustin4444:mainfrom
triggerdotdev:main

Conversation

@pull

@pull pull Bot commented May 14, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

ericallam and others added 10 commits May 14, 2026 13:12
Adds Sessions, a durable, run-aware stream primitive that scopes
session.in / session.out records to a session (not a single run).
Records survive run boundaries; reconnect-from-last-event-id is built in.

Server foundation:
- New /realtime/v1/sessions/:session/:io/append + /records routes
- sessionRunManager + sessionsRepository + clickhouseSessionsRepository
- mintRunToken for short-lived per-session tokens
- s2Append retry-with-backoff + undici cause diagnostics
- /api/v[12]/packets/* exempt from customer rate limits
- BackgroundWorker schema gains taskKind enum (TASK, AGENT, SCHEDULED)
- TaskRun.taskKind column + clickhouse 029_add_task_kind_to_task_runs_v2

Core types:
- new sessionStreams, inputStreams, realtimeStreams packages in @trigger.dev/core
- session-streams-api / realtime-streams-api surface

Sessions dashboard UI (the primitive's own viewer):
- /sessions index + detail routes
- SessionsTable, SessionFilters, SessionStatus, CloseSessionDialog
- AGENT/SCHEDULED filter in RunFilters + TaskTriggerSource

Includes the sessions-primitive changeset.
`tasks.trigger`, `tasks.batchTrigger`, `batch.create`,
`wait.createToken`, `wait.forDuration`, and the input/session stream
waitpoint endpoints all accept a caller-supplied `idempotencyKey` and
store it verbatim against a composite-unique index on `TaskRun`,
`BatchTaskRun`, or `Waitpoint`. The schemas had no length cap, so a
sufficiently long high-entropy key produced an index row larger than the
underlying storage layer can hold. The insert failed at the database,
and the caller saw a generic 500 from
`RunEngineTriggerTaskService.call()` / `CreateBatchService` / waitpoint
creation, depending on the endpoint.

Keys produced by `idempotencyKeys.create()` are 64-character SHA-256
hashes and never trip this — it only manifests for direct REST callers
(or SDK callers passing a raw string they generated themselves).
Low-entropy keys also sail through, because the storage layer compresses
repeated bytes before they reach the index, which is why the failure
mode is intermittent and tied to caller-side key shape.

## Fix

Add `.max(2048, "<field> must be 2048 characters or less")` to the seven
schemas that feed an indexed `idempotencyKey` column:

- `TriggerTaskRequestBody.options.idempotencyKey`
- `BatchTriggerTaskItem.options.idempotencyKey`
- `CreateBatchRequestBody.idempotencyKey`
- `CreateWaitpointTokenRequestBody.idempotencyKey`
- `CreateInputStreamWaitpointRequestBody.idempotencyKey`
- `CreateSessionStreamWaitpointRequestBody.idempotencyKey`
- `WaitForDurationRequestBody.idempotencyKey`

Plus the `idempotency-key` HTTP header on the trigger route (and the
three batch routes that re-export `HeadersSchema`). The header schema is
lifted out of `api.v1.tasks.$taskId.trigger.ts` into
`apps/webapp/app/v3/triggerHeaders.server.ts` so it can be exercised in
tests without dragging the route's import-time side effects.

The 2048 character ceiling is chosen to sit safely under the per-row
index limit while staying generous against existing callers — keys that
fit before still fit. Oversized keys now return a structured Zod 400
instead of a generic 500.

Limit is documented under `Idempotency key` in `docs/limits.mdx` and as
a `<Note>` on `docs/idempotency.mdx`.

## Test plan

- [x] 15 schema unit tests added
(`packages/core/src/v3/schemas/idempotencyKey.test.ts`,
`apps/webapp/test/routes/triggerHeaders.test.ts`) —
rejection-with-message + boundary acceptance for each capped schema. The
webapp test exercises the extracted `TriggerHeadersSchema` directly with
no mocks.
- [x] `pnpm run build --filter @trigger.dev/core`
- [x] `pnpm run typecheck --filter webapp`
- [x] End-to-end verified locally: baseline (small key) → 200; 3000-char
high-entropy header → 400 with the expected Zod error; same key at the
2048 boundary → 200; same key with the cap reverted → the database
rejected the insert and the route returned 500 to the caller. Cap
restored.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…3542)

## Summary

A `/sessions` dashboard for inspecting durable Sessions, an `AGENT` /
`SCHEDULED` task-kind filter for the runs list, and the server-side
hardening (rate-limit exemption for packets, retry-with-backoff on
stream appends, typed too-large-chunk error) that the `chat.agent`
runtime in #3543 needs. Builds on the Sessions primitive shipped in
#3417.

## Design

The Sessions list + detail routes mirror the run inspector pattern.
`TaskTriggerSource` gains `AGENT` and `SCHEDULED` values, persisted on
`BackgroundWorker.taskKind` and `TaskRun.taskKind` (plus a matching
Clickhouse column), so the runs list can filter by kind.

New `@trigger.dev/core` modules — `sessionStreams`, `inputStreams`, a
`sessionStreamInstance` for realtime streams, and the
`realtime-streams-api` / `session-streams-api` surfaces — expose the
typed shapes that chat.agent will use to drive `session.out`.
`ChatChunkTooLargeError` lets the runtime drop oversized chunks with a
typed surface instead of failing the run. `s2Append` retries transient
failures with exponential backoff. `/api/v[12]/packets/*` is exempt from
customer rate limits so chat snapshot reads and writes don't get
throttled under load.

## Stack

Part of a 4-PR stack. Merge bottom-up.

1. **This PR** (#3542) → `main`
2. #3543#3542 — `chat.agent` runtime + browser transport
3. #3545#3543 — agent-view dashboard
4. #3546#3545 — ai-chat reference + MCP tooling

Replaces #3173 (closed).

<!-- GitButler Footer Boundary Top -->
---
This is **part 5 of 5 in a stack** made with GitButler:
- <kbd>&nbsp;5&nbsp;</kbd> #3612
- <kbd>&nbsp;4&nbsp;</kbd> #3546
- <kbd>&nbsp;3&nbsp;</kbd> #3545
- <kbd>&nbsp;2&nbsp;</kbd> #3543
- <kbd>&nbsp;1&nbsp;</kbd> #3542 👈 
<!-- GitButler Footer Boundary Bottom -->
The `code` paths filter currently matches `**` minus a tiny exclusion
list, so a PR that only touches `.github/workflows/*.yml` still flips
`code == true` and runs typecheck (~2 min on the runner).

Exclude `.github/**` from `code`, then re-include just `pr_checks.yml`
and `typecheck.yml` so a change to either of those still triggers the
full code check matrix.

Effect:
- workflow-only PRs (this one, future dependabot/codeql/etc.) skip
typecheck; `all-checks` treats the skipped job as non-failure so the
required status passes.
- modifying `pr_checks.yml` or `typecheck.yml` themselves still triggers
typecheck.
- the existing per-suite filters (`webapp`, `packages`, `internal`,
`cli`, `sdk`) already re-include the specific workflows that gate them,
so they're unaffected.
Adds a Mon 08:00 UTC workflow that posts a summary of open Dependabot
alerts and PRs to Slack. Uses env-scoped secrets so the alerts PAT and
Slack token are only available to this workflow.
Adds the chat.agent({...}) task definition (server runtime) and the
browser-side TriggerChatTransport + AgentChat that drives it from a
React or Next.js app. The runtime sits on top of the Sessions primitive
and handles the durable conversational task lifecycle.

Server runtime:
- chat.agent({...}) — session-aware task definition
- Lifecycle hooks: onChatStart, onTurnStart, onTurnComplete, onAction,
  onValidateMessages, hydrateMessages
- chat.history read primitives for HITL flows
- chat.local, chat.headStart, chat.handover, oomMachine
- Delta-only wire + S3 snapshot reconstruction at run boot
- Actions are no longer turns

Browser transport:
- TriggerChatTransport (ai-sdk Transport): delta-only wire sends,
  SSE reconnection with lastEventId resume, stop/abort cleanup,
  dynamic accessToken refresh
- AgentChat: direct programmatic API
- useTriggerChatTransport (React hook)
- chat-tab-coordinator: cross-tab leader election

Includes the chat-agent, chat-agent-delta-wire-snapshots,
chat-history-read-primitives, chat-head-start, chat-actions-no-turn,
chat-session-attributes, agent-skills, and mock-chat-agent-test-harness
changesets.
## Summary

Adds `chat.agent({...})`, a durable conversational task runtime, plus
the browser-side `TriggerChatTransport` + `AgentChat` that drive it from
a React or Next.js app. Conversations survive page refreshes, network
blips, idle suspend, and process restarts, with built-in tools, HITL
approvals, multi-turn state, and stop-mid-stream cancellation. Builds on
#3542.

## Design

Each `/in/append` request carries at most one new message. The agent
reconstructs prior history at run boot from an object-store snapshot
plus a `session.out` replay tail, so conversation context lives
server-side instead of bloating the wire. Awaited snapshot writes after
every `onTurnComplete` keep the chain durable across idle suspend.
Registering `hydrateMessages` short-circuits both paths for customers
who own their own conversation store.

Lifecycle hooks — `onChatStart`, `onTurnStart`, `onTurnComplete`,
`onAction`, `onValidateMessages`, `hydrateMessages` — cover validation,
persistence, and post-turn work. `chat.history` exposes read primitives
(`getPendingToolCalls`, `getResolvedToolCalls`, `extractNewToolResults`,
`findMessage`, `all`) for HITL flows. `chat.local` gives per-run typed
state with Proxy access and dirty tracking. `chat.headStart` bridges
first-turn TTFC via a customer HTTP handler. `oomMachine` opts a chat
into one-shot OOM-retry on a larger machine.

`TriggerChatTransport` is a `Transport` implementation for Vercel's
ai-sdk `useChat`: delta-only wire sends, SSE reconnection with
`lastEventId` resume, stop/abort cleanup, dynamic `accessToken` refresh,
`X-Peek-Settled` fast-close. `AgentChat` is the direct programmatic
equivalent. A cross-tab coordinator does leader election so multiple
open tabs share a single SSE.

```ts
import { chat } from "@trigger.dev/sdk/ai";
import { streamText } from "ai";

export const myChat = chat.agent({
  id: "my-chat",
  run: async ({ messages, signal }) =>
    streamText({ model: openai("gpt-4o"), messages, abortSignal: signal }),
});
```
#3610)

Concurrent `POST /api/v1/deployments` requests for the same environment
race on the `WorkerDeployment(environmentId, version)` unique
constraint. Both requests read the same latest deployment via
`findFirst`, compute the same next version via
`calculateNextBuildVersion`, and both attempt
`prisma.workerDeployment.create()` — one wins, the other crashes with
Prisma `P2002`. The bug is a classic TOCTOU between the version read and
the version write; it's been latent since the version-assignment logic
was first added but only fires when two deploys land within milliseconds
of each other (CI matrices, retried CLI calls, webhook-triggered
redeploys).

## Approach

Extracts the version assignment + create into a small helper
`createDeploymentWithNextVersion`
(`apps/webapp/app/v3/services/initializeDeployment/createDeploymentWithNextVersion.server.ts`).
The helper retries on `P2002 (environmentId, version)` up to 5 times
with randomised 5–50ms jitter so N concurrent racers don't loop in
lockstep. Each attempt re-reads the latest version, recomputes via
`calculateNextBuildVersion`, and re-runs the caller's `buildData`
callback so version-dependent fields (image ref tag, friendlyId) are
always consistent with the version actually persisted. A `logger.warn`
fires per collision so the retry rate is observable in production logs.

When retries are exhausted, the helper throws a dedicated
`DeploymentVersionCollisionError` carrying `environmentId`, `attempts`,
and `lastAttemptedVersion`, with the original
`PrismaClientKnownRequestError` attached as `cause`. Sentry walks the
`cause` chain natively, so contention exhaustion shows up as a
distinguishable wrapper exception linked to the underlying `P2002`
rather than a generic unique-constraint violation that looks identical
to every other duplicate-key bug.

The behavioural change is limited to "catch P2002 and retry instead of
crashing." The image ref computation stays inside the builder callback
(same call site as before the refactor), so ECR / non-ECR behaviour, S2
stream creation order, and all downstream side effects are unchanged.

## Non-goals

- No new database migrations, no schema changes, no isolation-level /
locking changes. A serialisable transaction or advisory lock would also
fix this; retry-on-conflict is the smaller change that keeps the
existing version-allocation logic intact.
- Does not touch the analogous `calculateNextBuildVersion` call in
`createBackgroundWorker.server.ts`, which likely has the same race shape
against `BackgroundWorker`'s unique constraint — flagged as a follow-up.

## Test plan

- [x] `pnpm run typecheck --filter webapp` passes (no new errors in the
modified files).
- [x] Three real-Postgres tests in
`apps/webapp/test/createDeploymentWithNextVersion.test.ts` via
`containerTest`:
- 5 concurrent calls all produce distinct, persistable versions
(`Set(versions).size === concurrency`). The naive read-then-create
version of the helper fails this test with the exact same `P2002` seen
in production; the retry version passes.
- Non-`P2002` errors raised from the `buildData` callback propagate
immediately without retry, builder invoked exactly once.
- With `maxRetries: 0`, concurrent racers surface the wrapped
`DeploymentVersionCollisionError` (not a raw `P2002`); `environmentId`,
`attempts`, `lastAttemptedVersion` are populated and `error.cause.code
=== "P2002"`.
- [x] Existing `apps/webapp/test/getDeploymentImageRef.test.ts` still
green (the file was untouched in the final diff).

## Follow-ups (not in this PR)

- `createBackgroundWorker.server.ts` likely has the same TOCTOU shape
against its background-worker version unique constraint — should use the
same helper.
- Sentry visibility check: confirm `error.cause` chain renders as a
linked exception in the Sentry UI when the wrapped error fires (requires
a sandboxed triggering of the exhaustion path).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dashboard surfaces for inspecting and debugging chat.agent runs.
Depends on the Sessions primitive (L1) and chat.agent runtime (L2+L3).

Run inspector — chat-aware:
- AgentView + AgentMessageView (run inspector tab for chat.agent runs)
- AIChatMessages + AISpanDetails + types.ts (per-span chat message
  rendering, tool-call/tool-output handling)
- PromptSpanDetails (gen_ai.* span detail panel)
- StreamdownRenderer + shikiTheme (markdown renderer with shiki
  highlighting and v2 patch)
- useAutoScrollToBottom hook

Playground UI (interactive chat.agent debugger):
- /playground index + /playground/$agentParam routes
- /agents route + AgentListPresenter
- PlaygroundPresenter (per-org basin variants, clientData wiring)
- realtime session routes for playground + run inspector chat
- AI-generate-payload + AIPayloadTabContent for the test panel

Navigation + theming:
- SideMenu links for Agents and Playground
- BlankStatePanels copy updates
- tailwind config + tailwind.css storybook hooks
- streamdown@2 dep in apps/webapp/package.json

Includes agent-view-sessions, playground-trigger-config-fields,
run-agent-view, and streamdown-v2-upgrade .server-changes.
## Summary

A chat-aware run inspector and a `/playground` UI for testing
`chat.agent` tasks interactively. Builds on #3543's runtime.

## Design

The run inspector grows a new tab that renders the conversation chain
for any `chat.agent`-kind run. It subscribes to the run's session
streams, threads chat parts through a per-message renderer, and uses a
shared markdown + Shiki component for code highlighting (also used by
the test-payload panel).

The playground is a standalone `/playground` route that lets you drive a
deployed chat agent from the dashboard — pick a task, send messages,
watch tool calls render, and see span detail on every turn. The matching
`/agents` list view shows all deployed agents in the project.
@pull pull Bot locked and limited conversation to collaborators May 14, 2026
@pull pull Bot added the ⤵️ pull label May 14, 2026
@pull pull Bot merged commit 16538f6 into Dustin4444:main May 14, 2026
0 of 3 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants