Skip to content

[pull] main from triggerdotdev:main#113

Merged
pull[bot] merged 4 commits into
Dustin4444:mainfrom
triggerdotdev:main
May 12, 2026
Merged

[pull] main from triggerdotdev:main#113
pull[bot] merged 4 commits into
Dustin4444:mainfrom
triggerdotdev:main

Conversation

@pull

@pull pull Bot commented May 12, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

github-actions Bot and others added 4 commits May 12, 2026 11:33
## Summary
1 improvement, 1 bug fix.

## Improvements
- Fail attempts on uncaught exceptions instead of hanging to
`MAX_DURATION_EXCEEDED`. A Node `EventEmitter` (e.g. `node-redis`)
emitting `"error"` with no `.on("error", ...)` listener escalates to
`uncaughtException`, which the worker previously reported but did not
act on — runs drifted to maxDuration with empty attempts. They now fail
fast with the original error and status `FAILED`, and respect the task's
normal retry policy. You should still attach `.on("error", ...)`
listeners to long-lived clients to handle errors gracefully.
([#3529](#3529))

## Bug fixes
- Fix dev workers spinning at 100% CPU after the parent CLI disconnects.
Orphaned `trigger-dev-run-worker` (and indexer) processes were caught in
an `uncaughtException` feedback loop: a periodic IPC send via
`process.send` would throw `ERR_IPC_CHANNEL_CLOSED` once the parent
closed the channel, which re-entered the same handler that itself called
`process.send`, scheduled via `setImmediate` and amplified by
source-map-support's `prepareStackTrace`. Fixed by (1) silently dropping
packets in `ZodIpcConnection` when the channel is disconnected, (2)
adding a `process.on("disconnect", ...)` handler in dev workers so they
exit cleanly when the CLI closes the IPC channel, and (3) wrapping all
`uncaughtException`-path `process.send` calls in a `safeSend` guard that
checks `process.connected` and swallows synchronous throws.
([#3491](#3491))

<details>
<summary>Raw changeset output</summary>

# Releases
## @trigger.dev/build@4.4.6

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.4.6`

## trigger.dev@4.4.6

### Patch Changes

- Fix dev workers spinning at 100% CPU after the parent CLI disconnects.
Orphaned `trigger-dev-run-worker` (and indexer) processes were caught in
an `uncaughtException` feedback loop: a periodic IPC send via
`process.send` would throw `ERR_IPC_CHANNEL_CLOSED` once the parent
closed the channel, which re-entered the same handler that itself called
`process.send`, scheduled via `setImmediate` and amplified by
source-map-support's `prepareStackTrace`. Fixed by (1) silently dropping
packets in `ZodIpcConnection` when the channel is disconnected, (2)
adding a `process.on("disconnect", ...)` handler in dev workers so they
exit cleanly when the CLI closes the IPC channel, and (3) wrapping all
`uncaughtException`-path `process.send` calls in a `safeSend` guard that
checks `process.connected` and swallows synchronous throws.
([#3491](#3491))
- Fail attempts on uncaught exceptions instead of hanging to
`MAX_DURATION_EXCEEDED`. A Node `EventEmitter` (e.g. `node-redis`)
emitting `"error"` with no `.on("error", ...)` listener escalates to
`uncaughtException`, which the worker previously reported but did not
act on — runs drifted to maxDuration with empty attempts. They now fail
fast with the original error and status `FAILED`, and respect the task's
normal retry policy. You should still attach `.on("error", ...)`
listeners to long-lived clients to handle errors gracefully.
([#3529](#3529))
-   Updated dependencies:
    -   `@trigger.dev/core@4.4.6`
    -   `@trigger.dev/build@4.4.6`
    -   `@trigger.dev/schema-to-json@4.4.6`

## @trigger.dev/core@4.4.6

### Patch Changes

- Fix dev workers spinning at 100% CPU after the parent CLI disconnects.
Orphaned `trigger-dev-run-worker` (and indexer) processes were caught in
an `uncaughtException` feedback loop: a periodic IPC send via
`process.send` would throw `ERR_IPC_CHANNEL_CLOSED` once the parent
closed the channel, which re-entered the same handler that itself called
`process.send`, scheduled via `setImmediate` and amplified by
source-map-support's `prepareStackTrace`. Fixed by (1) silently dropping
packets in `ZodIpcConnection` when the channel is disconnected, (2)
adding a `process.on("disconnect", ...)` handler in dev workers so they
exit cleanly when the CLI closes the IPC channel, and (3) wrapping all
`uncaughtException`-path `process.send` calls in a `safeSend` guard that
checks `process.connected` and swallows synchronous throws.
([#3491](#3491))
- Fail attempts on uncaught exceptions instead of hanging to
`MAX_DURATION_EXCEEDED`. A Node `EventEmitter` (e.g. `node-redis`)
emitting `"error"` with no `.on("error", ...)` listener escalates to
`uncaughtException`, which the worker previously reported but did not
act on — runs drifted to maxDuration with empty attempts. They now fail
fast with the original error and status `FAILED`, and respect the task's
normal retry policy. You should still attach `.on("error", ...)`
listeners to long-lived clients to handle errors gracefully.
([#3529](#3529))

## @trigger.dev/python@4.4.6

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.4.6`
    -   `@trigger.dev/build@4.4.6`
    -   `@trigger.dev/sdk@4.4.6`

## @trigger.dev/react-hooks@4.4.6

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.4.6`

## @trigger.dev/redis-worker@4.4.6

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.4.6`

## @trigger.dev/rsc@4.4.6

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.4.6`

## @trigger.dev/schema-to-json@4.4.6

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.4.6`

## @trigger.dev/sdk@4.4.6

### Patch Changes

-   Updated dependencies:
    -   `@trigger.dev/core@4.4.6`

</details>

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…3552)

Closes
[TRI-9234](https://linear.app/triggerdotdev/issue/TRI-9234/retry-task-process-sigsegv-errors-respecting-user-retry-config)

## What this changes

SIGSEGV crashes (`TASK_PROCESS_SIGSEGV`) will now be **retried when an
attempt fails**, in line with the task's configured retry settings
(`retry.maxAttempts` etc.) — the same path SIGTERM and uncaught
exceptions already use. Previously SIGSEGV was hard-classified as
non-retriable and failed the run on the first segfault, ignoring the
user's retry policy.

Tasks without a retry policy still fail fast on the first SIGSEGV.
Behaviour is unchanged for OOM kills (separate machine-bump retry path)
and SIGKILL_TIMEOUT.

## Deploy

**Only the webapp needs to ship.** The retry decision lives entirely in
the webapp:
- V2 path: `internal-packages/run-engine` (bundled into the webapp)
- V1 path: `apps/webapp/app/v3/services/completeAttempt.server.ts`

No supervisor, CLI, SDK, or customer-task-image changes required.
Customers do not need to redeploy. The `@trigger.dev/core` changeset is
just keeping the public package in sync — the published npm version
isn't what makes the fix work.

## Why retry

SIGSEGV in Node tasks is frequently non-deterministic across processes:

- **Native addon races** (`sharp`, `canvas`, `better-sqlite3`,
`node-rdkafka`, `bcrypt`, …) — libuv thread-pool work stepping on V8
handles. Different heap layout / thread schedule on a fresh process →
retry often succeeds.
- **JIT / GC interaction** — V8 turbofan deopt or GC during a native
callback. Timing-dependent.
- **Near-OOM in native code** — when RSS approaches the cgroup limit,
native allocations fail and poorly-written addons dereference NULL →
SIGSEGV instead of clean OOM-kill.
- **Host / hardware issues** — bit flips, kernel quirks. Retry lands on
a different host.

The genuinely deterministic case (a user-code bug always tripping the
same addon) is real, but a subset — and `maxAttempts` bounds the damage.

## Pre-existing inconsistency this resolves

- `shouldRetryError` returned `false` for `TASK_PROCESS_SIGSEGV` →
`fail_run`.
- `shouldLookupRetrySettings` already listed `TASK_PROCESS_SIGSEGV` as
retry-config-aware — but that branch was unreachable because
`shouldRetryError` short-circuited first in `retrying.ts:86-90`.
- We already retry `TASK_RUN_UNCAUGHT_EXCEPTION` (clearly a user-code
bug) under the user's retry policy; refusing to retry SIGSEGV was the
odd one out.

## Test plan

- [x] `pnpm exec vitest run test/errors.test.ts` in `packages/core` —
26/26 pass (4 new)
- [x] `pnpm run build --filter @trigger.dev/core`
- [ ] CI green on PR

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Summary

Adds `.claude/REVIEW.md` — a repo-specific source of truth for what AI /
agent code reviewers should treat as critical in this codebase
(rolling-deploy safety, hot-table indexes, recovery-path queries,
testcontainers usage, etc.). Pairs with a Claude-based PR audit that
flags drift between REVIEW.md and the code as it evolves.

## How the audit works

Mirrors the existing `.github/workflows/claude-md-audit.yml` pattern. On
non-draft, non-fork PRs that touch code, `anthropics/claude-code-action`
reads REVIEW.md, samples the PR diff, and posts a sticky comment with up
to 3 of:

- `[stale]` — rule cites a path / function / table that's been removed
or renamed
- `[contradiction]` — code in the PR violates a current rule
- `[missing]` — PR introduces a new pattern future reviewers should know
about
- `[obsolete]` — rule asserts a constraint the repo has moved past

If nothing's off, posts `✅ REVIEW.md looks current for this PR.`

## Test plan

- [ ] Convert this PR to ready-for-review, confirm the audit runs and
posts a sticky comment
- [ ] Verify the audit doesn't run on fork PRs (gated by
`head.repo.full_name == github.repository`)
- [ ] Verify suggestions are actionable on at least one follow-up PR
…3499)

## Summary

Consolidates the webapp's authentication and authorization into a small
set of route helpers, replacing the ad-hoc `requireUser` /
`requireUserId` / `authenticatedEnvironmentForAuthentication` calls
scattered across routes. Same security model, but the per-request flow
(authenticate → authorize → load) now lives in one place per route
family.

Introduces a plugin seam (`@trigger.dev/plugins`) that lets the cloud
build install a richer RBAC implementation without touching webapp code.
The OSS fallback keeps the pre-RBAC permissive behaviour intact, so
self-hosted deployments work unchanged.

Adds a comprehensive end-to-end auth test suite that didn't exist before
— 193 `it()` blocks (vitest reports ~199 after `it.each` expansion)
covering API key, PAT and JWT auth across the public API surface, plus
dashboard session auth for admin pages.

## Changes

### Plugin contract — `@trigger.dev/plugins`

`RoleBaseAccessController` interface authoritative for both OSS
(fallback) and cloud (enterprise plugin):
- `authenticateBearer(request, { allowJWT? })` — API-key / public-JWT
auth, returns env + ability
- `authenticateSession(request, { userId, organizationId?, projectId?
})` — dashboard auth, caller resolves `userId` from the session cookie
and passes it in (no `helpers.getSessionUserId` callback — decouples the
plugin host from session-cookie code)
- `authenticatePat(request, { organizationId?, projectId? })` — PAT
auth, returns identity + `lastAccessedAt` so the host can throttle the
per-request update
- `authenticateAuthorize*` variants for the auth-and-check-in-one-call
cases
- `isUsingPlugin(): Promise<boolean>` — capability flag for UI /
branching where plugin-present-ness matters; replaces the
sentinel-string coupling that had `personalAccessToken.server` matching
`"RBAC plugin not installed"` literally

### Dashboard auth (started, partial rollout)

Admin and settings pages migrated to a unified `dashboardLoader` /
`dashboardAction` helper that authenticates the session, runs an
authorization check, and exposes the result to the route. Other
dashboard routes still on the old pattern; remaining migration tracked
in TRI-8730.

Migrated routes:
- `admin.*` (14 admin / back-office / feature-flags / LLM-models /
notifications / orgs / concurrency pages)
- `_app.orgs.$organizationSlug.settings.team`
- `_app.orgs.$organizationSlug.settings.roles`

### API / realtime / engine auth (complete for the migrated families)

71 routes migrated to a unified `apiBuilder` that centralizes Bearer /
PAT / Public-JWT authentication and applies the per-route authorization
check before the handler runs. Includes:
- `api.v1.*` and `api.v2.*` and `api.v3.*` — tasks, runs, batches,
queues, prompts, deployments, query, sessions, waitpoints, packets,
workers, idempotency keys
- `realtime.v1.*` — runs, batches, sessions, streams
- `engine.v1.*` — dev / worker-action protocols

29 routes still on the legacy `authenticateApiRequest*` helpers —
tracked as a post-deploy follow-up in TRI-9228.

Multi-resource auth direction is now explicit at the call site via
`anyResource(...)` (OR) and `everyResource(...)` (AND). Bare arrays no
longer typecheck — fixes a class of bug where a JWT scoped to one
resource could implicitly access others under OR semantics.

PAT auth path consolidated: was three DB queries per request (legacy
`authenticateApiRequestWithPersonalAccessToken` findFirst +
`rbac.authenticatePat` join + `lastAccessedAt` update). Now one query in
the steady state — plugin returns `lastAccessedAt`, host smart-skips the
update via JS-side throttle when fresh.

Side effect: action aliases preserved historic JWT scope semantics where
the new model is stricter (e.g. a `write:tasks` JWT now also satisfies
`trigger` / `batchTrigger` / `update` actions on the same resource —
matched at the auth boundary, not in the route handler).

### Backwards-compat fixes

The strict-match model regressed several real-world JWT shapes. Each
preserved via explicit `anyResource(...)` entries in the route's authz
block:

- **Batch retrieve routes** (`api.v1.batches.$batchId`, `api.v2.*`,
`realtime.v1.batches.*`) accept `read:runs` JWTs again (pre-RBAC
literal-match superScope behaviour)
- **Runs list routes** (`api.v1.runs`, `realtime.v1.runs`) accept
type-level `read:tasks` / `read:tags` on unfiltered queries (matched the
legacy `Object.keys` iteration semantic)
- **PAT/OAT auth shape** normalized through `toAuthenticated` so all
auth methods return the same slim `AuthenticatedEnvironment` (was:
API-key returned the slim shape but PAT/OAT returned raw Prisma
`Decimal` / no `orgMember`)
- **Scope `:` preservation** in resource ids — `read:tags:env:staging`
now correctly identifies the tag id as `env:staging`, not `env`

### Slim `AuthenticatedEnvironment`

Extracted to `@trigger.dev/core/v3/auth/environment` — a structural
shape independent of `@trigger.dev/database`. The plugin contract
returns this; webapp consumers import from there; the cloud plugin
(Drizzle) returns the same shape without Prisma's `Decimal` class
leaking into the public surface. Lets internal-packages (run-engine,
etc.) refer to `AuthenticatedEnvironment` without pulling Prisma in.

### Auth test suite (new — `*.e2e.full.test.ts`)

193 e2e tests run against a real spawned webapp + Postgres (no mocks).
Coverage matrix:

- **API key auth** — read / write / trigger / batchTrigger / deploy
actions across runs, batches, deployments, prompts, queues, query,
sessions, input-streams, waitpoints, tasks, idempotency keys; multi-key
resources (a run carries batch / tag / task identifiers — auth must
accept any matching scope)
- **Personal Access Token auth** — comprehensive matrix: scope match,
scope mismatch, missing scope, expired token, malformed token
- **Public JWT auth** — sub-vs-URL environment resolution, expired JWTs,
signature verification, scope checking, otu (one-time-use) token
semantics, branch-environment signing-key fallback
- **Dashboard session auth** — admin-only pages reject non-admins;
per-action gating
- **Cross-cutting edge cases** — revoked API key grace window, JWT
cross-environment isolation, MissingResource branch behaviour

### Hygiene cleanups

- Deleted dead `app/services/authorization.server.ts` (legacy
`checkAuthorization` + types — no live consumers post-migration) and its
orphaned test
- Dropped the never-populated `scopes` field from
`ApiAuthenticationResultSuccess`
- `scheduleEmail` moved out of `email.server.ts` into its own module —
breaks a `commonWorker → marqs/V1` import chain that was poisoning the
auth test graph
- OSS Roles page shows a deployment-aware empty state ("Roles aren't
available in this self-hosted deployment" vs the plan-upsell copy) via
`rbac.isUsingPlugin()`
- Team action handler: explicit per-intent ability gates
(`manage:billing` for purchase-seats, `manage:members` for set-role +
remove-member with self-leave carve-out)

### Cross-repo coordination

All public-package contract changes paired in `triggerdotdev/cloud#763`
(rbac-packages branch) — the enterprise plugin implements the same
`RoleBaseAccessController` interface against Drizzle.

## Test plan

- [x] `pnpm run typecheck --filter webapp` clean
- [x] `pnpm --filter webapp exec vitest run --config
vitest.e2e.full.config.ts` — 193/193 pass (requires Docker for
testcontainers)
- [x] Spot-check an authed API endpoint with a valid + invalid API key
against a local stack
- [x] Spot-check the migrated admin pages render and gate non-admins

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pull pull Bot locked and limited conversation to collaborators May 12, 2026
@pull pull Bot added the ⤵️ pull label May 12, 2026
@pull pull Bot merged commit e4981d1 into Dustin4444:main May 12, 2026
0 of 3 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants