[pull] main from triggerdotdev:main by pull[bot] · Pull Request #113 · Dustin4444/trigger.dev

pull · 2026-05-12T16:29:34Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

## Summary 1 improvement, 1 bug fix. ## Improvements - Fail attempts on uncaught exceptions instead of hanging to `MAX_DURATION_EXCEEDED`. A Node `EventEmitter` (e.g. `node-redis`) emitting `"error"` with no `.on("error", ...)` listener escalates to `uncaughtException`, which the worker previously reported but did not act on — runs drifted to maxDuration with empty attempts. They now fail fast with the original error and status `FAILED`, and respect the task's normal retry policy. You should still attach `.on("error", ...)` listeners to long-lived clients to handle errors gracefully. ([#3529](#3529)) ## Bug fixes - Fix dev workers spinning at 100% CPU after the parent CLI disconnects. Orphaned `trigger-dev-run-worker` (and indexer) processes were caught in an `uncaughtException` feedback loop: a periodic IPC send via `process.send` would throw `ERR_IPC_CHANNEL_CLOSED` once the parent closed the channel, which re-entered the same handler that itself called `process.send`, scheduled via `setImmediate` and amplified by source-map-support's `prepareStackTrace`. Fixed by (1) silently dropping packets in `ZodIpcConnection` when the channel is disconnected, (2) adding a `process.on("disconnect", ...)` handler in dev workers so they exit cleanly when the CLI closes the IPC channel, and (3) wrapping all `uncaughtException`-path `process.send` calls in a `safeSend` guard that checks `process.connected` and swallows synchronous throws. ([#3491](#3491)) <details> <summary>Raw changeset output</summary> # Releases ## @trigger.dev/build@4.4.6 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.6` ## trigger.dev@4.4.6 ### Patch Changes - Fix dev workers spinning at 100% CPU after the parent CLI disconnects. Orphaned `trigger-dev-run-worker` (and indexer) processes were caught in an `uncaughtException` feedback loop: a periodic IPC send via `process.send` would throw `ERR_IPC_CHANNEL_CLOSED` once the parent closed the channel, which re-entered the same handler that itself called `process.send`, scheduled via `setImmediate` and amplified by source-map-support's `prepareStackTrace`. Fixed by (1) silently dropping packets in `ZodIpcConnection` when the channel is disconnected, (2) adding a `process.on("disconnect", ...)` handler in dev workers so they exit cleanly when the CLI closes the IPC channel, and (3) wrapping all `uncaughtException`-path `process.send` calls in a `safeSend` guard that checks `process.connected` and swallows synchronous throws. ([#3491](#3491)) - Fail attempts on uncaught exceptions instead of hanging to `MAX_DURATION_EXCEEDED`. A Node `EventEmitter` (e.g. `node-redis`) emitting `"error"` with no `.on("error", ...)` listener escalates to `uncaughtException`, which the worker previously reported but did not act on — runs drifted to maxDuration with empty attempts. They now fail fast with the original error and status `FAILED`, and respect the task's normal retry policy. You should still attach `.on("error", ...)` listeners to long-lived clients to handle errors gracefully. ([#3529](#3529)) - Updated dependencies: - `@trigger.dev/core@4.4.6` - `@trigger.dev/build@4.4.6` - `@trigger.dev/schema-to-json@4.4.6` ## @trigger.dev/core@4.4.6 ### Patch Changes - Fix dev workers spinning at 100% CPU after the parent CLI disconnects. Orphaned `trigger-dev-run-worker` (and indexer) processes were caught in an `uncaughtException` feedback loop: a periodic IPC send via `process.send` would throw `ERR_IPC_CHANNEL_CLOSED` once the parent closed the channel, which re-entered the same handler that itself called `process.send`, scheduled via `setImmediate` and amplified by source-map-support's `prepareStackTrace`. Fixed by (1) silently dropping packets in `ZodIpcConnection` when the channel is disconnected, (2) adding a `process.on("disconnect", ...)` handler in dev workers so they exit cleanly when the CLI closes the IPC channel, and (3) wrapping all `uncaughtException`-path `process.send` calls in a `safeSend` guard that checks `process.connected` and swallows synchronous throws. ([#3491](#3491)) - Fail attempts on uncaught exceptions instead of hanging to `MAX_DURATION_EXCEEDED`. A Node `EventEmitter` (e.g. `node-redis`) emitting `"error"` with no `.on("error", ...)` listener escalates to `uncaughtException`, which the worker previously reported but did not act on — runs drifted to maxDuration with empty attempts. They now fail fast with the original error and status `FAILED`, and respect the task's normal retry policy. You should still attach `.on("error", ...)` listeners to long-lived clients to handle errors gracefully. ([#3529](#3529)) ## @trigger.dev/python@4.4.6 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.6` - `@trigger.dev/build@4.4.6` - `@trigger.dev/sdk@4.4.6` ## @trigger.dev/react-hooks@4.4.6 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.6` ## @trigger.dev/redis-worker@4.4.6 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.6` ## @trigger.dev/rsc@4.4.6 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.6` ## @trigger.dev/schema-to-json@4.4.6 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.6` ## @trigger.dev/sdk@4.4.6 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.6` </details> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

…3552) Closes [TRI-9234](https://linear.app/triggerdotdev/issue/TRI-9234/retry-task-process-sigsegv-errors-respecting-user-retry-config) ## What this changes SIGSEGV crashes (`TASK_PROCESS_SIGSEGV`) will now be **retried when an attempt fails**, in line with the task's configured retry settings (`retry.maxAttempts` etc.) — the same path SIGTERM and uncaught exceptions already use. Previously SIGSEGV was hard-classified as non-retriable and failed the run on the first segfault, ignoring the user's retry policy. Tasks without a retry policy still fail fast on the first SIGSEGV. Behaviour is unchanged for OOM kills (separate machine-bump retry path) and SIGKILL_TIMEOUT. ## Deploy **Only the webapp needs to ship.** The retry decision lives entirely in the webapp: - V2 path: `internal-packages/run-engine` (bundled into the webapp) - V1 path: `apps/webapp/app/v3/services/completeAttempt.server.ts` No supervisor, CLI, SDK, or customer-task-image changes required. Customers do not need to redeploy. The `@trigger.dev/core` changeset is just keeping the public package in sync — the published npm version isn't what makes the fix work. ## Why retry SIGSEGV in Node tasks is frequently non-deterministic across processes: - **Native addon races** (`sharp`, `canvas`, `better-sqlite3`, `node-rdkafka`, `bcrypt`, …) — libuv thread-pool work stepping on V8 handles. Different heap layout / thread schedule on a fresh process → retry often succeeds. - **JIT / GC interaction** — V8 turbofan deopt or GC during a native callback. Timing-dependent. - **Near-OOM in native code** — when RSS approaches the cgroup limit, native allocations fail and poorly-written addons dereference NULL → SIGSEGV instead of clean OOM-kill. - **Host / hardware issues** — bit flips, kernel quirks. Retry lands on a different host. The genuinely deterministic case (a user-code bug always tripping the same addon) is real, but a subset — and `maxAttempts` bounds the damage. ## Pre-existing inconsistency this resolves - `shouldRetryError` returned `false` for `TASK_PROCESS_SIGSEGV` → `fail_run`. - `shouldLookupRetrySettings` already listed `TASK_PROCESS_SIGSEGV` as retry-config-aware — but that branch was unreachable because `shouldRetryError` short-circuited first in `retrying.ts:86-90`. - We already retry `TASK_RUN_UNCAUGHT_EXCEPTION` (clearly a user-code bug) under the user's retry policy; refusing to retry SIGSEGV was the odd one out. ## Test plan - [x] `pnpm exec vitest run test/errors.test.ts` in `packages/core` — 26/26 pass (4 new) - [x] `pnpm run build --filter @trigger.dev/core` - [ ] CI green on PR 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

## Summary Adds `.claude/REVIEW.md` — a repo-specific source of truth for what AI / agent code reviewers should treat as critical in this codebase (rolling-deploy safety, hot-table indexes, recovery-path queries, testcontainers usage, etc.). Pairs with a Claude-based PR audit that flags drift between REVIEW.md and the code as it evolves. ## How the audit works Mirrors the existing `.github/workflows/claude-md-audit.yml` pattern. On non-draft, non-fork PRs that touch code, `anthropics/claude-code-action` reads REVIEW.md, samples the PR diff, and posts a sticky comment with up to 3 of: - `[stale]` — rule cites a path / function / table that's been removed or renamed - `[contradiction]` — code in the PR violates a current rule - `[missing]` — PR introduces a new pattern future reviewers should know about - `[obsolete]` — rule asserts a constraint the repo has moved past If nothing's off, posts `✅ REVIEW.md looks current for this PR.` ## Test plan - [ ] Convert this PR to ready-for-review, confirm the audit runs and posts a sticky comment - [ ] Verify the audit doesn't run on fork PRs (gated by `head.repo.full_name == github.repository`) - [ ] Verify suggestions are actionable on at least one follow-up PR

…3499) ## Summary Consolidates the webapp's authentication and authorization into a small set of route helpers, replacing the ad-hoc `requireUser` / `requireUserId` / `authenticatedEnvironmentForAuthentication` calls scattered across routes. Same security model, but the per-request flow (authenticate → authorize → load) now lives in one place per route family. Introduces a plugin seam (`@trigger.dev/plugins`) that lets the cloud build install a richer RBAC implementation without touching webapp code. The OSS fallback keeps the pre-RBAC permissive behaviour intact, so self-hosted deployments work unchanged. Adds a comprehensive end-to-end auth test suite that didn't exist before — 193 `it()` blocks (vitest reports ~199 after `it.each` expansion) covering API key, PAT and JWT auth across the public API surface, plus dashboard session auth for admin pages. ## Changes ### Plugin contract — `@trigger.dev/plugins` `RoleBaseAccessController` interface authoritative for both OSS (fallback) and cloud (enterprise plugin): - `authenticateBearer(request, { allowJWT? })` — API-key / public-JWT auth, returns env + ability - `authenticateSession(request, { userId, organizationId?, projectId? })` — dashboard auth, caller resolves `userId` from the session cookie and passes it in (no `helpers.getSessionUserId` callback — decouples the plugin host from session-cookie code) - `authenticatePat(request, { organizationId?, projectId? })` — PAT auth, returns identity + `lastAccessedAt` so the host can throttle the per-request update - `authenticateAuthorize*` variants for the auth-and-check-in-one-call cases - `isUsingPlugin(): Promise<boolean>` — capability flag for UI / branching where plugin-present-ness matters; replaces the sentinel-string coupling that had `personalAccessToken.server` matching `"RBAC plugin not installed"` literally ### Dashboard auth (started, partial rollout) Admin and settings pages migrated to a unified `dashboardLoader` / `dashboardAction` helper that authenticates the session, runs an authorization check, and exposes the result to the route. Other dashboard routes still on the old pattern; remaining migration tracked in TRI-8730. Migrated routes: - `admin.*` (14 admin / back-office / feature-flags / LLM-models / notifications / orgs / concurrency pages) - `_app.orgs.$organizationSlug.settings.team` - `_app.orgs.$organizationSlug.settings.roles` ### API / realtime / engine auth (complete for the migrated families) 71 routes migrated to a unified `apiBuilder` that centralizes Bearer / PAT / Public-JWT authentication and applies the per-route authorization check before the handler runs. Includes: - `api.v1.*` and `api.v2.*` and `api.v3.*` — tasks, runs, batches, queues, prompts, deployments, query, sessions, waitpoints, packets, workers, idempotency keys - `realtime.v1.*` — runs, batches, sessions, streams - `engine.v1.*` — dev / worker-action protocols 29 routes still on the legacy `authenticateApiRequest*` helpers — tracked as a post-deploy follow-up in TRI-9228. Multi-resource auth direction is now explicit at the call site via `anyResource(...)` (OR) and `everyResource(...)` (AND). Bare arrays no longer typecheck — fixes a class of bug where a JWT scoped to one resource could implicitly access others under OR semantics. PAT auth path consolidated: was three DB queries per request (legacy `authenticateApiRequestWithPersonalAccessToken` findFirst + `rbac.authenticatePat` join + `lastAccessedAt` update). Now one query in the steady state — plugin returns `lastAccessedAt`, host smart-skips the update via JS-side throttle when fresh. Side effect: action aliases preserved historic JWT scope semantics where the new model is stricter (e.g. a `write:tasks` JWT now also satisfies `trigger` / `batchTrigger` / `update` actions on the same resource — matched at the auth boundary, not in the route handler). ### Backwards-compat fixes The strict-match model regressed several real-world JWT shapes. Each preserved via explicit `anyResource(...)` entries in the route's authz block: - **Batch retrieve routes** (`api.v1.batches.$batchId`, `api.v2.*`, `realtime.v1.batches.*`) accept `read:runs` JWTs again (pre-RBAC literal-match superScope behaviour) - **Runs list routes** (`api.v1.runs`, `realtime.v1.runs`) accept type-level `read:tasks` / `read:tags` on unfiltered queries (matched the legacy `Object.keys` iteration semantic) - **PAT/OAT auth shape** normalized through `toAuthenticated` so all auth methods return the same slim `AuthenticatedEnvironment` (was: API-key returned the slim shape but PAT/OAT returned raw Prisma `Decimal` / no `orgMember`) - **Scope `:` preservation** in resource ids — `read:tags:env:staging` now correctly identifies the tag id as `env:staging`, not `env` ### Slim `AuthenticatedEnvironment` Extracted to `@trigger.dev/core/v3/auth/environment` — a structural shape independent of `@trigger.dev/database`. The plugin contract returns this; webapp consumers import from there; the cloud plugin (Drizzle) returns the same shape without Prisma's `Decimal` class leaking into the public surface. Lets internal-packages (run-engine, etc.) refer to `AuthenticatedEnvironment` without pulling Prisma in. ### Auth test suite (new — `*.e2e.full.test.ts`) 193 e2e tests run against a real spawned webapp + Postgres (no mocks). Coverage matrix: - **API key auth** — read / write / trigger / batchTrigger / deploy actions across runs, batches, deployments, prompts, queues, query, sessions, input-streams, waitpoints, tasks, idempotency keys; multi-key resources (a run carries batch / tag / task identifiers — auth must accept any matching scope) - **Personal Access Token auth** — comprehensive matrix: scope match, scope mismatch, missing scope, expired token, malformed token - **Public JWT auth** — sub-vs-URL environment resolution, expired JWTs, signature verification, scope checking, otu (one-time-use) token semantics, branch-environment signing-key fallback - **Dashboard session auth** — admin-only pages reject non-admins; per-action gating - **Cross-cutting edge cases** — revoked API key grace window, JWT cross-environment isolation, MissingResource branch behaviour ### Hygiene cleanups - Deleted dead `app/services/authorization.server.ts` (legacy `checkAuthorization` + types — no live consumers post-migration) and its orphaned test - Dropped the never-populated `scopes` field from `ApiAuthenticationResultSuccess` - `scheduleEmail` moved out of `email.server.ts` into its own module — breaks a `commonWorker → marqs/V1` import chain that was poisoning the auth test graph - OSS Roles page shows a deployment-aware empty state ("Roles aren't available in this self-hosted deployment" vs the plan-upsell copy) via `rbac.isUsingPlugin()` - Team action handler: explicit per-intent ability gates (`manage:billing` for purchase-seats, `manage:members` for set-role + remove-member with self-leave carve-out) ### Cross-repo coordination All public-package contract changes paired in `triggerdotdev/cloud#763` (rbac-packages branch) — the enterprise plugin implements the same `RoleBaseAccessController` interface against Drizzle. ## Test plan - [x] `pnpm run typecheck --filter webapp` clean - [x] `pnpm --filter webapp exec vitest run --config vitest.e2e.full.config.ts` — 193/193 pass (requires Docker for testcontainers) - [x] Spot-check an authed API endpoint with a valid + invalid API key against a local stack - [x] Spot-check the migrated admin pages render and gate non-admins --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions Bot and others added 4 commits May 12, 2026 11:33

pull Bot locked and limited conversation to collaborators May 12, 2026

pull Bot added the ⤵️ pull label May 12, 2026

pull Bot merged commit e4981d1 into Dustin4444:main May 12, 2026
0 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from triggerdotdev:main#113

[pull] main from triggerdotdev:main#113
pull[bot] merged 4 commits into
Dustin4444:mainfrom
triggerdotdev:main

pull Bot commented May 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pull Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pull Bot commented May 12, 2026 •

edited

Loading