fix: add docstring to defineConfig for maxComputeSeconds PR#6
Open
deepshekhardas wants to merge 292 commits into
Open
fix: add docstring to defineConfig for maxComputeSeconds PR#6deepshekhardas wants to merge 292 commits into
deepshekhardas wants to merge 292 commits into
Conversation
…v#3070) Fixes an issue introduced in triggerdotdev#3024. The behavior for local builds in older CLI versions relies on `externalBuildData` to be defined to distinguish from the self-hosting local build path, even though it doesn't actually use the token.
To get access to the runs from `batch.triggerAndWait` use `results.runs`
…v#3081) Expand documentation for the Vercel integration with detailed usage, installation, environment variable sync, atomic deployments, and environment mapping. Replace the previous "coming soon" placeholder with complete instructions and UI flow for connecting via the Trigger.dev dashboard or the Vercel Marketplace. Explain required GitHub integration, how env vars sync in both directions, which vars are excluded, and how to control sync behavior. Describe atomic deployments (default for production), how they gate Vercel deployments to ensure task/app consistency, and note related configuration changes. Add tips and notes to guide setup and troubleshooting. This provides users with actionable guidance to connect Vercel, map environments, and keep app and tasks in sync without custom CI scripts.
) This will prevent internal logs to be added to the task_events_search_table Closes #<issue> ## ✅ Checklist - [x] I have followed every step in the [contributing guide](https://github.com/triggerdotdev/trigger.dev/blob/main/CONTRIBUTING.md) - [x] The PR title follows the convention. - [x] I ran and tested the code works --- ## Testing Ran the migration, deleted the old invalid rows and ran new tasks. The undesired logs are not added to the table. --- ## Changelog Updated the MATERIALIZED VIEW to also filter for `trace_id != ''` --------- Co-authored-by: Matt Aitken <matt@mattaitken.com>
…dev#3082) Adds a region selector to the Test task page and Replay run dialog, so users can override the region from the dashboard. Disabled with a placeholder for dev environments. Closes triggerdotdev#3016
…ev#3083) Summary - Only render the top 50 series - Improved rendering performance on bar charts
…ev#3087) Closes #<issue> ## ✅ Checklist - [ ] I have followed every step in the [contributing guide](https://github.com/triggerdotdev/trigger.dev/blob/main/CONTRIBUTING.md) - [ ] The PR title follows the convention. - [ ] I ran and tested the code works --- ## Testing N/A - Documentation and OpenAPI schema updates only. --- ## Changelog Added comprehensive Queue Management API support: **OpenAPI Endpoints:** - `GET /api/v1/queues` - List all queues with pagination support - `GET /api/v1/queues/{queueParam}` - Retrieve a specific queue by ID, task ID, or custom queue name - `POST /api/v1/queues/{queueParam}/pause` - Pause or resume a queue - `POST /api/v1/queues/{queueParam}/concurrency/override` - Override queue concurrency limits - `POST /api/v1/queues/{queueParam}/concurrency/reset` - Reset concurrency limits to base values **Schema Definitions:** - `QueueObject` - Complete queue representation with concurrency details - `ListQueuesResult` - Paginated queue listing response **Documentation:** - Updated `queue-concurrency.mdx` with SDK usage examples for queue management - Added 5 new management API documentation pages for each endpoint - Updated `docs.json` navigation structure with new "Queues API" section All endpoints support flexible queue identification (by ID, task ID, or custom queue name) and include TypeScript code samples. --- ## Screenshots N/A 💯 https://claude.ai/code/session_01LyrXwxHCbejvi34fykifPP Co-authored-by: Claude <noreply@anthropic.com>
…rdotdev#3090) - If your project is v3, show a v3 deprecation panel in the side menu - If there is an active incident, show the incident panel instead - Links to the Migration guide in the docs - Displays when the side menu is collapsed https://github.com/user-attachments/assets/f8492713-c58b-4f83-bcce-0e85f4a967ef <img width="972" height="694" alt="CleanShot 2026-02-19 at 08 13 59@2x" src="https://github.com/user-attachments/assets/0599dd20-d598-48c6-b83c-208648cee071" />
This PR was opened by the [Changesets release](https://github.com/changesets/action) GitHub action. When you're ready to do a release, you can merge this and publish to npm yourself or [setup this action to publish automatically](https://github.com/changesets/action#with-publishing). If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated. # Releases ## @trigger.dev/sdk@4.4.0 ### Minor Changes - Added `query.execute()` which lets you query your Trigger.dev data using TRQL (Trigger Query Language) and returns results as typed JSON rows or CSV. It supports configurable scope (environment, project, or organization), time filtering via `period` or `from`/`to` ranges, and a `format` option for JSON or CSV output. ([triggerdotdev#3060](triggerdotdev#3060)) ```typescript import { query } from "@trigger.dev/sdk"; import type { QueryTable } from "@trigger.dev/sdk"; // Basic untyped query const result = await query.execute("SELECT run_id, status FROM runs LIMIT 10"); // Type-safe query using QueryTable to pick specific columns const typedResult = await query.execute<QueryTable<"runs", "run_id" | "status" | "triggered_at">>( "SELECT run_id, status, triggered_at FROM runs LIMIT 10" ); typedResult.results.forEach((row) => { console.log(row.run_id, row.status); // Fully typed }); // Aggregation query with inline types const stats = await query.execute<{ status: string; count: number }>( "SELECT status, COUNT(*) as count FROM runs GROUP BY status", { scope: "project", period: "30d" } ); // CSV export const csv = await query.execute("SELECT run_id, status FROM runs", { format: "csv", period: "7d", }); console.log(csv.results); // Raw CSV string ``` ### Patch Changes - Add `maxDelay` option to debounce feature. This allows setting a maximum time limit for how long a debounced run can be delayed, ensuring execution happens within a specified window even with continuous triggers. ([triggerdotdev#2984](triggerdotdev#2984)) ```typescript await myTask.trigger(payload, { debounce: { key: "my-key", delay: "5s", maxDelay: "30m", // Execute within 30 minutes regardless of continuous triggers }, }); ``` - Aligned the SDK's `getRunIdForOptions` logic with the Core package to handle semantic targets (`root`, `parent`) in root tasks. ([triggerdotdev#2874](triggerdotdev#2874)) - Export `AnyOnStartAttemptHookFunction` type to allow defining `onStartAttempt` hooks for individual tasks. ([triggerdotdev#2966](triggerdotdev#2966)) - Fixed a minor issue in the deployment command on distinguishing between local builds for the cloud vs local builds for self-hosting setups. ([triggerdotdev#3070](triggerdotdev#3070)) - Updated dependencies: - `@trigger.dev/core@4.4.0` ## @trigger.dev/build@4.4.0 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.0` ## trigger.dev@4.4.0 ### Patch Changes - Fix runner getting stuck indefinitely when `execute()` is called on a dead child process. ([triggerdotdev#2978](triggerdotdev#2978)) - Add optional `timeoutInSeconds` parameter to the `wait_for_run_to_complete` MCP tool. Defaults to 60 seconds. If the run doesn't complete within the timeout, the current state of the run is returned instead of waiting indefinitely. ([triggerdotdev#3035](triggerdotdev#3035)) - Fixed a minor issue in the deployment command on distinguishing between local builds for the cloud vs local builds for self-hosting setups. ([triggerdotdev#3070](triggerdotdev#3070)) - Updated dependencies: - `@trigger.dev/core@4.4.0` - `@trigger.dev/build@4.4.0` - `@trigger.dev/schema-to-json@4.4.0` ## @trigger.dev/core@4.4.0 ### Patch Changes - Add `maxDelay` option to debounce feature. This allows setting a maximum time limit for how long a debounced run can be delayed, ensuring execution happens within a specified window even with continuous triggers. ([triggerdotdev#2984](triggerdotdev#2984)) ```typescript await myTask.trigger(payload, { debounce: { key: "my-key", delay: "5s", maxDelay: "30m", // Execute within 30 minutes regardless of continuous triggers }, }); ``` - Fixed a minor issue in the deployment command on distinguishing between local builds for the cloud vs local builds for self-hosting setups. ([triggerdotdev#3070](triggerdotdev#3070)) - fix: vendor superjson to fix ESM/CJS compatibility ([triggerdotdev#2949](triggerdotdev#2949)) Bundle superjson during build to avoid `ERR_REQUIRE_ESM` errors on Node.js versions that don't support `require(ESM)` by default (< 22.12.0) and AWS Lambda which intentionally disables it. - Add Vercel integration support to API schemas: `commitSHA` and `integrationDeployments` on deployment responses, and `source` field for environment variable imports. ([triggerdotdev#2994](triggerdotdev#2994)) ## @trigger.dev/python@4.4.0 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.0` - `@trigger.dev/sdk@4.4.0` - `@trigger.dev/build@4.4.0` ## @trigger.dev/react-hooks@4.4.0 ### Patch Changes - Fix `onComplete` callback firing prematurely when the realtime stream disconnects before the run finishes. ([triggerdotdev#2929](triggerdotdev#2929)) - Updated dependencies: - `@trigger.dev/core@4.4.0` ## @trigger.dev/redis-worker@4.4.0 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.0` ## @trigger.dev/rsc@4.4.0 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.0` ## @trigger.dev/schema-to-json@4.4.0 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.0` --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
<!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/triggerdotdev/trigger.dev/pull/2985"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end -->
- Adds an end-to-end OTEL metrics pipeline: task workers collect and export metrics via OpenTelemetry, the webapp ingests them into ClickHouse, and they're queryable through the existing dashboard query engine - Workers emit process CPU/memory metrics (via `@opentelemetry/host-metrics`) and Node.js runtime metrics (event loop utilization, event loop delay, heap usage) - Users can create custom metrics in their tasks via `otel.metrics.getMeter()` from `@trigger.dev/sdk` - Metrics are automatically tagged with run context (run ID, task slug, machine, worker version) so they can be sliced per-run, per-task, or per-machine - The TSQL query engine gains metrics table support with typed attribute columns, `prettyFormat()` for human-readable values, and per-schema time bucket thresholds - Includes reference tasks (`references/hello-world/src/trigger/metrics.ts`) demonstrating CPU-intensive, memory-ramp, bursty workload, and custom metrics patterns ## What changed ### Metrics collection (packages/core, packages/cli-v3) - **Metrics export pipeline** — `TracingSDK` now sets up a `MeterProvider` with a `PeriodicExportingMetricReader` that chains through `TaskContextMetricExporter` (adds run context attributes) and `BufferingMetricExporter` (batches exports to reduce overhead) - **Host metrics** — Enabled `@opentelemetry/host-metrics` for process CPU, memory, and system-level metrics - **Node.js runtime metrics** — New `nodejsRuntimeMetrics.ts` module using `performance.eventLoopUtilization()`, `monitorEventLoopDelay()`, and `process.memoryUsage()` to emit 6 observable gauges - File system and diskio metrics - **Custom metrics** — Exposed `otel.metrics` from `@trigger.dev/sdk` so users can create counters, histograms, and gauges in their tasks - **Machine ID** — Stable per-worker machine identifier for grouping metrics - **Dev worker** — Drops `system.*` metrics to reduce noise, keeps sending metrics between runs in warm workers ### Metrics ingestion (apps/webapp) - **OTEL endpoint** — `otel.v1.metrics.ts` accepts OTEL metric export requests (JSON and protobuf), converts to ClickHouse rows - **ClickHouse schema** — `017_create_metrics_v1.sql` with 10-second aggregation buckets, JSON attributes column, 60-day TTLs ### Query engine (internal-packages/tsql, apps/webapp) - **Metrics query schema** — Typed columns for metric attributes (`task_identifier`, `run_id`, `machine_name`, `worker_version`, etc.) extracted from the JSON attributes column - **`prettyFormat()`** — TSQL function that annotates columns with format hints (`bytes`, `percent`, `durationSeconds`) for frontend rendering without changing the underlying data - **Per-schema time buckets** — Different tables can define their own time bucket thresholds (metrics uses tighter intervals than runs) - **AI query integration** — The AI query service knows about the metrics table and can generate metric queries - **Chart improvements** — Better formatting for byte values, percentages, and durations in charts and tables ### Reference project - **`references/hello-world/src/trigger/metrics.ts`** — 6 example tasks: `cpu-intensive`, `memory-ramp`, `bursty-workload`, `sustained-workload`, `concurrent-load`, `custom-metrics` ## Test plan - [ ] Build all packages and webapp - [ ] Start dev worker with hello-world reference project - [ ] Run `cpu-intensive`, `memory-ramp`, and `custom-metrics` tasks - [ ] Verify metrics in ClickHouse: `SELECT DISTINCT metric_name FROM metrics_v1` - [ ] Query via dashboard AI: "show me CPU utilization over time" - [ ] Verify `prettyFormat` renders correctly in chart tooltips and table cells - [ ] Confirm dev worker drops `system.*` metrics but keeps `process.*` and `nodejs.*`
…ookie call (triggerdotdev#3104) Co-authored-by: Oskar Otwinowski <oskar.otwinowski@gmail.com>
…-from-v3 (triggerdotdev#3098) Adds a deprecation warning at the top of the migrating-from-v3 page and updates the “Migrate using AI” prompt and intro
…ctory warning (triggerdotdev#3097) Adds a direct Vercel Marketplace link, documents configuring build options via the project config page, and adds a warning and workaround for projects using a Vercel Root Directory
This PR was opened by the [Changesets release](https://github.com/changesets/action) GitHub action. When you're ready to do a release, you can merge this and publish to npm yourself or [setup this action to publish automatically](https://github.com/changesets/action#with-publishing). If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated. # Releases ## @trigger.dev/build@4.4.1 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.1` ## trigger.dev@4.4.1 ### Patch Changes - Add OTEL metrics pipeline for task workers. Workers collect process CPU/memory, Node.js runtime metrics (event loop utilization, event loop delay, heap usage), and user-defined custom metrics via `otel.metrics.getMeter()`. Metrics are exported to ClickHouse with 10-second aggregation buckets and 1m/5m rollups, and are queryable through the dashboard query engine with typed attribute columns, `prettyFormat()` for human-readable values, and AI query support. ([triggerdotdev#3061](triggerdotdev#3061)) - Updated dependencies: - `@trigger.dev/build@4.4.1` - `@trigger.dev/core@4.4.1` - `@trigger.dev/schema-to-json@4.4.1` ## @trigger.dev/python@4.4.1 ### Patch Changes - Updated dependencies: - `@trigger.dev/sdk@4.4.1` - `@trigger.dev/build@4.4.1` - `@trigger.dev/core@4.4.1` ## @trigger.dev/react-hooks@4.4.1 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.1` ## @trigger.dev/redis-worker@4.4.1 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.1` ## @trigger.dev/rsc@4.4.1 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.1` ## @trigger.dev/schema-to-json@4.4.1 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.1` ## @trigger.dev/sdk@4.4.1 ### Patch Changes - Add OTEL metrics pipeline for task workers. Workers collect process CPU/memory, Node.js runtime metrics (event loop utilization, event loop delay, heap usage), and user-defined custom metrics via `otel.metrics.getMeter()`. Metrics are exported to ClickHouse with 10-second aggregation buckets and 1m/5m rollups, and are queryable through the dashboard query engine with typed attribute columns, `prettyFormat()` for human-readable values, and AI query support. ([triggerdotdev#3061](triggerdotdev#3061)) - Updated dependencies: - `@trigger.dev/core@4.4.1` ## @trigger.dev/core@4.4.1 --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Small fixes and improvements to the logs page: - Clicking the Run ID didn't open inspector - Swapped the "open link in tab" icon with Runs icon - Prevent tooltip hovering on Level info <img width="350" height="206" alt="CleanShot 2026-02-20 at 10 00 37@2x" src="https://github.com/user-attachments/assets/3e82f24a-c0a1-4c01-a8e9-9e06a8af982a" />
…rdotdev#3113) Without doing an expensive query we can’t tell if it’s definitely a v3 projects – like getting run counts. So let’s just assume if the project hasn’t been upgraded to v4 (by running dev/deploy CLI with v4) AND the project is older than the v4 release then it’s v3.
…#3108) ## ✅ Checklist - [x] I have followed every step in the [contributing guide](https://github.com/triggerdotdev/trigger.dev/blob/main/CONTRIBUTING.md) - [x] The PR title follows the convention. - [x] I ran and tested the code works --- ## Testing Slack + GitHub + Vercel + Builds + Deployments --- ## Changelog Settings changes: - Split general from integrations - Add new Slack section to org level integrations Vercel improvements: - bugfix for TRIGGER_SECRET_KEY collision - onboarding improvements for connecting to projects - new loops event Slack improvements: - nicer alerts Webhook/Email alerts: - rich events with Github & Vercel integration data --- ## Screenshots <img width="2550" height="652" alt="Screenshot 2026-02-20 at 21 53 34" src="https://github.com/user-attachments/assets/8d7c9f1d-5fe9-4516-8fb3-885460b4207f" /> <img width="843" height="710" alt="Screenshot 2026-02-23 at 10 55 54" src="https://github.com/user-attachments/assets/8ea72c1f-431b-493c-b9a9-8076cce12262" /> <img width="765" height="466" alt="Screenshot 2026-02-20 at 21 52 46" src="https://github.com/user-attachments/assets/157fafb8-b7bf-499d-8953-c2aed5e44ce0" /> <img width="691" height="261" alt="Screenshot 2026-02-20 at 22 04 24" src="https://github.com/user-attachments/assets/3aea7369-2008-4af8-a9c0-5fbfa2cc381d" /> <img width="2032" height="1114" alt="Screenshot 2026-02-19 at 14 48 49" src="https://github.com/user-attachments/assets/dc10c14e-cd15-445a-b5be-d694d29d20e5" /> <img width="2032" height="1114" alt="Screenshot 2026-02-19 at 14 49 04" src="https://github.com/user-attachments/assets/1ef591fd-fd00-430a-9649-8b18cff9586d" /> <img width="1583" height="1115" alt="Screenshot 2026-02-19 at 17 32 56" src="https://github.com/user-attachments/assets/c5c8f318-d193-4dd4-86f7-1cc4bbcc4e0c" /> <img width="422" height="187" alt="Screenshot 2026-02-20 at 21 57 41" src="https://github.com/user-attachments/assets/37865cb6-4c0d-40ef-9c60-7b057d546c61" /> <img width="1583" height="1115" alt="Screenshot 2026-02-19 at 17 33 06" src="https://github.com/user-attachments/assets/e9180e8e-e611-4734-9232-80c62ff863ad" /> 💯
…aitpoint creation (triggerdotdev#2980) This PR implements a new run TTL system and queue size limits to prevent unbounded queue growth which should help prevent situations where queues enter a "death spiral" where the queue will never be able to catch up. The main/correct way to battle this situation is to enforce a maximum TTL on all runs (e.g. up to 14 days) where runs that have been queued for that maximum TTL will get auto-expired, making room for newer runs to execute. This required creating a new TTL system that can handle higher workloads and is now deeply integrated into the RunQueue. When runs are enqueued with a TTL, they are added to their normal queue as well as to the TTL queue. When runs are dequeued, they are removed from both their normal queue and the TTL queue. If runs are dequeued by the TTL system, they are removed from their normal queue. Both these dequeues happen automatically so there is no race condition. The TTL expiration system is also made reliable by expiring runs via a Redis worker, which is enqueued to atomically inside the TTL dequeue lua script. ### Optional associated waitpoints Additionally, this PR implements an optimization where runs that aren't triggered with a dependent parent run will no longer create an associated waitpoint. Associated waitpoints are then lazily created if a dependent run wants to wait for the child run post-facto (via debounce or idempotency), which is a rare situation but is possible. This means fewer waitpoint creations but also fewer waitpoint completions for runs with no dependencies. ### Environment Queue Limits Prevents any single queue growing too large by enforcing queue size limits at trigger time. - Queue size checks happen at trigger time - runs are rejected if queue would exceed limit - Dashboard UI shows queue limits on both the Queues page and a new Limits page - In-memory caching for queue size checks to reduce Redis load ### Batch trigger fixes Currently when a batch item cannot be created for whatever reason (e.g. queue limits) the run will never get created, which means a stalled run if using `batchTriggerAndWait`. We've updated the system to handle this differently: now when a batch item cannot be triggered and converted into a run, we will eventually (after retrying 8 times up to 30s) we will create a "pre-failed" run with the error details, correctly resolving the batchTriggerAndWait.
…rom the whole system (triggerdotdev#3115)
…tdev#3119) Adds a warning to the onCancel docs clarifying that the hook only fires when a run is actively executing
Adds API reference pages for three previously undocumented run endpoints: retrieve run events, retrieve run trace, and add tags to a run.
Adds OpenAPI specs and sidebar pages for four previously undocumented public endpoints: retrieve run result, per-task batch trigger, retrieve batch, and retrieve batch results.
…#3489) ## Summary - Upgrade pnpm from 10.23.0 → 10.33.2 (latest minor) - Enable `blockExoticSubdeps: true` for supply-chain defense - Update all version references across the repo ## Security improvements in 10.28.2+ - Path traversal protection in `directories.bin` - Symlink-escape protection for `file:/git:` dependencies (prevents reading `/etc/passwd`, `~/.ssh/...`) - https://pnpm.io/settings#blockexoticsubdeps ## Files updated - `package.json` — `packageManager` field - `docker/Dockerfile` — 5 `corepack prepare` calls - `apps/supervisor/Containerfile` — 1 `corepack prepare` call - `pnpm-workspace.yaml` — added `blockExoticSubdeps: true` - `CLAUDE.md`, `AGENTS.md`, `CONTRIBUTING.md`, `ai/references/repo.md` — version references ## Verification - `pnpm install --frozen-lockfile` succeeds (no lockfile regen needed) - `pnpm install` (plain) produces zero lockfile diff - All CI checks pass Slack thread: https://triggerdotdev.slack.com/archives/C061L2MHW93/p1777625600974279?thread_ts=1777622248.762639&cid=C061L2MHW93 https://claude.ai/code/session_01G759MUqmjsPh9k1qDxbdjG --------- Co-authored-by: Claude <noreply@anthropic.com>
… disconnect (triggerdotdev#3491) Orphaned `trigger-dev-run-worker` processes were pinning CPU at 100% after the dev CLI exited — stuck in an uncaughtException feedback loop where a closed IPC channel kept throwing `ERR_IPC_CHANNEL_CLOSED` back into a handler that itself called `process.send`. Fix: - `ZodIpcConnection` no-ops sends when the channel is disconnected. - Dev workers exit on `process.disconnect` instead of being re-parented to init. - All worker `uncaughtException` handlers route through a `safeSend` guard so the handler can never re-enter itself. Verified end-to-end: `kill -9` of the dev CLI now cleans up all child workers within ~2s.
…tdev#3502) Updates the compute private beta page with the May 1 release entry, plus a deploy-time warning when `us-east-1-next` is the project default. The new What's new entry, verbatim: ### May 1, 2026 - **Cold starts are faster across all machine sizes.** Every preset starts faster, including `micro` and `small-1x` - there's no longer a cold-start penalty for picking a smaller machine. - **First runs after a deploy are faster on every preset.** Boot snapshot creation is significantly quicker across the board, so the cold path is consistently snappier. - **`large-1x` and `large-2x` no longer hard-fail.** They're still not recommended - cold-start performance trails the smaller presets and we're ironing out reliability issues. Follow-up to triggerdotdev#3472 and triggerdotdev#3479.
Adds an `actionlint` job that runs on changes to `.github/workflows/**` and `.github/actions/**`. Catches workflow bugs at PR time — expression typos, deprecated runner labels, broken matrices, and shellcheck issues in `run:` blocks. Run from the official `docker://rhysd/actionlint` image, digest-pinned alongside everything else. Existing workflows had 6 shellcheck findings, all fixed.
Follow-up to the v4.4.5 release incident where the release PR (triggerdotdev#3406) was merged with a stale lockfile and stale Chart.yaml, breaking npm + helm releases. The two automation jobs (`update-lockfile`, `bump-chart-version`) got cancelled mid-flight by `cancel-in-progress` when the merge fired the workflow again on `main`. This restructures `changeset:version` so all the post-version-bump fixups happen in the same script and end up in a single atomic commit on `changeset-release/main`, via `changesets/action`'s normal commit step. Pattern borrowed from Cloudflare workers-sdk, Astro, shadcn/ui. ## Before ``` push: main └── release-pr (changeset version → bumps package.jsons, opens PR) └── update-lockfile (separate job, separate commit) └── bump-chart-version (separate job, separate commit) ``` Three jobs, three commits to the release branch. ## After ``` push: main └── release-pr └── changesets/action runs: changeset version pnpm install --lockfile-only node scripts/bump-helm-chart.mjs node scripts/cleanup-server-changes.mjs ...all staged and committed as ONE commit by the action ``` One job, one commit.
…otdev#3509) ## Summary Delete 34 `.server-changes/*.md` files that should have been cleaned up automatically when v4.4.5 (triggerdotdev#3406) was merged but were stranded by a workflow race. ## Why these are stale The `update-lockfile` job in `.github/workflows/changesets-pr.yml` is what cleans up consumed `.server-changes/*.md` files on the release branch. When v4.4.5 was merged on 2026-05-01, the post-merge workflow run on `main` failed at `pnpm install --frozen-lockfile` (stale lockfile in the merge commit), and `cancel-in-progress: true` cancelled the in-flight run from the previous push — so `update-lockfile` never reached the cleanup step. Result: the 34 files described changes that v4.4.5 already shipped, and they were re-appearing in the v4.4.6 release PR (triggerdotdev#3501) under "Server changes" plus showing up as deletions in its diff. ## What this PR keeps - `fix-rollback-schedule-sync.md` — genuinely new for v4.4.6 (triggerdotdev#3468), the only server change introduced after v4.4.5 - `README.md`, `.gitkeep` — directory infrastructure - `dev-cli-disconnect-md` — leaving alone (typo'd filename from March, no `.md` extension, not picked up by the cleanup glob anyway) ## After merge The next run of `changesets-pr.yml` will refresh triggerdotdev#3501 with a "Server changes" section that only lists the v4.4.6 entry, and the only `.server-changes/` deletion in its diff will be `fix-rollback-schedule-sync.md`. ## Related - triggerdotdev#3505 is the proper underlying fix — collapses the three-job graph into a single atomic commit by `changesets/action` so this race can't strand the cleanup again. This PR is just the one-time catch-up for the files that already got stranded.
…dotdev#3504) Reported by external contributor. The supervisor template hardcoded a short DNS name for `OTEL_EXPORTER_OTLP_ENDPOINT`, which the supervisor then propagates verbatim into runner pods (`apps/supervisor/src/workloadManager/kubernetes.ts:196`). When runners are spawned in a different namespace via `supervisor.config.kubernetes.namespace`, the short name doesn't resolve and span/log export silently fails - runs complete fine but the dashboard shows nothing. Same FQDN pattern the chart already uses for `TRIGGER_WORKLOAD_API_DOMAIN` (line 203). Verified with `helm template trigger . --namespace my-ns` - renders `http://trigger-webapp.my-ns.svc.cluster.local:3030/otel`. Cheers Niels
<img width="2284" height="2028" alt="CleanShot 2026-05-01 at 18 53 50@2x" src="https://github.com/user-attachments/assets/4f58cbb1-0168-40fb-a523-017f2ba625a1" /> ## Performance - **Per-request DB hit**: `getUserId` runs `getEffectiveSessionDuration` (User lookup + Org `aggregate`) on *every* authenticated request, including each fetcher poll. Consider caching the effective duration in the session cookie with a short TTL (e.g. 60s) and revalidating in the background. - **Double session commit in `root.tsx`**: `getUser` already runs the expiry check; then `commitAuthenticatedSessionLazy` commits the cookie again. Fine, but doubles `Set-Cookie` headers on every page load — worth a quick perf check. ## Correctness / Edge cases - **Lazy backfill assumes a root.tsx hit first**: users whose first post-deploy request is a fetcher/API route (`/resources/*`) skip the backfill until they navigate to a page. Not a security hole, but `getUserId` could backfill itself for completeness. - **No upper bound on `Organization.maxSessionDuration`**: admin API accepts `1` second, which would instant-logout every member on next request. Add a `min(60)` (or `min(300)` to match the lowest user option) to the Zod schema. - **No clock-skew tolerance**: `isSessionExpired` is exact-millisecond. Multi-instance deploys with skewed clocks could log users out a few seconds early/late. Probably fine for the 5-min minimum, but worth noting. ## Security - **Auto-logout audit log lacks IP/orgId**: HIPAA forensics typically wants source IP and which org context. Currently logs only `userId` + path. IP isn't PII for audit purposes; orgIds help correlate. Add both. - **Cookie `Max-Age` is 1 year regardless of user's setting**: intentional (server-side `issuedAt` is the source of truth), but reviewers will ask. Add a one-line comment on the cookie config explaining why. ## API surface - **`maxSessionDuration` is admin-PAT only**: no in-app UI for org owners to set/change their own cap. If this is "Trigger staff sets it during HIPAA onboarding", say so in the PR description; otherwise add an org-settings UI. - **Auto-submit dropdown has no confirmation**: misclicking "5 minutes" immediately shortens the user's session window with no undo. Consider a save button or 3-sec undo toast. ## Schema / migration - **`User.sessionDuration NOT NULL DEFAULT 31556952`**: instant on PG 11+ (metadata-only), but call out in the PR description so reviewers don't worry about a table rewrite on the User table. - **No DB-level constraint matching `SESSION_DURATION_OPTIONS`**: if the option list changes, existing users keep orphaned values. The dropdown's tag-along behaviour hides this — fine for now, but if you ever drop an option you'll need a backfill. ## UX - **Session expiry only fires on next request**: an idle authenticated tab keeps showing UI past the cap (until SSE/polling catches it, ~60s). Add a client-side timer based on the user's effective duration that triggers a fetcher to `/account` or `/logout` at expiry. - **No "you were signed out" message on logout**: users hitting their cap are bounced to `/` with no explanation. Was intentionally reverted in this PR — call that out so reviewers don't request it. ## Tests - Unit coverage on `sessionDuration.server.ts` is solid (215 lines). Missing: integration test for `getUserId` → expired session → redirect to `/logout`, and one for the loader's clamping fix (the most recent bug). Add at least the second one to lock in the regression. --------- Co-authored-by: Matt Aitken <matt@mattaitken.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…iCache IP-finding tip and NLB inbound-rules step (triggerdotdev#3517)
Adds zizmor alongside the actionlint job from triggerdotdev#3503. Both now run as parallel jobs in a single `.github/workflows/workflow-checks.yml`, triggered on `.github/workflows/**` and `.github/actions/**` changes. Zizmor is configured with `unpinned-uses: hash-pin` policy via `.github/zizmor.yml`, so any future unpinned action will fail CI. Findings upload SARIF to the Security tab alongside CodeQL. Bulk of the diff is cleanup of the findings zizmor surfaced on first run. `zizmor --fix=all` handled most of them mechanically; the rest were judgment calls.
## Summary Move from a single shared S2 basin to **per-org basins** with retention tied to the org's billing plan. Stops S2 from deleting streams out from under live chat sessions when basin retention fires before the chat ends, and unlocks per-org cost attribution. OSS / s2-lite installs are unaffected: provisioning is gated by `REALTIME_STREAMS_PER_ORG_BASINS_ENABLED` (default `false`), and the read precedence falls back to the global basin env var when an entity has no stamped basin. ``` basin = run.streamBasinName ?? session.streamBasinName ?? env.REALTIME_STREAMS_S2_BASIN ``` ## Design Three nullable `streamBasinName` columns (`Organization`, `TaskRun`, `Session`) plus a provisioner that idempotently creates the basin and reconfigures retention on plan changes. The trigger and session-create paths stamp the org's basin onto new rows; the realtime read path picks the basin from the entity context. Admin routes back-fill existing orgs and force-reconfigure a single org. ## Test plan - [x] `pnpm run typecheck --filter webapp --filter @internal/run-engine` - [x] Backfill admin route end-to-end (provision + DB stamp + S2 basin config). - [x] Reconfigure on plan change (all retention tiers). - [x] chat.agent multi-turn drives streams into the per-org basin. - [x] Legacy fallback when entity has no stamped basin. - [x] Provisioner is a no-op when the flag is off.
triggerdotdev#3524) Fixes triggerdotdev#3520. The bundled bitnami clickhouse subchart was pinned at `9.3.7` (clickhouse `25.6.1-debian-12-r0`), which hits a memory-tracker accounting bug under sustained ingest - the global counter overflows to ~7 EiB and every query gets rejected by OvercommitTracker until the pod is restarted. Self-hosters running 4.0.5 through 4.4.5 are exposed regardless of chart version since the subchart pin hadn't moved. Bumping to `9.4.4` (clickhouse `25.7.5-debian-12-r0`) pulls in the 25.7.x memory-tracker fixes. This is also the latest publicly packaged release at `oci://registry-1.docker.io/bitnamicharts` - that registry has been frozen since 2025-08-28 (Bitnami catalog changes), but the chart source remains under Apache 2 on `bitnami/charts`. The image continues to resolve via `bitnamilegacy/clickhouse` per the existing `values.yaml` override, since `bitnami/clickhouse` itself moved to paid-only. Verified locally: `helm dependency update` + `helm lint` + `helm template` + kubeconform across all 57 rendered manifests. Rendered statefulset image is `docker.io/bitnamilegacy/clickhouse:25.7.5-debian-12-r0`.
…rdotdev#3523) `dac9c83bd` added `ignoreErrors: /^ServiceValidationError(?::|$)/` in `apps/webapp/sentry.server.ts` to drop SVEs before they reach Sentry. The filter only matches when the captured event's *type* is `ServiceValidationError`, but nine call sites in the webapp catch SVE (and analogous user-input error types — `OutOfEntitlementError`, `CreateDeclarativeScheduleError`, `QueryError`) and call `logger.error("wrapper message", { error: e })` *before* the type check. The captured event is then titled with the wrapper message, with the inner error buried in `extra.error` — invisible to the SDK filter. Result: a steady stream of expected user-input failures escalating as `error`-level events when they should be `warn`. Each catch block now type-discriminates first, logs expected types at `warn`, and keeps unknown-error fall-throughs at `error`. For service sites that wrap into SVE (`createBackgroundWorker`, `createDeploymentBackgroundWorkerV4`), the inner error is logged at `error` before wrapping — mirrors the `waitpointCompletionPacket.server.ts` pattern from `dac9c83bd`. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… create (triggerdotdev#3525) Large deploys (projects with many tasks or source files) blocked the webapp event loop for several seconds inside Prisma's client-side serializer on `BackgroundWorker.create`, tail-latencying every other in-flight request on the same Node process. The `metadata` JSON column was being written with the full deploy manifest — every task's config, every queue and prompt, and the full source of every file — all of which already live on dedicated columns or in dedicated tables. Fix: project the manifest to `{ packageVersion, contentHash, tasks: [{ id, filePath, schedule }] }` on insert. The only post-write read site is `changeCurrentDeployment`, which feeds `tasks[].schedule` into `syncDeclarativeSchedules` at deploy promotion. The retained top-level keys and per-task `filePath` are kept solely so `BackgroundWorkerMetadata.safeParse` still succeeds on read. ## Test plan - [ ] Deploy a project with declarative schedules; verify schedules are created on first deploy - [ ] Modify / remove schedules across subsequent deploys; verify sync - [ ] Roll back to a previous deploy; verify `changeCurrentDeployment` re-syncs schedules - [ ] Inspect `BackgroundWorker.metadata` on a fresh deploy — should be a small object, not the full manifest
…dev#3528) - Tags webapp images by full commit SHA on `main` pushes (`ghcr.io/triggerdotdev/trigger.dev:<sha>`) so any commit can be resolved to a digest easily. - Adds OCI labels (`source`, `revision`, `version`, `created`) so `docker inspect`, vulnerability scanners, and registry browsers see source/commit/version directly. - Signs each pushed digest with SLSA build provenance via `actions/attest-build-provenance@v4.1.0` (pinned by SHA), enabling `gh attestation verify oci://...` against the source commit and workflow.
…xDuration (TRI-9117) (triggerdotdev#3529) When a Node EventEmitter (e.g. node-redis) emits an "error" event with no listener attached, Node escalates it to process.on("uncaughtException") in the task worker. The worker reported the error via the UNCAUGHT_EXCEPTION IPC event but did not exit, and the supervisor-side handler in taskRunProcess only logged the message at debug level — leaving the run() promise orphaned until maxDuration fired and producing empty attempts (durationMs=0, costInCents=0). The supervisor now rejects the in-flight attempt with an UncaughtExceptionError and gracefully terminates the worker (preserving the OTEL flush window) on UNCAUGHT_EXCEPTION. The attempt fails fast with TASK_EXECUTION_FAILED, surfacing the original error name, message, and stack trace, and falls under the normal retry policy. This mirrors the existing indexing-side behavior in indexWorkerManifest. Apply the same handling to unhandled promise rejections, which Node already routes through uncaughtException by default.
…otdev#3532) ## Summary Both Claude Code workflows (`claude.yml` and `claude-md-audit.yml`) authenticated via `CLAUDE_CODE_OAUTH_TOKEN`, which broke when the org disabled Claude subscription access for Claude Code: > Your organization has disabled Claude subscription access for Claude Code · Use an Anthropic API key instead, or ask your admin to enable access This switches both workflows to `anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}` (secret already added to the repo). ## Test plan - [ ] Confirm `📝 CLAUDE.md Audit` runs to completion on this PR - [ ] Confirm `@claude` mention in a PR comment still triggers the `Claude Code` workflow successfully
…dotdev#3531) ## Summary Stamps the active OpenTelemetry `trace_id` and `span_id` onto every Sentry event captured from the webapp, so engineers can copy a `trace_id` from a Sentry issue and search for the corresponding trace in any OTel-aware backend. Also adds an `otel_sampled` tag to indicate whether the trace was head-sampled — a cheap signal for whether the link will resolve to span data or hit a missing trace. ## Why Sentry and OTel were OTel-disconnected: `apps/webapp/sentry.server.ts` initialised Sentry with `skipOpenTelemetrySetup: true`, and no error-capture site (`logger.server.ts`, the Remix-wrapped `handleError`, the root `ErrorBoundary`) attached OTel context to the event. With many spans/sec across services, getting from a Sentry issue to its trace was guesswork. ## Approach Single global Sentry event processor, registered immediately after `Sentry.init`. On each event it reads `trace.getActiveSpan()?.spanContext()` via `@opentelemetry/api`, then writes: - `event.contexts.trace.trace_id` and `event.contexts.trace.span_id` (Sentry's native trace context fields) - `event.tags.otel_sampled` = `"true"` | `"false"` (derived from `traceFlags`) If no active span (module-load errors, scheduled timers without a context, primary cluster process), the processor returns the event unmodified — Sentry's default propagation context fills in. Implementation is co-located in `apps/webapp/sentry.server.ts` (no separate helper module — `sentry.server.ts` is built standalone by esbuild and a separate import would have required a new bundling step). Helper functions are exported so the unit tests can reach them without re-running `Sentry.init`. ## Non-goals (deliberate) - No sample rate change. ~95% of Sentry events will carry a `trace_id` that returns no spans in the tracing backend (head-sampled out). The `otel_sampled` tag makes that obvious at a glance. Raising find-rate is a separate conversation with cost trade-offs. - No user/org tags or `Sentry.setUser` (would need auth-helper + per-request scope wiring across multiple worker entrypoints — separate ticket). - Webapp image only. No changes to supervisor or CLI workers. ## Test plan - [x] Unit tests in `apps/webapp/test/sentryTraceContext.server.test.ts` — 9 tests covering: helper returns \`undefined\` with no active span; returns \`traceId\`/\`spanId\`/\`sampled=true\` for a recording span; returns \`sampled=false\` for a non-recording span; processor leaves the event unchanged with no active span; processor stamps \`trace_id\`/\`span_id\` onto \`contexts.trace\`; preserves existing \`contexts.trace\` fields; tags \`otel_sampled\` correctly for both sampled and non-sampled cases; never throws if \`@opentelemetry/api\` access throws. - [x] \`pnpm run typecheck --filter webapp\` passes. - [x] Manually verified end-to-end against a sandboxed Sentry project: confirmed both sampled and non-sampled traces correctly populate \`contexts.trace.trace_id\` matching the OTel ids logged from the loader, and the \`otel_sampled\` tag appears with the expected value. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…iggerdotdev#3536) When a webapp API route's catch-all 500 branch handles a non-typed exception, it returns the raw `error.message` to the caller. If the exception originates from an internal subsystem (the ORM client, an infra dependency, etc.) the server-side error string is surfaced verbatim in the response body — exposing implementation details the API surface shouldn't carry. The leak shows up in three shapes across the routes: - `return json({ error: error.message }, { status: 500 })` - `return json({ error: error instanceof Error ? error.message : "Internal Server Error" }, { status: 500 })` - ``return json({ error: `Internal server error: ${error.message}` }, { status: 500 })`` (plus a couple of analogous neverthrow-Result variants on admin routes.) ## Fix Across 19 webapp routes, replace each leaking branch with a generic body (`"Something went wrong"` / `"Internal Server Error"` to match the file's existing fallback) and add `logger.error(...)` so full visibility is preserved server-side. Catch blocks that branch on typed user-input errors (`ServiceValidationError`, `EngineServiceValidationError`, `OutOfEntitlementError`, `PrismaClientKnownRequestError`) are left intact — those messages are constructed deliberately and intended to be customer-facing. ## Test plan - [x] `pnpm run typecheck --filter webapp` - [x] Per-route manual probe: inject a synthetic `Error` at the top of the catch'd `try` block (or fake the wrapped call's rejection / Result error), curl the route with the dev API key, confirm the response body changed from the synthetic message verbatim → generic body. 21/21 leak sites verified end-to-end. - [x] 4xx-typed-error paths spot-checked: throwing `ServiceValidationError` from inside the catch'd try still surfaces its message at 422 as intended.
- defineConfig: resolves maxComputeSeconds ?? maxDuration into maxDuration - new resolveMaxComputeSeconds helper for shared.ts - task definitions, trigger and batchTrigger options funnel through helper Internal references to maxDuration (run engine, queues, DB) are unchanged.
…sting - trigger.config.ts: top-level maxComputeSeconds - example.ts: per-task maxComputeSeconds on maxDurationTask - example.ts: per-trigger override on triggerAndWait Variable name maxDurationTask kept since it labels the legacy fixture concept; the payload.maxDuration field is unrelated to the SDK property and untouched.
If a user exports a trigger.config plain object without going through defineConfig() (TypeScript allows it), validation previously rejected the new maxComputeSeconds field with an error mentioning maxDuration. Mirror the SDK boundary's resolution at the CLI boundary so downstream internals (which still read maxDuration) keep working.
Adds JSDoc to defineConfig function to improve docstring coverage.
| id: release | ||
| uses: softprops/action-gh-release@v1 | ||
| if: github.event_name == 'push' | ||
| uses: softprops/action-gh-release@b4309332981a82ec1c5618f44dd2e27cc8bfbfda # v3.0.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds JSDoc documentation to the defineConfig function to improve docstring coverage to meet the 80% threshold required by CI.
Summary by cubic
Adds JSDoc to
defineConfigand introducesmaxComputeSecondsas a clearer replacement formaxDurationacross config, tasks, and trigger options. Keeps backward compatible behavior and satisfies CI docstring coverage.Migration
maxComputeSecondsindefineConfig, task definitions, and trigger options.maxDurationis JSDoc-deprecated but still accepted; if both are set,maxComputeSecondswins.Bug Fixes
uncaughtExceptioninstead of drifting to max duration, respecting normal retry policy.Written for commit f89e3de. Summary will update on new commits.