fix(webapp): notification style updates#5
Open
deepshekhardas wants to merge 301 commits into
Open
Conversation
This PR was opened by the [Changesets release](https://github.com/changesets/action) GitHub action. When you're ready to do a release, you can merge this and publish to npm yourself or [setup this action to publish automatically](https://github.com/changesets/action#with-publishing). If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated. # Releases ## @trigger.dev/sdk@4.4.0 ### Minor Changes - Added `query.execute()` which lets you query your Trigger.dev data using TRQL (Trigger Query Language) and returns results as typed JSON rows or CSV. It supports configurable scope (environment, project, or organization), time filtering via `period` or `from`/`to` ranges, and a `format` option for JSON or CSV output. ([triggerdotdev#3060](triggerdotdev#3060)) ```typescript import { query } from "@trigger.dev/sdk"; import type { QueryTable } from "@trigger.dev/sdk"; // Basic untyped query const result = await query.execute("SELECT run_id, status FROM runs LIMIT 10"); // Type-safe query using QueryTable to pick specific columns const typedResult = await query.execute<QueryTable<"runs", "run_id" | "status" | "triggered_at">>( "SELECT run_id, status, triggered_at FROM runs LIMIT 10" ); typedResult.results.forEach((row) => { console.log(row.run_id, row.status); // Fully typed }); // Aggregation query with inline types const stats = await query.execute<{ status: string; count: number }>( "SELECT status, COUNT(*) as count FROM runs GROUP BY status", { scope: "project", period: "30d" } ); // CSV export const csv = await query.execute("SELECT run_id, status FROM runs", { format: "csv", period: "7d", }); console.log(csv.results); // Raw CSV string ``` ### Patch Changes - Add `maxDelay` option to debounce feature. This allows setting a maximum time limit for how long a debounced run can be delayed, ensuring execution happens within a specified window even with continuous triggers. ([triggerdotdev#2984](triggerdotdev#2984)) ```typescript await myTask.trigger(payload, { debounce: { key: "my-key", delay: "5s", maxDelay: "30m", // Execute within 30 minutes regardless of continuous triggers }, }); ``` - Aligned the SDK's `getRunIdForOptions` logic with the Core package to handle semantic targets (`root`, `parent`) in root tasks. ([triggerdotdev#2874](triggerdotdev#2874)) - Export `AnyOnStartAttemptHookFunction` type to allow defining `onStartAttempt` hooks for individual tasks. ([triggerdotdev#2966](triggerdotdev#2966)) - Fixed a minor issue in the deployment command on distinguishing between local builds for the cloud vs local builds for self-hosting setups. ([triggerdotdev#3070](triggerdotdev#3070)) - Updated dependencies: - `@trigger.dev/core@4.4.0` ## @trigger.dev/build@4.4.0 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.0` ## trigger.dev@4.4.0 ### Patch Changes - Fix runner getting stuck indefinitely when `execute()` is called on a dead child process. ([triggerdotdev#2978](triggerdotdev#2978)) - Add optional `timeoutInSeconds` parameter to the `wait_for_run_to_complete` MCP tool. Defaults to 60 seconds. If the run doesn't complete within the timeout, the current state of the run is returned instead of waiting indefinitely. ([triggerdotdev#3035](triggerdotdev#3035)) - Fixed a minor issue in the deployment command on distinguishing between local builds for the cloud vs local builds for self-hosting setups. ([triggerdotdev#3070](triggerdotdev#3070)) - Updated dependencies: - `@trigger.dev/core@4.4.0` - `@trigger.dev/build@4.4.0` - `@trigger.dev/schema-to-json@4.4.0` ## @trigger.dev/core@4.4.0 ### Patch Changes - Add `maxDelay` option to debounce feature. This allows setting a maximum time limit for how long a debounced run can be delayed, ensuring execution happens within a specified window even with continuous triggers. ([triggerdotdev#2984](triggerdotdev#2984)) ```typescript await myTask.trigger(payload, { debounce: { key: "my-key", delay: "5s", maxDelay: "30m", // Execute within 30 minutes regardless of continuous triggers }, }); ``` - Fixed a minor issue in the deployment command on distinguishing between local builds for the cloud vs local builds for self-hosting setups. ([triggerdotdev#3070](triggerdotdev#3070)) - fix: vendor superjson to fix ESM/CJS compatibility ([triggerdotdev#2949](triggerdotdev#2949)) Bundle superjson during build to avoid `ERR_REQUIRE_ESM` errors on Node.js versions that don't support `require(ESM)` by default (< 22.12.0) and AWS Lambda which intentionally disables it. - Add Vercel integration support to API schemas: `commitSHA` and `integrationDeployments` on deployment responses, and `source` field for environment variable imports. ([triggerdotdev#2994](triggerdotdev#2994)) ## @trigger.dev/python@4.4.0 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.0` - `@trigger.dev/sdk@4.4.0` - `@trigger.dev/build@4.4.0` ## @trigger.dev/react-hooks@4.4.0 ### Patch Changes - Fix `onComplete` callback firing prematurely when the realtime stream disconnects before the run finishes. ([triggerdotdev#2929](triggerdotdev#2929)) - Updated dependencies: - `@trigger.dev/core@4.4.0` ## @trigger.dev/redis-worker@4.4.0 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.0` ## @trigger.dev/rsc@4.4.0 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.0` ## @trigger.dev/schema-to-json@4.4.0 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.0` --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
<!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/triggerdotdev/trigger.dev/pull/2985"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end -->
- Adds an end-to-end OTEL metrics pipeline: task workers collect and export metrics via OpenTelemetry, the webapp ingests them into ClickHouse, and they're queryable through the existing dashboard query engine - Workers emit process CPU/memory metrics (via `@opentelemetry/host-metrics`) and Node.js runtime metrics (event loop utilization, event loop delay, heap usage) - Users can create custom metrics in their tasks via `otel.metrics.getMeter()` from `@trigger.dev/sdk` - Metrics are automatically tagged with run context (run ID, task slug, machine, worker version) so they can be sliced per-run, per-task, or per-machine - The TSQL query engine gains metrics table support with typed attribute columns, `prettyFormat()` for human-readable values, and per-schema time bucket thresholds - Includes reference tasks (`references/hello-world/src/trigger/metrics.ts`) demonstrating CPU-intensive, memory-ramp, bursty workload, and custom metrics patterns ## What changed ### Metrics collection (packages/core, packages/cli-v3) - **Metrics export pipeline** — `TracingSDK` now sets up a `MeterProvider` with a `PeriodicExportingMetricReader` that chains through `TaskContextMetricExporter` (adds run context attributes) and `BufferingMetricExporter` (batches exports to reduce overhead) - **Host metrics** — Enabled `@opentelemetry/host-metrics` for process CPU, memory, and system-level metrics - **Node.js runtime metrics** — New `nodejsRuntimeMetrics.ts` module using `performance.eventLoopUtilization()`, `monitorEventLoopDelay()`, and `process.memoryUsage()` to emit 6 observable gauges - File system and diskio metrics - **Custom metrics** — Exposed `otel.metrics` from `@trigger.dev/sdk` so users can create counters, histograms, and gauges in their tasks - **Machine ID** — Stable per-worker machine identifier for grouping metrics - **Dev worker** — Drops `system.*` metrics to reduce noise, keeps sending metrics between runs in warm workers ### Metrics ingestion (apps/webapp) - **OTEL endpoint** — `otel.v1.metrics.ts` accepts OTEL metric export requests (JSON and protobuf), converts to ClickHouse rows - **ClickHouse schema** — `017_create_metrics_v1.sql` with 10-second aggregation buckets, JSON attributes column, 60-day TTLs ### Query engine (internal-packages/tsql, apps/webapp) - **Metrics query schema** — Typed columns for metric attributes (`task_identifier`, `run_id`, `machine_name`, `worker_version`, etc.) extracted from the JSON attributes column - **`prettyFormat()`** — TSQL function that annotates columns with format hints (`bytes`, `percent`, `durationSeconds`) for frontend rendering without changing the underlying data - **Per-schema time buckets** — Different tables can define their own time bucket thresholds (metrics uses tighter intervals than runs) - **AI query integration** — The AI query service knows about the metrics table and can generate metric queries - **Chart improvements** — Better formatting for byte values, percentages, and durations in charts and tables ### Reference project - **`references/hello-world/src/trigger/metrics.ts`** — 6 example tasks: `cpu-intensive`, `memory-ramp`, `bursty-workload`, `sustained-workload`, `concurrent-load`, `custom-metrics` ## Test plan - [ ] Build all packages and webapp - [ ] Start dev worker with hello-world reference project - [ ] Run `cpu-intensive`, `memory-ramp`, and `custom-metrics` tasks - [ ] Verify metrics in ClickHouse: `SELECT DISTINCT metric_name FROM metrics_v1` - [ ] Query via dashboard AI: "show me CPU utilization over time" - [ ] Verify `prettyFormat` renders correctly in chart tooltips and table cells - [ ] Confirm dev worker drops `system.*` metrics but keeps `process.*` and `nodejs.*`
…ookie call (triggerdotdev#3104) Co-authored-by: Oskar Otwinowski <oskar.otwinowski@gmail.com>
…-from-v3 (triggerdotdev#3098) Adds a deprecation warning at the top of the migrating-from-v3 page and updates the “Migrate using AI” prompt and intro
…ctory warning (triggerdotdev#3097) Adds a direct Vercel Marketplace link, documents configuring build options via the project config page, and adds a warning and workaround for projects using a Vercel Root Directory
This PR was opened by the [Changesets release](https://github.com/changesets/action) GitHub action. When you're ready to do a release, you can merge this and publish to npm yourself or [setup this action to publish automatically](https://github.com/changesets/action#with-publishing). If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated. # Releases ## @trigger.dev/build@4.4.1 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.1` ## trigger.dev@4.4.1 ### Patch Changes - Add OTEL metrics pipeline for task workers. Workers collect process CPU/memory, Node.js runtime metrics (event loop utilization, event loop delay, heap usage), and user-defined custom metrics via `otel.metrics.getMeter()`. Metrics are exported to ClickHouse with 10-second aggregation buckets and 1m/5m rollups, and are queryable through the dashboard query engine with typed attribute columns, `prettyFormat()` for human-readable values, and AI query support. ([triggerdotdev#3061](triggerdotdev#3061)) - Updated dependencies: - `@trigger.dev/build@4.4.1` - `@trigger.dev/core@4.4.1` - `@trigger.dev/schema-to-json@4.4.1` ## @trigger.dev/python@4.4.1 ### Patch Changes - Updated dependencies: - `@trigger.dev/sdk@4.4.1` - `@trigger.dev/build@4.4.1` - `@trigger.dev/core@4.4.1` ## @trigger.dev/react-hooks@4.4.1 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.1` ## @trigger.dev/redis-worker@4.4.1 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.1` ## @trigger.dev/rsc@4.4.1 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.1` ## @trigger.dev/schema-to-json@4.4.1 ### Patch Changes - Updated dependencies: - `@trigger.dev/core@4.4.1` ## @trigger.dev/sdk@4.4.1 ### Patch Changes - Add OTEL metrics pipeline for task workers. Workers collect process CPU/memory, Node.js runtime metrics (event loop utilization, event loop delay, heap usage), and user-defined custom metrics via `otel.metrics.getMeter()`. Metrics are exported to ClickHouse with 10-second aggregation buckets and 1m/5m rollups, and are queryable through the dashboard query engine with typed attribute columns, `prettyFormat()` for human-readable values, and AI query support. ([triggerdotdev#3061](triggerdotdev#3061)) - Updated dependencies: - `@trigger.dev/core@4.4.1` ## @trigger.dev/core@4.4.1 --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Small fixes and improvements to the logs page: - Clicking the Run ID didn't open inspector - Swapped the "open link in tab" icon with Runs icon - Prevent tooltip hovering on Level info <img width="350" height="206" alt="CleanShot 2026-02-20 at 10 00 37@2x" src="https://github.com/user-attachments/assets/3e82f24a-c0a1-4c01-a8e9-9e06a8af982a" />
…rdotdev#3113) Without doing an expensive query we can’t tell if it’s definitely a v3 projects – like getting run counts. So let’s just assume if the project hasn’t been upgraded to v4 (by running dev/deploy CLI with v4) AND the project is older than the v4 release then it’s v3.
…#3108) ## ✅ Checklist - [x] I have followed every step in the [contributing guide](https://github.com/triggerdotdev/trigger.dev/blob/main/CONTRIBUTING.md) - [x] The PR title follows the convention. - [x] I ran and tested the code works --- ## Testing Slack + GitHub + Vercel + Builds + Deployments --- ## Changelog Settings changes: - Split general from integrations - Add new Slack section to org level integrations Vercel improvements: - bugfix for TRIGGER_SECRET_KEY collision - onboarding improvements for connecting to projects - new loops event Slack improvements: - nicer alerts Webhook/Email alerts: - rich events with Github & Vercel integration data --- ## Screenshots <img width="2550" height="652" alt="Screenshot 2026-02-20 at 21 53 34" src="https://github.com/user-attachments/assets/8d7c9f1d-5fe9-4516-8fb3-885460b4207f" /> <img width="843" height="710" alt="Screenshot 2026-02-23 at 10 55 54" src="https://github.com/user-attachments/assets/8ea72c1f-431b-493c-b9a9-8076cce12262" /> <img width="765" height="466" alt="Screenshot 2026-02-20 at 21 52 46" src="https://github.com/user-attachments/assets/157fafb8-b7bf-499d-8953-c2aed5e44ce0" /> <img width="691" height="261" alt="Screenshot 2026-02-20 at 22 04 24" src="https://github.com/user-attachments/assets/3aea7369-2008-4af8-a9c0-5fbfa2cc381d" /> <img width="2032" height="1114" alt="Screenshot 2026-02-19 at 14 48 49" src="https://github.com/user-attachments/assets/dc10c14e-cd15-445a-b5be-d694d29d20e5" /> <img width="2032" height="1114" alt="Screenshot 2026-02-19 at 14 49 04" src="https://github.com/user-attachments/assets/1ef591fd-fd00-430a-9649-8b18cff9586d" /> <img width="1583" height="1115" alt="Screenshot 2026-02-19 at 17 32 56" src="https://github.com/user-attachments/assets/c5c8f318-d193-4dd4-86f7-1cc4bbcc4e0c" /> <img width="422" height="187" alt="Screenshot 2026-02-20 at 21 57 41" src="https://github.com/user-attachments/assets/37865cb6-4c0d-40ef-9c60-7b057d546c61" /> <img width="1583" height="1115" alt="Screenshot 2026-02-19 at 17 33 06" src="https://github.com/user-attachments/assets/e9180e8e-e611-4734-9232-80c62ff863ad" /> 💯
…aitpoint creation (triggerdotdev#2980) This PR implements a new run TTL system and queue size limits to prevent unbounded queue growth which should help prevent situations where queues enter a "death spiral" where the queue will never be able to catch up. The main/correct way to battle this situation is to enforce a maximum TTL on all runs (e.g. up to 14 days) where runs that have been queued for that maximum TTL will get auto-expired, making room for newer runs to execute. This required creating a new TTL system that can handle higher workloads and is now deeply integrated into the RunQueue. When runs are enqueued with a TTL, they are added to their normal queue as well as to the TTL queue. When runs are dequeued, they are removed from both their normal queue and the TTL queue. If runs are dequeued by the TTL system, they are removed from their normal queue. Both these dequeues happen automatically so there is no race condition. The TTL expiration system is also made reliable by expiring runs via a Redis worker, which is enqueued to atomically inside the TTL dequeue lua script. ### Optional associated waitpoints Additionally, this PR implements an optimization where runs that aren't triggered with a dependent parent run will no longer create an associated waitpoint. Associated waitpoints are then lazily created if a dependent run wants to wait for the child run post-facto (via debounce or idempotency), which is a rare situation but is possible. This means fewer waitpoint creations but also fewer waitpoint completions for runs with no dependencies. ### Environment Queue Limits Prevents any single queue growing too large by enforcing queue size limits at trigger time. - Queue size checks happen at trigger time - runs are rejected if queue would exceed limit - Dashboard UI shows queue limits on both the Queues page and a new Limits page - In-memory caching for queue size checks to reduce Redis load ### Batch trigger fixes Currently when a batch item cannot be created for whatever reason (e.g. queue limits) the run will never get created, which means a stalled run if using `batchTriggerAndWait`. We've updated the system to handle this differently: now when a batch item cannot be triggered and converted into a run, we will eventually (after retrying 8 times up to 30s) we will create a "pre-failed" run with the error details, correctly resolving the batchTriggerAndWait.
…rom the whole system (triggerdotdev#3115)
…tdev#3119) Adds a warning to the onCancel docs clarifying that the hook only fires when a run is actively executing
Adds API reference pages for three previously undocumented run endpoints: retrieve run events, retrieve run trace, and add tags to a run.
Adds OpenAPI specs and sidebar pages for four previously undocumented public endpoints: retrieve run result, per-task batch trigger, retrieve batch, and retrieve batch results.
What changed - Fixed some functions like dateAdd, toString, ifNotFinite - Removed all functions that accept lambdas as they're not supported (yet) - Added tests for all TRQL functions that use ClickHouse
<!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/triggerdotdev/trigger.dev/pull/3030" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end -->
…lease PR (triggerdotdev#3085) - Add .server-changes/ convention for tracking server-only changes - Create scripts/enhance-release-pr.mjs to deduplicate and categorize changeset PR body - Create scripts/generate-github-release.mjs to format unified GitHub release body - Change release.yml to create one unified GitHub release instead of per-package releases - Add update-release job to patch Docker image link after images are pushed to GHCR - Update changesets-pr.yml to trigger on .server-changes, enhance PR body, and clean up consumed files - Document server changes in CLAUDE.md, CONTRIBUTING.md, CHANGESETS.md, and RELEASE.md
…entifier instead of unknown (triggerdotdev#3080) Fixes triggerdotdev#2942
…erdotdev#3126) - Fix for series color assignment being out of sync with the graph (ensures added series colors match their graph representation) - Reload widgets when returning to screen if props changed (prevents stale widgets after filtering and scrolling)
…f and fixing retry race (triggerdotdev#3079) Fix slow fair queue processing by removing spurious cooloff on concurrency blocks and fixing a race condition where retry attempt counts were not atomically updated during message re-queue. Removed cooloff entirely from the batch queue
Adds zizmor alongside the actionlint job from triggerdotdev#3503. Both now run as parallel jobs in a single `.github/workflows/workflow-checks.yml`, triggered on `.github/workflows/**` and `.github/actions/**` changes. Zizmor is configured with `unpinned-uses: hash-pin` policy via `.github/zizmor.yml`, so any future unpinned action will fail CI. Findings upload SARIF to the Security tab alongside CodeQL. Bulk of the diff is cleanup of the findings zizmor surfaced on first run. `zizmor --fix=all` handled most of them mechanically; the rest were judgment calls.
## Summary Move from a single shared S2 basin to **per-org basins** with retention tied to the org's billing plan. Stops S2 from deleting streams out from under live chat sessions when basin retention fires before the chat ends, and unlocks per-org cost attribution. OSS / s2-lite installs are unaffected: provisioning is gated by `REALTIME_STREAMS_PER_ORG_BASINS_ENABLED` (default `false`), and the read precedence falls back to the global basin env var when an entity has no stamped basin. ``` basin = run.streamBasinName ?? session.streamBasinName ?? env.REALTIME_STREAMS_S2_BASIN ``` ## Design Three nullable `streamBasinName` columns (`Organization`, `TaskRun`, `Session`) plus a provisioner that idempotently creates the basin and reconfigures retention on plan changes. The trigger and session-create paths stamp the org's basin onto new rows; the realtime read path picks the basin from the entity context. Admin routes back-fill existing orgs and force-reconfigure a single org. ## Test plan - [x] `pnpm run typecheck --filter webapp --filter @internal/run-engine` - [x] Backfill admin route end-to-end (provision + DB stamp + S2 basin config). - [x] Reconfigure on plan change (all retention tiers). - [x] chat.agent multi-turn drives streams into the per-org basin. - [x] Legacy fallback when entity has no stamped basin. - [x] Provisioner is a no-op when the flag is off.
triggerdotdev#3524) Fixes triggerdotdev#3520. The bundled bitnami clickhouse subchart was pinned at `9.3.7` (clickhouse `25.6.1-debian-12-r0`), which hits a memory-tracker accounting bug under sustained ingest - the global counter overflows to ~7 EiB and every query gets rejected by OvercommitTracker until the pod is restarted. Self-hosters running 4.0.5 through 4.4.5 are exposed regardless of chart version since the subchart pin hadn't moved. Bumping to `9.4.4` (clickhouse `25.7.5-debian-12-r0`) pulls in the 25.7.x memory-tracker fixes. This is also the latest publicly packaged release at `oci://registry-1.docker.io/bitnamicharts` - that registry has been frozen since 2025-08-28 (Bitnami catalog changes), but the chart source remains under Apache 2 on `bitnami/charts`. The image continues to resolve via `bitnamilegacy/clickhouse` per the existing `values.yaml` override, since `bitnami/clickhouse` itself moved to paid-only. Verified locally: `helm dependency update` + `helm lint` + `helm template` + kubeconform across all 57 rendered manifests. Rendered statefulset image is `docker.io/bitnamilegacy/clickhouse:25.7.5-debian-12-r0`.
…rdotdev#3523) `dac9c83bd` added `ignoreErrors: /^ServiceValidationError(?::|$)/` in `apps/webapp/sentry.server.ts` to drop SVEs before they reach Sentry. The filter only matches when the captured event's *type* is `ServiceValidationError`, but nine call sites in the webapp catch SVE (and analogous user-input error types — `OutOfEntitlementError`, `CreateDeclarativeScheduleError`, `QueryError`) and call `logger.error("wrapper message", { error: e })` *before* the type check. The captured event is then titled with the wrapper message, with the inner error buried in `extra.error` — invisible to the SDK filter. Result: a steady stream of expected user-input failures escalating as `error`-level events when they should be `warn`. Each catch block now type-discriminates first, logs expected types at `warn`, and keeps unknown-error fall-throughs at `error`. For service sites that wrap into SVE (`createBackgroundWorker`, `createDeploymentBackgroundWorkerV4`), the inner error is logged at `error` before wrapping — mirrors the `waitpointCompletionPacket.server.ts` pattern from `dac9c83bd`. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… create (triggerdotdev#3525) Large deploys (projects with many tasks or source files) blocked the webapp event loop for several seconds inside Prisma's client-side serializer on `BackgroundWorker.create`, tail-latencying every other in-flight request on the same Node process. The `metadata` JSON column was being written with the full deploy manifest — every task's config, every queue and prompt, and the full source of every file — all of which already live on dedicated columns or in dedicated tables. Fix: project the manifest to `{ packageVersion, contentHash, tasks: [{ id, filePath, schedule }] }` on insert. The only post-write read site is `changeCurrentDeployment`, which feeds `tasks[].schedule` into `syncDeclarativeSchedules` at deploy promotion. The retained top-level keys and per-task `filePath` are kept solely so `BackgroundWorkerMetadata.safeParse` still succeeds on read. ## Test plan - [ ] Deploy a project with declarative schedules; verify schedules are created on first deploy - [ ] Modify / remove schedules across subsequent deploys; verify sync - [ ] Roll back to a previous deploy; verify `changeCurrentDeployment` re-syncs schedules - [ ] Inspect `BackgroundWorker.metadata` on a fresh deploy — should be a small object, not the full manifest
…dev#3528) - Tags webapp images by full commit SHA on `main` pushes (`ghcr.io/triggerdotdev/trigger.dev:<sha>`) so any commit can be resolved to a digest easily. - Adds OCI labels (`source`, `revision`, `version`, `created`) so `docker inspect`, vulnerability scanners, and registry browsers see source/commit/version directly. - Signs each pushed digest with SLSA build provenance via `actions/attest-build-provenance@v4.1.0` (pinned by SHA), enabling `gh attestation verify oci://...` against the source commit and workflow.
…xDuration (TRI-9117) (triggerdotdev#3529) When a Node EventEmitter (e.g. node-redis) emits an "error" event with no listener attached, Node escalates it to process.on("uncaughtException") in the task worker. The worker reported the error via the UNCAUGHT_EXCEPTION IPC event but did not exit, and the supervisor-side handler in taskRunProcess only logged the message at debug level — leaving the run() promise orphaned until maxDuration fired and producing empty attempts (durationMs=0, costInCents=0). The supervisor now rejects the in-flight attempt with an UncaughtExceptionError and gracefully terminates the worker (preserving the OTEL flush window) on UNCAUGHT_EXCEPTION. The attempt fails fast with TASK_EXECUTION_FAILED, surfacing the original error name, message, and stack trace, and falls under the normal retry policy. This mirrors the existing indexing-side behavior in indexWorkerManifest. Apply the same handling to unhandled promise rejections, which Node already routes through uncaughtException by default.
…otdev#3532) ## Summary Both Claude Code workflows (`claude.yml` and `claude-md-audit.yml`) authenticated via `CLAUDE_CODE_OAUTH_TOKEN`, which broke when the org disabled Claude subscription access for Claude Code: > Your organization has disabled Claude subscription access for Claude Code · Use an Anthropic API key instead, or ask your admin to enable access This switches both workflows to `anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}` (secret already added to the repo). ## Test plan - [ ] Confirm `📝 CLAUDE.md Audit` runs to completion on this PR - [ ] Confirm `@claude` mention in a PR comment still triggers the `Claude Code` workflow successfully
…dotdev#3531) ## Summary Stamps the active OpenTelemetry `trace_id` and `span_id` onto every Sentry event captured from the webapp, so engineers can copy a `trace_id` from a Sentry issue and search for the corresponding trace in any OTel-aware backend. Also adds an `otel_sampled` tag to indicate whether the trace was head-sampled — a cheap signal for whether the link will resolve to span data or hit a missing trace. ## Why Sentry and OTel were OTel-disconnected: `apps/webapp/sentry.server.ts` initialised Sentry with `skipOpenTelemetrySetup: true`, and no error-capture site (`logger.server.ts`, the Remix-wrapped `handleError`, the root `ErrorBoundary`) attached OTel context to the event. With many spans/sec across services, getting from a Sentry issue to its trace was guesswork. ## Approach Single global Sentry event processor, registered immediately after `Sentry.init`. On each event it reads `trace.getActiveSpan()?.spanContext()` via `@opentelemetry/api`, then writes: - `event.contexts.trace.trace_id` and `event.contexts.trace.span_id` (Sentry's native trace context fields) - `event.tags.otel_sampled` = `"true"` | `"false"` (derived from `traceFlags`) If no active span (module-load errors, scheduled timers without a context, primary cluster process), the processor returns the event unmodified — Sentry's default propagation context fills in. Implementation is co-located in `apps/webapp/sentry.server.ts` (no separate helper module — `sentry.server.ts` is built standalone by esbuild and a separate import would have required a new bundling step). Helper functions are exported so the unit tests can reach them without re-running `Sentry.init`. ## Non-goals (deliberate) - No sample rate change. ~95% of Sentry events will carry a `trace_id` that returns no spans in the tracing backend (head-sampled out). The `otel_sampled` tag makes that obvious at a glance. Raising find-rate is a separate conversation with cost trade-offs. - No user/org tags or `Sentry.setUser` (would need auth-helper + per-request scope wiring across multiple worker entrypoints — separate ticket). - Webapp image only. No changes to supervisor or CLI workers. ## Test plan - [x] Unit tests in `apps/webapp/test/sentryTraceContext.server.test.ts` — 9 tests covering: helper returns \`undefined\` with no active span; returns \`traceId\`/\`spanId\`/\`sampled=true\` for a recording span; returns \`sampled=false\` for a non-recording span; processor leaves the event unchanged with no active span; processor stamps \`trace_id\`/\`span_id\` onto \`contexts.trace\`; preserves existing \`contexts.trace\` fields; tags \`otel_sampled\` correctly for both sampled and non-sampled cases; never throws if \`@opentelemetry/api\` access throws. - [x] \`pnpm run typecheck --filter webapp\` passes. - [x] Manually verified end-to-end against a sandboxed Sentry project: confirmed both sampled and non-sampled traces correctly populate \`contexts.trace.trace_id\` matching the OTel ids logged from the loader, and the \`otel_sampled\` tag appears with the expected value. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…iggerdotdev#3536) When a webapp API route's catch-all 500 branch handles a non-typed exception, it returns the raw `error.message` to the caller. If the exception originates from an internal subsystem (the ORM client, an infra dependency, etc.) the server-side error string is surfaced verbatim in the response body — exposing implementation details the API surface shouldn't carry. The leak shows up in three shapes across the routes: - `return json({ error: error.message }, { status: 500 })` - `return json({ error: error instanceof Error ? error.message : "Internal Server Error" }, { status: 500 })` - ``return json({ error: `Internal server error: ${error.message}` }, { status: 500 })`` (plus a couple of analogous neverthrow-Result variants on admin routes.) ## Fix Across 19 webapp routes, replace each leaking branch with a generic body (`"Something went wrong"` / `"Internal Server Error"` to match the file's existing fallback) and add `logger.error(...)` so full visibility is preserved server-side. Catch blocks that branch on typed user-input errors (`ServiceValidationError`, `EngineServiceValidationError`, `OutOfEntitlementError`, `PrismaClientKnownRequestError`) are left intact — those messages are constructed deliberately and intended to be customer-facing. ## Test plan - [x] `pnpm run typecheck --filter webapp` - [x] Per-route manual probe: inject a synthetic `Error` at the top of the catch'd `try` block (or fake the wrapped call's rejection / Result error), curl the route with the dev API key, confirm the response body changed from the synthetic message verbatim → generic body. 21/21 leak sites verified end-to-end. - [x] 4xx-typed-error paths spot-checked: throwing `ServiceValidationError` from inside the catch'd try still surfaces its message at 422 as intended.
## Lots of filter UX improvements across lots of routes ### General - Promoted important filters out of the "More filters" so they're always visible - SearchInput primitive is now reusable and Esc now clears the field (AI filter input also clears with Esc) - Tooltips + keyboard shortcuts on every primary filter button - Brighter text on selected filter items / queue items - Filter dropdowns reordered for better hierarchy - Removed debounce on Tasks page search for faster filtering ### Tasks page search - Esc now clears the field - ENTER submits a search to improve performance when you have lots of tasks https://github.com/user-attachments/assets/4b30521e-dbc4-4468-b2af-8c85bdfb9002 ### Runs filters - Moves Status and Tasks out of the More filters menu - "Root only" toggle is set to false when you filter for a Task. This state isn't stored and flips back to the stored value if filters are cleared <img width="1690" height="986" alt="CleanShot 2026-04-26 at 19 24 08@2x" src="https://github.com/user-attachments/assets/b07da73c-140e-451f-a7bf-c32129317f63" /> ### Batches filters - General consistency improvements <img width="1429" height="948" alt="CleanShot 2026-05-08 at 09 50 35" src="https://github.com/user-attachments/assets/e5ec267f-2aa3-43ef-991e-93bf01bdaea5" /> ### Schedules - General consistency improvements <img width="1567" height="1141" alt="CleanShot 2026-05-08 at 09 51 11" src="https://github.com/user-attachments/assets/34b7da88-87c6-4e4d-a70f-fe13ea9f87ec" /> ### Queues - General consistency improvements <img width="824" height="416" alt="CleanShot 2026-05-08 at 09 52 02" src="https://github.com/user-attachments/assets/b4adc102-8192-4a68-b199-a175c2645a6c" /> ### Waitpoint tokens - General consistency improvements <img width="941" height="363" alt="CleanShot 2026-05-08 at 09 52 19" src="https://github.com/user-attachments/assets/d43aeb3f-7f80-454d-b183-fd077a4e3ff7" /> ### Models - General consistency improvements <img width="1570" height="509" alt="CleanShot 2026-05-08 at 09 53 17" src="https://github.com/user-attachments/assets/066d7646-4672-4cae-8ec0-e30a82889914" /> ### AI metrics - General consistency improvements <img width="1568" height="624" alt="CleanShot 2026-05-08 at 09 53 43" src="https://github.com/user-attachments/assets/fdfc4806-26fa-458d-a5ed-5c226b3bbc9f" /> ### Logs - General consistency improvements <img width="1267" height="752" alt="CleanShot 2026-05-08 at 09 54 30" src="https://github.com/user-attachments/assets/3e9ba871-b9dd-490e-aded-5d87134fd2bb" /> ### Errors - General consistency improvements <img width="1568" height="670" alt="CleanShot 2026-05-08 at 09 54 50" src="https://github.com/user-attachments/assets/fdda027a-e24f-4804-b4bb-203a6c2db960" /> ### Query - General consistency improvements - History, Scope, Triggered (date) filters all have shortcut tooltips - Scope filter now reuses the metrics ScopeFilter component <img width="1566" height="716" alt="CleanShot 2026-05-08 at 09 55 22" src="https://github.com/user-attachments/assets/0130b4a2-9daf-4edc-bada-3380aff4022a" /> ### Dashboards - General consistency improvements - Scope filter gets nicer icons and a shortcut - Nice icons for the Scope menu items <img width="1567" height="769" alt="CleanShot 2026-05-08 at 09 56 10" src="https://github.com/user-attachments/assets/7bea25f7-6c33-4d4a-a36d-3a1cb56afe09" /> ### Custom dashboard - General consistency improvements - Add chart, Add title, and the kebab menu now have tooltips + shortcuts <img width="1566" height="782" alt="CleanShot 2026-05-08 at 09 58 11" src="https://github.com/user-attachments/assets/9df4db25-b2c0-43a2-b92f-00256337d5a9" /> ### Environment variables - General consistency improvements <img width="1569" height="930" alt="CleanShot 2026-05-08 at 09 58 55" src="https://github.com/user-attachments/assets/26e614b4-88e7-400b-aa6d-a96bad488fb8" /> ### Preview branches - General consistency improvements <img width="1570" height="986" alt="CleanShot 2026-05-08 at 09 59 17" src="https://github.com/user-attachments/assets/57a2b939-3670-4252-ab2c-d6dc65bdda1b" />
…tdev#3534) ## Summary Adds a Redis pub/sub reload path to the webapp's in-memory LLM pricing registry. When enabled on a process, the registry reloads from the database whenever a publish lands on the configured channel — instead of waiting for the existing 5-minute interval. Lets pricing/model changes propagate to cost enrichment within seconds. Subscription is **off by default** and opt-in per process. Only OTel-ingesting services need real-time freshness; dashboard and worker services run fine on the periodic interval and shouldn't pile onto each publish with a full-table reload. ## Design When `LLM_PRICING_RELOAD_PUBSUB_ENABLED=true`, subscribes via `createRedisClient` against `COMMON_WORKER_REDIS_*` and listens on `LLM_PRICING_RELOAD_CHANNEL` (default `llm-registry:reload`). The 5-minute periodic reload stays as a backstop, and a SIGTERM/SIGINT handler closes the subscription cleanly. The publisher side lives outside this PR — any process running in the same Redis namespace can trigger a reload by `PUBLISH llm-registry:reload <anything>`. Includes a `.server-changes/` note for the changelog. ### Debounced reload Bursts of publishes are coalesced. The first publish schedules a reload at T+`LLM_PRICING_RELOAD_DEBOUNCE_MS` (default 1s); subsequent publishes during that window are no-ops because the trailing reload picks up everything when it queries the DB. Bounds reload rate to at most 1 per debounce window regardless of publisher chattiness, so a runaway upstream publisher can't fan out into a flood of full-table-scan reloads. ## Test plan - [ ] With `LLM_PRICING_RELOAD_PUBSUB_ENABLED=false` (default): `redis-cli PUBSUB NUMSUB llm-registry:reload` returns `0` while the webapp is up - [ ] With it set to `true`: returns `>= 1` - [ ] `redis-cli PUBLISH llm-registry:reload test` returns `1` (one subscriber received) on a subscribed process - [ ] Mutate an `LlmModel` row externally, publish on the channel, observe the registry's match() picks up the change without waiting for the 5-min tick - [ ] Publish 100x in rapid succession; confirm only one reload fires within the debounce window
…ze (triggerdotdev#3538) ## Summary - Run-view inspector panel was glitching out on Firefox: visual flicker on close, locking up at min size, and intermittent `panelHasSpace` invariant errors. Root cause is the underlying `react-window-splitter` library's collapse animation, which uses `@react-spring/rafz` and interacts poorly with Firefox. - Disabled the library's collapse animation on Firefox only, app-wide (every consumer of `RESIZABLE_PANEL_ANIMATION`). Chromium and Safari behaviour is unchanged. ## Changes - **Firefox animation skip** in `RESIZABLE_PANEL_ANIMATION` — UA-detected at module load, resolves to `undefined` for Firefox so the library's animation actor completes in one frame instead of running its rAF loop. - **Inspector min raised 50px → 250px** so dragging can't shrink the panel into a near-useless width. - **`autosaveId` bumped `v2` → `v3`** to invalidate stale persisted snapshots (the library has a `// TODO` branch that ignores prop changes for already-registered panels, so existing users would otherwise still see the old 50px min). - **`react-window-splitter` pinned** to exact `0.4.1` to protect the patch from drifting if line offsets change in a patch release. - **Two hunks added to the existing `@window-splitter/state` patch:** - Removed the library's auto-collapse-on-drag block entirely. Every collapsible panel in the app is parent-controlled, and that block was triggering state-machine deadlocks when handlers were no-ops. Drag-to-collapse is now disabled across the app; collapse is only triggered explicitly (close button, ESC, URL change, etc.). - In `getDeltaForEvent`, fall back to the panel's `default` before its `min` when expanding — so the first ever click on a span opens the inspector at 500px, not 250px. ## Local testing confirmed - [x] Firefox: open a run, click various spans → panel opens instantly at 500px, drags freely between 250px and max, closes instantly to 0. No console errors. - [x] Chrome/Chromium: same flow, but with smooth open/close animation as before. - [x] Safari: same as Chrome. - [x] Reload mid-session → panel restores cleanly to the dragged size. - [x] Other resizable panels in the app (logs, deployments, schedules, batches, bulk-actions, runs index) still animate on Chromium/Safari. ## Notes - Linear: TRI-8584 - Branch contains intermediate commits exploring an unsuccessful snapshot-validator approach; they're reverted by the final commit. Cumulative diff is 6 files. Squash on merge if you'd prefer a clean history. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…over (triggerdotdev#3548) ## Summary During an ElastiCache role swap (failover) or node-type change (vertical scale), the ioredis TCP/TLS connection stays open but the server starts answering with `READONLY` (the client is talking to a node that became a replica) or `LOADING` (node still loading data from disk). Without an explicit hook, those errors surface to caller code as `ReplyError` instances — every write op on the affected connection fails until the cluster fully cuts over. This PR adds `reconnectOnError` to every prod ioredis client so the disconnect + reconnect + retry cycle absorbs these errors and caller code never sees them. ## Fix ```ts export function defaultReconnectOnError(err: Error): boolean | 1 | 2 { const msg = err.message ?? ""; if (msg.startsWith("READONLY") || msg.startsWith("LOADING")) return 2; return false; } ``` Returning `2` tells ioredis to disconnect, reconnect, and re-issue the failed command. After reconnect, DNS / SG state routes the new socket to a writable node. The helper lives in `@internal/redis` and is wired into both the shared `createRedisClient` (which covers RunQueue, schedule-engine, redis-worker, and every other internal-package consumer) and the direct `new Redis(...)` call sites in the webapp. V1-only marqs files are intentionally not migrated. ## Test plan - [x] `pnpm run typecheck --filter webapp` - [x] `pnpm run typecheck --filter @internal/run-engine` - [x] Verified end-to-end against a live ElastiCache vertical-scale event — caller-surfaced errors went from tens of thousands during the cutover window down to a handful per ioredis client - [ ] Confirm steady-state behavior unchanged after deploy
…riggerdotdev#3549) ## Summary When ElastiCache demotes a primary to replica — during a Multi-AZ failover or a vertical node-type change — the demoting primary issues an `UNBLOCKED` reply to any in-flight blocking commands (`BLPOP`, `BRPOP`, `BLMOVE`, `XREADGROUP ... BLOCK`, etc.) to clear them before the role flips. ioredis surfaces these as `ReplyError` to caller code. The shared `defaultReconnectOnError` added in triggerdotdev#3548 only matches `READONLY` and `LOADING`. This extends it to `UNBLOCKED` so the disconnect-reconnect-retry cycle handles BLPOP-shaped errors the same way the existing two cases handle non-blocking-command errors. ## Fix ```ts export function defaultReconnectOnError(err: Error): boolean | 1 | 2 { const msg = err.message ?? ""; if ( msg.startsWith("READONLY") || msg.startsWith("LOADING") || msg.startsWith("UNBLOCKED") ) { return 2; } return false; } ``` Returning `2` tells ioredis to disconnect, reconnect, and re-issue the command. For a BLPOP that means a fresh BLPOP against the new primary instead of the `UNBLOCKED` error escaping to the caller. ## Test plan - [ ] CI green - [ ] Trigger a Multi-AZ failover or a vertical scale event on an ElastiCache replication group whose clients are running blocking commands and confirm no `UNBLOCKED` errors surface to caller code during the cutover.
…h rate limit (triggerdotdev#3475) ## Summary - Adds admin-only editors on the back-office org page for `Organization.maximumProjectCount` and `Organization.batchRateLimitConfig`, alongside the existing API rate limit editor. - Splits the back-office org page into per-section components (`ApiRateLimitSection`, `BatchRateLimitSection`, `MaxProjectsSection`) so each tool is self-contained — adding new sections later doesn't bloat the route. - Generalizes the rate-limit form into a reusable `RateLimitSection` component + `RateLimitDomain` server config so API and batch share the same UI, validation, and action handler. Each domain only owns its env defaults, DB column, and logger key. - "Saved." banner and validation errors are scoped to the section that submitted, not the page. Heads-up: the API rate-limit log key was renamed `admin.backOffice.rateLimit` → `admin.backOffice.apiRateLimit` for symmetry with the new `admin.backOffice.batchRateLimit`. ## Test plan - [ ] As an admin, visit `/admin/back-office/orgs/:orgId` and confirm all three sections render with the org's current values (or system defaults). - [ ] Edit and save each section; confirm only that section shows the "Saved." banner. - [ ] Submit invalid input (e.g. `0` tokens, malformed interval); confirm errors render in the offending form only and the other sections stay closed. - [ ] Confirm a non-admin user is redirected away from the route. - [ ] After saving a rate-limit override, hit the org with traffic and confirm the new limit is enforced (API rate limit + batch rate limit code paths read the column at request time).
…ed as HTML' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
| id: release | ||
| uses: softprops/action-gh-release@v1 | ||
| if: github.event_name == 'push' | ||
| uses: softprops/action-gh-release@b4309332981a82ec1c5618f44dd2e27cc8bfbfda # v3.0.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR includes notification style improvements:\n- Label notification dismiss button for accessibility\n- Show notification dismiss button on keyboard focus\n- Sanitize notification action URL to block unsafe protocols\n- Fix CodeQL DOM text reinterpreted as HTML warning\n- Avoid nested interactive elements in notification card
Summary by cubic
Improves webapp notification cards for accessibility and safety. Labels and exposes the dismiss button on focus, sanitizes action URLs, and resolves a CodeQL warning.
Written for commit e0dd504. Summary will update on new commits.