Skip to content

Commit d5a3299

Browse files
giovaborgognoclaude
andcommitted
chore: document E2E test findings — ordering limitation, payload size bug, ClickHouse in dev
Found during live E2E testing: - 9.5: orderingKey doesn't guarantee strict ordering (Trigger.dev concurrencyKey limitation) - 9.6: payloads >512KB cause silent fan-out failure (0 runs, HTTP 200) - 9.7: ClickHouse tables not created in dev (stats/history/replay return 500) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 826762d commit d5a3299

File tree

1 file changed

+81
-1
lines changed

1 file changed

+81
-1
lines changed

.claude/projects/-Users-terac-repos-trigger-dev/memory/pubsub-pending.md

Lines changed: 81 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,87 @@ Items identified during post-implementation audit. Ordered by priority.
3838
- SDK docs in `rules/` directory
3939
- Update `.claude/skills/trigger-dev-tasks/SKILL.md`
4040

41-
### 9.5 — Consumer-side Rate Limiting + Backpressure Monitor
41+
### 9.5 — Ordering Key Does Not Guarantee Strict Ordering
42+
43+
**Status**: NOT RESOLVED — needs design decision
44+
**Priority**: HIGH — correctness issue
45+
**Found during**: E2E testing (2026-03-01)
46+
47+
**Problem**: `orderingKey` maps to Trigger.dev's `concurrencyKey`, which creates a **copy of the queue per key**, each with the same `concurrencyLimit`. This means:
48+
49+
- If task has `concurrencyLimit: 1` → ordering works per key, BUT the limit is per-key, not global. All different keys run in parallel with no global cap (only bounded by environment concurrency limit).
50+
- If task has `concurrencyLimit: 10` → 10 events with the SAME key can run in parallel, breaking ordering.
51+
- There's no way to express "strict ordering per key + global concurrency limit N" with Trigger.dev's current queue model.
52+
53+
**Expected behavior** (like Kafka/SQS FIFO):
54+
- `orderingKey` = strict sequential per key (always 1 at a time per key)
55+
- `concurrencyLimit` = total parallel runs across all keys (separate concept)
56+
57+
```
58+
concurrencyLimit: 3, ordering keys A/B/C:
59+
60+
Slot 1: A1 → A2 → A3 (key A in order)
61+
Slot 2: B1 → B2 (key B in order)
62+
Slot 3: C1 → C2 (key C in order)
63+
Max 3 running at once, each key strictly ordered.
64+
```
65+
66+
**Trigger.dev's actual behavior with concurrencyKey**:
67+
- Creates 3 separate queues (A, B, C), EACH with concurrencyLimit 3
68+
- So 9 runs could execute simultaneously (3 per key × 3 keys)
69+
- Not true ordering
70+
71+
**Options to resolve**:
72+
1. Build ordering on top of Trigger.dev's queue system with custom logic in PublishEventService
73+
2. Contribute ordering support upstream to Trigger.dev's run engine
74+
3. Document as limitation and recommend `concurrencyLimit: 1` for ordering use cases
75+
4. Use a separate ordering mechanism (Redis-based FIFO per key) before triggering runs
76+
77+
**Test results that confirmed this**:
78+
- `concurrencyLimit: 1` + same key → sequential (correct)
79+
- `concurrencyLimit: 1` + different keys → parallel (capped by env limit ~8, not by concurrencyLimit)
80+
- `concurrencyLimit: 2` + same key → 2 at a time (breaks ordering)
81+
- 10 different keys + `concurrencyLimit: 1` → only ~8 ran in parallel (env limit, not queue limit)
82+
83+
### 9.6 — Large Payloads Cause Silent Fan-out Failure
84+
85+
**Status**: NOT RESOLVED — needs fix
86+
**Priority**: HIGH — data loss / silent failure
87+
**Found during**: E2E testing (2026-03-01)
88+
89+
**Problem**: Payloads >512KB cause `PublishEventService` to return `runs: []` (HTTP 200, no error) because Trigger.dev's task trigger silently fails for large payloads (>512KB need object storage offloading which our event publish path doesn't handle).
90+
91+
**Test results**:
92+
- 100KB payload: 4 runs (OK)
93+
- 500KB payload: 4 runs (OK)
94+
- 600KB payload: 0 runs (SILENT FAILURE)
95+
- 2MB payload: 0 runs (SILENT FAILURE)
96+
97+
**The trigger call fails silently**`TriggerTaskService` returns `undefined` for each subscriber, and `PublishEventService` logs it as a partial failure but still returns HTTP 200 with empty runs.
98+
99+
**Options to resolve**:
100+
1. Validate payload size in PublishEventService before fan-out (reject >512KB with clear error)
101+
2. Use Trigger.dev's payload offloading mechanism (payloads >512KB go to object storage)
102+
3. Both: warn on large payloads + support offloading
103+
104+
### 9.7 — ClickHouse Tables Not Created in Dev
105+
106+
**Status**: KNOWN LIMITATION
107+
**Priority**: LOW — only affects stats/history/replay in local dev
108+
109+
**Problem**: ClickHouse migrations (`021_event_log_v1.sql`, `022_event_counts_mv_v1.sql`) are not automatically applied in local dev. This causes:
110+
- `GET /api/v1/events/:id/stats` → 500 "Failed to query event stats"
111+
- `GET /api/v1/events/:id/history` → 500 "Failed to query event history"
112+
- `POST /api/v1/events/:id/replay` → 500 "Failed to query events for replay"
113+
114+
The event log writer (fire-and-forget) also fails silently:
115+
```
116+
Table trigger_dev.event_log_v1 does not exist.
117+
```
118+
119+
**Resolution**: Apply ClickHouse migrations in local dev, or improve error messages to indicate ClickHouse is not configured.
120+
121+
### 9.8 — Consumer-side Rate Limiting + Backpressure Monitor
42122

43123
**Status**: NOT STARTED (deferred from Phase 7)
44124
**Complexity**: MEDIUM

0 commit comments

Comments
 (0)