You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
chore: document E2E test findings — ordering limitation, payload size bug, ClickHouse in dev
Found during live E2E testing:
- 9.5: orderingKey doesn't guarantee strict ordering (Trigger.dev concurrencyKey limitation)
- 9.6: payloads >512KB cause silent fan-out failure (0 runs, HTTP 200)
- 9.7: ClickHouse tables not created in dev (stats/history/replay return 500)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
### 9.5 — Ordering Key Does Not Guarantee Strict Ordering
42
+
43
+
**Status**: NOT RESOLVED — needs design decision
44
+
**Priority**: HIGH — correctness issue
45
+
**Found during**: E2E testing (2026-03-01)
46
+
47
+
**Problem**: `orderingKey` maps to Trigger.dev's `concurrencyKey`, which creates a **copy of the queue per key**, each with the same `concurrencyLimit`. This means:
48
+
49
+
- If task has `concurrencyLimit: 1` → ordering works per key, BUT the limit is per-key, not global. All different keys run in parallel with no global cap (only bounded by environment concurrency limit).
50
+
- If task has `concurrencyLimit: 10` → 10 events with the SAME key can run in parallel, breaking ordering.
51
+
- There's no way to express "strict ordering per key + global concurrency limit N" with Trigger.dev's current queue model.
52
+
53
+
**Expected behavior** (like Kafka/SQS FIFO):
54
+
-`orderingKey` = strict sequential per key (always 1 at a time per key)
55
+
-`concurrencyLimit` = total parallel runs across all keys (separate concept)
56
+
57
+
```
58
+
concurrencyLimit: 3, ordering keys A/B/C:
59
+
60
+
Slot 1: A1 → A2 → A3 (key A in order)
61
+
Slot 2: B1 → B2 (key B in order)
62
+
Slot 3: C1 → C2 (key C in order)
63
+
Max 3 running at once, each key strictly ordered.
64
+
```
65
+
66
+
**Trigger.dev's actual behavior with concurrencyKey**:
67
+
- Creates 3 separate queues (A, B, C), EACH with concurrencyLimit 3
68
+
- So 9 runs could execute simultaneously (3 per key × 3 keys)
69
+
- Not true ordering
70
+
71
+
**Options to resolve**:
72
+
1. Build ordering on top of Trigger.dev's queue system with custom logic in PublishEventService
73
+
2. Contribute ordering support upstream to Trigger.dev's run engine
74
+
3. Document as limitation and recommend `concurrencyLimit: 1` for ordering use cases
75
+
4. Use a separate ordering mechanism (Redis-based FIFO per key) before triggering runs
76
+
77
+
**Test results that confirmed this**:
78
+
-`concurrencyLimit: 1` + same key → sequential (correct)
79
+
-`concurrencyLimit: 1` + different keys → parallel (capped by env limit ~8, not by concurrencyLimit)
80
+
-`concurrencyLimit: 2` + same key → 2 at a time (breaks ordering)
81
+
- 10 different keys + `concurrencyLimit: 1` → only ~8 ran in parallel (env limit, not queue limit)
82
+
83
+
### 9.6 — Large Payloads Cause Silent Fan-out Failure
84
+
85
+
**Status**: NOT RESOLVED — needs fix
86
+
**Priority**: HIGH — data loss / silent failure
87
+
**Found during**: E2E testing (2026-03-01)
88
+
89
+
**Problem**: Payloads >512KB cause `PublishEventService` to return `runs: []` (HTTP 200, no error) because Trigger.dev's task trigger silently fails for large payloads (>512KB need object storage offloading which our event publish path doesn't handle).
90
+
91
+
**Test results**:
92
+
- 100KB payload: 4 runs (OK)
93
+
- 500KB payload: 4 runs (OK)
94
+
- 600KB payload: 0 runs (SILENT FAILURE)
95
+
- 2MB payload: 0 runs (SILENT FAILURE)
96
+
97
+
**The trigger call fails silently** — `TriggerTaskService` returns `undefined` for each subscriber, and `PublishEventService` logs it as a partial failure but still returns HTTP 200 with empty runs.
98
+
99
+
**Options to resolve**:
100
+
1. Validate payload size in PublishEventService before fan-out (reject >512KB with clear error)
101
+
2. Use Trigger.dev's payload offloading mechanism (payloads >512KB go to object storage)
102
+
3. Both: warn on large payloads + support offloading
103
+
104
+
### 9.7 — ClickHouse Tables Not Created in Dev
105
+
106
+
**Status**: KNOWN LIMITATION
107
+
**Priority**: LOW — only affects stats/history/replay in local dev
108
+
109
+
**Problem**: ClickHouse migrations (`021_event_log_v1.sql`, `022_event_counts_mv_v1.sql`) are not automatically applied in local dev. This causes:
110
+
-`GET /api/v1/events/:id/stats` → 500 "Failed to query event stats"
111
+
-`GET /api/v1/events/:id/history` → 500 "Failed to query event history"
112
+
-`POST /api/v1/events/:id/replay` → 500 "Failed to query events for replay"
113
+
114
+
The event log writer (fire-and-forget) also fails silently:
115
+
```
116
+
Table trigger_dev.event_log_v1 does not exist.
117
+
```
118
+
119
+
**Resolution**: Apply ClickHouse migrations in local dev, or improve error messages to indicate ClickHouse is not configured.
0 commit comments