Skip to content

Commit 689feea

Browse files
committed
chore: rivetkit core/napi/typescript follow up review
1 parent f102f4d commit 689feea

11 files changed

Lines changed: 1073 additions & 805 deletions

.agent/notes/driver-test-flake-investigation-plan.md

Lines changed: 0 additions & 459 deletions
This file was deleted.

.agent/notes/driver-test-progress.md

Lines changed: 39 additions & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -6,98 +6,52 @@ Config: registry (static), client type (http), encoding (bare)
66
## Fast Tests
77

88
- [x] manager-driver | Manager Driver Tests
9-
- [x] actor-conn | Actor Connection Tests
10-
- [x] actor-conn-state | Actor Connection State Tests
11-
- [x] conn-error-serialization | Connection Error Serialization Tests
12-
- [x] actor-destroy | Actor Destroy Tests
13-
- [x] request-access | Request Access in Lifecycle Hooks
14-
- [x] actor-handle | Actor Handle Tests
15-
- [x] action-features | Action Features (was listed as "Tests" in skill doc; actual describe is "Action Features")
16-
- [x] access-control | access control
17-
- [x] actor-vars | Actor Variables
18-
- [x] actor-metadata | Actor Metadata Tests
19-
- [x] actor-onstatechange | Actor onStateChange Tests (was listed as "State Change Tests")
20-
- [x] actor-db | Actor Database (flaky: "handles parallel actor lifecycle churn" hit `no_envoys` 1/4 runs)
21-
- [x] actor-db-raw | Actor Database (Raw) Tests
22-
- [~] actor-workflow | Actor Workflow Tests (US-103 fixed sleep-grace/run-handler crash-path coverage; remaining known red test is workflow destroy semantics)
23-
- [~] actor-error-handling | Actor Error Handling Tests (6 pass / 1 fail)
24-
- [x] actor-queue | Actor Queue Tests (flaky on first run: 3 failures related to "reply channel dropped" / timeout; clean on retry)
25-
- [x] actor-kv | Actor KV Tests
26-
- [x] actor-stateless | Actor Stateless Tests
27-
- [x] raw-http | raw http
28-
- [x] raw-http-request-properties | raw http request properties
29-
- [x] raw-websocket | raw websocket
30-
- [~] actor-inspector | Actor Inspector HTTP API (1 fail is workflow-replay related; 20 pass)
31-
- [x] gateway-query-url | Gateway Query URLs (filter was missing the "s")
32-
- [x] actor-db-pragma-migration | Actor Database PRAGMA Migration Tests
33-
- [x] actor-state-zod-coercion | Actor State Zod Coercion Tests (filter needed suffix)
34-
- [x] actor-conn-status | Connection Status Changes
35-
- [x] gateway-routing | Gateway Routing
36-
- [x] lifecycle-hooks | Lifecycle Hooks
9+
- [ ] actor-conn | Actor Connection Tests
10+
- [ ] actor-conn-state | Actor Connection State Tests
11+
- [ ] conn-error-serialization | Connection Error Serialization Tests
12+
- [ ] actor-destroy | Actor Destroy Tests
13+
- [ ] request-access | Request Access in Lifecycle Hooks
14+
- [ ] actor-handle | Actor Handle Tests
15+
- [ ] action-features | Action Features Tests
16+
- [ ] access-control | access control
17+
- [ ] actor-vars | Actor Variables
18+
- [ ] actor-metadata | Actor Metadata Tests
19+
- [ ] actor-onstatechange | Actor State Change Tests
20+
- [ ] actor-db | Actor Database
21+
- [ ] actor-db-raw | Actor Database Raw Tests
22+
- [ ] actor-workflow | Actor Workflow Tests
23+
- [ ] actor-error-handling | Actor Error Handling Tests
24+
- [ ] actor-queue | Actor Queue Tests
25+
- [ ] actor-kv | Actor KV Tests
26+
- [ ] actor-stateless | Actor Stateless Tests
27+
- [ ] raw-http | raw http
28+
- [ ] raw-http-request-properties | raw http request properties
29+
- [ ] raw-websocket | raw websocket
30+
- [ ] actor-inspector | Actor Inspector Tests
31+
- [ ] gateway-query-url | Gateway Query URL Tests
32+
- [ ] actor-db-pragma-migration | Actor Database Pragma Migration
33+
- [ ] actor-state-zod-coercion | Actor State Zod Coercion
34+
- [ ] actor-conn-status | Connection Status Changes
35+
- [ ] gateway-routing | Gateway Routing
36+
- [ ] lifecycle-hooks | Lifecycle Hooks
3737

3838
## Slow Tests
3939

40-
- [x] actor-state | Actor State Tests
41-
- [x] actor-schedule | Actor Schedule Tests
42-
- [x] actor-sleep | Actor Sleep Tests
43-
- [x] actor-sleep-db | Actor Sleep Database Tests
44-
- [x] actor-lifecycle | Actor Lifecycle Tests
45-
- [x] actor-conn-hibernation | Connection Hibernation (flaky first run; clean on retry)
46-
- [x] actor-run | Actor Run Tests
47-
- [x] hibernatable-websocket-protocol | hibernatable websocket protocol (all 6 tests skipped; the feature flag `hibernatableWebSocketProtocol` is not enabled for the static driver config)
48-
- [x] actor-db-stress | Actor Database Stress Tests
40+
- [ ] actor-state | Actor State Tests
41+
- [ ] actor-schedule | Actor Schedule Tests
42+
- [ ] actor-sleep | Actor Sleep Tests
43+
- [ ] actor-sleep-db | Actor Sleep Database Tests
44+
- [ ] actor-lifecycle | Actor Lifecycle Tests
45+
- [ ] actor-conn-hibernation | Actor Connection Hibernation Tests
46+
- [ ] actor-run | Actor Run Tests
47+
- [ ] hibernatable-websocket-protocol | hibernatable websocket protocol
48+
- [ ] actor-db-stress | Actor Database Stress Tests
4949

5050
## Excluded
5151

5252
- [ ] actor-agent-os | Actor agentOS Tests (skip unless explicitly requested)
5353

5454
## Log
5555

56-
- 2026-04-22 manager-driver: PASS (16 tests, 12.20s)
57-
- 2026-04-22 actor-conn: PASS (23 tests, 28.12s) -- Note: first run showed 2 flaky failures (lifecycle hooks `onWake` missing; `maxIncomingMessageSize` timeout). Re-ran 5 times with trace after, all passed. Likely cold-start race on first run.
58-
- 2026-04-22 actor-conn-state: PASS (8 tests, 6.80s)
59-
- 2026-04-22 conn-error-serialization: PASS (3 tests, 2.53s)
60-
- 2026-04-22 actor-destroy: PASS (10 tests, 19.47s)
61-
- 2026-04-22 request-access: PASS (4 tests, 3.52s)
62-
- 2026-04-22 actor-handle: PASS (12 tests, 8.42s)
63-
- 2026-04-22 action-features: PASS (11 tests, 8.46s) -- corrected filter to "Action Features" (no "Tests" suffix)
64-
- 2026-04-22 access-control: PASS (8 tests, 6.29s)
65-
- 2026-04-22 actor-vars: PASS (5 tests, 3.81s)
66-
- 2026-04-22 actor-metadata: PASS (6 tests, 4.34s)
67-
- 2026-04-22 actor-onstatechange: PASS (5 tests, 3.97s) -- corrected filter to "Actor onStateChange Tests"
68-
- 2026-04-22 actor-db: PASS (16 tests, 26.21s) -- flaky 1/4: "handles parallel actor lifecycle churn" intermittently fails with no_envoys. Passes on retry.
69-
- 2026-04-22 actor-db-raw: PASS (4 tests, 4.04s) -- corrected filter to "Actor Database (Raw) Tests"
70-
- 2026-04-22 actor-queue: PASS (25 tests, 32.95s) -- first run had 3 flaky failures, all passed on retry
71-
- 2026-04-22 actor-kv: PASS (3 tests, 2.51s)
72-
- 2026-04-22 actor-stateless: PASS (6 tests, 4.38s)
73-
- 2026-04-22 raw-http: PASS (15 tests, 10.76s)
74-
- 2026-04-22 raw-http-request-properties: PASS (16 tests, 11.44s)
75-
- 2026-04-22 raw-websocket: PASS (11 tests, 8.77s)
76-
- 2026-04-22 actor-inspector: PARTIAL PASS (20 passed, 1 failed, 42 skipped) -- filter corrected to "Actor Inspector HTTP API". Only failure is `POST /inspector/workflow/replay rejects workflows that are currently in flight` (workflow-related; user asked to skip workflow issues).
77-
- 2026-04-22 gateway-query-url: PASS (2 tests, 2.35s) -- filter corrected to "Gateway Query URLs"
78-
- 2026-04-22 actor-db-pragma-migration: PASS (4 tests, 4.09s)
79-
- 2026-04-22 actor-state-zod-coercion: PASS (3 tests, 3.34s)
80-
- 2026-04-22 actor-conn-status: PASS (6 tests, 5.76s)
81-
- 2026-04-22 gateway-routing: PASS (8 tests, 5.96s)
82-
- 2026-04-22 lifecycle-hooks: PASS (8 tests, 6.62s)
83-
- 2026-04-22 actor-state: PASS (3 tests, 3.08s)
84-
- 2026-04-22 actor-schedule: PASS (4 tests, 6.79s)
85-
- 2026-04-22 actor-sleep: PASS (21 tests, 53.61s)
86-
- 2026-04-22 actor-sleep-db: PASS (14 tests, 42.29s)
87-
- 2026-04-22 actor-lifecycle: PASS (5 tests, 30.22s)
88-
- 2026-04-22 actor-conn-hibernation: PASS (5 tests) -- filter is "Connection Hibernation". Flaky first run ("conn state persists through hibernation"), passed on retry.
89-
- 2026-04-22 hibernatable-websocket-protocol: N/A (feature not enabled; all 6 tests correctly skipped)
90-
- 2026-04-22 actor-db-stress: PASS (3 tests, 24.22s)
91-
- 2026-04-22 actor-run: PASS after US-103 (8 passed / 16 skipped) -- native abortSignal binding plus sleep-grace abort firing and NAPI run-handler active gating now cover `active run handler keeps actor awake past sleep timeout`.
92-
- 2026-04-22 actor-error-handling: FAIL (1 failed, 6 passed, 14 skipped) -- `should convert internal errors to safe format` leaks the original `Error` message through instead of sanitizing to `INTERNAL_ERROR_DESCRIPTION`. Server-side sanitization of plain `Error` into canonical internal_error was likely dropped somewhere on this branch; `toRivetError` in actor/errors.ts preserves `error.message` and the classifier in common/utils.ts is not being invoked on this path. Needs fix outside driver-runner scope.
93-
- 2026-04-22 actor-workflow: FAIL (6 failed / 12 passed / 39 skipped) -- REVERTED the `isLifecycleEventsNotConfiguredError` swallow in `stateManager.saveState`. The fix only masked the symptom: workflow `batch()` does `Promise.all([kvBatchPut, stateManager.saveState])`, and when the task joins and `registry/mod.rs:807` clears `configure_lifecycle_events(None)`, a still-pending `saveState` hits `actor/state.rs:191` (`lifecycle_event_sender()` returns None) → unhandled rejection → Node runtime crash → downstream `no_envoys` / "reply channel dropped". Root cause is the race: shutdown tears down lifecycle events while the workflow engine still has an outstanding save. Real fix belongs in core or the workflow flush sequence, not in a bridge error swallow. Failures that were being masked:
94-
* `starts child workflows created inside workflow steps` - 2 identical "child-1" results instead of 1. Workflow step body re-executes on replay, double-pushing to `state.results`.
95-
* `workflow steps can destroy the actor` - ctx.destroy() fires onDestroy but actor still resolvable via `get`. envoy-client `destroy_actor` sends plain `ActorIntentStop` and there is no `ActorIntentDestroy` in the envoy v2 protocol. TS runner sets `graceful_exit` marker; equivalent marker is not wired through Rust envoy-client.
96-
- 2026-04-22 actor-workflow after US-103: PARTIAL PASS (17 passed / 1 failed / 39 skipped). Crash-path coverage passed, including `replays steps and guards state access`, `tryStep and try recover terminal workflow failures`, `sleeps and resumes between ticks`, and `completed workflows sleep instead of destroying the actor`. Remaining failure is still `workflow steps can destroy the actor`, matching the known missing envoy destroy marker above.
97-
- 2026-04-22 actor-db sanity after US-103: PASS for `handles parallel actor lifecycle churn`.
98-
- 2026-04-22 actor-queue sanity after US-103: combined route-sensitive run still hit the known many-queue dropped-reply/overload flake; both targeted cases passed when run in isolation.
99-
- 2026-04-22 ALL FILES PROCESSED (37 files). Summary: 30 full-pass, 4 partial-pass (actor-workflow, actor-error-handling, actor-inspector, actor-run), 1 n/a (hibernatable-websocket-protocol - feature disabled). 2 code fixes landed: (1) `stateManager.saveState` swallows post-shutdown state-save bridge error in workflow cleanup; (2) `#createActorAbortSignal` uses native `AbortSignal` property/event API instead of calling non-existent methods. Outstanding issues captured above; none caused by the test-runner pass itself.
100-
- 2026-04-22 flake investigation Step 1: `actor-error-handling` recheck is GREEN for static/bare `Actor Error Handling Tests` (`/tmp/driver-logs/error-handling-recheck.log`, exit 0). `actor-workflow` child-workflow recheck is GREEN for static/bare `starts child workflows` (`/tmp/driver-logs/workflow-child-recheck.log`, exit 0). Step 5 skipped because the child-workflow target is no longer red.
101-
- 2026-04-22 flake investigation Step 2: `actor-inspector` replay target still fails, but the failure is after the expected 409. `/tmp/driver-logs/inspector-replay.log` shows replay rejection works, then `handle.release()` does not lead to `finishedAt` before the 30s test timeout. Evidence and fix direction captured in `.agent/notes/flake-inspector-replay.md`.
102-
- 2026-04-22 flake investigation Step 3: `actor-conn` targeted runs: `isConnected should be false before connection opens` 5/5 PASS; `onOpen should be called when connection opens` 2/3 PASS and 1/3 FAIL; `should reject request exceeding maxIncomingMessageSize` 2/3 PASS and 1/3 FAIL; `should reject response exceeding maxOutgoingMessageSize` 3/3 PASS. Evidence and fix direction captured in `.agent/notes/flake-conn-websocket.md`.
103-
- 2026-04-22 flake investigation Step 4: isolated `actor-queue` `wait send returns completion response` is 5/5 PASS. `drains many-queue child actors created from actions while connected` is 1/3 PASS and 2/3 FAIL with `actor/dropped_reply` plus HTTP 500 responses. Evidence and fix direction captured in `.agent/notes/flake-queue-waitsend.md`.
56+
- 2026-04-23T03:45:07.364Z manager-driver: PASS (41.0s)
57+
- 2026-04-23T03:46:11.489Z actor-conn: FAIL - FAIL tests/driver/actor-conn.test.ts > Actor Conn > static registry > encoding (bare) > Actor Connection Tests > Large Payloads > should reject response exceeding maxOutgoingMessageSize
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
sequenceDiagram
2+
autonumber
3+
participant Timer as Sleep timer
4+
participant Task as ActorTask
5+
participant Ctx as ActorContext
6+
participant Adapter as Runtime adapter
7+
participant User as User hooks/run
8+
participant Store as State/KV
9+
10+
Timer->>Task: sleep_timer() / idle deadline fires
11+
Task->>Task: reset_sleep_deadline() used can_arm_sleep_timer()
12+
Task->>Task: begin_grace(StopReason::Sleep)
13+
Task->>Task: transition_to(LifecycleState::SleepGrace)
14+
Task->>Ctx: suspend_alarm_dispatch()
15+
Task->>Ctx: cancel_local_alarm_timeouts()
16+
Task->>Ctx: cancel_abort_signal_for_sleep()
17+
Task->>Task: emit_grace_events(StopReason::Sleep)
18+
Task->>Ctx: begin_core_dispatched_hook()
19+
Task->>Adapter: send_actor_event(DisconnectConn { conn_id, reply })
20+
Adapter->>User: call_on_disconnect_final()
21+
Adapter->>Ctx: disconnect_conn(conn_id)
22+
Adapter-->>Task: Reply<()> Ok
23+
Task->>Ctx: mark_core_dispatched_hook_completed()
24+
Task->>Ctx: begin_core_dispatched_hook()
25+
Task->>Adapter: send_actor_event(RunGracefulCleanup { Sleep, reply })
26+
Adapter->>User: call_on_sleep()
27+
Adapter-->>Task: Reply<()> Ok
28+
Task->>Ctx: mark_core_dispatched_hook_completed()
29+
User-->>Ctx: waitUntil / on_state_change / http / raw ws completes
30+
Ctx-->>Task: activity_notify.notified()
31+
Task->>Task: on_activity_signal()
32+
Task->>Task: try_finish_grace()
33+
Task->>Ctx: can_finalize_sleep()
34+
alt all work idle before grace deadline
35+
Task->>Task: run_shutdown(StopReason::Sleep)
36+
Task->>Task: transition_to(LifecycleState::SleepFinalize)
37+
Task->>Task: save_final_state(StopReason::Sleep)
38+
Task->>Adapter: send_actor_event(SerializeState { Save, reply })
39+
Adapter-->>Task: Reply<Vec<StateDelta>>
40+
Task->>Ctx: save_state(deltas)
41+
Ctx->>Store: persist actor state
42+
Task->>Task: close_actor_event_channel()
43+
Task->>User: abort_and_join_run_handle()
44+
Task->>Ctx: cleanup_for_stopped()
45+
Task->>Task: LiveExit::Stopped(Sleep)
46+
else grace deadline expires
47+
Task->>Task: on_sleep_grace_deadline()
48+
Task->>User: run_handle.abort()
49+
Task->>Task: run_shutdown(StopReason::Sleep)
50+
Task->>Task: transition_to(LifecycleState::SleepFinalize)
51+
Task->>Task: save_final_state(StopReason::Sleep)
52+
Task->>Adapter: send_actor_event(SerializeState { Save, reply })
53+
Adapter-->>Task: Reply<Vec<StateDelta>> or empty on timeout/error
54+
Task->>Ctx: save_state(deltas)
55+
Ctx->>Store: persist actor state
56+
Task->>Task: close_actor_event_channel()
57+
Task->>Ctx: cleanup_for_stopped()
58+
Task->>Task: LiveExit::Stopped(Sleep)
59+
end

0 commit comments

Comments
 (0)