diff --git a/.changeset/batch-child-completion-wakes.md b/.changeset/batch-child-completion-wakes.md new file mode 100644 index 0000000000..6a654c7bfa --- /dev/null +++ b/.changeset/batch-child-completion-wakes.md @@ -0,0 +1,6 @@ +--- +'@electric-ax/agents-runtime': patch +'@electric-ax/agents-server': patch +--- + +Batch queued child completion wakes into a single wake payload so parent agents receive every child result without extra handler runs. Preserve manifest-backed child wake registrations during spawn reconciliation and catch up late runFinished registrations so fast child completions are not missed. diff --git a/.changeset/quiet-markdown-docs.md b/.changeset/quiet-markdown-docs.md new file mode 100644 index 0000000000..6b4f6ee90a --- /dev/null +++ b/.changeset/quiet-markdown-docs.md @@ -0,0 +1,10 @@ +--- +"@electric-ax/agents": patch +"@electric-ax/agents-runtime": patch +"@electric-ax/agents-server": patch +"@electric-ax/agents-server-ui": patch +--- + +Add collaborative markdown document tools backed by Yjs durable streams. + +Horton can create, read, replace, edit, and stream inserts into markdown documents by mutating a wake-local Y.Doc and appending binary Yjs updates to the document stream. The server now keeps markdown document handling thin by creating document streams and serving manifest metadata while document content changes flow through the Yjs stream. diff --git a/.changeset/realtime-agents-voice-mode.md b/.changeset/realtime-agents-voice-mode.md new file mode 100644 index 0000000000..efe20c4899 --- /dev/null +++ b/.changeset/realtime-agents-voice-mode.md @@ -0,0 +1,9 @@ +--- +'@electric-ax/agents': patch +'@electric-ax/agents-desktop': patch +'@electric-ax/agents-runtime': patch +'@electric-ax/agents-server': patch +'@electric-ax/agents-server-ui': patch +--- + +Add OpenAI realtime voice mode for Electric Agents, backed by durable audio/control streams. Horton can enter realtime mode with normal context and tools, desktop exposes realtime model/voice/reasoning settings, the server/runtime persist session stream refs, transcripts, and audio spans, and the UI adds voice controls, typed-message forwarding, credential gating, input metering, new-session voice startup, and audio capture/playback fixes. diff --git a/.changeset/streaming-tool-call-args.md b/.changeset/streaming-tool-call-args.md new file mode 100644 index 0000000000..88d56049b4 --- /dev/null +++ b/.changeset/streaming-tool-call-args.md @@ -0,0 +1,5 @@ +--- +"@electric-ax/agents-runtime": minor +--- + +Add runtime support for streaming tool call arguments from Pi model events. diff --git a/AGENTS_MARKDOWN_DOCS_PLAN.md b/AGENTS_MARKDOWN_DOCS_PLAN.md new file mode 100644 index 0000000000..ffee9b0977 --- /dev/null +++ b/AGENTS_MARKDOWN_DOCS_PLAN.md @@ -0,0 +1,770 @@ +# Agents Markdown Docs Implementation Plan + +## Goal + +Add first-class collaborative markdown documents to Electric Agents. + +Agents should be able to create a markdown document, add it to the entity +manifest, read it, and edit it with file-like replacement tools. Users should be +able to click the manifest entry, open a CodeMirror markdown editor in the +workspace, edit concurrently with other users, and see agent/user presence. + +The first implementation intentionally does not require streaming tool calls or +runtime-level interception of assistant text. Streaming edits can be added after +the document model, auth, UI, and non-streaming tools are working. + +## MVP Scope + +### In scope + +- A new manifest entry kind for collaborative markdown documents. +- One durable Yjs document stream per document, using + `@durable-streams/y-durable-streams`. +- A CodeMirror markdown editor bound to `Y.Text`. +- User presence through Yjs awareness. +- Agent presence during document tools, including status and edit location. +- Agent tools for create/read/write/exact text replacement. +- Unified diff results from write/edit tools, matching the current file tool + behavior. +- Explicit server auth for document Yjs stream paths. +- Forking support so forked entities receive forked document streams. + +### Out of scope for MVP + +- Token-by-token agent edits. +- Streaming tool arguments. +- Runtime routing of assistant text into documents. +- Rich-text CRDTs such as ProseMirror fragments. +- Markdown preview/render mode. +- Comments or suggestions inside docs. +- Document history UI beyond the Yjs/Durable Streams backing log. + +## Core Design + +### Manifest Entry + +Add a new manifest entry kind rather than encoding docs as attachments. + +Attachments are immutable, closed streams with byte length and sha256 semantics. +Markdown docs are mutable CRDT-backed resources, so they should be first-class +manifest entries with their own lifecycle and fork/auth behavior. + +Proposed manifest shape: + +```ts +type ManifestDocumentEntry = { + key?: string + kind: 'document' + id: string + title: string + provider: 'y-durable-streams' + docId: string + docPath: string + streamPath: string + contentMimeType: 'text/markdown' + transportMimeType: 'application/vnd.electric-agents.markdown-yjs' + yTextName: 'markdown' + createdAt: string + createdBy?: string + updatedAt?: string + meta?: Record +} +``` + +Recommended manifest key: + +```ts +document:${id} +``` + +Recommended stream path: + +```ts +/docs/agents/${entityType}/${instanceId}/documents/${id} +``` + +`docId` is the value passed to `YjsProvider`. + +`docPath` is the provider-facing stable document path and should not have a +leading slash: + +```ts +agents/${entityType}/${instanceId}/documents/${id} +``` + +`streamPath` is the Durable Streams document stream path used for auth, forking, +and debugging: + +```ts +/docs/${docPath} +``` + +This shape follows the `y-durable-streams` URL contract. The provider requests: + +```ts +{baseUrl}/docs/{docPath}?{queryParams} +``` + +For the agents server, use: + +```ts +baseUrl = agentsServerUrl +docId = docPath +``` + +Do not set `baseUrl` to the raw `streamPath`; the provider appends `/docs/...` +itself. + +### Yjs Document Model + +Use a plain Yjs text type: + +```ts +const ytext = ydoc.getText('markdown') +``` + +This keeps the MVP simple: + +- The stored CRDT is binary Yjs updates. +- The logical document content is markdown text. +- CodeMirror can bind directly to `Y.Text`. +- Agent tools can operate on `ytext.toString()` and commit Yjs transactions. + +### Mime Types + +Use two concepts: + +- `contentMimeType: 'text/markdown'` for what users and tools are editing. +- `transportMimeType: 'application/vnd.electric-agents.markdown-yjs'` for what + is stored in the durable stream. + +Do not label the durable stream itself as `text/markdown`; its bytes are Yjs +updates/snapshots. + +## Implementation Areas + +### 1. Runtime Types and Manifest Schema + +Files: + +- `packages/agents-runtime/src/entity-schema.ts` +- `packages/agents-runtime/src/types.ts` +- `packages/agents-runtime/src/manifest-helpers.ts` +- `packages/agents-server-ui/src/lib/ElectricAgentsProvider.tsx` + +Tasks: + +- Add `ManifestDocumentEntryValue`. +- Extend the manifest zod union with `kind: 'document'`. +- Export `ManifestDocumentEntry`. +- Add `manifestDocumentKey(id: string)`. +- Update UI-side manifest parsing/types to accept `kind: 'document'`. + +Acceptance: + +- Entity state can contain a `manifest` event with `kind: 'document'`. +- Document manifest entries use the strict shape above; older draft document + manifest shapes are not supported. + +### 2. Server Document API + +Files: + +- `packages/agents-server/src/entity-manager.ts` +- `packages/agents-server/src/routing/entities-router.ts` +- `packages/agents-runtime/src/runtime-server-client.ts` +- `packages/agents-runtime/src/types.ts` + +Tasks: + +- Add document validation helpers: + - document id cannot be empty, start with `.`, or contain `/`. + - title should be non-empty and bounded. +- Add `createDocument(entityUrl, req)`: + - create durable Yjs backing stream if needed. + - initialize the Yjs document with optional markdown text. + - write the document manifest entry. + - return `{ txid, document }`. +- Add `getDocument(entityUrl, id)`. +- Add `readDocument(entityUrl, id)` returning current markdown text. +- Add `writeDocument(entityUrl, id, content)` replacing the whole `Y.Text`. +- Add `editDocument(entityUrl, id, old_string, new_string, replace_all?)`. +- Add HTTP routes under entity API: + - `POST /:type/:instanceId/documents` + - `GET /:type/:instanceId/documents/:documentId` + - `GET /:type/:instanceId/documents/:documentId/content` + - `PUT /:type/:instanceId/documents/:documentId/content` + - `PATCH /:type/:instanceId/documents/:documentId/content` + +Open implementation choice: + +- Preferred for MVP: put create/read/write/edit on the server API and expose + them through `RuntimeServerClient`. This keeps auth, fork locks, and manifest + writes in one place. +- Avoid direct runtime-tool writes to `YjsProvider` in the first cut. That is + faster to prototype, but it spreads auth, fork locks, and stream path rules + into the runtime. + +Acceptance: + +- Creating a doc appends a manifest row. +- Reading returns markdown text from the Yjs doc. +- Writing/editing produces Yjs updates, not manifest content mutations. +- Server rejects operations when the entity is stopped or fork-write-locked. + +### 3. Durable Stream Yjs Integration + +Files: + +- `packages/agents-server/package.json` +- `packages/agents-server-ui/package.json` +- `packages/agents-runtime/package.json` if runtime tools manipulate Yjs locally. + +Dependencies: + +- `@durable-streams/y-durable-streams` +- `yjs` +- `y-protocols` +- `lib0` +- UI only: + - `codemirror` + - `@codemirror/state` + - `@codemirror/view` + - `@codemirror/lang-markdown` + - `y-codemirror.next` + +Tasks: + +- Use `YjsProvider` for browser/editor connections. +- On server create/write/edit, either: + - use `YjsProvider` server-side and wait for sync, or + - use the y-durable-streams server utilities if exposed by the package. +- Always destroy providers after tool/server operations. +- For initial content, create a `Y.Doc`, set `getText('markdown')`, and persist + through the provider. +- Keep the Yjs mount constants in one shared server module: + - `docPathForDocument(entityUrl, documentId)` + - `documentStreamPathForDocPath(docPath)` + - `entityUrlFromYjsDocumentPath(path)` + - `entityUrlFromYjsAwarenessPath(path)` + +Acceptance: + +- A browser editor and server operation converge on the same markdown text. +- New editor clients load through snapshot discovery and then live updates. + +### 4. Durable Stream Auth + +Files: + +- `packages/agents-server/src/routing/durable-streams-router.ts` +- `packages/agents-server/src/routing/stream-append.ts` + +Tasks: + +- Add document path recognition for provider document requests: + +```ts +function entityUrlFromYjsDocumentPath(path: string): string | null { + const match = path.match( + /^\/docs\/agents\/([^/]+)\/([^/]+)\/documents\/[^/]+(?:\/.*)?$/ + ) + if (!match) return null + return `/${match[1]}/${match[2]}` +} +``` + +- Authorize `GET`/`HEAD` document stream access with entity read permission. +- Authorize `POST`/`PUT` document stream writes with entity write/manage rules + or a dedicated document write permission rule. +- Inspect the installed `@durable-streams/y-durable-streams` package and add an + equivalent `entityUrlFromYjsAwarenessPath(path)` for the exact awareness URL + pattern used by the provider. +- Add route tests using real provider URL shapes for: + - snapshot discovery and snapshot load. + - live update reads. + - local edit writes. + - awareness reads/writes. +- Reject direct writes to document paths during fork locks. + +Important: + +The current durable-stream proxy explicitly guards entity streams, attachment +streams, and shared-state streams. Unknown paths intentionally pass through. +Document and document-awareness paths must not remain in that pass-through +bucket. + +Acceptance: + +- Unauthorized users cannot read, write, or observe awareness for document + streams. +- Authorized users can edit through CodeMirror. +- Fork locks prevent concurrent writes while the subtree is being forked. + +### 5. Forking + +Files: + +- `packages/agents-server/src/entity-manager.ts` + +Tasks: + +- Collect document stream paths from document manifest entries during fork + snapshot reads. +- Lock document stream paths during fork, like shared-state streams. +- Fork each document durable stream from source to fork destination. +- Remap document manifest entries: + - `streamPath` + - `docPath` + - `docId` + - possibly `key` if document ids are rewritten. +- Keep document ids stable within a fork unless collisions require suffixing. + +Acceptance: + +- Forked entity opens an independent copy of each document. +- Editing a forked doc does not change the source entity's doc. +- Pointer forks include only document manifest entries visible at the fork point. + +### 6. Runtime Tool Surface + +Files: + +- `packages/agents-runtime/src/tools/documents.ts` +- `packages/agents-runtime/src/tools.ts` +- `packages/agents-runtime/src/types.ts` +- `packages/agents-runtime/src/process-wake.ts` +- `packages/agents/src/bootstrap.ts` + +Tasks: + +- Add framework document tool factory. +- Extend `ProcessWakeConfig.createElectricTools` context with + `principal?: RuntimePrincipal`, and pass `config.principal` through from + `processWake`. Document tools need this for agent awareness state. +- Extend `ProcessWakeConfig.createElectricTools` context with document methods + backed by `RuntimeServerClient`: + - `createMarkdownDocument` + - `readMarkdownDocument` + - `writeMarkdownDocument` + - `editMarkdownDocument` +- Add default built-in tools in `packages/agents/src/bootstrap.ts`, alongside + event-source tools. +- Keep worker exposure explicit if desired. Horton already includes + `ctx.electricTools`; Worker currently gets only selected tools. + +Tool shapes: + +```ts +create_markdown_doc({ + title: string, + content?: string +}) +``` + +```ts +read_markdown_doc({ + docId: string, +}) +``` + +```ts +write_markdown_doc({ + docId: string, + content: string, +}) +``` + +```ts +edit_markdown_doc({ + docId: string, + old_string: string, + new_string: string, + replace_all?: boolean +}) +``` + +Tool behavior should mirror file tools: + +- `read_markdown_doc`, `create_markdown_doc`, and `write_markdown_doc` mark the + document as read in a per-wake read set. +- `edit_markdown_doc` must reject edits unless the document has been read or + written in the same wake. +- `old_string` must occur exactly once unless `replace_all` is true. +- Return a useful error when not found or ambiguous. +- Return `details.diff` using `createTwoFilesPatch`. +- Return replacement counts and byte/char counts. + +Acceptance: + +- An agent can create a doc and then read/edit it with file-like tools. +- Tool call UI shows a diff for document edits without special casing if + possible. + +### 7. Agent Presence During Tools + +Files: + +- `packages/agents-runtime/src/tools/documents.ts` +- server-side document service module, if split from `entity-manager.ts` + +Tasks: + +- When a document tool edits content: + - connect to the Yjs provider with an `Awareness` instance. + - set local awareness state from the principal passed through + `createElectricTools`, or from an agent principal derived by the server: + +```ts +{ + user: { + principalUrl, + role: 'agent', + name, + color, + status: 'editing' + } +} +``` + +- Before applying a replacement, set the agent selection/cursor near the + replacement range. +- Apply the Yjs transaction. +- Move cursor to the end of the replacement. +- Set status back to `idle` or destroy provider so awareness removal is + broadcast. + +Acceptance: + +- While an agent edit tool is running, open editors see the agent presence. +- For quick edits this may be brief; that is acceptable for MVP. + +### 8. UI: Document Manifest Rows + +Files: + +- `packages/agents-server-ui/src/components/EntityTimeline.tsx` +- `packages/agents-server-ui/src/lib/attachments.ts` or a new + `documents.ts` + +Tasks: + +- Add `isDocumentManifest`. +- Display document rows as `Document`. +- Use the title as primary text. +- Show `text/markdown`, provider, and created metadata. +- Add an open action. +- Use workspace helper: + +```ts +workspace.helpers.openEntity(entityUrl, { + viewId: 'markdown-doc', + viewParams: { doc: manifest.id }, +}) +``` + +Acceptance: + +- Document manifests are not hidden as attachments. +- Clicking a document opens the editor view. + +### 9. UI: CodeMirror Markdown Editor View + +Files: + +- `packages/agents-server-ui/src/lib/workspace/registerViews.ts` +- `packages/agents-server-ui/src/components/views/MarkdownDocumentView.tsx` +- new CSS module for the editor view. + +Tasks: + +- Register entity view: + +```ts +registerView({ + kind: 'entity', + id: 'markdown-doc', + label: 'Docs', + icon: FileText, + Component: MarkdownDocumentView, +}) +``` + +- Resolve `doc` from `viewParams`. +- Find document manifest from entity DB. +- Construct `Y.Doc`, `Awareness`, and `YjsProvider`. +- Use `baseUrl` pointing at the agents server durable-stream proxy. +- Bind CodeMirror to `ydoc.getText('markdown')`. +- Set local user awareness from `useCurrentPrincipal()`. +- Pass configured auth/principal headers to `YjsProvider.headers`, matching the + rest of the agents UI request path. +- Render presence bar from awareness states. +- Destroy CodeMirror view/provider on unmount. + +Acceptance: + +- Two browser windows can concurrently edit one doc. +- Remote cursor/presence appears. +- Agent tool edits appear live in open editors. +- The editor survives tile split/open/close cycles. + +### 10. Tests + +Unit and integration tests should be added at the layer being changed. + +Runtime: + +- Manifest schema accepts document entries. +- Document tool exact replacement behavior matches file edit behavior. +- Diff details are returned. + +Server: + +- Create document writes manifest. +- Read/write/edit round trip through Yjs. +- Unauthorized durable stream document access is rejected. +- Forked docs are independent. + +UI: + +- Manifest row labels and open action. +- View registration. +- Editor view mounts with missing/invalid doc id states. + +## Deferred Streaming Edit Work + +The repo already has enough evidence for a later streaming path: + +- `@mariozechner/pi-ai` emits `toolcall_start`, `toolcall_delta`, and + `toolcall_end` provider events. +- `@mariozechner/pi-agent-core` forwards those as `message_update` while the + assistant message is streaming. +- `packages/agents-runtime/src/pi-adapter.ts` currently only handles + `text_delta` in `message_update`. +- `packages/agents-runtime/src/outbound-bridge.ts` currently persists tool calls + only at `tool_execution_start` and final completion. + +Later streaming options: + +1. Surface tool argument deltas through the outbound bridge and persist partial + args in the `toolCalls` collection. +2. Add a streaming document insertion tool whose string argument can be consumed + incrementally. +3. Or add a runtime-level text routing mode. This is more invasive and should + remain separate from the MVP. + +This plan intentionally chooses non-streaming exact replacements first because +it avoids changing agent execution semantics. + +## Open Questions + +Resolve these inside the single PR before enabling the feature: + +- Should document tools be enabled for all built-in agents by default, or only + for Horton initially? +- Should workers be able to receive document tools by name in their spawn args? +- Should document stream write permission be tied to entity `manage`, entity + `write`, or a new permission? +- Should document ids remain stable across forks, or be suffixed like shared + state ids? + +## Single PR Implementation Phases + +Implement this as one PR. The phases below are sequencing for development and +review inside the branch, not separate merge boundaries. The PR should not be +merged with document creation/editing enabled until schema, server API, +auth, forking, tools, UI, presence, and tests are all complete. + +### Phase 0: Provider Path Spike + +Goal: remove uncertainty before changing product code. + +Tasks: + +- Inspect the installed `@durable-streams/y-durable-streams` package. +- Confirm the exact document request URLs for: + - snapshot discovery. + - snapshot load. + - live update reads. + - local edit writes. +- Confirm the exact awareness request URLs and methods. +- Capture helper names and URL examples in code comments/tests, not as + free-floating assumptions. + +Exit criteria: + +- The implementation has concrete helpers for document and awareness path + recognition. +- Route tests use real provider-shaped URLs. + +### Phase 1: Schema and Shared Types + +Tasks: + +- Add `ManifestDocumentEntryValue`. +- Extend the manifest schema union. +- Export document manifest types. +- Add `manifestDocumentKey(id)`. +- Add shared document path helpers. +- Update UI manifest parsing/types. + +Exit criteria: + +- Existing entity streams still load. +- A synthetic document manifest row parses in runtime and UI tests. + +### Phase 2: Server Document Service + +Tasks: + +- Add document id/title validation. +- Add create/get/read/write/edit document methods. +- Store initial markdown as `Y.Text('markdown')`. +- Return unified diffs from write/edit operations. +- Add entity API routes and `RuntimeServerClient` methods. +- Keep document writes server-mediated for MVP. + +Exit criteria: + +- Server tests can create, read, write, and exact-replace a markdown doc. +- Edit errors match the file edit tool behavior for missing/ambiguous strings. + +### Phase 3: Auth and Fork Safety + +Tasks: + +- Authorize `/docs/agents/...` document paths. +- Authorize the matching y-durable-streams awareness paths. +- Reject unauthorized document reads/writes/presence. +- Lock document streams during fork work. +- Clone document streams during fork. +- Remap `streamPath`, `docPath`, and `docId` in forked manifest entries. + +Exit criteria: + +- Unauthorized users cannot read/write doc streams or awareness streams. +- Forked entities edit independent document streams. +- Pointer forks include only document manifests visible at the fork point. + +### Phase 4: Runtime Tools + +Tasks: + +- Add document methods and `principal` to `createElectricTools` context. +- Add `create_markdown_doc`, `read_markdown_doc`, `write_markdown_doc`, and + `edit_markdown_doc`. +- Maintain a per-wake read set. +- Require read/write/create before exact edit in the same wake. +- Add document tools to the built-in electric tool bundle. +- Decide and document Worker exposure in the same PR. + +Exit criteria: + +- Horton can create/read/write/edit a doc through tools. +- Tool results include `details.diff`. +- Tool behavior mirrors file tools closely enough that the existing tool UI is + usable. + +### Phase 5: UI Manifest and Editor + +Tasks: + +- Add document manifest row rendering. +- Add open action using `viewId: 'markdown-doc'` and + `viewParams: { doc }`. +- Register the `markdown-doc` entity view. +- Add CodeMirror markdown editor bound to `ydoc.getText('markdown')`. +- Pass auth/principal headers to `YjsProvider`. +- Handle missing/invalid doc ids and provider errors. +- Destroy CodeMirror/Yjs resources on unmount. + +Exit criteria: + +- Clicking a document manifest opens the editor. +- Two editor tiles/windows can edit the same doc concurrently. +- Agent tool edits appear in open editors. + +### Phase 6: Presence + +Tasks: + +- Set user awareness from `useCurrentPrincipal()`. +- Render presence states in the editor. +- Set agent awareness while document tools are running. +- Show agent status and cursor/edit location for replacements. + +Exit criteria: + +- Users see other active users in the document. +- Users see agent presence while an agent edit tool is applying a change. + +### Phase 7: Verification + +Tasks: + +- Run runtime tests. +- Run server tests. +- Run UI tests. +- Run package typechecks. +- Manually verify the desktop flow: + - agent creates a document. + - manifest entry appears. + - user opens it in a tile. + - user edits it. + - agent edits it with exact replacement. + - two windows/tiles see concurrent updates and presence. +- forked entity receives an independent document. + +Suggested commands after `pnpm install` from repo root: + +```sh +pnpm --filter @electric-ax/agents-runtime test +pnpm --filter @electric-ax/agents-server test +pnpm --filter @electric-ax/agents-server-ui test +pnpm --filter @electric-ax/agents-runtime typecheck +pnpm --filter @electric-ax/agents-server typecheck +pnpm --filter @electric-ax/agents-server-ui typecheck +``` + +Streaming edits should be a later design/implementation after the single PR +lands and the non-streaming collaborative document workflow is stable. + +## Current Implementation Status + +Implemented in `samwillis/agents-markdown-docs`: + +- Strict document manifest metadata: + - `provider: 'y-durable-streams'` + - `docId` + - `docPath` + - `streamPath` + - `transportMimeType: 'application/vnd.electric-agents.markdown-yjs'` + - `contentMimeType: 'text/markdown'` + - `yTextName: 'markdown'` +- Server-mediated create/read/write/edit document API backed by framed Yjs + updates. +- Runtime markdown document tools with file-like read-before-edit behavior and + diff details. +- Public Yjs document routes and private backing stream routes guarded by entity + permissions, including awareness streams. +- Fork handling for document update streams and remapped `docPath`, `docId`, + and `streamPath` manifest fields. +- CodeMirror markdown editor view backed by `YjsProvider` and `Y.Text`. +- Manifest and context-drawer open/split-right actions for document entries. +- User awareness in the editor and server-published agent awareness around + create/write/edit tools, including status and cursor/edit range. +- Focused runtime, server, and UI tests plus package typechecks for the touched + packages. + +Deferred: + +- Snapshot discovery/compaction implementation beyond the current MVP redirect + behavior. +- Token-by-token/streaming document edits. + +Manual verification note: + +- Automated tests cover the new server, runtime, fork, auth, and UI helper + behavior. A full desktop two-window/two-tile manual pass still needs a running + agents server plus desktop UI; the in-app browser check against the only + detected local UI port was blocked by browser policy after the tab crashed. diff --git a/examples/yjs/package.json b/examples/yjs/package.json index b85c16f412..282d8bf401 100644 --- a/examples/yjs/package.json +++ b/examples/yjs/package.json @@ -23,8 +23,8 @@ }, "dependencies": { "@codemirror/lang-javascript": "^6.2.2", - "@codemirror/state": "^6.4.1", - "@codemirror/view": "^6.32.0", + "@codemirror/state": "^6.6.0", + "@codemirror/view": "^6.43.0", "@electric-sql/y-electric": "workspace:*", "@hono/node-server": "^1.8.2", "codemirror": "^6.0.1", diff --git a/packages/agents-desktop/package.json b/packages/agents-desktop/package.json index ac64b0fa52..ad6f38c74b 100644 --- a/packages/agents-desktop/package.json +++ b/packages/agents-desktop/package.json @@ -27,6 +27,7 @@ "typecheck": "tsc --noEmit" }, "dependencies": { + "@electric-ax/agents-runtime": "workspace:*", "@electric-sql/client": "^1.5.21", "@mixmark-io/domino": "^2.2.0", "better-sqlite3": "^12.9.0", diff --git a/packages/agents-desktop/src/app/controller.ts b/packages/agents-desktop/src/app/controller.ts index 1560564d4b..0eb1f96f7b 100644 --- a/packages/agents-desktop/src/app/controller.ts +++ b/packages/agents-desktop/src/app/controller.ts @@ -11,6 +11,7 @@ import * as DesktopIpc from '../ipc/register' import { ensureRuntimeEntry as ensureRuntimeEntryInStore } from '../runtime/entries' import { createRuntimeController } from '../runtime/controller' import * as SettingsBootstrap from '../settings/bootstrap' +import * as RealtimeSettings from '../settings/realtime' import * as ServerSelection from '../settings/selection' import { saveDesktopSettings } from '../settings/store' import { desktopStateForWindow as desktopStateForWindowImpl } from '../state/desktop-state' @@ -30,6 +31,7 @@ import type { DesktopMenuSection, DesktopMenuState, DesktopState, + RealtimeSettings as RealtimeSettingsConfig, RuntimeEntry, ServerConfig, } from '../shared/types' @@ -328,6 +330,20 @@ export function createDesktopMainController(ctx: DesktopAppContext) { runtime.refreshPowerSaveBlocker() } + const getRealtimeSettingsStatus = async () => + await RealtimeSettings.realtimeSettingsStatus({ + settings, + apiKeys, + launchEnv: ctx.envApiKeysSnapshot, + }) + + const setRealtimeSettings = async ( + next: RealtimeSettingsConfig + ): Promise => { + settings.realtime = RealtimeSettings.normalizeRealtimeSettings(next) + await saveSettings() + } + const syncLaunchAtLoginSetting = async (): Promise => { await LoginItems.setLaunchAtLogin(settings.launchAtLogin === true) } @@ -438,6 +454,8 @@ export function createDesktopMainController(ctx: DesktopAppContext) { setLaunchAtLogin, getPreventAppSuspension, setPreventAppSuspension, + getRealtimeSettingsStatus, + setRealtimeSettings, } const loadSettings = (): Promise => diff --git a/packages/agents-desktop/src/ipc/preferences.ts b/packages/agents-desktop/src/ipc/preferences.ts index cfd50bab3f..abb98d51cc 100644 --- a/packages/agents-desktop/src/ipc/preferences.ts +++ b/packages/agents-desktop/src/ipc/preferences.ts @@ -2,6 +2,8 @@ import { ipcMain } from 'electron' import type { LaunchAtLoginStatus, PreventAppSuspensionPreference, + RealtimeSettings, + RealtimeSettingsStatus, } from '../shared/types' export type PreferencesIpcDeps = { @@ -9,6 +11,10 @@ export type PreferencesIpcDeps = { setLaunchAtLogin: (enabled: boolean) => Promise getPreventAppSuspension: () => PreventAppSuspensionPreference setPreventAppSuspension: (enabled: boolean) => Promise + getRealtimeSettingsStatus: () => + | RealtimeSettingsStatus + | Promise + setRealtimeSettings: (settings: RealtimeSettings) => Promise } export function registerPreferencesIpcHandlers(deps: PreferencesIpcDeps): void { @@ -25,4 +31,11 @@ export function registerPreferencesIpcHandlers(deps: PreferencesIpcDeps): void { `desktop:set-prevent-app-suspension`, (_event, enabled: boolean) => deps.setPreventAppSuspension(Boolean(enabled)) ) + ipcMain.handle(`desktop:get-realtime-settings`, () => + deps.getRealtimeSettingsStatus() + ) + ipcMain.handle( + `desktop:set-realtime-settings`, + (_event, settings: RealtimeSettings) => deps.setRealtimeSettings(settings) + ) } diff --git a/packages/agents-desktop/src/preload.ts b/packages/agents-desktop/src/preload.ts index 82c437a935..af780edb47 100644 --- a/packages/agents-desktop/src/preload.ts +++ b/packages/agents-desktop/src/preload.ts @@ -21,6 +21,8 @@ import type { McpServerConfig, OnboardingState, PreventAppSuspensionPreference, + RealtimeSettings, + RealtimeSettingsStatus, ServerConfig, } from './shared/types' import type { CloudAgentServersState } from './cloud/cloud-agent-servers' @@ -190,6 +192,10 @@ const api = { ipcRenderer.invoke(`desktop:get-prevent-app-suspension`), setPreventAppSuspension: (enabled: boolean): Promise => ipcRenderer.invoke(`desktop:set-prevent-app-suspension`, enabled), + getRealtimeSettings: (): Promise => + ipcRenderer.invoke(`desktop:get-realtime-settings`), + setRealtimeSettings: (settings: RealtimeSettings): Promise => + ipcRenderer.invoke(`desktop:set-realtime-settings`, settings), getWorkingDirectory: (): Promise => ipcRenderer.invoke(`desktop:get-working-directory`), chooseWorkingDirectory: (): Promise => diff --git a/packages/agents-desktop/src/settings/realtime.ts b/packages/agents-desktop/src/settings/realtime.ts new file mode 100644 index 0000000000..23456255b4 --- /dev/null +++ b/packages/agents-desktop/src/settings/realtime.ts @@ -0,0 +1,145 @@ +import { createHash } from 'node:crypto' +import type { + ApiKeys, + DesktopSettings, + RealtimeCredentialStatus, + RealtimeSettings, + RealtimeSettingsStatus, +} from '../shared/types' +import { + DEFAULT_OPENAI_REALTIME_MODEL, + DEFAULT_OPENAI_REALTIME_REASONING_EFFORT, + DEFAULT_OPENAI_REALTIME_VOICE, + OPENAI_REALTIME_MODELS, + OPENAI_REALTIME_REASONING_EFFORTS, + OPENAI_REALTIME_VOICES, + isOpenAIRealtimeModel, + isOpenAIRealtimeReasoningEffort, + isOpenAIRealtimeVoice, +} from '@electric-ax/agents-runtime' + +export const DEFAULT_REALTIME_SETTINGS: RealtimeSettings = { + provider: `openai`, + model: DEFAULT_OPENAI_REALTIME_MODEL, + voice: DEFAULT_OPENAI_REALTIME_VOICE, + reasoningEffort: DEFAULT_OPENAI_REALTIME_REASONING_EFFORT, + interruptResponse: true, +} + +const OPENAI_REALTIME_VALIDATION_TTL_MS = 5 * 60 * 1000 + +type RealtimeCredentialValidation = { + openAIApiKeyStatus: RealtimeCredentialStatus + openAIApiKeyError?: string +} + +const validationCache = new Map< + string, + { expiresAt: number; result: RealtimeCredentialValidation } +>() + +export function normalizeRealtimeSettings(value: unknown): RealtimeSettings { + if (!value || typeof value !== `object`) return DEFAULT_REALTIME_SETTINGS + const maybe = value as Partial> + return { + provider: `openai`, + model: isOpenAIRealtimeModel(maybe.model) + ? maybe.model + : DEFAULT_REALTIME_SETTINGS.model, + voice: isOpenAIRealtimeVoice(maybe.voice) + ? maybe.voice + : DEFAULT_REALTIME_SETTINGS.voice, + reasoningEffort: isOpenAIRealtimeReasoningEffort(maybe.reasoningEffort) + ? maybe.reasoningEffort + : DEFAULT_REALTIME_SETTINGS.reasoningEffort, + interruptResponse: + typeof maybe.interruptResponse === `boolean` + ? maybe.interruptResponse + : DEFAULT_REALTIME_SETTINGS.interruptResponse, + } +} + +function validationCacheKey(apiKey: string, model: string): string { + const keyHash = createHash(`sha256`).update(apiKey).digest(`hex`) + return `${keyHash}:${model}` +} + +async function validateOpenAIRealtimeApiKey( + apiKey: string | null | undefined, + model: string +): Promise { + if (!apiKey) { + return { openAIApiKeyStatus: `missing` } + } + + const cacheKey = validationCacheKey(apiKey, model) + const cached = validationCache.get(cacheKey) + if (cached && cached.expiresAt > Date.now()) return cached.result + + let result: RealtimeCredentialValidation + try { + const response = await fetch( + `https://api.openai.com/v1/models/${encodeURIComponent(model)}`, + { + headers: { Authorization: `Bearer ${apiKey}` }, + } + ) + if (response.ok) { + result = { openAIApiKeyStatus: `valid` } + } else if ( + response.status === 401 || + response.status === 403 || + response.status === 404 + ) { + result = { + openAIApiKeyStatus: `invalid`, + openAIApiKeyError: + response.status === 404 + ? `OpenAI API key cannot access ${model}.` + : `OpenAI API key was rejected (${response.status}).`, + } + } else { + result = { + openAIApiKeyStatus: `unknown`, + openAIApiKeyError: `OpenAI credential check failed (${response.status}).`, + } + } + } catch (error) { + result = { + openAIApiKeyStatus: `unknown`, + openAIApiKeyError: error instanceof Error ? error.message : String(error), + } + } + + validationCache.set(cacheKey, { + expiresAt: Date.now() + OPENAI_REALTIME_VALIDATION_TTL_MS, + result, + }) + return result +} + +export async function realtimeSettingsStatus({ + settings, + apiKeys, + launchEnv, +}: { + settings: DesktopSettings + apiKeys: ApiKeys + launchEnv: ApiKeys +}): Promise { + const normalized = normalizeRealtimeSettings(settings.realtime) + const apiKey = apiKeys.openai || launchEnv.openai + const validation = await validateOpenAIRealtimeApiKey( + apiKey, + normalized.model + ) + return { + settings: normalized, + availableModels: [...OPENAI_REALTIME_MODELS], + availableVoices: [...OPENAI_REALTIME_VOICES], + availableReasoningEfforts: [...OPENAI_REALTIME_REASONING_EFFORTS], + hasOpenAIApiKey: Boolean(apiKey), + ...validation, + codexEnabled: settings.codex?.enabled === true, + } +} diff --git a/packages/agents-desktop/src/settings/store.ts b/packages/agents-desktop/src/settings/store.ts index 49a938ec98..4519a34392 100644 --- a/packages/agents-desktop/src/settings/store.ts +++ b/packages/agents-desktop/src/settings/store.ts @@ -17,11 +17,15 @@ import { saveApiKeysToSecret, } from '../credentials/api-keys' import { normalizeEnabledModelValues } from '../credentials/model-picker' +import { + DEFAULT_REALTIME_SETTINGS, + normalizeRealtimeSettings, +} from './realtime' import { normalizeServer, normalizeServers } from './servers' export { settingsPath } from '../shared/paths' -export const SETTINGS_VERSION = 2 +export const SETTINGS_VERSION = 3 export const DEFAULT_SETTINGS: DesktopSettings = { servers: [], @@ -31,6 +35,7 @@ export const DEFAULT_SETTINGS: DesktopSettings = { launchAtLogin: false, preventAppSuspension: true, codex: { enabled: false, source: null }, + realtime: DEFAULT_REALTIME_SETTINGS, } export function normalizeCodexSettings(value: unknown): CodexSettings { @@ -165,6 +170,7 @@ export async function loadDesktopSettings( preventAppSuspension: parsed.preventAppSuspension !== false, onboardingDismissed: parsed.onboardingDismissed === true, codex: normalizeCodexSettings(parsed.codex), + realtime: normalizeRealtimeSettings(parsed.realtime), enabledModelValues: enabledModelValues.length > 0 ? enabledModelValues : undefined, mcp: normalizeMcp(parsed.mcp), diff --git a/packages/agents-desktop/src/shared/types.ts b/packages/agents-desktop/src/shared/types.ts index efce0dd4b4..6023e66121 100644 --- a/packages/agents-desktop/src/shared/types.ts +++ b/packages/agents-desktop/src/shared/types.ts @@ -4,6 +4,12 @@ import type { McpServerConfig, RegistrySnapshot, } from '@electric-ax/agents' +import type { + OpenAIRealtimeReasoningEffort, + RealtimeModelChoice, + RealtimeReasoningEffortChoice, + RealtimeVoiceChoice, +} from '@electric-ax/agents-runtime' export type ServerSource = `manual` | `local-discovery` | `electric-cloud` export type ServerDesiredState = `connected` | `disconnected` @@ -122,6 +128,33 @@ export type CodexSettings = { source: CodexAuthSource | null } +export type RealtimeProvider = `openai` + +export type RealtimeSettings = { + provider: RealtimeProvider + model: string + voice: string + reasoningEffort: OpenAIRealtimeReasoningEffort + interruptResponse: boolean +} + +export type RealtimeCredentialStatus = + | `missing` + | `valid` + | `invalid` + | `unknown` + +export type RealtimeSettingsStatus = { + settings: RealtimeSettings + availableModels: Array + availableVoices: Array + availableReasoningEfforts: Array + hasOpenAIApiKey: boolean + openAIApiKeyStatus: RealtimeCredentialStatus + openAIApiKeyError?: string + codexEnabled: boolean +} + export type DesktopSettings = { servers: Array defaultServerId: string | null @@ -131,6 +164,7 @@ export type DesktopSettings = { preventAppSuspension?: boolean codex?: CodexSettings enabledModelValues?: Array + realtime?: RealtimeSettings onboardingDismissed?: boolean mcp?: { servers: Array } seededDefaultMcpServerNames?: Array diff --git a/packages/agents-runtime/package.json b/packages/agents-runtime/package.json index a12caa5275..690a17d2f0 100644 --- a/packages/agents-runtime/package.json +++ b/packages/agents-runtime/package.json @@ -110,6 +110,7 @@ "@anthropic-ai/sdk": "^0.78.0", "@durable-streams/client": "^0.2.6", "@durable-streams/state": "^0.3.1", + "@durable-streams/y-durable-streams": "0.2.7", "@electric-ax/agents-mcp": "workspace:*", "@mariozechner/pi-agent-core": "^0.70.2", "@mariozechner/pi-ai": "^0.70.2", @@ -121,11 +122,14 @@ "cron-parser": "^5.5.0", "diff": "^9.0.0", "jsdom": "^28.1.0", + "lib0": "^0.2.99", "pino": "^10.3.1", "pino-pretty": "^13.0.0", "turndown": "^7.2.2", "turndown-plugin-gfm": "^1.0.2", "xstate": "^5.32.0", + "y-protocols": "^1.0.6", + "yjs": "^13.6.26", "zod": "^4.3.6", "zod-to-json-schema": "^3.25.2" }, diff --git a/packages/agents-runtime/src/agents-client.ts b/packages/agents-runtime/src/agents-client.ts index d8995024ac..7c89927838 100644 --- a/packages/agents-runtime/src/agents-client.ts +++ b/packages/agents-runtime/src/agents-client.ts @@ -4,6 +4,10 @@ import { normalizeObservationSchema } from './observation-schema' import { createRuntimeServerClient } from './runtime-server-client' import { appendPathToUrl } from './url' import type { EntitySignal } from './runtime-server-client' +import type { + RealtimeSessionStartResult, + StartRealtimeSessionOptions, +} from './runtime-server-client' import type { EntitiesObservationSource, EntityObservationSource, @@ -32,6 +36,9 @@ export interface AgentsClient { payload?: unknown }) => Promise<{ txid: number }> kill: (entityUrl: string, reason?: string) => Promise<{ txid: number }> + startRealtimeSession: ( + options: StartRealtimeSessionOptions + ) => Promise } export function createAgentsClient(config: AgentsClientConfig): AgentsClient { @@ -45,6 +52,8 @@ export function createAgentsClient(config: AgentsClientConfig): AgentsClient { signal: `SIGKILL`, reason, }), + startRealtimeSession: (options) => + serverClient.startRealtimeSession(options), async observe(source) { if (source.sourceType === `entity`) { const info = await serverClient.getEntity( diff --git a/packages/agents-runtime/src/client.ts b/packages/agents-runtime/src/client.ts index 55306cc03a..95b4f3404e 100644 --- a/packages/agents-runtime/src/client.ts +++ b/packages/agents-runtime/src/client.ts @@ -12,6 +12,24 @@ export { normalizeTimelineEntities, TIMELINE_ORDER_FALLBACK, } from './entity-timeline' +export { + DEFAULT_OPENAI_REALTIME_MODEL, + DEFAULT_OPENAI_REALTIME_REASONING_EFFORT, + DEFAULT_OPENAI_REALTIME_VOICE, + OPENAI_REALTIME_MODELS, + OPENAI_REALTIME_REASONING_EFFORTS, + OPENAI_REALTIME_VOICES, + isOpenAIRealtimeModel, + isOpenAIRealtimeReasoningEffort, + isOpenAIRealtimeVoice, +} from './realtime-options' +export type { + OpenAIRealtimeReasoningEffort, + RealtimeModelChoice, + RealtimeProviderId, + RealtimeReasoningEffortChoice, + RealtimeVoiceChoice, +} from './realtime-options' export { canonicalPgSyncOptions, db, @@ -29,7 +47,10 @@ export { export { appendPathToUrl } from './url' export { getEntityAttachmentStreamPath, + getEntityMarkdownDocumentPath, + getEntityMarkdownDocumentUrlPath, manifestAttachmentKey, + manifestMarkdownDocumentKey, } from './manifest-helpers' export { buildSections, buildTimelineEntries } from './use-chat' export { @@ -47,6 +68,7 @@ export { export { isGoalCommandText, parseGoalCommand } from './goal-command' export { formatTokenCount } from './token-budget' export type { GoalCommand } from './goal-command' +export { MARKDOWN_DOCUMENT_AGENT_PRESENCE_TTL_MS } from './markdown-document-constants' export type { EntityStreamDB, @@ -60,6 +82,11 @@ export type { SlashCommandTrigger, } from './composer-input' export type { AgentsClient, AgentsClientConfig } from './agents-client' +export type { + RealtimeAudioOptions, + RealtimeSessionStartResult, + StartRealtimeSessionOptions, +} from './runtime-server-client' export type { AttachmentRole, AttachmentStatus, @@ -68,6 +95,7 @@ export type { GoalStatus, Manifest, ManifestAttachmentEntry, + ManifestDocumentEntry, ManifestGoalEntry, } from './entity-schema' export type { diff --git a/packages/agents-runtime/src/context-factory.ts b/packages/agents-runtime/src/context-factory.ts index e255096f8c..3e457ff02b 100644 --- a/packages/agents-runtime/src/context-factory.ts +++ b/packages/agents-runtime/src/context-factory.ts @@ -1,4 +1,5 @@ import { queryOnce } from '@durable-streams/state/db' +import { DurableStream } from '@durable-streams/client' import { assembleContext } from './context-assembly' import { createContextEntriesApi } from './context-entries' import { entityStateSchema } from './entity-schema' @@ -11,6 +12,7 @@ import { } from './outbound-bridge' import { createPiAgentAdapter } from './pi-adapter' import { + defaultProjection, timelineMessages as runtimeTimelineMessages, timelineToMessages, } from './timeline-context' @@ -18,6 +20,7 @@ import { getCronStreamPath } from './cron-utils' import { runtimeLog } from './log' import { sliceChars } from './token-budget' import { createContextTools } from './tools/context-tools' +import { appendPathToUrl } from './url' import { CACHE_TIERS } from './types' import { composeToolsWithProviders } from './tool-providers' import { validateSlashCommandDefinitions } from './composer-input' @@ -47,8 +50,17 @@ import type { HandlerWake, LLMMessage, ManifestAttachmentEntry, + ManifestDocumentEntry, + ManifestRealtimeSessionEntry, ObservationHandle, ObservationSource, + RealtimeAudioConfig, + RealtimeAudioFormat, + RealtimeConfig, + RealtimeHandle, + RealtimeProviderEvent, + RealtimeProviderSession, + RealtimeRunResult, RunHandle, SendResult, SharedStateHandle, @@ -58,9 +70,20 @@ import type { UseContextConfig, Wake, WakeEvent, + WakeMessage, WakeSession, } from './types' +const REALTIME_MIN_INPUT_COMMIT_BYTES = 4_800 +const REALTIME_SESSION_SOFT_LIMIT_MS = 55 * 60 * 1000 +const REALTIME_AUDIO_SPAN_MAX_MS = 500 +const REALTIME_PCM16_BYTES_PER_SAMPLE = 2 +const REALTIME_DEFAULT_AUDIO_FORMAT: RealtimeAudioFormat = { + codec: `pcm16`, + sampleRate: 24_000, + channels: 1, +} + function agentModelId(model: AgentModel): string { return typeof model === `string` ? model : model.id } @@ -71,6 +94,582 @@ function agentModelProvider(config: AgentConfig): string { : config.model.provider } +function isRealtimeSessionManifest( + entry: unknown +): entry is ManifestRealtimeSessionEntry { + return ( + typeof entry === `object` && + entry !== null && + (entry as { kind?: unknown }).kind === `realtime-session` && + typeof (entry as { id?: unknown }).id === `string` + ) +} + +function realtimeManifestIsActive( + entry: ManifestRealtimeSessionEntry +): boolean { + return entry.status === `requested` || entry.status === `active` +} + +function getToolName(tool: AgentTool): string | null { + const name = (tool as { name?: unknown }).name + return typeof name === `string` ? name : null +} + +function applyRealtimeToolPolicy( + tools: Array, + policy: RealtimeConfig[`toolPolicy`] +): Array { + if (!policy) return tools + const allowed = new Set([...(policy.direct ?? []), ...(policy.confirm ?? [])]) + if (allowed.size === 0) return [] + return tools.filter((tool) => { + const name = getToolName(tool) + return name != null && allowed.has(name) + }) +} + +type RealtimeStreamConfig = NonNullable +type RealtimeControlInput = + | { type: `input_text`; text: string } + | { type: `input_audio.commit`; afterAudioBytes?: number } + | { type: `response.cancel` } + | { type: `output_audio.truncate`; itemId: string; audioEndMs: number } + | { type: `session.close`; reason?: string } +type RealtimeStreamIo = { + writeProviderEvent: (event: RealtimeProviderEvent) => Promise + close: () => Promise +} +type RealtimeAudioSpanDraft = { + stream: `input` | `output` + seq: number + producerId: string + producerEpoch: number + byteStart: number + byteEnd: number + sampleStart: number + sampleCount: number + sampleRate: number + channels: number + timingSource: `runtime` | `provider` + createdAt: string + capturedAt?: string + receivedAt?: string + participantId?: string + providerItemId?: string + responseId?: string +} + +function trackRealtimeAppend( + pending: Set>, + append: Promise, + onError: (error: unknown) => void +): void { + let tracked: Promise + tracked = append.catch(onError).finally(() => { + pending.delete(tracked) + }) + pending.add(tracked) +} + +function isRealtimeControlInput(value: unknown): value is RealtimeControlInput { + if (!value || typeof value !== `object`) return false + const type = (value as { type?: unknown }).type + if (type === `output_audio.truncate`) { + return ( + typeof (value as { itemId?: unknown }).itemId === `string` && + typeof (value as { audioEndMs?: unknown }).audioEndMs === `number` + ) + } + if (type === `input_audio.commit`) { + const afterAudioBytes = (value as { afterAudioBytes?: unknown }) + .afterAudioBytes + return ( + afterAudioBytes === undefined || + (typeof afterAudioBytes === `number` && + Number.isFinite(afterAudioBytes) && + afterAudioBytes >= 0) + ) + } + if (type === `input_text`) { + return typeof (value as { text?: unknown }).text === `string` + } + return type === `response.cancel` || type === `session.close` +} + +function realtimeDurableStream( + streams: RealtimeStreamConfig, + path: string, + contentType: string +): DurableStream { + return new DurableStream({ + url: appendPathToUrl(streams.baseUrl, path), + headers: streams.headers, + contentType, + batching: true, + }) +} + +function jsonBytes(value: unknown): Uint8Array { + return new TextEncoder().encode(JSON.stringify(value)) +} + +function realtimeControlOutput(event: RealtimeProviderEvent): unknown { + if (event.type !== `output_audio.delta`) return event + return { + type: event.type, + responseId: event.responseId, + itemId: event.itemId, + byteLength: event.audio.byteLength, + } +} + +function useManualRealtimeInputCommits( + audio: RealtimeAudioConfig | undefined +): boolean { + return audio?.turnDetection === false || audio?.turnDetection?.type === `none` +} + +function realtimeByteOffset(byte: number): string { + return `byte:${byte}` +} + +function realtimeAudioFrameBytes(format: RealtimeAudioFormat): number { + return REALTIME_PCM16_BYTES_PER_SAMPLE * format.channels +} + +function realtimeAudioSamples( + byteLength: number, + format: RealtimeAudioFormat +): number { + return Math.floor(byteLength / realtimeAudioFrameBytes(format)) +} + +function createRealtimeStreamIo( + config: HandlerContextConfig, + session: ManifestRealtimeSessionEntry | undefined, + providerSession: RealtimeProviderSession, + audio: RealtimeAudioConfig | undefined +): RealtimeStreamIo | undefined { + if (!config.realtimeStreams || !session) return undefined + + const logPrefix = `[agent-runtime]` + const abort = new AbortController() + const abortFromRun = (): void => abort.abort() + if (config.runSignal?.aborted) { + abort.abort() + } else { + config.runSignal?.addEventListener(`abort`, abortFromRun, { once: true }) + } + + const audioIn = realtimeDurableStream( + config.realtimeStreams, + session.streams.audio_in, + `audio/pcm` + ) + const audioOut = realtimeDurableStream( + config.realtimeStreams, + session.streams.audio_out, + `audio/pcm` + ) + const controlIn = realtimeDurableStream( + config.realtimeStreams, + session.streams.control_in, + `application/json` + ) + const controlOut = realtimeDurableStream( + config.realtimeStreams, + session.streams.control_out, + `application/json` + ) + const tasks: Array> = [] + let audioInChunks = 0 + let audioInBytes = 0 + let committedAudioInBytes = 0 + let controlInCommands = 0 + let audioOutChunks = 0 + let audioOutBytes = 0 + let controlOutEvents = 0 + const pendingOutputAppends = new Set>() + const pendingInputCommits: Array<{ afterAudioBytes?: number }> = [] + const pendingAudioChunks: Array<{ + start: number + end: number + data: Uint8Array + }> = [] + const inputAudioFormat = audio?.inputFormat ?? REALTIME_DEFAULT_AUDIO_FORMAT + const outputAudioFormat = audio?.outputFormat ?? REALTIME_DEFAULT_AUDIO_FORMAT + const audioSpanDrafts: Partial< + Record<`input` | `output`, RealtimeAudioSpanDraft> + > = {} + let inputAudioSpanSeq = 0 + let outputAudioSpanSeq = 0 + let processingInputCommits = false + const manualInputCommits = useManualRealtimeInputCommits(audio) + + const trackOutputAppend = (append: Promise, label: string): void => { + trackRealtimeAppend(pendingOutputAppends, append, (error) => { + if (!abort.signal.aborted) { + runtimeLog.warn(logPrefix, `${label}:`, error) + } + }) + } + + const flushAudioSpan = (stream: `input` | `output`): void => { + const draft = audioSpanDrafts[stream] + if (!draft || draft.byteEnd <= draft.byteStart) return + audioSpanDrafts[stream] = undefined + config.writeEvent( + entityStateSchema.realtimeAudioSpans.insert({ + key: `realtime-audio-span:${session.id}:${stream}:${draft.seq}`, + value: { + session_id: session.id, + stream, + producer_id: draft.producerId, + producer_epoch: draft.producerEpoch, + seq: draft.seq, + offset: realtimeByteOffset(draft.byteStart), + next_offset: realtimeByteOffset(draft.byteEnd), + byte_start: draft.byteStart, + byte_end: draft.byteEnd, + byte_length: draft.byteEnd - draft.byteStart, + sample_start: draft.sampleStart, + sample_count: draft.sampleCount, + sample_rate: draft.sampleRate, + channels: draft.channels, + codec: `pcm16`, + timing_source: draft.timingSource, + created_at: draft.createdAt, + ...(draft.capturedAt ? { captured_at: draft.capturedAt } : {}), + ...(draft.receivedAt ? { received_at: draft.receivedAt } : {}), + ...(draft.participantId + ? { participant_id: draft.participantId } + : {}), + ...(draft.providerItemId + ? { provider_item_id: draft.providerItemId } + : {}), + ...(draft.responseId ? { response_id: draft.responseId } : {}), + } as never, + }) as ChangeEvent + ) + } + + const appendAudioSpan = (input: { + stream: `input` | `output` + byteStart: number + byteLength: number + format: RealtimeAudioFormat + producerId: string + timingSource: `runtime` | `provider` + capturedAt?: string + receivedAt?: string + participantId?: string + providerItemId?: string + responseId?: string + }): void => { + if (input.byteLength <= 0) return + const frameBytes = realtimeAudioFrameBytes(input.format) + const byteEnd = input.byteStart + input.byteLength + const sampleStart = Math.floor(input.byteStart / frameBytes) + const sampleCount = realtimeAudioSamples(input.byteLength, input.format) + const maxSampleCount = Math.max( + 1, + Math.floor((input.format.sampleRate * REALTIME_AUDIO_SPAN_MAX_MS) / 1000) + ) + const draft = audioSpanDrafts[input.stream] + const compatible = + draft && + draft.producerId === input.producerId && + draft.timingSource === input.timingSource && + draft.participantId === input.participantId && + draft.providerItemId === input.providerItemId && + draft.responseId === input.responseId && + draft.byteEnd === input.byteStart && + draft.sampleRate === input.format.sampleRate && + draft.channels === input.format.channels && + draft.sampleCount + sampleCount <= maxSampleCount + + if (compatible) { + draft.byteEnd = byteEnd + draft.sampleCount += sampleCount + draft.receivedAt = input.receivedAt ?? draft.receivedAt + return + } + + flushAudioSpan(input.stream) + const seq = + input.stream === `input` ? inputAudioSpanSeq++ : outputAudioSpanSeq++ + audioSpanDrafts[input.stream] = { + stream: input.stream, + seq, + producerId: input.producerId, + producerEpoch: config.epoch, + byteStart: input.byteStart, + byteEnd, + sampleStart, + sampleCount, + sampleRate: input.format.sampleRate, + channels: input.format.channels, + timingSource: input.timingSource, + createdAt: new Date().toISOString(), + capturedAt: input.capturedAt, + receivedAt: input.receivedAt, + participantId: input.participantId, + providerItemId: input.providerItemId, + responseId: input.responseId, + } + } + + const discardCommittedAudioChunks = (): void => { + while ( + pendingAudioChunks.length > 0 && + pendingAudioChunks[0]!.end <= committedAudioInBytes + ) { + pendingAudioChunks.shift() + } + } + + const appendAudioRangeToProvider = async ( + start: number, + end: number + ): Promise => { + if (!providerSession.appendInputAudio) return + for (const chunk of pendingAudioChunks) { + if (chunk.end <= start) continue + if (chunk.start >= end) break + const sliceStart = Math.max(0, start - chunk.start) + const sliceEnd = Math.min(chunk.data.byteLength, end - chunk.start) + if (sliceEnd <= sliceStart) continue + await providerSession.appendInputAudio( + chunk.data.subarray(sliceStart, sliceEnd) + ) + } + } + + const processPendingInputCommits = async (): Promise => { + if (processingInputCommits) return + processingInputCommits = true + try { + while (pendingInputCommits.length > 0) { + const command = pendingInputCommits[0]! + const commitAudioBytes = command.afterAudioBytes ?? audioInBytes + if (audioInBytes < commitAudioBytes) return + + pendingInputCommits.shift() + if (commitAudioBytes <= committedAudioInBytes) { + runtimeLog.info( + logPrefix, + `realtime input_audio.commit ignored session=${session.id} audioInBytes=${audioInBytes} committedAudioInBytes=${committedAudioInBytes} commitAudioBytes=${commitAudioBytes}` + ) + continue + } + + const pendingAudioBytes = commitAudioBytes - committedAudioInBytes + if (pendingAudioBytes < REALTIME_MIN_INPUT_COMMIT_BYTES) { + runtimeLog.info( + logPrefix, + `realtime input_audio.commit skipped session=${session.id} audioInBytes=${audioInBytes} committedAudioInBytes=${committedAudioInBytes} commitAudioBytes=${commitAudioBytes}` + ) + await providerSession.clearInputAudio?.() + committedAudioInBytes = commitAudioBytes + discardCommittedAudioChunks() + continue + } + + await appendAudioRangeToProvider( + committedAudioInBytes, + commitAudioBytes + ) + await providerSession.commitInputAudio?.() + committedAudioInBytes = commitAudioBytes + discardCommittedAudioChunks() + } + } finally { + processingInputCommits = false + } + } + + runtimeLog.info( + logPrefix, + `realtime stream bridge starting session=${session.id} inputMode=${manualInputCommits ? `manual-commit` : `provider-vad`} audioIn=${session.streams.audio_in} audioOut=${session.streams.audio_out}` + ) + + if (providerSession.appendInputAudio) { + tasks.push( + (async () => { + const response = await audioIn.stream({ + live: true, + signal: abort.signal, + warnOnHttp: false, + }) + try { + for await (const chunk of response.bodyStream()) { + if (abort.signal.aborted) break + const nextChunkCount = audioInChunks + 1 + if (nextChunkCount === 1) { + runtimeLog.info( + logPrefix, + `realtime audio/in first chunk session=${session.id} bytes=${chunk.byteLength}` + ) + } + const start = audioInBytes + audioInChunks = nextChunkCount + audioInBytes += chunk.byteLength + appendAudioSpan({ + stream: `input`, + byteStart: start, + byteLength: chunk.byteLength, + format: inputAudioFormat, + producerId: session.streams.audio_in, + timingSource: `runtime`, + participantId: `user`, + receivedAt: new Date().toISOString(), + }) + if (manualInputCommits) { + pendingAudioChunks.push({ + start, + end: start + chunk.byteLength, + data: chunk, + }) + await processPendingInputCommits() + } else { + await providerSession.appendInputAudio?.(chunk) + } + } + } finally { + response.cancel() + } + })().catch((error) => { + if (!abort.signal.aborted) { + runtimeLog.warn( + `[agent-runtime] realtime audio/in pump failed:`, + error + ) + } + }) + ) + } + + tasks.push( + (async () => { + const response = await controlIn.stream({ + live: true, + signal: abort.signal, + json: true, + warnOnHttp: false, + }) + try { + for await (const command of response.jsonStream()) { + if (abort.signal.aborted || !isRealtimeControlInput(command)) { + continue + } + controlInCommands += 1 + if (controlInCommands === 1) { + runtimeLog.info( + logPrefix, + `realtime control/in first command session=${session.id} type=${command.type}` + ) + } + switch (command.type) { + case `input_text`: + await providerSession.sendText?.(command.text) + break + case `input_audio.commit`: + if (manualInputCommits) { + pendingInputCommits.push({ + afterAudioBytes: command.afterAudioBytes, + }) + await processPendingInputCommits() + } else { + runtimeLog.info( + logPrefix, + `realtime input_audio.commit ignored in provider-vad mode session=${session.id}` + ) + } + break + case `response.cancel`: + await providerSession.cancelResponse?.() + break + case `output_audio.truncate`: + await providerSession.truncateOutputAudio?.({ + itemId: command.itemId, + audioEndMs: command.audioEndMs, + }) + break + case `session.close`: + await providerSession.close?.(command.reason) + abort.abort() + break + } + } + } finally { + response.cancel() + } + })().catch((error) => { + if (!abort.signal.aborted) { + runtimeLog.warn( + `[agent-runtime] realtime control/in pump failed:`, + error + ) + } + }) + ) + + return { + async writeProviderEvent(event) { + controlOutEvents += 1 + if (controlOutEvents === 1) { + runtimeLog.info( + logPrefix, + `realtime provider first event session=${session.id} type=${event.type}` + ) + } + if (event.type === `output_audio.delta`) { + const byteStart = audioOutBytes + audioOutChunks += 1 + audioOutBytes += event.audio.byteLength + if (audioOutChunks === 1) { + runtimeLog.info( + logPrefix, + `realtime audio/out first chunk session=${session.id} bytes=${event.audio.byteLength}` + ) + } + appendAudioSpan({ + stream: `output`, + byteStart, + byteLength: event.audio.byteLength, + format: outputAudioFormat, + producerId: session.streams.audio_out, + timingSource: `provider`, + participantId: `assistant`, + providerItemId: event.itemId, + responseId: event.responseId, + receivedAt: new Date().toISOString(), + }) + trackOutputAppend( + audioOut.append(event.audio), + `realtime audio/out append failed` + ) + } + trackOutputAppend( + controlOut.append(jsonBytes(realtimeControlOutput(event))), + `realtime control/out append failed` + ) + }, + async close() { + abort.abort() + config.runSignal?.removeEventListener(`abort`, abortFromRun) + await Promise.allSettled([...tasks, ...pendingOutputAppends]) + flushAudioSpan(`input`) + flushAudioSpan(`output`) + runtimeLog.info( + logPrefix, + `realtime stream bridge closed session=${session.id} audioInChunks=${audioInChunks} audioInBytes=${audioInBytes} controlInCommands=${controlInCommands} providerEvents=${controlOutEvents} audioOutChunks=${audioOutChunks} audioOutBytes=${audioOutBytes}` + ) + }, + } +} + const MAX_HYDRATED_IMAGE_ATTACHMENTS = 4 const MAX_HYDRATED_IMAGE_ATTACHMENT_BYTES = 10 * 1024 * 1024 @@ -102,6 +701,18 @@ export interface HandlerContextConfig { }) => void | Promise ) => void hydratedWebhookSourceWake?: HydratedWebhookSourceWake | null + realtimeStreams?: { + baseUrl: string + headers?: Record + } + registerLiveWakeHandler?: ( + handler: (wake: { + wakeEvent: WakeEvent + wakeOffset: string + ackOffset: string + events: Array + }) => boolean | Promise + ) => () => void doObserve: ( source: ObservationSource, wake?: Wake @@ -159,6 +770,10 @@ function asMessageText(value: unknown): string { return typeof value === `string` ? value : JSON.stringify(value ?? ``) } +function isRecord(value: unknown): value is Record { + return typeof value === `object` && value !== null && !Array.isArray(value) +} + function missingContextToolData(message: string): Promise { return Promise.reject(new Error(message)) } @@ -194,6 +809,126 @@ function getCronScheduleTriggerPayload( return undefined } +function isMarkdownDocumentManifestEntry( + value: unknown +): value is ManifestDocumentEntry { + if (!value || typeof value !== `object`) return false + const entry = value as Partial + return ( + entry.kind === `document` && + typeof entry.id === `string` && + entry.provider === `y-durable-streams` && + typeof entry.docPath === `string` && + typeof entry.streamPath === `string` && + entry.transportMimeType === + `application/vnd.electric-agents.markdown-yjs` && + entry.contentMimeType === `text/markdown` && + entry.yTextName === `markdown` && + typeof entry.title === `string` + ) +} + +function markdownDocumentSourceEntity( + document: ManifestDocumentEntry +): string | undefined { + const meta = document.meta + return meta && typeof meta === `object` && `sourceEntityUrl` in meta + ? typeof meta.sourceEntityUrl === `string` + ? meta.sourceEntityUrl + : undefined + : undefined +} + +function markdownDocumentRefsForWakeSource( + db: Pick, + sourceUrl: string +): Array { + const docsById = new Map() + for (const entry of db.collections.manifests.toArray as Array) { + if (!isMarkdownDocumentManifestEntry(entry)) continue + if (markdownDocumentSourceEntity(entry) !== sourceUrl) continue + docsById.set(entry.id, entry) + } + return [...docsById.values()] +} + +function renderMarkdownDocumentWakeRefs( + db: Pick, + wakeEvent: WakeEvent +): string | null { + if (wakeEvent.type !== `wake`) { + return null + } + const documentsByPath = new Map() + for (const source of wakeSourceUrls(wakeEvent)) { + for (const document of markdownDocumentRefsForWakeSource(db, source)) { + documentsByPath.set(document.docPath, document) + } + } + const documents = [...documentsByPath.values()] + if (documents.length === 0) return null + + return [ + `Collaborative markdown documents now available from this entity's manifest:`, + ...documents.map((document) => { + return `- ${document.title} (id: ${document.id})` + }), + `Use read_markdown_doc with the document id to inspect the current content before editing.`, + ].join(`\n`) +} + +function wakeBatchMessages(wakeEvent: WakeEvent): Array { + const payload = wakeEvent.payload + if (!isRecord(payload) || payload.type !== `wake_batch`) { + return [] + } + return Array.isArray(payload.wakes) + ? payload.wakes.filter(isRecord).map((wake) => wake as WakeMessage) + : [] +} + +function wakeSourceUrls(wakeEvent: WakeEvent): Array { + const sources = new Set() + if (typeof wakeEvent.source === `string`) { + sources.add(wakeEvent.source) + } + + const payload = wakeEvent.payload + if (isRecord(payload) && payload.type === `wake_batch`) { + if (Array.isArray(payload.sources)) { + for (const source of payload.sources) { + if (typeof source === `string`) sources.add(source) + } + } + for (const wake of wakeBatchMessages(wakeEvent)) { + if (typeof wake.source === `string`) sources.add(wake.source) + } + } + + return [...sources] +} + +function renderWakeBatchText(wakeEvent: WakeEvent): string | null { + const wakes = wakeBatchMessages(wakeEvent) + if (wakes.length === 0) return null + + return [ + `Batched wake notification with ${wakes.length} child/source updates:`, + ...wakes.map((wake, index) => { + const finishedChild = wake.finished_child + if (finishedChild) { + const status = finishedChild.run_status ?? `completed` + const detail = + finishedChild.response ?? + finishedChild.error ?? + asMessageText({ changes: wake.changes }) + return `- ${finishedChild.url} (${status}): ${detail}` + } + return `- ${wake.source ?? `source ${index + 1}`}: ${asMessageText(wake)}` + }), + ].join(`\n`) +} + function getTriggerMessageText( db: Pick, wakeEvent: WakeEvent, @@ -243,7 +978,13 @@ function getTriggerMessageText( } } - return asMessageText({ + const batchText = renderWakeBatchText(wakeEvent) + if (batchText) { + const documentRefs = renderMarkdownDocumentWakeRefs(db, wakeEvent) + return documentRefs ? `${batchText}\n\n${documentRefs}` : batchText + } + + const text = asMessageText({ type: wakeEvent.type, source: wakeEvent.source, payload: wakeEvent.payload, @@ -253,6 +994,8 @@ function getTriggerMessageText( toOffset: wakeEvent.toOffset, eventCount: wakeEvent.eventCount, }) + const documentRefs = renderMarkdownDocumentWakeRefs(db, wakeEvent) + return documentRefs ? `${text}\n\n${documentRefs}` : text } function combineAbortSignals(a: AbortSignal, b: AbortSignal): AbortSignal { @@ -476,6 +1219,8 @@ export function createHandlerContext( ): HandlerContextResult { let sleepRequested = false let agentConfig: AgentConfig | null = null + let realtimeConfig: RealtimeConfig | null = null + let activeRealtimeProviderSession: RealtimeProviderSession | null = null let useContextConfig: UseContextConfig | null = null let useContextHash = `` let useContextRegistrations = 0 @@ -542,6 +1287,85 @@ export function createHandlerContext( }, } + function realtimeSessions(): Array { + const sessions: Array = [] + for (const entry of config.db.collections.manifests.toArray) { + if (isRealtimeSessionManifest(entry)) { + sessions.push(entry) + } + } + return sessions.sort((a, b) => a.startedAt.localeCompare(b.startedAt)) + } + + function activeRealtimeSession(): ManifestRealtimeSessionEntry | undefined { + return realtimeSessions().filter(realtimeManifestIsActive).at(-1) + } + + async function updateRealtimeSessionStatus( + session: ManifestRealtimeSessionEntry | undefined, + status: `active` | `closed` | `failed`, + opts: { reason?: string; error?: string } = {} + ): Promise { + if (!session) return + + const key = session.key ?? `realtime-session:${session.id}` + const terminal = status === `closed` || status === `failed` + const endedAt = terminal ? new Date().toISOString() : session.endedAt + const meta = { + ...(session.meta ?? {}), + ...(opts.reason ? { reason: opts.reason } : {}), + ...(opts.error ? { error: opts.error } : {}), + } + + const nextSession: ManifestRealtimeSessionEntry = { + key, + kind: `realtime-session`, + id: session.id, + provider: session.provider, + model: session.model, + ...(session.voice ? { voice: session.voice } : {}), + ...(session.reasoningEffort + ? { reasoningEffort: session.reasoningEffort } + : {}), + ...(typeof session.interruptResponse === `boolean` + ? { interruptResponse: session.interruptResponse } + : {}), + status, + startedAt: session.startedAt, + endedAt: endedAt ?? null, + streams: session.streams, + retention: `forever`, + ...(Object.keys(meta).length > 0 ? { meta } : {}), + } + + config.wakeSession.registerManifestEntry(nextSession) + config.writeEvent( + entityStateSchema.realtimeSessions.update({ + key, + value: { + session_id: session.id, + provider: session.provider, + model: session.model, + ...(session.voice ? { voice: session.voice } : {}), + ...(session.reasoningEffort + ? { reasoning_effort: session.reasoningEffort } + : {}), + ...(typeof session.interruptResponse === `boolean` + ? { interrupt_response: session.interruptResponse } + : {}), + status, + started_at: session.startedAt, + ...(endedAt ? { ended_at: endedAt } : {}), + streams: session.streams, + ...(opts.reason ? { reason: opts.reason } : {}), + ...(opts.error ? { error: opts.error } : {}), + ...(Object.keys(meta).length > 0 ? { meta } : {}), + } as never, + }) as ChangeEvent + ) + await config.wakeSession.commitManifestEntries() + } + function structuralHash(nextConfig: UseContextConfig): string { const sources = Object.entries(nextConfig.sources) .sort(([leftName], [rightName]) => leftName.localeCompare(rightName)) @@ -950,6 +1774,849 @@ export function createHandlerContext( }, } + const realtimeHandle: RealtimeHandle = { + async run(): Promise { + if (!realtimeConfig) { + throw new Error( + `[agent-runtime] realtime.run() called without useRealtime().` + ) + } + + if (config.prepareAgentRun) { + await config.prepareAgentRun() + } + + const activeRealtimeConfig = realtimeConfig + const bridge = createOutboundBridge( + await loadOutboundIdSeed(config.db), + config.writeEvent + ) + const startedAt = Date.now() + let textStarted = false + let currentToolCall: + | { toolCallId: string; name: string; args: unknown } + | undefined + const realtimeSession = activeRealtimeSession() + + const endText = (): void => { + if (!textStarted) return + bridge.onTextEnd() + textStarted = false + } + + const emitText = (delta: string): void => { + if (delta.length === 0) return + if (!textStarted) { + bridge.onTextStart() + textStarted = true + } + bridge.onTextDelta(delta) + } + + const transcriptTextByKey = new Map() + const transcriptCreatedAtByKey = new Map() + const transcriptDeltaSeqByKey = new Map() + const transcriptFallbackIds = new Map<`input` | `output`, string>() + const inputTranscriptKeyByTurnId = new Map() + const outputTranscriptKeyByResponseId = new Map() + const outputTranscriptKeysByResponseId = new Map>() + const outputTranscriptSegmentByResponseId = new Map() + const outputTranscriptSourceByKey = new Map() + let transcriptFallbackCounter = 0 + let pendingInputTranscriptKey: string | undefined + let activeOutputTranscript: + | { key: string; responseId?: string } + | undefined + let providerSessionId = realtimeSession?.id + + const currentTranscriptSessionId = (): string => + realtimeSession?.id ?? providerSessionId ?? `ephemeral` + + const transcriptKey = ( + direction: `input` | `output`, + id?: string + ): string => { + let stableId = id + if (!stableId) { + stableId = transcriptFallbackIds.get(direction) + if (!stableId) { + stableId = `fallback-${transcriptFallbackCounter}` + transcriptFallbackCounter += 1 + transcriptFallbackIds.set(direction, stableId) + } + } + return `realtime-transcript:${currentTranscriptSessionId()}:${direction}:${stableId}` + } + + const inputTranscriptKey = (turnId?: string): string => { + if (turnId) { + const existing = inputTranscriptKeyByTurnId.get(turnId) + if (existing) return existing + if (pendingInputTranscriptKey) { + inputTranscriptKeyByTurnId.set(turnId, pendingInputTranscriptKey) + return pendingInputTranscriptKey + } + const key = transcriptKey(`input`, turnId) + inputTranscriptKeyByTurnId.set(turnId, key) + return key + } + const key = pendingInputTranscriptKey ?? transcriptKey(`input`) + pendingInputTranscriptKey = key + return key + } + + const trackOutputTranscriptKey = ( + responseId: string | undefined, + key: string + ): void => { + activeOutputTranscript = { key, responseId } + if (!responseId) return + const keys = outputTranscriptKeysByResponseId.get(responseId) ?? [] + if (!keys.includes(key)) { + keys.push(key) + outputTranscriptKeysByResponseId.set(responseId, keys) + } + } + + const outputTranscriptKey = (responseId?: string): string => { + if (responseId) { + const existing = outputTranscriptKeyByResponseId.get(responseId) + if (existing) return existing + const key = transcriptKey(`output`, responseId) + outputTranscriptKeyByResponseId.set(responseId, key) + trackOutputTranscriptKey(responseId, key) + return key + } + const key = activeOutputTranscript?.responseId + ? transcriptKey(`output`) + : (activeOutputTranscript?.key ?? transcriptKey(`output`)) + trackOutputTranscriptKey(undefined, key) + return key + } + + const rotateActiveOutputTranscript = (): void => { + const active = activeOutputTranscript + if (!active) return + const text = transcriptTextByKey.get(active.key) ?? `` + if (text.length === 0) return + + if (active.responseId) { + const nextSegment = + (outputTranscriptSegmentByResponseId.get(active.responseId) ?? 0) + + 1 + outputTranscriptSegmentByResponseId.set( + active.responseId, + nextSegment + ) + const key = transcriptKey( + `output`, + `${active.responseId}:segment-${nextSegment}` + ) + outputTranscriptKeyByResponseId.set(active.responseId, key) + trackOutputTranscriptKey(active.responseId, key) + return + } + + transcriptFallbackIds.delete(`output`) + activeOutputTranscript = undefined + } + + const outputTranscriptSourceRank = (source: string): number => { + switch (source) { + case `response.output_audio_transcript`: + return 3 + case `response.audio_transcript`: + return 2 + case `response.output_text`: + return 1 + default: + return 0 + } + } + + const outputTranscriptSourceKey = (input: { + responseId?: string + itemId?: string + contentIndex?: number + }): string | undefined => { + if (input.responseId) { + return `${input.responseId}:${input.itemId ?? ``}:${input.contentIndex ?? 0}` + } + if (input.itemId) { + return `${input.itemId}:${input.contentIndex ?? 0}` + } + return undefined + } + + const resetOutputTranscriptText = ( + responseId: string | undefined + ): void => { + const keys = responseId + ? (outputTranscriptKeysByResponseId.get(responseId) ?? []) + : activeOutputTranscript + ? [activeOutputTranscript.key] + : [] + for (const key of keys) { + transcriptTextByKey.set(key, ``) + deleteRealtimeTranscriptDeltas(key) + } + } + + const shouldUseOutputTranscriptSource = (input: { + responseId?: string + itemId?: string + contentIndex?: number + transcriptSource?: string + }): boolean => { + if (!input.transcriptSource) return true + const key = outputTranscriptSourceKey(input) + if (!key) return true + const existing = outputTranscriptSourceByKey.get(key) + if (!existing) { + outputTranscriptSourceByKey.set(key, input.transcriptSource) + return true + } + if (existing === input.transcriptSource) return true + if ( + outputTranscriptSourceRank(input.transcriptSource) > + outputTranscriptSourceRank(existing) + ) { + outputTranscriptSourceByKey.set(key, input.transcriptSource) + resetOutputTranscriptText(input.responseId) + return true + } + return false + } + + const writeRealtimeTranscript = (input: { + direction: `input` | `output` + key: string + text: string + status: `partial` | `final` + turnId?: string + responseId?: string + allowEmpty?: boolean + }): void => { + const collection = config.db.collections.realtimeTranscripts + if ( + input.text.length === 0 && + !input.allowEmpty && + !collection.has(input.key) + ) { + return + } + + const existing = collection.get(input.key) as + | { created_at?: string } + | undefined + const createdAt = + transcriptCreatedAtByKey.get(input.key) ?? + existing?.created_at ?? + new Date().toISOString() + transcriptCreatedAtByKey.set(input.key, createdAt) + + const value = { + session_id: currentTranscriptSessionId(), + direction: input.direction, + text: input.text, + status: input.status, + audio_stream: input.direction, + ...(input.turnId ? { turn_id: input.turnId } : {}), + ...(input.responseId ? { response_id: input.responseId } : {}), + created_at: createdAt, + } + config.writeEvent( + (collection.has(input.key) + ? entityStateSchema.realtimeTranscripts.update({ + key: input.key, + value: value as never, + }) + : entityStateSchema.realtimeTranscripts.insert({ + key: input.key, + value: value as never, + })) as ChangeEvent + ) + + emitRealtimeTranscript(input) + } + + const emitRealtimeTranscript = (input: { + direction: `input` | `output` + key: string + text: string + status: `partial` | `final` + turnId?: string + responseId?: string + }): void => { + const onTranscript = activeRealtimeConfig.onTranscript + if (!onTranscript) return + void Promise.resolve( + onTranscript({ + key: input.key, + sessionId: currentTranscriptSessionId(), + direction: input.direction, + text: input.text, + status: input.status, + ...(input.turnId ? { turnId: input.turnId } : {}), + ...(input.responseId ? { responseId: input.responseId } : {}), + }) + ).catch((error) => { + runtimeLog.warn( + `[agent-runtime]`, + `realtime transcript callback failed:`, + error + ) + }) + } + + const writeRealtimeTranscriptDelta = (input: { + key: string + delta: string + }): void => { + if (input.delta.length === 0) return + const seq = transcriptDeltaSeqByKey.get(input.key) ?? 0 + transcriptDeltaSeqByKey.set(input.key, seq + 1) + config.writeEvent( + entityStateSchema.textDeltas.insert({ + key: `${input.key}:delta-${seq}`, + value: { + text_id: input.key, + realtime_transcript_id: input.key, + delta: input.delta, + } as never, + }) as ChangeEvent + ) + } + + const deleteRealtimeTranscriptDeltas = (key: string): void => { + const deltaCount = transcriptDeltaSeqByKey.get(key) ?? 0 + for (let index = 0; index < deltaCount; index += 1) { + config.writeEvent( + entityStateSchema.textDeltas.delete({ + key: `${key}:delta-${index}`, + }) as ChangeEvent + ) + } + transcriptDeltaSeqByKey.set(key, 0) + } + + const reconcileRealtimeTranscriptDeltas = ( + key: string, + finalText: string + ): void => { + const currentText = transcriptTextByKey.get(key) ?? `` + if (finalText === currentText) return + if (finalText.startsWith(currentText)) { + writeRealtimeTranscriptDelta({ + key, + delta: finalText.slice(currentText.length), + }) + return + } + deleteRealtimeTranscriptDeltas(key) + writeRealtimeTranscriptDelta({ key, delta: finalText }) + } + + const beginRealtimeTranscript = (input: { + direction: `input` | `output` + turnId?: string + responseId?: string + }): void => { + const key = + input.direction === `input` + ? inputTranscriptKey(input.turnId) + : outputTranscriptKey(input.responseId) + const existing = config.db.collections.realtimeTranscripts.get(key) as + | { text?: string } + | undefined + const text = transcriptTextByKey.get(key) ?? existing?.text ?? `` + transcriptTextByKey.set(key, text) + writeRealtimeTranscript({ + direction: input.direction, + key, + text, + status: `partial`, + turnId: input.turnId, + responseId: input.responseId, + allowEmpty: true, + }) + } + + const appendRealtimeTranscript = (input: { + direction: `input` | `output` + delta: string + turnId?: string + responseId?: string + itemId?: string + contentIndex?: number + transcriptSource?: string + }): void => { + if (input.delta.length === 0) return + if ( + input.direction === `output` && + !shouldUseOutputTranscriptSource(input) + ) { + return + } + const key = + input.direction === `input` + ? inputTranscriptKey(input.turnId) + : outputTranscriptKey(input.responseId) + const text = `${transcriptTextByKey.get(key) ?? ``}${input.delta}` + transcriptTextByKey.set(key, text) + if (!config.db.collections.realtimeTranscripts.has(key)) { + writeRealtimeTranscript({ + direction: input.direction, + key, + text: ``, + status: `partial`, + turnId: input.turnId, + responseId: input.responseId, + allowEmpty: true, + }) + } + writeRealtimeTranscriptDelta({ key, delta: input.delta }) + emitRealtimeTranscript({ + direction: input.direction, + key, + text, + status: `partial`, + turnId: input.turnId, + responseId: input.responseId, + }) + } + + const completeRealtimeTranscript = (input: { + direction: `input` | `output` + text?: string + turnId?: string + responseId?: string + }): void => { + const key = + input.direction === `input` + ? inputTranscriptKey(input.turnId) + : outputTranscriptKey(input.responseId) + const text = input.text ?? transcriptTextByKey.get(key) ?? `` + reconcileRealtimeTranscriptDeltas(key, text) + transcriptTextByKey.set(key, text) + writeRealtimeTranscript({ + direction: input.direction, + key, + text, + status: `final`, + turnId: input.turnId, + responseId: input.responseId, + }) + if ( + (input.direction === `input` && !input.turnId) || + (input.direction === `output` && !input.responseId) + ) { + transcriptFallbackIds.delete(input.direction) + } + if (input.direction === `input` && pendingInputTranscriptKey === key) { + pendingInputTranscriptKey = undefined + if (input.turnId) { + transcriptFallbackIds.delete(`input`) + } + } + } + + const completeOutputTranscript = (input: { + text?: string + responseId?: string + itemId?: string + contentIndex?: number + transcriptSource?: string + }): void => { + if (!shouldUseOutputTranscriptSource(input)) return + const existingKeys = input.responseId + ? outputTranscriptKeysByResponseId.get(input.responseId) + : activeOutputTranscript + ? [activeOutputTranscript.key] + : undefined + const keys = + existingKeys && existingKeys.length > 0 + ? existingKeys + : [outputTranscriptKey(input.responseId)] + + for (const [index, key] of keys.entries()) { + const existing = config.db.collections.realtimeTranscripts.get( + key + ) as { text?: string } | undefined + const text = + keys.length === 1 && input.text !== undefined + ? input.text + : (transcriptTextByKey.get(key) ?? + existing?.text ?? + (index === keys.length - 1 ? (input.text ?? ``) : ``)) + reconcileRealtimeTranscriptDeltas(key, text) + transcriptTextByKey.set(key, text) + writeRealtimeTranscript({ + direction: `output`, + key, + text, + status: `final`, + responseId: input.responseId, + }) + } + + if (!input.responseId) { + transcriptFallbackIds.delete(`output`) + } + if ( + activeOutputTranscript && + activeOutputTranscript.responseId === input.responseId + ) { + activeOutputTranscript = undefined + } + } + + const composedTools = (await composeToolsWithProviders( + activeRealtimeConfig.tools ?? [] + )) as Array + const providerTools = applyRealtimeToolPolicy( + composedTools, + activeRealtimeConfig.toolPolicy + ) + const activeRealtimeSessionId = realtimeSession?.id + let realtimeCloseReason: string | undefined + const messages = + activeRealtimeConfig.context?.includeTimeline === false + ? [] + : await hydrateAttachmentBlocks( + runtimeTimelineMessages(config.db, { + projection: (item) => { + if ( + item.kind === `realtime_transcript` && + item.sessionId === activeRealtimeSessionId + ) { + return null + } + return defaultProjection(item) + }, + }).map(({ at: _at, ...message }) => message as LLMMessage) + ) + let realtimeIo: RealtimeStreamIo | undefined + let realtimeSessionTerminalWritten = false + let realtimeSessionLimitTimer: ReturnType | undefined + let unregisterLiveWakeHandler: (() => void) | undefined + let checkpointQueue: Promise = Promise.resolve() + + const checkpointRealtimeRun = async (): Promise => { + checkpointQueue = checkpointQueue.then(async () => { + await config.prepareAgentRun?.() + }) + await checkpointQueue + } + + const liveWakeText = (wake: { + wakeEvent: WakeEvent + wakeOffset: string + events: Array + }): string => { + const text = getTriggerMessageText( + config.db, + wake.wakeEvent, + wake.events, + wake.wakeOffset + ) + if (wake.wakeEvent.type === `inbox`) { + return text + } + return [ + `A live Electric Agents notification arrived while this realtime session is active.`, + `Treat it as fresh background context, not as user speech. If it changes what the user should know, respond briefly.`, + text, + ].join(`\n\n`) + } + + async function handleProviderEvent( + event: RealtimeProviderEvent + ): Promise { + switch (event.type) { + case `session.started`: + providerSessionId = + realtimeSession?.id ?? event.sessionId ?? providerSessionId + break + + case `session.updated`: + case `output_audio.delta`: + case `output_audio.completed`: + case `response.started`: + case `response.cancelled`: + break + + case `input_audio.speech_started`: + rotateActiveOutputTranscript() + beginRealtimeTranscript({ + direction: `input`, + turnId: event.turnId, + }) + break + + case `input_audio.speech_stopped`: + if (event.turnId || pendingInputTranscriptKey) { + beginRealtimeTranscript({ + direction: `input`, + turnId: event.turnId, + }) + } + break + + case `input_audio.committed`: + beginRealtimeTranscript({ + direction: `input`, + turnId: event.turnId, + }) + break + + case `input_transcript.delta`: + appendRealtimeTranscript({ + direction: `input`, + delta: event.delta, + turnId: event.turnId, + }) + break + + case `input_transcript.completed`: + completeRealtimeTranscript({ + direction: `input`, + text: event.text, + turnId: event.turnId, + }) + break + + case `session.closed`: + realtimeCloseReason = event.reason + endText() + break + + case `response.completed`: + endText() + break + + case `session.error`: + if (event.code === `response_cancel_not_active`) { + runtimeLog.warn( + `[agent-runtime]`, + `realtime provider ignored inactive response cancellation: ${event.error}` + ) + break + } + if ( + event.code === `invalid_value` && + event.error.includes(`Audio content`) && + event.error.includes(`already shorter than`) + ) { + runtimeLog.warn( + `[agent-runtime]`, + `realtime provider ignored stale output audio truncate: ${event.error}` + ) + break + } + throw new Error( + `[agent-runtime] realtime provider error${event.code ? ` ${event.code}` : ``}: ${event.error}` + ) + + case `output_transcript.delta`: + appendRealtimeTranscript({ + direction: `output`, + delta: event.delta, + responseId: event.responseId, + itemId: event.itemId, + contentIndex: event.contentIndex, + transcriptSource: event.transcriptSource, + }) + break + + case `output_transcript.completed`: + completeOutputTranscript({ + text: event.text, + responseId: event.responseId, + itemId: event.itemId, + contentIndex: event.contentIndex, + transcriptSource: event.transcriptSource, + }) + break + + case `tool_call.started`: + currentToolCall = { + toolCallId: event.toolCallId, + name: event.name, + args: event.args, + } + if (event.args !== undefined) { + bridge.onToolCallStart(event.toolCallId, event.name, event.args) + } + break + + case `tool_call.arguments_delta`: + break + + case `tool_call.arguments_completed`: + currentToolCall = { + toolCallId: event.toolCallId, + name: event.name, + args: event.args, + } + bridge.onToolCallStart(event.toolCallId, event.name, event.args) + break + + case `tool_call.completed`: { + if (currentToolCall?.toolCallId !== event.toolCallId) { + bridge.onToolCallStart(event.toolCallId, event.name, {}) + } + bridge.onToolCallEnd( + event.toolCallId, + event.name, + event.result, + event.isError ?? false + ) + await checkpointRealtimeRun() + break + } + } + } + + try { + bridge.onRunStart() + bridge.onStepStart({ + modelProvider: activeRealtimeConfig.provider.id, + modelId: activeRealtimeConfig.provider.model, + }) + + if (activeRealtimeConfig.testResponses) { + const messageText = getTriggerMessageText( + config.db, + config.wakeEvent, + config.events, + config.wakeOffset, + config.hydratedWebhookSourceWake + ) + const responses = activeRealtimeConfig.testResponses + if (Array.isArray(responses)) { + const priorRunCount = ( + await queryOnce((q) => + q.from({ runs: config.db.collections.runs }) + ) + ).length + emitText( + responses[priorRunCount % Math.max(responses.length, 1)] ?? `` + ) + } else { + const response = await responses(messageText, bridge) + if (response !== undefined) emitText(response) + } + endText() + } else { + activeRealtimeProviderSession = + await activeRealtimeConfig.provider.connect({ + systemPrompt: activeRealtimeConfig.systemPrompt, + messages, + tools: providerTools, + audio: activeRealtimeConfig.audio, + session: realtimeSession, + signal: config.runSignal, + }) + realtimeSessionLimitTimer = setTimeout(() => { + runtimeLog.info( + `[agent-runtime]`, + `realtime session soft limit reached session=${realtimeSession?.id ?? `ephemeral`}` + ) + void activeRealtimeProviderSession?.close?.( + `session-duration-limit` + ) + }, REALTIME_SESSION_SOFT_LIMIT_MS) + await updateRealtimeSessionStatus(realtimeSession, `active`) + realtimeIo = createRealtimeStreamIo( + config, + realtimeSession, + activeRealtimeProviderSession, + activeRealtimeConfig.audio + ) + unregisterLiveWakeHandler = config.registerLiveWakeHandler?.( + async (wake) => { + if (config.runSignal?.aborted) { + return false + } + const providerSession = activeRealtimeProviderSession + if (!providerSession?.sendText) { + return false + } + await checkpointRealtimeRun() + await providerSession.sendText(liveWakeText(wake)) + await checkpointRealtimeRun() + return true + } + ) + + for await (const event of activeRealtimeProviderSession.events) { + if (config.runSignal?.aborted) { + break + } + await realtimeIo?.writeProviderEvent(event) + await handleProviderEvent(event) + } + } + + endText() + await updateRealtimeSessionStatus(realtimeSession, `closed`, { + reason: config.runSignal?.aborted + ? `aborted` + : (realtimeCloseReason ?? `completed`), + }) + realtimeSessionTerminalWritten = true + bridge.onStepEnd({ + finishReason: config.runSignal?.aborted ? `aborted` : `stop`, + durationMs: Date.now() - startedAt, + }) + bridge.onRunEnd({ + finishReason: config.runSignal?.aborted ? `aborted` : `stop`, + }) + } catch (error) { + endText() + if (!realtimeSessionTerminalWritten) { + await updateRealtimeSessionStatus(realtimeSession, `failed`, { + error: error instanceof Error ? error.message : String(error), + }) + realtimeSessionTerminalWritten = true + } + bridge.onStepEnd({ + finishReason: `error`, + durationMs: Date.now() - startedAt, + }) + bridge.onRunEnd({ finishReason: `error` }) + throw error + } finally { + unregisterLiveWakeHandler?.() + await checkpointQueue.catch(() => undefined) + if (realtimeSessionLimitTimer) { + clearTimeout(realtimeSessionLimitTimer) + } + await realtimeIo?.close() + activeRealtimeProviderSession = null + } + + return { + writes: [], + toolCalls: [], + usage: { tokens: 0, duration: Date.now() - startedAt }, + } + }, + async close(reason?: string): Promise { + await activeRealtimeProviderSession?.close?.(reason) + }, + async stop(reason?: string): Promise { + await this.close(reason) + }, + async cancelResponse(): Promise { + await activeRealtimeProviderSession?.cancelResponse?.() + }, + async sendText(text: string): Promise { + await activeRealtimeProviderSession?.sendText?.(text) + }, + } + const ctx: DebugHandlerContext = { firstWake: config.firstWake, wake: toHandlerWake(config.wakeEvent), @@ -970,6 +2637,10 @@ export function createHandlerContext( agentConfig = cfg return agent }, + useRealtime(cfg) { + realtimeConfig = cfg + return realtimeHandle + }, useContext(nextConfig) { assertValidUseContextConfig(nextConfig) const hash = structuralHash(nextConfig) @@ -995,6 +2666,10 @@ export function createHandlerContext( useContextRegistrations: () => useContextRegistrations, }, agent, + realtime: { + activeSession: activeRealtimeSession, + sessions: realtimeSessions, + }, observe: ((source: ObservationSource, opts?: { wake?: Wake }) => { return config.doObserve(source, opts?.wake) as Promise< ObservationHandle & EntityHandle & SharedStateHandle diff --git a/packages/agents-runtime/src/create-handler.ts b/packages/agents-runtime/src/create-handler.ts index 1aab7893ed..aae4a12e62 100644 --- a/packages/agents-runtime/src/create-handler.ts +++ b/packages/agents-runtime/src/create-handler.ts @@ -20,7 +20,10 @@ import type { AnyEntityDefinition, EntityStreamDBWithActions, HeadersProvider, + ManifestDocumentEntry, + MarkdownDocumentConnection, ProcessWakeConfig, + RuntimePrincipal, WakeNotification, WebhookNotification, } from './types' @@ -73,6 +76,7 @@ export interface RuntimeRouterConfig { createElectricTools?: (context: { entityUrl: string entityType: string + principal?: RuntimePrincipal args: Readonly> db: EntityStreamDBWithActions events: Array @@ -99,6 +103,27 @@ export interface RuntimeRouterConfig { unsubscribeFromWebhookSource: (opts: { id: string }) => Promise<{ txid: string }> + createMarkdownDocument: (opts: { + id?: string + title: string + meta?: Record + }) => Promise<{ txid: string; document: ManifestDocumentEntry }> + getMarkdownDocumentConnection: ( + streamPath: string + ) => Promise + readMarkdownDocumentStream: ( + streamPath: string, + opts?: { offset?: string } + ) => Promise<{ bytes: Uint8Array; offset?: string }> + appendMarkdownDocumentUpdate: ( + streamPath: string, + update: Uint8Array + ) => Promise<{ offset?: string }> + appendMarkdownDocumentAwareness: ( + streamPath: string, + update: Uint8Array + ) => Promise<{ offset?: string }> + registerCleanup: (cleanup: () => void | Promise) => void }) => Array | Promise> /** * Optional observer for background wake failures. Return true to mark the diff --git a/packages/agents-runtime/src/entity-schema.ts b/packages/agents-runtime/src/entity-schema.ts index d94b99eac9..10645e5c43 100644 --- a/packages/agents-runtime/src/entity-schema.ts +++ b/packages/agents-runtime/src/entity-schema.ts @@ -11,6 +11,7 @@ import type { } from '@standard-schema/spec' import type { SlashCommandRow } from './composer-input' import type { JsonValue } from './types' +import type { OpenAIRealtimeReasoningEffort } from './realtime-options' // ============================================================================ // Passthrough Schema Utility @@ -174,7 +175,8 @@ type TextValue = { type TextDeltaValue = { key?: string text_id: string - run_id: string + run_id?: string + realtime_transcript_id?: string delta: string } type ToolCallValue = { @@ -182,12 +184,31 @@ type ToolCallValue = { run_id?: string tool_call_id?: string tool_name: string - status: `started` | `args_complete` | `executing` | `completed` | `failed` + status: + | `started` + | `args_streaming` + | `args_complete` + | `executing` + | `completed` + | `failed` args?: unknown + args_preview?: unknown result?: unknown error?: string duration_ms?: number } +// Tool argument deltas intentionally mirror text deltas: every streamed chunk is +// retained for replay/inspection, while the final parsed args are still stored +// on the tool_call row for the normal result lifecycle. +type ToolArgDeltaValue = { + key?: string + tool_call_key: string + tool_call_id?: string + run_id?: string + seq: number + delta: string + content_index?: number +} type ReasoningValue = { key?: string run_id?: string @@ -339,6 +360,23 @@ type ManifestAttachmentEntryValue = { error?: string meta?: Record } +type ManifestDocumentEntryValue = { + key?: string + kind: `document` + id: string + provider: `y-durable-streams` + docId: string + docPath: string + streamPath: string + transportMimeType: `application/vnd.electric-agents.markdown-yjs` + contentMimeType: `text/markdown` + yTextName: `markdown` + title: string + createdAt: string + createdBy?: string + updatedAt?: string + meta?: Record +} type ContextEntryAttrsValue = Record type ManifestContextEntryValue = { key?: string @@ -394,6 +432,91 @@ type ManifestGoalEntryValue = { createdAt: string updatedAt: string } +type RealtimeSessionStatusValue = + | `requested` + | `active` + | `closing` + | `closed` + | `failed` +type RealtimeSessionStreamRefsValue = { + audio_in: string + audio_out: string + control_in: string + control_out: string +} +type ManifestRealtimeSessionEntryValue = { + key?: string + kind: `realtime-session` + id: string + provider: string + model: string + voice?: string + reasoningEffort?: OpenAIRealtimeReasoningEffort + interruptResponse?: boolean + status: RealtimeSessionStatusValue + startedAt: string + endedAt?: string | null + streams: RealtimeSessionStreamRefsValue + retention: `forever` + meta?: Record +} +type RealtimeSessionValue = { + key?: string + session_id: string + provider: string + model: string + voice?: string + reasoning_effort?: OpenAIRealtimeReasoningEffort + interrupt_response?: boolean + status: RealtimeSessionStatusValue + started_at: string + ended_at?: string + streams: RealtimeSessionStreamRefsValue + reason?: string + error?: string + meta?: Record +} +type RealtimeAudioSpanValue = { + key?: string + session_id: string + stream: `input` | `output` + producer_id: string + producer_epoch: number + seq: number + offset: string + next_offset?: string + byte_start?: number + byte_end?: number + byte_length: number + sample_start: number + sample_count: number + sample_rate: number + channels: number + codec: `pcm16` + timing_source: `client` | `runtime` | `provider` + captured_at?: string + received_at?: string + participant_id?: string + turn_id?: string + provider_item_id?: string + response_id?: string + created_at: string +} +type RealtimeTranscriptValue = { + key?: string + session_id: string + direction: `input` | `output` + text: string + status: `partial` | `final` + turn_id?: string + response_id?: string + audio_stream?: `input` | `output` + audio_offset?: string + audio_next_offset?: string + sample_start?: number + sample_end?: number + created_at: string +} type ReplayWatermarkValue = { key?: string source_id: string @@ -539,7 +662,8 @@ function createTextDeltaSchema(): Schema { key: z.string().optional(), ...timelineOrderField, text_id: z.string(), - run_id: z.string(), + run_id: z.string().optional(), + realtime_transcript_id: z.string().optional(), delta: z.string(), }) } @@ -553,18 +677,33 @@ function createToolCallSchema(): Schema { tool_name: z.string(), status: z.enum([ `started`, + `args_streaming`, `args_complete`, `executing`, `completed`, `failed`, ]), args: z.unknown().optional(), + args_preview: z.unknown().optional(), result: z.unknown().optional(), error: z.string().optional(), duration_ms: z.number().int().optional(), }) } +function createToolArgDeltaSchema(): Schema { + return z.object({ + key: z.string().optional(), + ...timelineOrderField, + tool_call_key: z.string(), + tool_call_id: z.string().optional(), + run_id: z.string().optional(), + seq: z.number().int(), + delta: z.string(), + content_index: z.number().int().optional(), + }) +} + function createReasoningSchema(): Schema { return z.object({ key: z.string().optional(), @@ -771,16 +910,32 @@ function createContextRemovedSchema(): Schema { timestamp: z.string(), }) } + +function createRealtimeSessionStreamRefsSchema(): Schema { + return z.object({ + audio_in: z.string(), + audio_out: z.string(), + control_in: z.string(), + control_out: z.string(), + }) +} + +function createRealtimeSessionStatusSchema() { + return z.enum([`requested`, `active`, `closing`, `closed`, `failed`]) +} + function createManifestSchema(): Schema< | ManifestChildEntryValue | ManifestSourceEntryValue | ManifestSharedStateEntryValue | ManifestEffectEntryValue | ManifestAttachmentEntryValue + | ManifestDocumentEntryValue | ManifestContextEntryValue | ManifestCronScheduleEntryValue | ManifestFutureSendScheduleEntryValue | ManifestGoalEntryValue + | ManifestRealtimeSessionEntryValue > { return z.union([ z.object({ @@ -843,6 +998,26 @@ function createManifestSchema(): Schema< error: z.string().optional(), meta: createAttachmentMetaSchema().optional(), }), + z.object({ + key: z.string().optional(), + ...timelineOrderField, + kind: z.literal(`document`), + id: z.string(), + provider: z.literal(`y-durable-streams`), + docId: z.string(), + docPath: z.string(), + streamPath: z.string(), + transportMimeType: z.literal( + `application/vnd.electric-agents.markdown-yjs` + ), + contentMimeType: z.literal(`text/markdown`), + yTextName: z.literal(`markdown`), + title: z.string(), + createdAt: z.string(), + createdBy: z.string().optional(), + updatedAt: z.string().optional(), + meta: createAttachmentMetaSchema().optional(), + }), z.object({ key: z.string().optional(), ...timelineOrderField, @@ -896,19 +1071,107 @@ function createManifestSchema(): Schema< createdAt: z.string(), updatedAt: z.string(), }), + z.object({ + key: z.string().optional(), + ...timelineOrderField, + kind: z.literal(`realtime-session`), + id: z.string(), + provider: z.string(), + model: z.string(), + voice: z.string().optional(), + reasoningEffort: z.enum([`low`, `medium`, `high`]).optional(), + interruptResponse: z.boolean().optional(), + status: createRealtimeSessionStatusSchema(), + startedAt: z.string(), + endedAt: z.string().nullable().optional(), + streams: createRealtimeSessionStreamRefsSchema(), + retention: z.literal(`forever`).default(`forever`), + meta: createJsonObjectSchema().optional(), + }), ]) as unknown as Schema< | ManifestChildEntryValue | ManifestSourceEntryValue | ManifestSharedStateEntryValue | ManifestEffectEntryValue | ManifestAttachmentEntryValue + | ManifestDocumentEntryValue | ManifestContextEntryValue | ManifestCronScheduleEntryValue | ManifestFutureSendScheduleEntryValue | ManifestGoalEntryValue + | ManifestRealtimeSessionEntryValue > } +function createRealtimeSessionSchema(): Schema { + return z.object({ + key: z.string().optional(), + ...timelineOrderField, + session_id: z.string(), + provider: z.string(), + model: z.string(), + voice: z.string().optional(), + reasoning_effort: z.enum([`low`, `medium`, `high`]).optional(), + interrupt_response: z.boolean().optional(), + status: createRealtimeSessionStatusSchema(), + started_at: z.string(), + ended_at: z.string().optional(), + streams: createRealtimeSessionStreamRefsSchema(), + reason: z.string().optional(), + error: z.string().optional(), + meta: createJsonObjectSchema().optional(), + }) +} + +function createRealtimeAudioSpanSchema(): Schema { + return z.object({ + key: z.string().optional(), + ...timelineOrderField, + session_id: z.string(), + stream: z.enum([`input`, `output`]), + producer_id: z.string(), + producer_epoch: z.number().int().nonnegative(), + seq: z.number().int().nonnegative(), + offset: z.string(), + next_offset: z.string().optional(), + byte_start: z.number().int().nonnegative().optional(), + byte_end: z.number().int().nonnegative().optional(), + byte_length: z.number().int().nonnegative(), + sample_start: z.number().int().nonnegative(), + sample_count: z.number().int().nonnegative(), + sample_rate: z.number().int().positive(), + channels: z.number().int().positive(), + codec: z.literal(`pcm16`), + timing_source: z.enum([`client`, `runtime`, `provider`]), + captured_at: z.string().optional(), + received_at: z.string().optional(), + participant_id: z.string().optional(), + turn_id: z.string().optional(), + provider_item_id: z.string().optional(), + response_id: z.string().optional(), + created_at: z.string(), + }) +} + +function createRealtimeTranscriptSchema(): Schema { + return z.object({ + key: z.string().optional(), + ...timelineOrderField, + session_id: z.string(), + direction: z.enum([`input`, `output`]), + text: z.string(), + status: z.enum([`partial`, `final`]), + turn_id: z.string().optional(), + response_id: z.string().optional(), + audio_stream: z.enum([`input`, `output`]).optional(), + audio_offset: z.string().optional(), + audio_next_offset: z.string().optional(), + sample_start: z.number().int().nonnegative().optional(), + sample_end: z.number().int().nonnegative().optional(), + created_at: z.string(), + }) +} + function createReplayWatermarkSchema(): Schema { return z.object({ key: z.string().optional(), @@ -927,6 +1190,7 @@ export type Step = SequencedPersistedRow export type Text = SequencedPersistedRow export type TextDelta = SequencedPersistedRow export type ToolCall = SequencedPersistedRow +export type ToolArgDelta = SequencedPersistedRow export type Reasoning = SequencedPersistedRow export type ReasoningDelta = SequencedPersistedRow export type ErrorEvent = SequencedPersistedRow @@ -955,6 +1219,8 @@ export type AttachmentRole = AttachmentRoleValue export type AttachmentSubject = AttachmentSubjectValue export type ManifestAttachmentEntry = SequencedPersistedRow +export type ManifestDocumentEntry = + SequencedPersistedRow export type ManifestContextEntry = SequencedPersistedRow export type ManifestCronScheduleEntry = @@ -963,16 +1229,22 @@ export type ManifestFutureSendScheduleEntry = SequencedPersistedRow export type GoalStatus = GoalStatusValue export type ManifestGoalEntry = SequencedPersistedRow +export type RealtimeSessionStatus = RealtimeSessionStatusValue +export type RealtimeSessionStreamRefs = RealtimeSessionStreamRefsValue +export type ManifestRealtimeSessionEntry = + SequencedPersistedRow type ManifestUnion = | ManifestChildEntry | ManifestSourceEntry | ManifestSharedStateEntry | ManifestEffectEntry | ManifestAttachmentEntry + | ManifestDocumentEntry | ManifestContextEntry | ManifestCronScheduleEntry | ManifestFutureSendScheduleEntry | ManifestGoalEntry + | ManifestRealtimeSessionEntry export type Manifest = ManifestUnion & { id?: string entity_url?: string @@ -993,6 +1265,14 @@ export type Manifest = ManifestUnion & { createdBy?: string error?: string meta?: Record + provider?: string + docId?: string + docPath?: string + transportMimeType?: `application/vnd.electric-agents.markdown-yjs` + contentMimeType?: `text/markdown` + yTextName?: `markdown` + title?: string + updatedAt?: string name?: string attrs?: ContextEntryAttrs content?: string @@ -1004,7 +1284,11 @@ export type Manifest = ManifestUnion & { targetUrl?: string producerId?: string messageType?: string - status?: FutureSendScheduleStatus | AttachmentStatusValue | GoalStatusValue + status?: + | FutureSendScheduleStatus + | AttachmentStatusValue + | GoalStatusValue + | RealtimeSessionStatusValue sentAt?: string failedAt?: string lastError?: string @@ -1012,8 +1296,15 @@ export type Manifest = ManifestUnion & { tokenBudget?: number | null tokensUsed?: number summary?: string - updatedAt?: string -} + model?: string + startedAt?: string + endedAt?: string | null + streams?: RealtimeSessionStreamRefs + retention?: `forever` +} +export type RealtimeSession = SequencedPersistedRow +export type RealtimeAudioSpan = SequencedPersistedRow +export type RealtimeTranscript = SequencedPersistedRow export type ReplayWatermark = SequencedPersistedRow // ============================================================================ @@ -1038,6 +1329,9 @@ export const ENTITY_COLLECTIONS = { tags: `tags`, slashCommands: `slashCommands`, manifests: `manifests`, + realtimeSessions: `realtimeSessions`, + realtimeAudioSpans: `realtimeAudioSpans`, + realtimeTranscripts: `realtimeTranscripts`, contextInserted: `contextInserted`, contextRemoved: `contextRemoved`, replayWatermarks: `replayWatermarks`, @@ -1050,6 +1344,8 @@ export const BUILT_IN_EVENT_SCHEMAS = { text_delta: createTextDeltaSchema() as unknown as BuiltInEntitySchema, tool_call: createToolCallSchema() as unknown as BuiltInEntitySchema, + tool_arg_delta: + createToolArgDeltaSchema() as unknown as BuiltInEntitySchema, reasoning: createReasoningSchema() as unknown as BuiltInEntitySchema, reasoning_delta: @@ -1073,6 +1369,12 @@ export const BUILT_IN_EVENT_SCHEMAS = { context_removed: createContextRemovedSchema() as unknown as BuiltInEntitySchema, manifest: createManifestSchema() as unknown as BuiltInEntitySchema, + realtime_session: + createRealtimeSessionSchema() as unknown as BuiltInEntitySchema, + realtime_audio_span: + createRealtimeAudioSpanSchema() as unknown as BuiltInEntitySchema, + realtime_transcript: + createRealtimeTranscriptSchema() as unknown as BuiltInEntitySchema, replay_watermark: createReplayWatermarkSchema() as unknown as BuiltInEntitySchema, } as const @@ -1088,6 +1390,7 @@ type EntityCollectionsDefinition = { texts: CollectionDefinition textDeltas: CollectionDefinition toolCalls: CollectionDefinition + toolArgDeltas: CollectionDefinition reasoning: CollectionDefinition reasoningDeltas: CollectionDefinition errors: CollectionDefinition @@ -1100,6 +1403,9 @@ type EntityCollectionsDefinition = { tags: CollectionDefinition slashCommands: CollectionDefinition manifests: CollectionDefinition + realtimeSessions: CollectionDefinition + realtimeAudioSpans: CollectionDefinition + realtimeTranscripts: CollectionDefinition contextInserted: CollectionDefinition contextRemoved: CollectionDefinition replayWatermarks: CollectionDefinition @@ -1137,6 +1443,12 @@ export const builtInCollections: EntityCollectionsDefinition = { type: `tool_call`, primaryKey: `key`, }, + toolArgDeltas: { + schema: + BUILT_IN_EVENT_SCHEMAS.tool_arg_delta as StandardSchemaV1, + type: `tool_arg_delta`, + primaryKey: `key`, + }, reasoning: { schema: BUILT_IN_EVENT_SCHEMAS.reasoning as StandardSchemaV1, type: `reasoning`, @@ -1202,6 +1514,24 @@ export const builtInCollections: EntityCollectionsDefinition = { type: `manifest`, primaryKey: `key`, }, + realtimeSessions: { + schema: + BUILT_IN_EVENT_SCHEMAS.realtime_session as StandardSchemaV1, + type: `realtime_session`, + primaryKey: `key`, + }, + realtimeAudioSpans: { + schema: + BUILT_IN_EVENT_SCHEMAS.realtime_audio_span as StandardSchemaV1, + type: `realtime_audio_span`, + primaryKey: `key`, + }, + realtimeTranscripts: { + schema: + BUILT_IN_EVENT_SCHEMAS.realtime_transcript as StandardSchemaV1, + type: `realtime_transcript`, + primaryKey: `key`, + }, contextInserted: { schema: BUILT_IN_EVENT_SCHEMAS.context_inserted as StandardSchemaV1, @@ -1238,6 +1568,8 @@ const MANAGEMENT_TYPES = new Set([ `entity_created`, `signal`, `manifest`, + `realtime_session`, + `realtime_audio_span`, `replay_watermark`, `ack`, ]) diff --git a/packages/agents-runtime/src/entity-stream-db.ts b/packages/agents-runtime/src/entity-stream-db.ts index 37a87b11df..4fa4c1f797 100644 --- a/packages/agents-runtime/src/entity-stream-db.ts +++ b/packages/agents-runtime/src/entity-stream-db.ts @@ -9,6 +9,7 @@ import { createTransaction, getStreamDBCollectionId, } from '@durable-streams/state/db' +import { BasicIndex } from '@tanstack/db' import { builtInCollections, passthrough } from './entity-schema' import type { StandardSchemaV1 } from '@standard-schema/spec' import { formatPointerOrderToken, type EventPointer } from './event-pointer' @@ -106,6 +107,36 @@ type EntityStreamDBOptions = { const WRITE_TXID_TIMEOUT_MS = 20_000 +function createCollectionIndex( + collection: unknown, + indexCallback: (row: Record) => unknown +): void { + const createIndex = ( + collection as { + createIndex?: ( + indexCallback: (row: Record) => unknown, + config: { indexType: typeof BasicIndex } + ) => unknown + } + ).createIndex + if (typeof createIndex === `function`) { + createIndex.call(collection, indexCallback, { indexType: BasicIndex }) + } +} + +function createEntityTimelineIndexes(collections: EntityCollections): void { + createCollectionIndex(collections.texts, (row) => row.run_id) + createCollectionIndex(collections.textDeltas, (row) => row.text_id) + createCollectionIndex(collections.textDeltas, (row) => row.run_id) + createCollectionIndex( + collections.textDeltas, + (row) => row.realtime_transcript_id + ) + createCollectionIndex(collections.toolCalls, (row) => row.run_id) + createCollectionIndex(collections.steps, (row) => row.run_id) + createCollectionIndex(collections.errors, (row) => row.run_id) +} + /** * Virtual column the authenticated principal (from the change-event header) is * materialized into for externally writable collections. Like `_timeline_order`, @@ -563,6 +594,7 @@ export function createEntityStreamDB( } replayDb.__electricReplayBatchOffset = replayBatchOffset replayDb.__electricReplaySourceId = streamUrl + createEntityTimelineIndexes(replayDb.collections) const pendingWritePersistences = new Set>() let nextWriteSequence = 0 const pendingWriteSequences = new Set() diff --git a/packages/agents-runtime/src/entity-timeline.ts b/packages/agents-runtime/src/entity-timeline.ts index 171c85338f..aa4653f1dc 100644 --- a/packages/agents-runtime/src/entity-timeline.ts +++ b/packages/agents-runtime/src/entity-timeline.ts @@ -11,6 +11,7 @@ import { isUndefined, like, localOnlyCollectionOptions, + min, or, sum, toArray, @@ -23,7 +24,12 @@ import type { } from '@tanstack/db' import type { EntityStreamDB } from './entity-stream-db' import { formatPointerOrderToken, type EventPointer } from './event-pointer' -import type { ChildStatusEntry, MessageReceived, Signal } from './entity-schema' +import type { + ChildStatusEntry, + MessageReceived, + RealtimeTranscript, + Signal, +} from './entity-schema' import type { ManifestEntry, Wake, WakeMessage } from './types' export const TIMELINE_ORDER_FALLBACK = `~` @@ -44,7 +50,13 @@ export type EntityTimelineContentItem = toolCallId: string toolName: string args: Record - status: `started` | `args_complete` | `executing` | `completed` | `failed` + status: + | `started` + | `args_streaming` + | `args_complete` + | `executing` + | `completed` + | `failed` result?: string error?: string isError: boolean @@ -113,7 +125,13 @@ export interface IncludesToolCall { run_id: string order: TimelineOrder tool_name: string - status: `started` | `args_complete` | `executing` | `completed` | `failed` + status: + | `started` + | `args_streaming` + | `args_complete` + | `executing` + | `completed` + | `failed` args?: unknown result?: unknown error?: string @@ -159,6 +177,13 @@ export type IncludesSignal = Omit & { order: TimelineOrder } +export type IncludesRealtimeTranscript = Omit< + RealtimeTranscript, + `_seq` | `_timeline_order` +> & { + order: TimelineOrder +} + export interface IncludesContextInserted { key: string order: TimelineOrder @@ -195,6 +220,7 @@ export interface EntityTimelineData { inbox: Array wakes: Array signals: Array + realtimeTranscripts?: Array contextInserted: Array contextRemoved: Array entities: Array @@ -216,7 +242,7 @@ export interface EntityTimelineQueryOptions { /** * Additional sources merged into the timeline, keyed by row name. Names * must not collide with the built-in sources (`inbox`, `run`, `wake`, - * `signal`, `manifest`). + * `signal`, `error`, `realtimeTranscript`, `manifest`). */ customSources?: Record } @@ -243,7 +269,13 @@ export interface EntityTimelineToolCallItem { order: TimelineOrder tool_call_id?: string tool_name: string - status: `started` | `args_complete` | `executing` | `completed` | `failed` + status: + | `started` + | `args_streaming` + | `args_complete` + | `executing` + | `completed` + | `failed` args?: unknown result?: unknown error?: string @@ -323,6 +355,7 @@ export type EntityTimelineSignalRow = IncludesSignal export type EntityTimelineErrorRow = EntityTimelineErrorItem & { order: TimelineOrder } +export type EntityTimelineRealtimeTranscriptRow = IncludesRealtimeTranscript export type EntityTimelineQueryRow = | { @@ -332,6 +365,7 @@ export type EntityTimelineQueryRow = wake?: undefined signal?: undefined error?: undefined + realtimeTranscript?: undefined manifest?: undefined } | { @@ -341,6 +375,7 @@ export type EntityTimelineQueryRow = wake?: undefined signal?: undefined error?: undefined + realtimeTranscript?: undefined manifest?: undefined } | { @@ -350,6 +385,7 @@ export type EntityTimelineQueryRow = wake: EntityTimelineWakeRow signal?: undefined error?: undefined + realtimeTranscript?: undefined manifest?: undefined } | { @@ -359,6 +395,7 @@ export type EntityTimelineQueryRow = wake?: undefined signal: EntityTimelineSignalRow error?: undefined + realtimeTranscript?: undefined manifest?: undefined } | { @@ -368,6 +405,7 @@ export type EntityTimelineQueryRow = wake?: undefined signal?: undefined error: EntityTimelineErrorRow + realtimeTranscript?: undefined manifest?: undefined } | { @@ -377,6 +415,17 @@ export type EntityTimelineQueryRow = wake?: undefined signal?: undefined error?: undefined + realtimeTranscript: EntityTimelineRealtimeTranscriptRow + manifest?: undefined + } + | { + $key: string + inbox?: undefined + run?: undefined + wake?: undefined + signal?: undefined + error?: undefined + realtimeTranscript?: undefined manifest: ManifestEntry } @@ -492,6 +541,9 @@ export function normalizeEntityTimelineData( inbox: data.inbox, wakes: data.wakes, signals: data.signals ?? [], + realtimeTranscripts: [...(data.realtimeTranscripts ?? [])].sort( + compareTimelineOrder + ), contextInserted: data.contextInserted, contextRemoved: data.contextRemoved, entities: normalizeTimelineEntities(data.entities), @@ -528,6 +580,9 @@ type WakeRow = OrderedValue< type SignalRow = OrderedValue< EntityStreamDB[`collections`][`signals`][`toArray`][number] > +type RealtimeTranscriptValueRow = + EntityStreamDB[`collections`][`realtimeTranscripts`][`toArray`][number] +type RealtimeTranscriptRow = OrderedValue type ContextInsertedValueRow = EntityStreamDB[`collections`][`contextInserted`][`toArray`][number] type ContextRemovedValueRow = @@ -680,6 +735,23 @@ function getOrderableCollection( return collection } +function getOptionalOrderableCollection( + collection: + | { + id?: string + toArray: Array + __electricRowOffsets?: Map + } + | undefined, + id: string +): { + id?: string + toArray: Array + __electricRowOffsets?: Map +} { + return collection ?? { id, toArray: [] } +} + function createOrderIndex( groups: ReadonlyArray> ): Map { @@ -797,6 +869,25 @@ function buildTextContentById( return deltasById } +function buildRealtimeTranscriptContentById( + textDeltas: Array +): Map { + const deltasById = new Map() + + for (const delta of [...textDeltas].sort(compareTimelineOrder)) { + const transcriptId = + (delta as { realtime_transcript_id?: string }).realtime_transcript_id ?? + (!delta.run_id ? delta.text_id : undefined) + if (!transcriptId) continue + deltasById.set( + transcriptId, + `${deltasById.get(transcriptId) ?? ``}${delta.delta}` + ) + } + + return deltasById +} + function buildIncludesRuns(input: { runs: Array texts: Array @@ -980,6 +1071,25 @@ function buildSignalMessages(signals: Array): Array { }) } +function buildRealtimeTranscriptMessages( + transcripts: Array, + textDeltas: Array = [] +): Array { + const textContentById = buildRealtimeTranscriptContentById(textDeltas) + return [...transcripts].sort(compareTimelineOrder).map((transcript) => { + const { + _seq: _ignoredSeq, + _timeline_order: _ignoredTimelineOrder, + ...value + } = transcript + return { + ...value, + order: transcript.order, + text: textContentById.get(transcript.key) ?? transcript.text, + } + }) +} + function buildContextInsertedMessages( entries: Array ): Array { @@ -1098,6 +1208,14 @@ export function buildEntityTimelineData( const inbox = withOrderToken(db.collections.inbox) const wakes = withOrderToken(db.collections.wakes) const signals = withOrderToken(db.collections.signals) + const realtimeTranscripts = withOrderToken( + getOptionalOrderableCollection( + db.collections.realtimeTranscripts as + | typeof db.collections.realtimeTranscripts + | undefined, + `realtimeTranscripts` + ) + ) const contextInserted = withOrderToken( getOrderableCollection( db.collections.contextInserted as @@ -1145,6 +1263,7 @@ export function buildEntityTimelineData( inbox, wakes, signals, + realtimeTranscripts, contextInserted, contextRemoved, manifests.filter(hasOrderToken), @@ -1162,6 +1281,10 @@ export function buildEntityTimelineData( inbox: buildInboxMessages(withOrderFromOrderIndex(inbox, orderIndex)), wakes: buildWakeMessages(withOrderFromOrderIndex(wakes, orderIndex)), signals: buildSignalMessages(withOrderFromOrderIndex(signals, orderIndex)), + realtimeTranscripts: buildRealtimeTranscriptMessages( + withOrderFromOrderIndex(realtimeTranscripts, orderIndex), + withOrderFromOrderIndex(textDeltas, orderIndex) + ), contextInserted: buildContextInsertedMessages( withOrderAndHistoryOffsetFromOrderIndex(contextInserted, orderIndex) ), @@ -1314,6 +1437,43 @@ const getEntitySignalsCollection = cachedCollectionFactory( }) ) +const getEntityRealtimeTranscriptsCollection = cachedCollectionFactory( + (db: EntityStreamDB) => + createLiveQueryCollection({ + id: `${String(db.collections.realtimeTranscripts.id)}:realtime-transcripts-live`, + query: (q) => + q + .from({ realtimeTranscript: db.collections.realtimeTranscripts }) + .select(({ realtimeTranscript }) => ({ + timelineKey: TIMELINE_KEY, + key: realtimeTranscript.key, + order: coalesce(realtimeTranscript._seq, -1), + session_id: realtimeTranscript.session_id, + direction: realtimeTranscript.direction, + text: concat( + toArray( + q + .from({ delta: db.collections.textDeltas }) + .where(({ delta }) => + eq(delta.realtime_transcript_id, realtimeTranscript.key) + ) + .orderBy(({ delta }) => coalesce(delta._seq, -1)) + .select(({ delta }) => delta.delta) + ) + ), + status: realtimeTranscript.status, + turn_id: realtimeTranscript.turn_id, + response_id: realtimeTranscript.response_id, + audio_stream: realtimeTranscript.audio_stream, + audio_offset: realtimeTranscript.audio_offset, + audio_next_offset: realtimeTranscript.audio_next_offset, + sample_start: realtimeTranscript.sample_start, + sample_end: realtimeTranscript.sample_end, + created_at: realtimeTranscript.created_at, + })), + }) +) + type EntityTimelineQueryBuilder = (q: InitialQueryBuilder) => QueryBuilder /** @@ -1423,6 +1583,36 @@ function buildEntityTimelineQuery( run_id: error.run_id, })) + const realtimeTranscriptSource = q + .from({ realtimeTranscript: db.collections.realtimeTranscripts }) + .select(({ realtimeTranscript }) => ({ + key: realtimeTranscript.key, + order: coalesce(realtimeTranscript._timeline_order, `~`), + session_id: realtimeTranscript.session_id, + direction: realtimeTranscript.direction, + text: concat( + toArray( + q + .from({ delta: db.collections.textDeltas }) + .where(({ delta }) => + eq(delta.realtime_transcript_id, realtimeTranscript.key) + ) + .orderBy(({ delta }) => coalesce(delta._timeline_order, `~`)) + .orderBy(({ delta }) => delta.key) + .select(({ delta }) => delta.delta) + ) + ), + status: realtimeTranscript.status, + turn_id: realtimeTranscript.turn_id, + response_id: realtimeTranscript.response_id, + audio_stream: realtimeTranscript.audio_stream, + audio_offset: realtimeTranscript.audio_offset, + audio_next_offset: realtimeTranscript.audio_next_offset, + sample_start: realtimeTranscript.sample_start, + sample_end: realtimeTranscript.sample_end, + created_at: realtimeTranscript.created_at, + })) + // Union texts + tool calls into a single ordered stream. The // text-delta join lives at this level (vs. inside the consumer's // `items.select`) so the correlation key is `text.key` — a field @@ -1517,14 +1707,25 @@ function buildEntityTimelineQuery( output_count: count(step.output_tokens), })) + const runItemAnchorSource = q + .from({ item: runItemsSource }) + .groupBy(({ item }) => item.run_id) + .select(({ item }) => ({ + run_id: item.run_id, + order: min(item.order), + })) + const runSource = q .from({ run: db.collections.runs }) .leftJoin({ runTokens: runTokensSource }, ({ run, runTokens }) => eq(run.key, runTokens.run_id) ) - .select(({ run, runTokens }) => ({ + .leftJoin({ anchor: runItemAnchorSource }, ({ run, anchor }) => + eq(anchor.run_id, run.key) + ) + .select(({ run, runTokens, anchor }) => ({ key: run.key, - order: coalesce(run._timeline_order, `~`), + order: coalesce(anchor.order, run._timeline_order, `~`), status: run.status, finish_reason: run.finish_reason, // Mirrors the `tokens` shape produced by `createEntityIncludesQuery` @@ -1625,6 +1826,7 @@ function buildEntityTimelineQuery( wake: wakeSource, signal: signalSource, error: errorSource, + realtimeTranscript: realtimeTranscriptSource, manifest: db.collections.manifests, } for (const [name, buildSource] of Object.entries(opts.customSources ?? {})) { @@ -1672,6 +1874,8 @@ export function createEntityIncludesQuery( const inboxCollection = getEntityInboxCollection(db) const wakesCollection = getEntityWakesCollection(db) const signalsCollection = getEntitySignalsCollection(db) + const realtimeTranscriptsCollection = + getEntityRealtimeTranscriptsCollection(db) const entitiesCollection = getEntityEntitiesCollection(db) return (q: InitialQueryBuilder) => @@ -1850,6 +2054,30 @@ export function createEntityIncludesQuery( new_state: signal.new_state, })) ), + realtimeTranscripts: toArray( + q + .from({ realtimeTranscript: realtimeTranscriptsCollection }) + .where(({ realtimeTranscript }) => + eq(realtimeTranscript.timelineKey, timeline.key) + ) + .orderBy(({ realtimeTranscript }) => realtimeTranscript.order) + .select(({ realtimeTranscript }) => ({ + key: realtimeTranscript.key, + order: realtimeTranscript.order, + session_id: realtimeTranscript.session_id, + direction: realtimeTranscript.direction, + text: realtimeTranscript.text, + status: realtimeTranscript.status, + turn_id: realtimeTranscript.turn_id, + response_id: realtimeTranscript.response_id, + audio_stream: realtimeTranscript.audio_stream, + audio_offset: realtimeTranscript.audio_offset, + audio_next_offset: realtimeTranscript.audio_next_offset, + sample_start: realtimeTranscript.sample_start, + sample_end: realtimeTranscript.sample_end, + created_at: realtimeTranscript.created_at, + })) + ), entities: toArray( q .from({ entity: entitiesCollection }) diff --git a/packages/agents-runtime/src/index.ts b/packages/agents-runtime/src/index.ts index 61093c87b4..41f63f8022 100644 --- a/packages/agents-runtime/src/index.ts +++ b/packages/agents-runtime/src/index.ts @@ -7,10 +7,33 @@ export type { ManifestAttachmentEntry, ManifestChildEntry, ManifestContextEntry, + ManifestDocumentEntry, ManifestEntry, ManifestEffectEntry, + ManifestRealtimeSessionEntry, ManifestSourceEntry, ManifestSharedStateEntry, + RealtimeAudioSpan, + RealtimeAudioConfig, + RealtimeAudioFormat, + RealtimeConfig, + RealtimeContextConfig, + RealtimeHandle, + RealtimeHelpers, + RealtimeProviderConfig, + RealtimeProviderConnectInput, + RealtimeProviderEvent, + RealtimeProviderSession, + RealtimeRunResult, + RealtimeSession, + RealtimeSessionPolicy, + RealtimeSessionStatus, + RealtimeSessionStreamRefs, + RealtimeToolPolicy, + RealtimeToolResult, + RealtimeTranscript, + RealtimeTranscriptEvent, + RealtimeTurnDetectionConfig, PendingSend, EffectConfig, ObservationSource, @@ -74,6 +97,7 @@ export type { GeneratedStateActions, HandlerActions, ManifestContextEntry as ManifestContextRow, + ManifestDocumentEntry as ManifestDocumentRow, SchemaInput, SchemaOutput, SourceConfig, @@ -117,14 +141,46 @@ export type { AttachmentSubject, AttachmentSubjectType, ManifestContextEntry as ManifestContextEntryRow, + ManifestDocumentEntry as ManifestDocumentEntryRow, + ManifestRealtimeSessionEntry as ManifestRealtimeSessionEntryRow, + RealtimeAudioSpan as RealtimeAudioSpanRow, + RealtimeSession as RealtimeSessionRow, + RealtimeSessionStatus as RealtimeSessionStatusRow, + RealtimeSessionStreamRefs as RealtimeSessionStreamRefsRow, + RealtimeTranscript as RealtimeTranscriptRow, ReplayWatermark, WakeConfigValue, } from './entity-schema' export { createEntityStreamDB } from './entity-stream-db' +export { createTestRealtimeProvider } from './realtime' +export type { TestRealtimeProviderOptions } from './realtime' +export { createOpenAIRealtimeProvider } from './openai-realtime' +export type { OpenAIRealtimeProviderOptions } from './openai-realtime' +export { + DEFAULT_OPENAI_REALTIME_MODEL, + DEFAULT_OPENAI_REALTIME_REASONING_EFFORT, + DEFAULT_OPENAI_REALTIME_VOICE, + OPENAI_REALTIME_MODELS, + OPENAI_REALTIME_REASONING_EFFORTS, + OPENAI_REALTIME_VOICES, + isOpenAIRealtimeModel, + isOpenAIRealtimeReasoningEffort, + isOpenAIRealtimeVoice, +} from './realtime-options' +export type { + OpenAIRealtimeReasoningEffort, + RealtimeModelChoice, + RealtimeProviderId, + RealtimeReasoningEffortChoice, + RealtimeVoiceChoice, +} from './realtime-options' export { getEntityAttachmentStreamPath, + getEntityMarkdownDocumentPath, + getEntityMarkdownDocumentUrlPath, manifestAttachmentKey, + manifestMarkdownDocumentKey, } from './manifest-helpers' export { COMPOSER_INPUT_MESSAGE_TYPE, @@ -261,6 +317,9 @@ export type { DispatchPolicy, SpawnEntityOptions, SendEntityMessageOptions, + RealtimeAudioOptions, + RealtimeSessionStartResult, + StartRealtimeSessionOptions, } from './runtime-server-client' export { buildWebhookSourceManifestEntry, diff --git a/packages/agents-runtime/src/manifest-helpers.ts b/packages/agents-runtime/src/manifest-helpers.ts index 599cd6fa38..bce492b98c 100644 --- a/packages/agents-runtime/src/manifest-helpers.ts +++ b/packages/agents-runtime/src/manifest-helpers.ts @@ -16,9 +16,37 @@ export function manifestAttachmentKey(id: string): string { return `attachment:${id}` } +export function manifestMarkdownDocumentKey(id: string): string { + return `document:${id}` +} + export function getEntityAttachmentStreamPath( entityUrl: string, attachmentId: string ): string { return `${entityUrl.replace(/\/+$/, ``)}/attachments/${attachmentId}` } + +export function getEntityMarkdownDocumentPath( + entityUrl: string, + documentId: string +): string { + const segments = entityUrl.replace(/^\/+|\/+$/g, ``).split(`/`) + if (segments.length !== 2 || !segments[0] || !segments[1]) { + throw new Error( + `Invalid entity URL for markdown document path: ${entityUrl}` + ) + } + return `agents/${segments[0]}/${segments[1]}/documents/${documentId}` +} + +export function getEntityMarkdownDocumentUrlPath( + service: string, + entityUrl: string, + documentId: string +): string { + return `/v1/yjs/${encodeURIComponent(service)}/docs/${getEntityMarkdownDocumentPath( + entityUrl, + documentId + )}` +} diff --git a/packages/agents-runtime/src/markdown-document-constants.ts b/packages/agents-runtime/src/markdown-document-constants.ts new file mode 100644 index 0000000000..e58a7ebe69 --- /dev/null +++ b/packages/agents-runtime/src/markdown-document-constants.ts @@ -0,0 +1,2 @@ +export const MARKDOWN_DOCUMENT_TEXT_NAME = `markdown` as const +export const MARKDOWN_DOCUMENT_AGENT_PRESENCE_TTL_MS = 45_000 diff --git a/packages/agents-runtime/src/markdown-document-session.ts b/packages/agents-runtime/src/markdown-document-session.ts new file mode 100644 index 0000000000..c3b5814f6d --- /dev/null +++ b/packages/agents-runtime/src/markdown-document-session.ts @@ -0,0 +1,164 @@ +import { YjsProvider } from '@durable-streams/y-durable-streams' +import { Awareness } from 'y-protocols/awareness' +import * as Y from 'yjs' +import { + MARKDOWN_DOCUMENT_AGENT_PRESENCE_TTL_MS, + MARKDOWN_DOCUMENT_TEXT_NAME, +} from './markdown-document-constants' +import { markdownText } from './markdown-yjs' +import type { + ManifestDocumentEntry, + MarkdownDocumentConnection, + RuntimePrincipal, +} from './types' + +export type MarkdownDocumentPresence = { + anchor?: number + head?: number + clear?: boolean +} + +export type MarkdownDocumentSession = { + readonly document: ManifestDocumentEntry + readonly doc: Y.Doc + readonly text: Y.Text + readonly textName: string + content: () => string + setPresence: (opts: MarkdownDocumentPresence) => Promise + flush: () => Promise + close: () => Promise +} + +export async function openMarkdownDocumentSession(opts: { + document: ManifestDocumentEntry + connection: MarkdownDocumentConnection + entityUrl: string + principal?: RuntimePrincipal +}): Promise { + const doc = new Y.Doc() + const textName = opts.document.yTextName || MARKDOWN_DOCUMENT_TEXT_NAME + const text = markdownText(doc, textName) + const awareness = new Awareness(doc) + const provider = new YjsProvider({ + doc, + baseUrl: opts.connection.baseUrl, + docId: opts.connection.docId, + awareness, + headers: opts.connection.headers, + liveMode: `sse`, + connect: false, + }) + const principalUrl = `/principal/entity:${encodeURIComponent(opts.entityUrl)}` + const color = principalColor(principalUrl) + + await provider.connect() + + const content = (): string => text.toString() + + const setPresence = async ({ + anchor, + head, + clear, + }: MarkdownDocumentPresence): Promise => { + if (clear) { + awareness.setLocalState(null) + await settleAwarenessUpdate() + return + } + const boundedAnchor = boundIndex(anchor ?? text.length, text.length) + const boundedHead = boundIndex(head ?? boundedAnchor, text.length) + const now = Date.now() + awareness.setLocalState({ + user: { + name: principalDisplayName(principalUrl), + principalUrl, + role: principalRole(principalUrl), + status: `editing`, + updatedAt: now, + expiresAt: now + MARKDOWN_DOCUMENT_AGENT_PRESENCE_TTL_MS, + color: color.color, + colorLight: color.colorLight, + }, + cursor: { + anchor: Y.createRelativePositionFromTypeIndex(text, boundedAnchor), + head: Y.createRelativePositionFromTypeIndex(text, boundedHead), + }, + }) + await settleAwarenessUpdate() + } + + return { + document: opts.document, + doc, + text, + textName, + content, + setPresence, + flush: () => provider.flush(), + close: async () => { + awareness.setLocalState(null) + await settleAwarenessUpdate() + await provider.flush() + await provider.disconnect() + awareness.destroy() + provider.destroy() + doc.destroy() + }, + } +} + +function boundIndex(value: number, length: number): number { + return Math.max(0, Math.min(Math.floor(value), length)) +} + +function settleAwarenessUpdate(): Promise { + return new Promise((resolve) => setTimeout(resolve, 0)) +} + +function principalDisplayName(principalUrl: string): string { + const raw = principalUrl.split(`/principal/`).at(-1) ?? principalUrl + let decoded = raw + try { + decoded = decodeURIComponent(raw) + } catch { + // Keep the raw value when the URL segment is not URI encoded. + } + const withoutPrefix = decoded.replace(/^(user|agent|entity|system):/, ``) + if (withoutPrefix.startsWith(`/`)) { + return withoutPrefix.split(`/`).filter(Boolean).at(-1) ?? withoutPrefix + } + return withoutPrefix || decoded || principalUrl +} + +function principalRole(principalUrl: string): `agent` | `user` | `system` { + const raw = principalUrl.split(`/principal/`).at(-1) ?? principalUrl + let decoded = raw + try { + decoded = decodeURIComponent(raw) + } catch { + // Keep the raw value when the URL segment is not URI encoded. + } + if (decoded.startsWith(`user:`)) return `user` + if (decoded.startsWith(`system:`)) return `system` + return `agent` +} + +function principalColor(principalUrl: string): { + color: string + colorLight: string +} { + const colors = [ + [`#2563eb`, `#2563eb33`], + [`#059669`, `#05966933`], + [`#dc2626`, `#dc262633`], + [`#7c3aed`, `#7c3aed33`], + [`#c2410c`, `#c2410c33`], + [`#0f766e`, `#0f766e33`], + ] as const + let hash = 0 + for (let i = 0; i < principalUrl.length; i += 1) { + hash = (hash * 31 + principalUrl.charCodeAt(i)) >>> 0 + } + const [color, colorLight] = colors[hash % colors.length]! + return { color, colorLight } +} diff --git a/packages/agents-runtime/src/markdown-yjs.ts b/packages/agents-runtime/src/markdown-yjs.ts new file mode 100644 index 0000000000..57d040d085 --- /dev/null +++ b/packages/agents-runtime/src/markdown-yjs.ts @@ -0,0 +1,263 @@ +import * as decoding from 'lib0/decoding' +import * as encoding from 'lib0/encoding' +import { Awareness, encodeAwarenessUpdate } from 'y-protocols/awareness' +import * as Y from 'yjs' + +import { + MARKDOWN_DOCUMENT_AGENT_PRESENCE_TTL_MS, + MARKDOWN_DOCUMENT_TEXT_NAME, +} from './markdown-document-constants' + +export { + MARKDOWN_DOCUMENT_AGENT_PRESENCE_TTL_MS, + MARKDOWN_DOCUMENT_TEXT_NAME, +} from './markdown-document-constants' + +export function frameYjsUpdate(update: Uint8Array): Uint8Array { + const encoder = encoding.createEncoder() + encoding.writeVarUint8Array(encoder, update) + return encoding.toUint8Array(encoder) +} + +export function applyFramedYjsUpdates(doc: Y.Doc, data: Uint8Array): void { + if (data.length === 0) return + const decoder = decoding.createDecoder(data) + while (decoding.hasContent(decoder)) { + Y.applyUpdate(doc, decoding.readVarUint8Array(decoder), `agent`) + } +} + +export function markdownText( + doc: Y.Doc, + name: string = MARKDOWN_DOCUMENT_TEXT_NAME +): Y.Text { + return doc.getText(name) +} + +export function createMarkdownYDoc(data: Uint8Array): Y.Doc { + const doc = new Y.Doc() + applyFramedYjsUpdates(doc, data) + return doc +} + +export function replaceMarkdownText( + doc: Y.Doc, + content: string, + textName: string = MARKDOWN_DOCUMENT_TEXT_NAME +): Uint8Array { + const before = Y.encodeStateVector(doc) + const text = markdownText(doc, textName) + doc.transact(() => { + text.delete(0, text.length) + if (content.length > 0) text.insert(0, content) + }, `agent`) + return Y.encodeStateAsUpdate(doc, before) +} + +export function editMarkdownText( + doc: Y.Doc, + oldString: string, + newString: string, + replaceAll: boolean | undefined, + textName: string = MARKDOWN_DOCUMENT_TEXT_NAME +): { + update: Uint8Array + content: string + replacements: number + cursorIndex?: number +} { + const text = markdownText(doc, textName) + const beforeContent = text.toString() + const matches = beforeContent.split(oldString).length - 1 + if (matches === 0 || (!replaceAll && matches > 1)) { + return { + update: new Uint8Array(), + content: beforeContent, + replacements: matches, + } + } + + const before = Y.encodeStateVector(doc) + let cursorIndex = 0 + if (replaceAll) { + let cursor = 0 + doc.transact(() => { + while (true) { + const index = text.toString().indexOf(oldString, cursor) + if (index < 0) break + text.delete(index, oldString.length) + text.insert(index, newString) + cursor = index + newString.length + cursorIndex = cursor + } + }, `agent`) + } else { + const index = beforeContent.indexOf(oldString) + doc.transact(() => { + text.delete(index, oldString.length) + text.insert(index, newString) + }, `agent`) + cursorIndex = index + newString.length + } + return { + update: Y.encodeStateAsUpdate(doc, before), + content: text.toString(), + replacements: matches, + cursorIndex, + } +} + +export function insertMarkdownText( + doc: Y.Doc, + content: string, + opts?: { + index?: number + position?: Y.RelativePosition + textName?: string + } +): { + update: Uint8Array + index: number + nextIndex: number + nextPosition: Y.RelativePosition +} { + const text = markdownText(doc, opts?.textName) + const absolute = opts?.position + ? Y.createAbsolutePositionFromRelativePosition(opts.position, doc) + : null + const index = + absolute && absolute.type === text + ? Math.max(0, Math.min(absolute.index, text.length)) + : Math.max(0, Math.min(opts?.index ?? text.length, text.length)) + const before = Y.encodeStateVector(doc) + if (content.length > 0) { + doc.transact(() => { + text.insert(index, content) + }, `agent`) + } + const nextIndex = index + content.length + return { + update: Y.encodeStateAsUpdate(doc, before), + index, + nextIndex, + nextPosition: Y.createRelativePositionFromTypeIndex(text, nextIndex), + } +} + +export function deleteMarkdownTextRange( + doc: Y.Doc, + index: number, + length: number, + textName: string = MARKDOWN_DOCUMENT_TEXT_NAME +): { + update: Uint8Array + index: number + length: number + position: Y.RelativePosition +} { + const text = markdownText(doc, textName) + const boundedIndex = Math.max(0, Math.min(index, text.length)) + const boundedLength = Math.max( + 0, + Math.min(length, text.length - boundedIndex) + ) + const before = Y.encodeStateVector(doc) + if (boundedLength > 0) { + doc.transact(() => { + text.delete(boundedIndex, boundedLength) + }, `agent`) + } + return { + update: Y.encodeStateAsUpdate(doc, before), + index: boundedIndex, + length: boundedLength, + position: Y.createRelativePositionFromTypeIndex(text, boundedIndex), + } +} + +export function relativePositionAtMarkdownIndex( + doc: Y.Doc, + index: number, + textName: string = MARKDOWN_DOCUMENT_TEXT_NAME +): Y.RelativePosition { + const text = markdownText(doc, textName) + const boundedIndex = Math.max(0, Math.min(index, text.length)) + return Y.createRelativePositionFromTypeIndex(text, boundedIndex) +} + +export function markdownIndexFromRelativePosition( + doc: Y.Doc, + position: Y.RelativePosition, + textName: string = MARKDOWN_DOCUMENT_TEXT_NAME +): number | undefined { + const text = markdownText(doc, textName) + const absolute = Y.createAbsolutePositionFromRelativePosition(position, doc) + if (!absolute || absolute.type !== text) return undefined + return Math.max(0, Math.min(absolute.index, text.length)) +} + +export function encodeMarkdownAwarenessUpdate(opts: { + doc: Y.Doc + docPath: string + principalUrl: string + clientKey?: string + name: string + role: `agent` | `user` | `system` + status?: `editing` + anchor?: number + head?: number + color: string + colorLight: string + clear?: boolean + textName?: string +}): Uint8Array { + const awarenessDoc = new Y.Doc() + ;(awarenessDoc as { clientID: number }).clientID = + markdownDocumentPresenceClientId( + opts.docPath, + opts.clientKey ?? opts.principalUrl + ) + const awareness = new Awareness(awarenessDoc) + if (opts.clear) { + awareness.setLocalState(null) + } else { + const text = markdownText(opts.doc, opts.textName) + const anchor = Math.max( + 0, + Math.min(opts.anchor ?? text.length, text.length) + ) + const head = Math.max(0, Math.min(opts.head ?? anchor, text.length)) + const now = Date.now() + awareness.setLocalState({ + user: { + name: opts.name, + principalUrl: opts.principalUrl, + role: opts.role, + status: opts.status ?? `editing`, + updatedAt: now, + expiresAt: now + MARKDOWN_DOCUMENT_AGENT_PRESENCE_TTL_MS, + color: opts.color, + colorLight: opts.colorLight, + }, + cursor: { + anchor: Y.createRelativePositionFromTypeIndex(text, anchor), + head: Y.createRelativePositionFromTypeIndex(text, head), + }, + }) + } + return frameYjsUpdate(encodeAwarenessUpdate(awareness, [awareness.clientID])) +} + +function markdownDocumentPresenceClientId( + docPath: string, + principalUrl: string +): number { + let hash = 2166136261 + const input = `${docPath}\0${principalUrl}` + for (let i = 0; i < input.length; i += 1) { + hash ^= input.charCodeAt(i) + hash = Math.imul(hash, 16777619) + } + const id = hash >>> 0 + return id === 0 ? 1 : id +} diff --git a/packages/agents-runtime/src/openai-realtime.ts b/packages/agents-runtime/src/openai-realtime.ts new file mode 100644 index 0000000000..8909caf9e8 --- /dev/null +++ b/packages/agents-runtime/src/openai-realtime.ts @@ -0,0 +1,994 @@ +import type { + AgentTool, + LLMMessage, + RealtimeAudioFormat, + RealtimeProviderConfig, + RealtimeProviderConnectInput, + RealtimeProviderEvent, + RealtimeProviderSession, + RealtimeToolResult, + RealtimeTurnDetectionConfig, +} from './types' +import { + DEFAULT_OPENAI_REALTIME_MODEL, + DEFAULT_OPENAI_REALTIME_REASONING_EFFORT, + type OpenAIRealtimeReasoningEffort, +} from './realtime-options' + +type MaybePromise = T | Promise +type OpenAIRealtimeSocket = { + send: (data: string) => void + close?: (code?: number, reason?: string) => void + addEventListener?: ( + event: string, + handler: (...args: Array) => void + ) => void + removeEventListener?: ( + event: string, + handler: (...args: Array) => void + ) => void + on?: (event: string, handler: (...args: Array) => void) => void + off?: (event: string, handler: (...args: Array) => void) => void + readyState?: number +} +type OpenAIRealtimeWebSocketConstructor = new ( + url: string, + init?: unknown +) => OpenAIRealtimeSocket + +const DEFAULT_OPENAI_INPUT_TRANSCRIPTION_MODEL = `gpt-4o-mini-transcribe` +const BYTES_PER_PCM16_SAMPLE = 2 +const MAX_INPUT_AUDIO_APPEND_BYTES = 32 * 1024 + +export interface OpenAIRealtimeProviderOptions { + apiKey: string | (() => MaybePromise) + model?: string + url?: string + voice?: string + reasoningEffort?: OpenAIRealtimeReasoningEffort + safetyIdentifier?: string + headers?: Record + WebSocket?: OpenAIRealtimeWebSocketConstructor +} + +type OpenAIRealtimeEvent = Record & { type?: string } + +class AsyncEventQueue implements AsyncIterable { + private values: Array = [] + private resolvers: Array<{ + resolve: (value: IteratorResult) => void + reject: (error: unknown) => void + }> = [] + private closed = false + private error: unknown + + push(value: T): void { + if (this.closed) return + const resolver = this.resolvers.shift() + if (resolver) { + resolver.resolve({ value, done: false }) + return + } + this.values.push(value) + } + + close(): void { + if (this.closed) return + this.closed = true + for (const resolver of this.resolvers.splice(0)) { + resolver.resolve({ value: undefined as T, done: true }) + } + } + + fail(error: unknown): void { + if (this.closed) return + this.error = error + this.closed = true + for (const resolver of this.resolvers.splice(0)) { + resolver.reject(error) + } + } + + [Symbol.asyncIterator](): AsyncIterator { + return { + next: () => { + if (this.values.length > 0) { + return Promise.resolve({ value: this.values.shift()!, done: false }) + } + if (this.error) { + return Promise.reject(this.error) + } + if (this.closed) { + return Promise.resolve({ value: undefined as T, done: true }) + } + return new Promise>((resolve, reject) => { + this.resolvers.push({ resolve, reject }) + }) + }, + } + } +} + +function resolveWebSocket( + opts: OpenAIRealtimeProviderOptions +): OpenAIRealtimeWebSocketConstructor { + const ctor = opts.WebSocket ?? globalThis.WebSocket + if (!ctor) { + throw new Error( + `[agent-runtime] OpenAI realtime requires a WebSocket implementation` + ) + } + return ctor as unknown as OpenAIRealtimeWebSocketConstructor +} + +function onSocket( + ws: OpenAIRealtimeSocket, + event: string, + handler: (...args: Array) => void +): void { + if (ws.addEventListener) { + ws.addEventListener(event, handler) + return + } + ws.on?.(event, handler) +} + +function socketMessageData(args: Array): unknown { + const [first] = args + if (first && typeof first === `object` && `data` in first) { + return (first as { data: unknown }).data + } + return first +} + +function socketCloseDetails(args: Array): { + code?: number + reason?: string + wasClean?: boolean +} { + const [first, second] = args + if (typeof first === `number`) { + return { + code: first, + reason: second === undefined ? undefined : dataToString(second), + } + } + if (!first || typeof first !== `object`) return {} + const event = first as { + code?: unknown + reason?: unknown + wasClean?: unknown + } + return { + code: typeof event.code === `number` ? event.code : undefined, + reason: + typeof event.reason === `string` + ? event.reason + : event.reason === undefined + ? undefined + : dataToString(event.reason), + wasClean: typeof event.wasClean === `boolean` ? event.wasClean : undefined, + } +} + +function socketCloseError(details: { + code?: number + reason?: string + wasClean?: boolean +}): string { + const parts = [`OpenAI realtime WebSocket closed before client stop`] + if (details.code !== undefined) parts.push(`code=${details.code}`) + if (details.reason) parts.push(`reason=${details.reason}`) + if (details.wasClean !== undefined) parts.push(`clean=${details.wasClean}`) + return parts.join(` `) +} + +function dataToString(data: unknown): string { + if (typeof data === `string`) return data + if (data instanceof ArrayBuffer) return new TextDecoder().decode(data) + if (data instanceof Uint8Array) return new TextDecoder().decode(data) + if ( + data && + typeof data === `object` && + `toString` in data && + typeof data.toString === `function` + ) { + return data.toString() + } + return String(data) +} + +function bytesToBase64(bytes: Uint8Array): string { + const bufferCtor = (globalThis as { Buffer?: typeof Buffer }).Buffer + if (bufferCtor) return bufferCtor.from(bytes).toString(`base64`) + let binary = `` + for (const byte of bytes) binary += String.fromCharCode(byte) + return btoa(binary) +} + +function alignedPcm16Bytes(bytes: Uint8Array): Uint8Array { + const alignedLength = + bytes.byteLength - (bytes.byteLength % BYTES_PER_PCM16_SAMPLE) + if (alignedLength <= 0) return new Uint8Array() + return alignedLength === bytes.byteLength + ? bytes + : bytes.subarray(0, alignedLength) +} + +function inputAudioAppendChunks(bytes: Uint8Array): Array { + const aligned = alignedPcm16Bytes(bytes) + if (aligned.byteLength === 0) return [] + if (aligned.byteLength <= MAX_INPUT_AUDIO_APPEND_BYTES) return [aligned] + + const chunks: Array = [] + const chunkSize = + MAX_INPUT_AUDIO_APPEND_BYTES - + (MAX_INPUT_AUDIO_APPEND_BYTES % BYTES_PER_PCM16_SAMPLE) + for (let offset = 0; offset < aligned.byteLength; offset += chunkSize) { + chunks.push(aligned.subarray(offset, offset + chunkSize)) + } + return chunks +} + +function base64ToBytes(value: string): Uint8Array { + const bufferCtor = (globalThis as { Buffer?: typeof Buffer }).Buffer + if (bufferCtor) return new Uint8Array(bufferCtor.from(value, `base64`)) + const binary = atob(value) + const bytes = new Uint8Array(binary.length) + for (let index = 0; index < binary.length; index += 1) { + bytes[index] = binary.charCodeAt(index) + } + return bytes +} + +function sendJson(ws: OpenAIRealtimeSocket, event: unknown): void { + ws.send(JSON.stringify(event)) +} + +function toolName(tool: AgentTool): string { + return tool.name +} + +function toOpenAITool(tool: AgentTool): Record { + return { + type: `function`, + name: tool.name, + description: tool.description, + parameters: tool.parameters, + } +} + +function messageContentText(content: unknown): string { + if (typeof content === `string`) return content + if (!Array.isArray(content)) return `` + return content + .map((part) => { + if (!part || typeof part !== `object`) return `` + const text = (part as { text?: unknown }).text + return typeof text === `string` ? text : `` + }) + .filter(Boolean) + .join(`\n`) +} + +function messageRole(message: LLMMessage): `user` | `assistant` | null { + const role = (message as { role?: unknown }).role + return role === `assistant` ? `assistant` : role === `user` ? `user` : null +} + +function sendConversationMessage( + ws: OpenAIRealtimeSocket, + message: LLMMessage +): void { + const role = messageRole(message) + if (!role) return + const text = messageContentText((message as { content?: unknown }).content) + if (!text) return + sendJson(ws, { + type: `conversation.item.create`, + item: { + type: `message`, + role, + content: [ + { + type: role === `assistant` ? `output_text` : `input_text`, + text, + }, + ], + }, + }) +} + +function realtimeFormat( + format: RealtimeAudioFormat | undefined +): Record | undefined { + if (!format) return undefined + return { + type: `audio/pcm`, + rate: format.sampleRate, + } +} + +function inputTranscription( + input: RealtimeProviderConnectInput +): Record | undefined { + if (!input.audio?.inputFormat || input.audio.inputTranscription === false) { + return undefined + } + const config = input.audio.inputTranscription ?? {} + return { + model: config.model ?? DEFAULT_OPENAI_INPUT_TRANSCRIPTION_MODEL, + ...(config.language ? { language: config.language } : {}), + ...(config.prompt ? { prompt: config.prompt } : {}), + ...(config.delay ? { delay: config.delay } : {}), + } +} + +function realtimeTurnDetection( + config: RealtimeTurnDetectionConfig | undefined +): Record | null { + if (config === false || config?.type === `none`) return null + if (!config) { + return { + type: `server_vad`, + threshold: 0.55, + prefix_padding_ms: 300, + silence_duration_ms: 500, + create_response: true, + interrupt_response: true, + } + } + if (config.type === `semantic_vad`) { + return { + type: `semantic_vad`, + ...(config.eagerness ? { eagerness: config.eagerness } : {}), + create_response: config.createResponse ?? true, + interrupt_response: config.interruptResponse ?? true, + } + } + return { + type: `server_vad`, + ...(config.threshold != null ? { threshold: config.threshold } : {}), + ...(config.prefixPaddingMs != null + ? { prefix_padding_ms: config.prefixPaddingMs } + : {}), + ...(config.silenceDurationMs != null + ? { silence_duration_ms: config.silenceDurationMs } + : {}), + create_response: config.createResponse ?? true, + interrupt_response: config.interruptResponse ?? true, + } +} + +function buildSessionUpdate( + opts: OpenAIRealtimeProviderOptions, + input: RealtimeProviderConnectInput +): Record { + const inputFormat = realtimeFormat(input.audio?.inputFormat) + const outputFormat = realtimeFormat(input.audio?.outputFormat) + const transcription = inputTranscription(input) + const model = opts.model ?? DEFAULT_OPENAI_REALTIME_MODEL + const wantsAudioOutput = Boolean(outputFormat || opts.voice) + const reasoningEffort = + model === DEFAULT_OPENAI_REALTIME_MODEL + ? (opts.reasoningEffort ?? DEFAULT_OPENAI_REALTIME_REASONING_EFFORT) + : undefined + return { + type: `session.update`, + session: { + type: `realtime`, + model, + instructions: input.systemPrompt, + output_modalities: wantsAudioOutput ? [`audio`] : [`text`], + tool_choice: input.tools.length > 0 ? `auto` : `none`, + ...(reasoningEffort ? { reasoning: { effort: reasoningEffort } } : {}), + ...(input.tools.length > 0 + ? { tools: input.tools.map((tool) => toOpenAITool(tool)) } + : {}), + ...(inputFormat || wantsAudioOutput + ? { + audio: { + ...(inputFormat + ? { + input: { + format: inputFormat, + ...(transcription ? { transcription } : {}), + turn_detection: realtimeTurnDetection( + input.audio?.turnDetection + ), + }, + } + : {}), + ...(wantsAudioOutput + ? { + output: { + ...(outputFormat ? { format: outputFormat } : {}), + ...(opts.voice ? { voice: opts.voice } : {}), + }, + } + : {}), + }, + } + : {}), + }, + } +} + +function parseToolArgs(value: unknown): unknown { + if (typeof value !== `string`) return value ?? {} + try { + return JSON.parse(value) as unknown + } catch { + return value + } +} + +function toolResultOutput(result: RealtimeToolResult): string { + if (typeof result.result === `string`) return result.result + return JSON.stringify(result.result) +} + +type OutputTranscriptSource = + | `response.audio_transcript` + | `response.output_audio_transcript` + | `response.output_text` + +function outputTranscriptSource( + event: OpenAIRealtimeEvent +): OutputTranscriptSource | undefined { + if ( + event.type === `response.audio_transcript.delta` || + event.type === `response.audio_transcript.done` + ) { + return `response.audio_transcript` + } + if ( + event.type === `response.output_audio_transcript.delta` || + event.type === `response.output_audio_transcript.done` + ) { + return `response.output_audio_transcript` + } + if ( + event.type === `response.output_text.delta` || + event.type === `response.output_text.done` + ) { + return `response.output_text` + } + return undefined +} + +function openAIString(value: unknown): string | undefined { + return typeof value === `string` ? value : undefined +} + +function openAINumber(value: unknown): number | undefined { + return typeof value === `number` && Number.isFinite(value) ? value : undefined +} + +function openAIResponseId(event: OpenAIRealtimeEvent): string | undefined { + return typeof event.response?.id === `string` + ? event.response.id + : typeof event.response_id === `string` + ? event.response_id + : undefined +} + +function mapOpenAIEvent( + event: OpenAIRealtimeEvent +): Array { + switch (event.type) { + case `session.created`: + return [{ type: `session.started`, sessionId: event.session?.id }] + case `session.updated`: + return [{ type: `session.updated` }] + case `error`: + return [ + { + type: `session.error`, + error: + typeof event.error?.message === `string` + ? event.error.message + : `OpenAI realtime error`, + code: + typeof event.error?.code === `string` + ? event.error.code + : undefined, + }, + ] + case `input_audio_buffer.speech_started`: + return [ + { + type: `input_audio.speech_started`, + audioOffset: + typeof event.audio_start_ms === `number` + ? String(event.audio_start_ms) + : undefined, + turnId: typeof event.item_id === `string` ? event.item_id : undefined, + }, + ] + case `input_audio_buffer.speech_stopped`: + return [ + { + type: `input_audio.speech_stopped`, + audioOffset: + typeof event.audio_end_ms === `number` + ? String(event.audio_end_ms) + : undefined, + turnId: typeof event.item_id === `string` ? event.item_id : undefined, + }, + ] + case `input_audio_buffer.committed`: + return [ + { + type: `input_audio.committed`, + turnId: openAIString(event.item_id), + previousTurnId: openAIString(event.previous_item_id), + }, + ] + case `conversation.item.input_audio_transcription.delta`: + return [ + { + type: `input_transcript.delta`, + delta: String(event.delta ?? ``), + turnId: typeof event.item_id === `string` ? event.item_id : undefined, + }, + ] + case `conversation.item.input_audio_transcription.completed`: + return [ + { + type: `input_transcript.completed`, + text: String(event.transcript ?? ``), + turnId: typeof event.item_id === `string` ? event.item_id : undefined, + }, + ] + case `response.created`: + return [ + { + type: `response.started`, + responseId: openAIResponseId(event), + }, + ] + case `response.audio.delta`: + case `response.output_audio.delta`: + return [ + { + type: `output_audio.delta`, + audio: base64ToBytes(String(event.delta ?? ``)), + responseId: + typeof event.response_id === `string` + ? event.response_id + : undefined, + itemId: typeof event.item_id === `string` ? event.item_id : undefined, + }, + ] + case `response.audio.done`: + case `response.output_audio.done`: + return [ + { + type: `output_audio.completed`, + responseId: + typeof event.response_id === `string` + ? event.response_id + : undefined, + itemId: typeof event.item_id === `string` ? event.item_id : undefined, + }, + ] + case `response.audio_transcript.delta`: + case `response.output_audio_transcript.delta`: + case `response.output_text.delta`: + return [ + { + type: `output_transcript.delta`, + delta: String(event.delta ?? ``), + responseId: openAIString(event.response_id), + itemId: openAIString(event.item_id), + contentIndex: openAINumber(event.content_index), + transcriptSource: outputTranscriptSource(event), + }, + ] + case `response.audio_transcript.done`: + case `response.output_audio_transcript.done`: + case `response.output_text.done`: + return [ + { + type: `output_transcript.completed`, + text: + typeof event.transcript === `string` + ? event.transcript + : typeof event.text === `string` + ? event.text + : undefined, + responseId: openAIString(event.response_id), + itemId: openAIString(event.item_id), + contentIndex: openAINumber(event.content_index), + transcriptSource: outputTranscriptSource(event), + }, + ] + case `response.done`: + return [ + { + type: `response.completed`, + responseId: openAIResponseId(event), + }, + ] + case `response.cancelled`: + return [ + { + type: `response.cancelled`, + responseId: openAIResponseId(event), + }, + ] + case `response.output_item.added`: + if (event.item?.type !== `function_call`) return [] + return [ + { + type: `tool_call.started`, + toolCallId: String(event.item.call_id ?? event.item.id ?? ``), + name: String(event.item.name ?? ``), + }, + ] + case `response.function_call_arguments.delta`: + return [ + { + type: `tool_call.arguments_delta`, + toolCallId: String(event.call_id ?? event.item_id ?? ``), + delta: String(event.delta ?? ``), + }, + ] + default: + return [] + } +} + +export function createOpenAIRealtimeProvider( + opts: OpenAIRealtimeProviderOptions +): RealtimeProviderConfig { + const model = opts.model ?? DEFAULT_OPENAI_REALTIME_MODEL + + return { + id: `openai`, + model, + async connect(input): Promise { + const apiKey = + typeof opts.apiKey === `function` ? await opts.apiKey() : opts.apiKey + if (!apiKey) { + throw new Error(`[agent-runtime] OpenAI realtime apiKey is required`) + } + + const WebSocketCtor = resolveWebSocket(opts) + const url = new URL(opts.url ?? `wss://api.openai.com/v1/realtime`) + url.searchParams.set(`model`, model) + const headers: Record = { + Authorization: `Bearer ${apiKey}`, + ...opts.headers, + } + if (opts.safetyIdentifier) { + headers[`OpenAI-Safety-Identifier`] = opts.safetyIdentifier + } + + const ws = new WebSocketCtor(url.toString(), { headers }) + const queue = new AsyncEventQueue() + const toolsByName = new Map( + input.tools.map((tool) => [toolName(tool), tool]) + ) + const seenProviderEventIds = new Set() + let socketOpen = false + let socketClosed = false + let clientCloseRequested = false + let responseEpoch = 0 + let responseInFlight = false + let responseCreatePendingAck = false + let activeResponseId: string | undefined + let pendingResponseCreate = false + const finishedResponseIds: Array = [] + let rejectOpen: ((error: Error) => void) | undefined + let sequentialToolQueue: Promise = Promise.resolve() + + const runSequentialTool = async (fn: () => Promise): Promise => { + const run = sequentialToolQueue.then(fn, fn) + sequentialToolQueue = run.then( + () => undefined, + () => undefined + ) + return run + } + + const rememberFinishedResponse = (responseId: string | undefined) => { + if (!responseId || finishedResponseIds.includes(responseId)) return + finishedResponseIds.push(responseId) + if (finishedResponseIds.length > 32) { + finishedResponseIds.shift() + } + } + + const requestResponse = (): void => { + if (clientCloseRequested || socketClosed || input.signal?.aborted) { + return + } + if (responseInFlight) { + pendingResponseCreate = true + return + } + pendingResponseCreate = false + responseInFlight = true + responseCreatePendingAck = true + activeResponseId = undefined + sendJson(ws, { type: `response.create` }) + } + + const finishActiveResponse = (responseId?: string): void => { + if (!responseInFlight) return + if ( + responseId !== undefined && + finishedResponseIds.includes(responseId) + ) { + return + } + if ( + activeResponseId !== undefined && + responseId !== undefined && + responseId !== activeResponseId + ) { + return + } + if ( + responseCreatePendingAck && + activeResponseId === undefined && + responseId === undefined + ) { + return + } + responseInFlight = false + responseCreatePendingAck = false + rememberFinishedResponse(responseId) + activeResponseId = undefined + if (pendingResponseCreate) { + requestResponse() + } + } + + const closeQueue = (reason?: string): void => { + if (socketClosed) return + socketClosed = true + queue.push({ type: `session.closed`, reason }) + queue.close() + input.signal?.removeEventListener(`abort`, handleAbort) + } + + const handleAbort = (): void => { + const error = new Error( + `[agent-runtime] OpenAI realtime WebSocket aborted` + ) + clientCloseRequested = true + closeQueue(`aborted`) + ws.close?.(1000, `aborted`) + if (!socketOpen) rejectOpen?.(error) + } + + const sendToolResult = async ( + result: RealtimeToolResult + ): Promise => { + sendJson(ws, { + type: `conversation.item.create`, + item: { + type: `function_call_output`, + call_id: result.toolCallId, + output: toolResultOutput(result), + }, + }) + requestResponse() + } + + const executeToolCall = async ( + event: OpenAIRealtimeEvent + ): Promise => { + const toolResponseEpoch = responseEpoch + const item = event.item ?? {} + const toolCallId = String( + event.call_id ?? item.call_id ?? item.id ?? event.item_id ?? `` + ) + const name = String(event.name ?? item.name ?? ``) + const args = parseToolArgs(event.arguments ?? item.arguments) + queue.push({ + type: `tool_call.arguments_completed`, + toolCallId, + name, + args, + }) + const tool = toolsByName.get(name) + if (!tool) { + const result: RealtimeToolResult = { + toolCallId, + name, + result: `Tool "${name}" is not available.`, + isError: true, + } + queue.push({ type: `tool_call.completed`, ...result }) + await sendToolResult(result) + return + } + + const executeAndSendToolResult = async (): Promise => { + try { + const prepared = + typeof tool.prepareArguments === `function` + ? tool.prepareArguments(args) + : args + const result = await tool.execute( + toolCallId, + prepared as never, + input.signal + ) + const realtimeResult: RealtimeToolResult = { + toolCallId, + name, + result, + } + queue.push({ type: `tool_call.completed`, ...realtimeResult }) + if ( + clientCloseRequested || + socketClosed || + input.signal?.aborted || + toolResponseEpoch !== responseEpoch + ) { + return + } + await sendToolResult(realtimeResult) + } catch (error) { + const realtimeResult: RealtimeToolResult = { + toolCallId, + name, + result: error instanceof Error ? error.message : String(error), + isError: true, + } + queue.push({ type: `tool_call.completed`, ...realtimeResult }) + if ( + clientCloseRequested || + socketClosed || + input.signal?.aborted || + toolResponseEpoch !== responseEpoch + ) { + return + } + await sendToolResult(realtimeResult) + } + } + + if ( + (tool as { executionMode?: unknown }).executionMode === `sequential` + ) { + await runSequentialTool(executeAndSendToolResult) + return + } + + await executeAndSendToolResult() + } + + const opened = new Promise((resolve, reject) => { + rejectOpen = reject + onSocket(ws, `open`, () => { + if (socketClosed) return + socketOpen = true + if (input.signal?.aborted) { + handleAbort() + return + } + resolve() + }) + onSocket(ws, `error`, (event) => { + const error = + event instanceof Error + ? event + : new Error(`[agent-runtime] OpenAI realtime WebSocket error`) + input.signal?.removeEventListener(`abort`, handleAbort) + queue.fail(error) + reject(error) + }) + }) + + onSocket(ws, `message`, (...args) => { + try { + const parsed = JSON.parse( + dataToString(socketMessageData(args)) + ) as OpenAIRealtimeEvent + if (typeof parsed.event_id === `string`) { + if (seenProviderEventIds.has(parsed.event_id)) return + seenProviderEventIds.add(parsed.event_id) + } + if (parsed.type === `response.created`) { + responseEpoch += 1 + responseInFlight = true + responseCreatePendingAck = false + activeResponseId = openAIResponseId(parsed) + } + if (parsed.type === `response.function_call_arguments.done`) { + void executeToolCall(parsed).catch((error) => queue.fail(error)) + return + } + for (const event of mapOpenAIEvent(parsed)) queue.push(event) + if ( + parsed.type === `response.done` || + parsed.type === `response.cancelled` + ) { + finishActiveResponse(openAIResponseId(parsed)) + } + } catch (error) { + queue.fail(error) + } + }) + onSocket(ws, `close`, (...args) => { + const details = socketCloseDetails(args) + if (clientCloseRequested || input.signal?.aborted) { + closeQueue(details.reason || undefined) + return + } + queue.push({ + type: `session.error`, + code: `websocket_closed`, + error: socketCloseError(details), + }) + closeQueue(details.reason || `websocket_closed`) + }) + + if (input.signal?.aborted) { + handleAbort() + } else { + input.signal?.addEventListener(`abort`, handleAbort, { once: true }) + } + + await opened + sendJson(ws, buildSessionUpdate(opts, input)) + for (const message of input.messages) { + sendConversationMessage(ws, message) + } + + return { + events: queue, + appendInputAudio: async (chunk) => { + for (const appendChunk of inputAudioAppendChunks(chunk)) { + sendJson(ws, { + type: `input_audio_buffer.append`, + audio: bytesToBase64(appendChunk), + }) + } + }, + clearInputAudio: async () => { + sendJson(ws, { type: `input_audio_buffer.clear` }) + }, + commitInputAudio: async () => { + sendJson(ws, { type: `input_audio_buffer.commit` }) + requestResponse() + }, + sendText: async (text) => { + sendJson(ws, { + type: `conversation.item.create`, + item: { + type: `message`, + role: `user`, + content: [{ type: `input_text`, text }], + }, + }) + requestResponse() + }, + sendToolResult, + cancelResponse: async () => { + responseEpoch += 1 + sendJson(ws, { type: `response.cancel` }) + }, + truncateOutputAudio: async ({ itemId, audioEndMs }) => { + sendJson(ws, { + type: `conversation.item.truncate`, + item_id: itemId, + content_index: 0, + audio_end_ms: audioEndMs, + }) + }, + close: async (reason) => { + clientCloseRequested = true + closeQueue(reason) + ws.close?.(1000, reason) + }, + } + }, + } +} diff --git a/packages/agents-runtime/src/outbound-bridge.ts b/packages/agents-runtime/src/outbound-bridge.ts index 87c902df93..6d8600b98b 100644 --- a/packages/agents-runtime/src/outbound-bridge.ts +++ b/packages/agents-runtime/src/outbound-bridge.ts @@ -178,6 +178,18 @@ export interface OutboundBridge { onReasoningStart: () => void onReasoningDelta: (delta: string) => void onReasoningEnd: (opts?: { encrypted?: string; summaryTitle?: string }) => void + onToolCallArgsStart( + toolCallId: string, + name: string, + argsPreview?: unknown + ): void + onToolCallArgsDelta( + toolCallId: string, + name: string, + delta: string, + opts?: { contentIndex?: number; argsPreview?: unknown } + ): void + onToolCallArgsEnd(toolCallId: string, name: string, args: unknown): void onToolCallStart(toolCallId: string, name: string, args: unknown): void onToolCallStart(name: string, args: unknown): void onToolCallEnd( @@ -244,7 +256,7 @@ export function createOutboundBridge( let currentReasoningRunKey: string | null = null const toolCallsById = new Map< string, - { key: string; runKey: string; args: unknown } + { key: string; runKey: string; args: unknown; argSeq: number } >() const legacyToolCallIdsByName = new Map>() const requireActiveRun = (action: string): string => { @@ -255,6 +267,65 @@ export function createOutboundBridge( } return currentRunKey } + const ensureToolCall = ( + toolCallId: string, + name: string, + opts?: { + args?: unknown + argsPreview?: unknown + status?: `started` | `args_streaming` | `args_complete` | `executing` + } + ): { key: string; runKey: string; args: unknown; argSeq: number } => { + const runKey = requireActiveRun(`ensureToolCall`) + const existing = toolCallsById.get(toolCallId) + if (existing) { + if (opts && (`args` in opts || `argsPreview` in opts || opts.status)) { + const nextArgs = `args` in opts ? opts.args : existing.args + if (`args` in opts) existing.args = opts.args + writeEvent( + entityStateSchema.toolCalls.update({ + key: existing.key, + value: { + tool_call_id: toolCallId, + tool_name: name, + status: opts.status ?? `args_streaming`, + args: nextArgs, + ...(opts.argsPreview !== undefined && { + args_preview: opts.argsPreview, + }), + run_id: existing.runKey, + } as never, + }) as ChangeEvent + ) + } + return existing + } + const key = `tc-${counters.tc++}` + persistSeed() + const created = { + key, + runKey, + args: opts && `args` in opts ? opts.args : undefined, + argSeq: 0, + } + toolCallsById.set(toolCallId, created) + writeEvent( + entityStateSchema.toolCalls.insert({ + key, + value: { + tool_call_id: toolCallId, + tool_name: name, + status: opts?.status ?? `started`, + args: created.args, + ...(opts?.argsPreview !== undefined && { + args_preview: opts.argsPreview, + }), + run_id: runKey, + } as never, + }) as ChangeEvent + ) + return created + } return { onRunStart() { @@ -444,15 +515,69 @@ export function createOutboundBridge( currentReasoningRunKey = null }, + onToolCallArgsStart( + toolCallId: string, + name: string, + argsPreview?: unknown + ) { + ensureToolCall(toolCallId, name, { + status: `started`, + argsPreview, + }) + }, + + onToolCallArgsDelta( + toolCallId: string, + name: string, + delta: string, + opts?: { contentIndex?: number; argsPreview?: unknown } + ) { + let toolCall = toolCallsById.get(toolCallId) + if (toolCall) { + ensureToolCall(toolCallId, name, { + status: `args_streaming`, + argsPreview: opts?.argsPreview, + }) + } else { + toolCall = ensureToolCall(toolCallId, name, { + status: `args_streaming`, + argsPreview: opts?.argsPreview, + }) + } + const seq = toolCall.argSeq++ + writeEvent( + entityStateSchema.toolArgDeltas.insert({ + key: `${toolCall.key}:args-${seq}`, + value: { + tool_call_key: toolCall.key, + tool_call_id: toolCallId, + run_id: toolCall.runKey, + seq, + delta, + ...(opts?.contentIndex !== undefined && { + content_index: opts.contentIndex, + }), + } as never, + }) as ChangeEvent + ) + }, + + onToolCallArgsEnd(toolCallId: string, name: string, args: unknown) { + ensureToolCall(toolCallId, name, { + status: `args_complete`, + args, + }) + }, + onToolCallStart( toolCallIdOrName: string, nameOrArgs: string | unknown, maybeArgs?: unknown ) { - const runKey = requireActiveRun(`onToolCallStart`) - const key = `tc-${counters.tc++}` const legacyCall = maybeArgs === undefined - const toolCallId = legacyCall ? key : toolCallIdOrName + const toolCallId = legacyCall + ? `legacy-tc-${counters.tc}` + : toolCallIdOrName const name = legacyCall ? toolCallIdOrName : (nameOrArgs as string) const args = legacyCall ? nameOrArgs : maybeArgs if (legacyCall) { @@ -460,20 +585,11 @@ export function createOutboundBridge( ids.push(toolCallId) legacyToolCallIdsByName.set(name, ids) } - persistSeed() - toolCallsById.set(toolCallId, { key, runKey, args }) - writeEvent( - entityStateSchema.toolCalls.insert({ - key, - value: { - tool_call_id: toolCallId, - tool_name: name, - status: `started`, - args, - run_id: runKey, - } as never, - }) as ChangeEvent - ) + const existing = toolCallsById.has(toolCallId) + ensureToolCall(toolCallId, name, { + status: existing ? `executing` : `started`, + args, + }) }, onToolCallEnd( diff --git a/packages/agents-runtime/src/pi-adapter.ts b/packages/agents-runtime/src/pi-adapter.ts index 269805a1cb..a1f55691b1 100644 --- a/packages/agents-runtime/src/pi-adapter.ts +++ b/packages/agents-runtime/src/pi-adapter.ts @@ -21,7 +21,6 @@ import type { ChangeEvent } from '@durable-streams/state' import type { AgentEvent, AgentMessage, - AgentTool, StreamFn, } from '@mariozechner/pi-agent-core' import type { @@ -30,7 +29,12 @@ import type { Provider, SimpleStreamOptions, } from '@mariozechner/pi-ai' -import type { LLMContentBlock, LLMMessage, LLMMessageContent } from './types' +import type { + AgentTool, + LLMContentBlock, + LLMMessage, + LLMMessageContent, +} from './types' /** * Split a streamed reasoning blob into `{ title, body }`. @@ -269,6 +273,8 @@ export function createPiAgentAdapter( let reasoningStarted = false let reasoningAccum = `` let abortedRun = false + let activeRunSignal: AbortSignal | undefined + const pendingToolArgDeltaHooks = new Map>() const model = resolvePiModel({ model: opts.model, @@ -284,11 +290,61 @@ export function createPiAgentAdapter( timeoutMs: modelTimeoutMs, maxRetries: modelMaxRetries, }) + const awaitToolArgDeltaHooks = async ( + toolCallId: string | undefined + ): Promise => { + if (!toolCallId) return + await pendingToolArgDeltaHooks.get(toolCallId) + } + const enqueueToolArgDeltaHook = ( + tool: AgentTool, + context: { + toolCallId: string + toolName: string + contentIndex?: number + delta: string + argsPreview?: unknown + }, + logPrefix: string + ): void => { + if (!tool.onArgsDelta) return + const previous = + pendingToolArgDeltaHooks.get(context.toolCallId) ?? Promise.resolve() + const next = previous + .catch(() => undefined) + .then(async () => { + try { + await tool.onArgsDelta?.(context, activeRunSignal) + } catch (error) { + runtimeLog.warn( + logPrefix, + `streaming tool arg hook failed for ${context.toolName}:`, + error + ) + } + }) + pendingToolArgDeltaHooks.set(context.toolCallId, next) + void next.finally(() => { + if (pendingToolArgDeltaHooks.get(context.toolCallId) === next) { + pendingToolArgDeltaHooks.delete(context.toolCallId) + } + }) + } + const agentTools = opts.tools.map( + (tool): AgentTool => ({ + ...tool, + execute: (async (...args: Parameters) => { + const toolCallId = typeof args[0] === `string` ? args[0] : undefined + await awaitToolArgDeltaHooks(toolCallId) + return tool.execute(...args) + }) as AgentTool[`execute`], + }) + ) const agentOptions = { initialState: { systemPrompt: opts.systemPrompt, - tools: opts.tools as Array, + tools: agentTools as Array, messages: history as Array, model, }, @@ -347,7 +403,24 @@ export function createPiAgentAdapter( case `message_update`: { const assistantEvent = (event as Record) .assistantMessageEvent as - | { type: string; delta?: string } + | { + type: string + contentIndex?: number + delta?: string + toolCall?: { + id?: string + name?: string + arguments?: Record + } + partial?: { + content?: Array<{ + type?: string + id?: string + name?: string + arguments?: Record + }> + } + } | undefined if (assistantEvent?.type === `text_delta`) { if (!textStarted) { @@ -392,6 +465,62 @@ export function createPiAgentAdapter( reasoningStarted = false reasoningAccum = `` } + } else if ( + assistantEvent?.type === `toolcall_start` || + assistantEvent?.type === `toolcall_delta` || + assistantEvent?.type === `toolcall_end` + ) { + const contentIndex = assistantEvent.contentIndex + const partialToolCall = + typeof contentIndex === `number` + ? assistantEvent.partial?.content?.[contentIndex] + : undefined + const toolCall = assistantEvent.toolCall ?? partialToolCall + const toolCallId = toolCall?.id + const toolName = toolCall?.name + const argsPreview = toolCall?.arguments + if (toolCallId && toolName) { + if (assistantEvent.type === `toolcall_start`) { + bridge.onToolCallArgsStart( + toolCallId, + toolName, + argsPreview + ) + } else if (assistantEvent.type === `toolcall_delta`) { + const delta = assistantEvent.delta ?? `` + bridge.onToolCallArgsDelta(toolCallId, toolName, delta, { + contentIndex, + argsPreview, + }) + const tool = opts.tools.find( + (candidate) => candidate.name === toolName + ) + if (tool) { + enqueueToolArgDeltaHook( + tool, + { + toolCallId, + toolName, + contentIndex, + delta, + argsPreview, + }, + logPrefix + ) + } + } else { + bridge.onToolCallArgsEnd( + toolCallId, + toolName, + argsPreview + ) + } + } else { + runtimeLog.debug( + logPrefix, + `pi-adapter message_update missing tool call identity type=${assistantEvent.type}` + ) + } } else { runtimeLog.debug( logPrefix, @@ -626,6 +755,7 @@ export function createPiAgentAdapter( async run(input?: string, abortSignal?: AbortSignal): Promise { running = true abortedRun = false + activeRunSignal = abortSignal bridge.onRunStart() @@ -643,6 +773,7 @@ export function createPiAgentAdapter( settled = true clearAbortFallback() running = false + activeRunSignal = undefined abortSignal?.removeEventListener(`abort`, abortRun) unsubscribe() bridge.onRunEnd({ finishReason }) @@ -676,6 +807,7 @@ export function createPiAgentAdapter( if (settled) return settled = true running = false + activeRunSignal = undefined clearAbortFallback() abortSignal?.removeEventListener(`abort`, abortRun) unsubscribe() diff --git a/packages/agents-runtime/src/process-wake.ts b/packages/agents-runtime/src/process-wake.ts index edb7a850a8..5c7e1f4093 100644 --- a/packages/agents-runtime/src/process-wake.ts +++ b/packages/agents-runtime/src/process-wake.ts @@ -69,6 +69,7 @@ interface WakeDeltaWindow { } type FreshKind = `inbox` | `wake` +type LiveWakeHandler = (wake: WakeDeltaWindow) => boolean | Promise interface ClaimCallbackResponse { ok: boolean @@ -330,6 +331,43 @@ function changeEventToWakeEvent( return null } +function combineWakeEvents( + events: Array, + fallbackSource: string +): { wakeEvent: WakeEvent; offset: string | null } | null { + const wakeEvents = events.filter((event) => event.type === `wake`) + if (wakeEvents.length === 0) return null + if (wakeEvents.length === 1) { + const event = wakeEvents[0]! + const wakeEvent = changeEventToWakeEvent(event, fallbackSource) + return wakeEvent + ? { wakeEvent, offset: event.headers.offset ?? null } + : null + } + + const messages = wakeEvents + .map((event) => event.value as WakeMessage | undefined) + .filter((message): message is WakeMessage => message !== undefined) + const sources = [...new Set(messages.map((message) => message.source))] + + return { + wakeEvent: { + source: sources.length === 1 ? sources[0]! : fallbackSource, + type: `wake`, + fromOffset: 0, + toOffset: 0, + eventCount: wakeEvents.length, + payload: { + type: `wake_batch`, + sources, + wakes: messages, + changes: messages.flatMap((message) => message.changes ?? []), + }, + }, + offset: wakeEvents[0]!.headers.offset ?? null, + } +} + function selectWakeFromEvents( events: Array, fallbackSource: string, @@ -366,6 +404,16 @@ function selectWakeFromEvents( return null } +function selectWakeInputFromEvents( + events: Array, + fallbackSource: string, + preferredKind?: FreshKind +): { wakeEvent: WakeEvent; offset: string | null } | null { + return preferredKind === `wake` + ? combineWakeEvents(events, fallbackSource) + : selectWakeFromEvents(events, fallbackSource, preferredKind) +} + function createInFlightTracker() { let count = 0 let resolve: (() => void) | null = null @@ -584,8 +632,11 @@ export async function processWake( detachWrites?: () => Promise close: () => void }> = [] + const electricToolCleanups: Array<() => void | Promise> = [] let liveProcessError: Error | null = null let acceptLiveInputs = false + let liveWakeHandler: LiveWakeHandler | null = null + let liveWakeDrain: Promise | null = null const handledSignalKeys = new Set() const compareOffsets = (left: string, right: string): number => { @@ -1045,7 +1096,7 @@ export async function processWake( return null } - const selectedWake = selectWakeFromEvents( + const selectedWake = selectWakeInputFromEvents( deltaEvents, entityUrl, selectedKind @@ -1076,6 +1127,54 @@ export async function processWake( } } + function scheduleLiveWakeDrain(): void { + if (!liveWakeHandler || liveWakeDrain || queuedNextWake || liveProcessError) + return + + liveWakeDrain = (async () => { + while (!queuedNextWake) { + const handler = liveWakeHandler + if (!handler) return + const nextWake = dequeueNextWakeFromPending() + if (!nextWake) return + + const accepted = await handler(nextWake) + if (!accepted) { + queuedNextWake = nextWake + clearIdleTimer() + idleController?.abort() + return + } + + handleRuntimeSideEffectEvents(nextWake.events) + setSafeAckOffset(nextWake.ackOffset) + } + })() + .catch((err) => { + failBackgroundWake(err, `LIVE_WAKE_HANDLER_FAILED`) + }) + .finally(() => { + liveWakeDrain = null + if ( + liveWakeHandler && + !liveProcessError && + pendingLiveBatches.length > 0 + ) { + scheduleLiveWakeDrain() + } + }) + } + + const registerLiveWakeHandler = (handler: LiveWakeHandler): (() => void) => { + liveWakeHandler = handler + scheduleLiveWakeDrain() + return () => { + if (liveWakeHandler === handler) { + liveWakeHandler = null + } + } + } + function handleLiveBatch(batch: JsonBatch): void { if (!preloaded) { const changeEvents = toChangeEvents(batch) @@ -1108,7 +1207,11 @@ export async function processWake( ...batch, items: changeEvents as Array, }) - queueNextWakeIfReady() + if (liveWakeHandler) { + scheduleLiveWakeDrain() + } else { + queueNextWakeIfReady() + } } try { @@ -1616,7 +1719,11 @@ export async function processWake( ) const initialFromCatchUp = notification.wakeEvent ? null - : selectWakeFromEvents(actionableEventsAtOrAfterNotification, entityUrl) + : selectWakeInputFromEvents( + actionableEventsAtOrAfterNotification, + entityUrl, + getFreshKind(actionableEventsAtOrAfterNotification) ?? undefined + ) if (initialFromCatchUp) { currentWakeEvent = initialFromCatchUp.wakeEvent currentWakeOffset = initialFromCatchUp.offset ?? notificationOffset @@ -1974,7 +2081,7 @@ export async function processWake( if (!skipInitialHandlerPass) { await waitForCurrentWakeInput() currentWakeEvents = computeCurrentNotificationEvents() - const initialWake = selectWakeFromEvents( + const initialWake = selectWakeInputFromEvents( currentWakeEvents, entityUrl, getFreshKind(currentWakeEvents) ?? undefined @@ -2078,6 +2185,7 @@ export async function processWake( ? await config.createElectricTools({ entityUrl, entityType: typeName, + principal: notification.principal, args: entityArgs, db, events: currentWakeEvents, @@ -2107,6 +2215,22 @@ export async function processWake( entityUrl, ...opts, }), + createMarkdownDocument: (opts) => + serverClient.createMarkdownDocument({ + entityUrl, + ...opts, + }), + getMarkdownDocumentConnection: (streamPath) => + serverClient.getMarkdownDocumentConnection(streamPath), + readMarkdownDocumentStream: (streamPath, opts) => + serverClient.readMarkdownDocumentStream(streamPath, opts), + appendMarkdownDocumentUpdate: (streamPath, update) => + serverClient.appendMarkdownDocumentUpdate(streamPath, update), + appendMarkdownDocumentAwareness: (streamPath, update) => + serverClient.appendMarkdownDocumentAwareness(streamPath, update), + registerCleanup: (cleanup) => { + electricToolCleanups.push(cleanup) + }, }) : [] @@ -2149,6 +2273,11 @@ export async function processWake( activeSignalHandler = handler }, hydratedWebhookSourceWake: await hydrateCurrentWebhookSourceWake(), + realtimeStreams: { + baseUrl, + headers: serverHeaders, + }, + registerLiveWakeHandler, doObserve, doSpawn, doFork, @@ -2309,8 +2438,9 @@ export async function processWake( break } - const nextWake = dequeueNextWakeFromPending() + const nextWake = queuedNextWake ?? dequeueNextWakeFromPending() if (nextWake) { + queuedNextWake = null log.info( debugWakeTypes ? `fresh work already pending, continuing in-process (type=${nextWake.wakeEvent.type}, offset=${nextWake.wakeOffset}, ack=${nextWake.ackOffset})` @@ -2352,6 +2482,11 @@ export async function processWake( if (heartbeat) { clearInterval(heartbeat) } + try { + await liveWakeDrain + } catch (err) { + cleanupErrors.push(toError(err)) + } try { await io.drain() } catch (err) { @@ -2374,6 +2509,13 @@ export async function processWake( } catch (err) { cleanupErrors.push(toError(err)) } + for (const cleanup of electricToolCleanups.splice(0).reverse()) { + try { + await cleanup() + } catch (err) { + cleanupErrors.push(toError(err)) + } + } // Updated by the handler-error path before control reaches this async cleanup. if (ackCurrentWakeOnFailure && cleanupErrors.length === 0) { diff --git a/packages/agents-runtime/src/realtime-options.ts b/packages/agents-runtime/src/realtime-options.ts new file mode 100644 index 0000000000..49e138cd48 --- /dev/null +++ b/packages/agents-runtime/src/realtime-options.ts @@ -0,0 +1,144 @@ +export type RealtimeProviderId = `openai` + +export type RealtimeModelChoice = { + id: string + label: string + description: string + recommended?: boolean +} + +export type RealtimeVoiceChoice = { + id: string + label: string + description: string + recommended?: boolean +} + +export type OpenAIRealtimeReasoningEffort = `low` | `medium` | `high` + +export type RealtimeReasoningEffortChoice = { + id: OpenAIRealtimeReasoningEffort + label: string + description: string + recommended?: boolean +} + +export const DEFAULT_OPENAI_REALTIME_MODEL = `gpt-realtime-2` +export const DEFAULT_OPENAI_REALTIME_VOICE = `marin` +export const DEFAULT_OPENAI_REALTIME_REASONING_EFFORT: OpenAIRealtimeReasoningEffort = `low` + +export const OPENAI_REALTIME_MODELS = [ + { + id: `gpt-realtime-2`, + label: `GPT-Realtime-2`, + description: `Strongest realtime reasoning, tool use, and instruction following.`, + recommended: true, + }, + { + id: `gpt-realtime-1.5`, + label: `GPT-Realtime-1.5`, + description: `Fast, reliable speech-to-speech model for audio in, audio out.`, + }, + { + id: `gpt-realtime-mini`, + label: `GPT-Realtime mini`, + description: `Cost-efficient realtime voice model.`, + }, +] as const satisfies ReadonlyArray + +export const OPENAI_REALTIME_VOICES = [ + { + id: `marin`, + label: `Marin`, + description: `OpenAI recommended voice with the strongest naturalness.`, + recommended: true, + }, + { + id: `cedar`, + label: `Cedar`, + description: `OpenAI recommended voice with a distinct, expressive tone.`, + recommended: true, + }, + { + id: `alloy`, + label: `Alloy`, + description: `Balanced general-purpose voice.`, + }, + { + id: `ash`, + label: `Ash`, + description: `Clear general-purpose voice.`, + }, + { + id: `ballad`, + label: `Ballad`, + description: `Warm general-purpose voice.`, + }, + { + id: `coral`, + label: `Coral`, + description: `Bright general-purpose voice.`, + }, + { + id: `echo`, + label: `Echo`, + description: `Steady general-purpose voice.`, + }, + { + id: `sage`, + label: `Sage`, + description: `Calm general-purpose voice.`, + }, + { + id: `shimmer`, + label: `Shimmer`, + description: `Light general-purpose voice.`, + }, + { + id: `verse`, + label: `Verse`, + description: `Expressive general-purpose voice.`, + }, +] as const satisfies ReadonlyArray + +export const OPENAI_REALTIME_REASONING_EFFORTS = [ + { + id: `low`, + label: `Low`, + description: `Lowest recommended latency for production voice agents.`, + recommended: true, + }, + { + id: `medium`, + label: `Medium`, + description: `More reasoning for harder requests, with higher latency.`, + }, + { + id: `high`, + label: `High`, + description: `Deepest reasoning; use only when latency is acceptable.`, + }, +] as const satisfies ReadonlyArray + +export function isOpenAIRealtimeModel(value: unknown): value is string { + return ( + typeof value === `string` && + OPENAI_REALTIME_MODELS.some((model) => model.id === value) + ) +} + +export function isOpenAIRealtimeVoice(value: unknown): value is string { + return ( + typeof value === `string` && + OPENAI_REALTIME_VOICES.some((voice) => voice.id === value) + ) +} + +export function isOpenAIRealtimeReasoningEffort( + value: unknown +): value is OpenAIRealtimeReasoningEffort { + return ( + typeof value === `string` && + OPENAI_REALTIME_REASONING_EFFORTS.some((effort) => effort.id === value) + ) +} diff --git a/packages/agents-runtime/src/realtime.ts b/packages/agents-runtime/src/realtime.ts new file mode 100644 index 0000000000..5916d4ebd6 --- /dev/null +++ b/packages/agents-runtime/src/realtime.ts @@ -0,0 +1,42 @@ +import type { RealtimeProviderConfig, RealtimeProviderEvent } from './types' + +export interface TestRealtimeProviderOptions { + model?: string + events?: Array + response?: string +} + +export function createTestRealtimeProvider( + opts: TestRealtimeProviderOptions = {} +): RealtimeProviderConfig { + return { + id: `test`, + model: opts.model ?? `test-realtime`, + async connect() { + const events = + opts.events ?? + (opts.response != null + ? [ + { type: `session.started` as const }, + { + type: `output_transcript.completed` as const, + text: opts.response, + }, + { type: `response.completed` as const }, + { type: `session.closed` as const }, + ] + : [ + { type: `session.started` as const }, + { type: `session.closed` as const }, + ]) + + return { + events: (async function* () { + for (const event of events) { + yield event + } + })(), + } + }, + } +} diff --git a/packages/agents-runtime/src/runtime-server-client.ts b/packages/agents-runtime/src/runtime-server-client.ts index ec7f713166..61d9abc2d4 100644 --- a/packages/agents-runtime/src/runtime-server-client.ts +++ b/packages/agents-runtime/src/runtime-server-client.ts @@ -9,9 +9,13 @@ import type { AttachmentCreateInput, ClaimTokenHeader, HeadersProvider, + MarkdownDocumentConnection, ManifestAttachmentEntry, + ManifestDocumentEntry, } from './types' import type { EntitySignal } from './entity-schema' +import type { RealtimeSessionStreamRefs } from './entity-schema' +import type { OpenAIRealtimeReasoningEffort } from './realtime-options' import type { WebhookSourceContract, WebhookSourceSubscription, @@ -21,6 +25,13 @@ export type { EntitySignal } from './entity-schema' const ELECTRIC_PRINCIPAL_HEADER = `electric-principal` +function bytesBody(bytes: Uint8Array): ArrayBuffer { + return bytes.buffer.slice( + bytes.byteOffset, + bytes.byteOffset + bytes.byteLength + ) as ArrayBuffer +} + export interface RuntimeServerClientConfig { baseUrl: string fetch?: typeof globalThis.fetch @@ -95,6 +106,38 @@ export interface SendEntityMessageOptions { writeToken?: string } +export interface RealtimeAudioOptions { + codec?: `pcm16` + sampleRate?: number + channels?: number +} + +export interface StartRealtimeSessionOptions { + entityUrl: string + id?: string + provider: string + model: string + voice?: string + reasoningEffort?: OpenAIRealtimeReasoningEffort + interruptResponse?: boolean + inputAudio?: RealtimeAudioOptions + outputAudio?: RealtimeAudioOptions + meta?: Record +} + +export interface RealtimeSessionStartResult { + sessionId: string + entityUrl: string + provider: string + model: string + voice?: string + reasoningEffort?: OpenAIRealtimeReasoningEffort + interruptResponse?: boolean + status: `requested` + startedAt: string + streams: RealtimeSessionStreamRefs +} + export interface RegisterWakeOptions { subscriberUrl: string sourceUrl: string @@ -120,6 +163,9 @@ export interface SignalEntityOptions { export interface RuntimeServerClient { sendEntityMessage: (options: SendEntityMessageOptions) => Promise + startRealtimeSession: ( + options: StartRealtimeSessionOptions + ) => Promise createAttachment: (options: { entityUrl: string attachment: AttachmentCreateInput @@ -128,6 +174,27 @@ export interface RuntimeServerClient { entityUrl: string id: string }) => Promise + createMarkdownDocument: (options: { + entityUrl: string + id?: string + title: string + meta?: Record + }) => Promise<{ txid: string; document: ManifestDocumentEntry }> + getMarkdownDocumentConnection: ( + streamPath: string + ) => Promise + readMarkdownDocumentStream: ( + streamPath: string, + opts?: { offset?: string } + ) => Promise<{ bytes: Uint8Array; offset?: string }> + appendMarkdownDocumentUpdate: ( + streamPath: string, + update: Uint8Array + ) => Promise<{ offset?: string }> + appendMarkdownDocumentAwareness: ( + streamPath: string, + update: Uint8Array + ) => Promise<{ offset?: string }> spawnEntity: (options: SpawnEntityOptions) => Promise /** * Fork an entity at the server-resolved `latest_completed_run` anchor. @@ -386,6 +453,24 @@ export function createRuntimeServerClient( } } + const startRealtimeSession = async ( + options: StartRealtimeSessionOptions + ): Promise => { + const response = await request(`/_electric/realtime/sessions`, { + method: `POST`, + headers: { 'content-type': `application/json` }, + body: JSON.stringify(options), + }) + + if (!response.ok) { + throw new Error( + `startRealtimeSession ${options.entityUrl} failed (${response.status}): ${await readErrorText(response)}` + ) + } + + return (await response.json()) as RealtimeSessionStartResult + } + const createAttachment = async ({ entityUrl, attachment, @@ -457,6 +542,129 @@ export function createRuntimeServerClient( return new Uint8Array(await response.arrayBuffer()) } + const createMarkdownDocument = async ({ + entityUrl, + id, + title, + meta, + }: { + entityUrl: string + id?: string + title: string + meta?: Record + }): Promise<{ txid: string; document: ManifestDocumentEntry }> => { + const response = await request(`${entityRpcPath(entityUrl)}/documents`, { + method: `POST`, + headers: { 'content-type': `application/json` }, + body: JSON.stringify({ id, title, meta }), + }) + if (!response.ok) { + throw new Error( + `create markdown document on ${entityUrl} failed (${response.status}): ${await readErrorText(response)}` + ) + } + return (await response.json()) as { + txid: string + document: ManifestDocumentEntry + } + } + + const getMarkdownDocumentConnection = async ( + streamPath: string + ): Promise => { + const docsIndex = streamPath.indexOf(`/docs/`) + if (docsIndex < 0) { + throw new Error( + `markdown document stream path is missing /docs/: ${streamPath}` + ) + } + const prefix = streamPath.slice(0, docsIndex) + const docId = streamPath.slice(docsIndex + `/docs/`.length) + const headers = await resolveHeaders() + if (config.principalKey) { + headers.set(ELECTRIC_PRINCIPAL_HEADER, config.principalKey) + } + const headerRecord: Record = {} + headers.forEach((value, key) => { + headerRecord[key] = value + }) + return { + baseUrl: appendPathToUrl(config.baseUrl, prefix).replace(/\/+$/, ``), + docId, + headers: headerRecord, + } + } + + const readMarkdownDocumentStream = async ( + streamPath: string, + opts?: { offset?: string } + ): Promise<{ bytes: Uint8Array; offset?: string }> => { + const url = new URL(streamPath, `http://agent-runtime.local`) + if (opts?.offset !== undefined) { + url.searchParams.set(`offset`, opts.offset) + } + const path = `${url.pathname}${url.search}` + const response = await request(path, { method: `GET` }) + if (!response.ok) { + throw new Error( + `read markdown document stream ${path} failed (${response.status}): ${await readErrorText(response)}` + ) + } + return { + bytes: new Uint8Array(await response.arrayBuffer()), + offset: response.headers.get(`stream-next-offset`) ?? undefined, + } + } + + const appendMarkdownDocumentUpdate = async ( + streamPath: string, + update: Uint8Array + ): Promise<{ offset?: string }> => { + const response = await request(streamPath, { + method: `POST`, + headers: { 'content-type': `application/octet-stream` }, + body: bytesBody(update), + }) + if (!response.ok) { + throw new Error( + `append markdown document update ${streamPath} failed (${response.status}): ${await readErrorText(response)}` + ) + } + return { + offset: response.headers.get(`stream-next-offset`) ?? undefined, + } + } + + const appendMarkdownDocumentAwareness = async ( + streamPath: string, + update: Uint8Array + ): Promise<{ offset?: string }> => { + const awarenessPath = `${streamPath}?awareness=default` + const append = () => + request(awarenessPath, { + method: `POST`, + headers: { 'content-type': `application/octet-stream` }, + body: bytesBody(update), + }) + let response = await append() + if (response.status === 404) { + response = await request(awarenessPath, { + method: `PUT`, + headers: { 'content-type': `application/octet-stream` }, + body: bytesBody(update), + }) + if (response.status === 409) response = await append() + } + if (!response.ok) { + throw new Error( + `append markdown document awareness ${streamPath} failed (${response.status}): ${await readErrorText(response)}` + ) + } + return { + offset: response.headers.get(`stream-next-offset`) ?? undefined, + } + } + const getEntity = async (entityUrl: string): Promise => { const response = await request(entityRpcPath(entityUrl), { method: `GET` }) if (!response.ok) { @@ -940,8 +1148,14 @@ export function createRuntimeServerClient( return { sendEntityMessage, + startRealtimeSession, createAttachment, readAttachment, + createMarkdownDocument, + getMarkdownDocumentConnection, + readMarkdownDocumentStream, + appendMarkdownDocumentUpdate, + appendMarkdownDocumentAwareness, spawnEntity, forkEntity, getEntity, diff --git a/packages/agents-runtime/src/setup-context.ts b/packages/agents-runtime/src/setup-context.ts index e4187c3ba5..29f328e5d0 100644 --- a/packages/agents-runtime/src/setup-context.ts +++ b/packages/agents-runtime/src/setup-context.ts @@ -240,6 +240,16 @@ export function createSetupContext( executeSend: executeSendFn, } = config let inSetup = true + let inlineSpawnQueue: Promise = Promise.resolve() + + const runInlineSpawn = async (fn: () => Promise): Promise => { + const run = inlineSpawnQueue.then(fn, fn) + inlineSpawnQueue = run.then( + () => undefined, + () => undefined + ) + return run + } const dbActions = ( @@ -999,61 +1009,63 @@ export function createSetupContext( // ---- Inline wiring (production path) ---- if (wiring) { - // Register spawn handle FIRST, then manifest entry - wakeSession.registerSpawnHandle(id, { - wireDb: () => {}, - updateEntityUrl: (newUrl: string) => { - realEntityUrl = newUrl - wakeSession.registerManifestEntry(childRow(newUrl)) - }, - }) - // Check dedup before creating child - const existingChild = await queryOnce((q) => - q - .from({ manifests: db.collections.manifests }) - .where(({ manifests }) => eq(manifests.key, childKey)) - .findOne() - ) - if (existingChild?.kind === `child`) { - throw new Error( - `[agent-runtime] child "${type}:${id}" already exists — use observe(entity("${existingChild.entity_url}")) to get a handle` + return runInlineSpawn(async () => { + // Register spawn handle FIRST, then manifest entry + wakeSession.registerSpawnHandle(id, { + wireDb: () => {}, + updateEntityUrl: (newUrl: string) => { + realEntityUrl = newUrl + wakeSession.registerManifestEntry(childRow(newUrl)) + }, + }) + // Check dedup before creating child + const existingChild = await queryOnce((q) => + q + .from({ manifests: db.collections.manifests }) + .where(({ manifests }) => eq(manifests.key, childKey)) + .findOne() ) - } - - try { - const { entityUrl: childUrl, streamPath } = - await wiring.createOrGetChild( - type, - id, - spawnArgs ?? {}, - entityUrl, - { - initialMessage: opts?.initialMessage, - initialMessageType: opts?.initialMessageType, - wake: opts?.wake, - tags: opts?.tags, - sandbox: opts?.sandbox, - } + if (existingChild?.kind === `child`) { + throw new Error( + `[agent-runtime] child "${type}:${id}" already exists — use observe(entity("${existingChild.entity_url}")) to get a handle` ) - realEntityUrl = childUrl - wakeSession.registerManifestEntry(childRow(childUrl)) + } - if (observeChild) { - const childDb = await wiring.createChildDb( - `${config.serverBaseUrl}${streamPath}`, - type, - () => {} - ) - handle.db = childDb + try { + const { entityUrl: childUrl, streamPath } = + await wiring.createOrGetChild( + type, + id, + spawnArgs ?? {}, + entityUrl, + { + initialMessage: opts?.initialMessage, + initialMessageType: opts?.initialMessageType, + wake: opts?.wake, + tags: opts?.tags, + sandbox: opts?.sandbox, + } + ) + realEntityUrl = childUrl + wakeSession.registerManifestEntry(childRow(childUrl)) + + if (observeChild) { + const childDb = await wiring.createChildDb( + `${config.serverBaseUrl}${streamPath}`, + type, + () => {} + ) + handle.db = childDb + } + } catch (err) { + spawnError = err instanceof Error ? err : new Error(String(err)) + throw spawnError } - } catch (err) { - spawnError = err instanceof Error ? err : new Error(String(err)) - throw spawnError - } - observeHandleCache.set(realEntityUrl, handle) + observeHandleCache.set(realEntityUrl, handle) - return handle + return handle + }) } // ---- Deferred wiring (unit test path) ---- diff --git a/packages/agents-runtime/src/timeline-context.ts b/packages/agents-runtime/src/timeline-context.ts index 461430da4a..39c8030c3f 100644 --- a/packages/agents-runtime/src/timeline-context.ts +++ b/packages/agents-runtime/src/timeline-context.ts @@ -7,6 +7,7 @@ import type { IncludesContextInserted, IncludesContextRemoved, IncludesInboxMessage, + IncludesRealtimeTranscript, IncludesRun, IncludesSignal, IncludesWakeMessage, @@ -69,12 +70,14 @@ export function buildTimelineMessages(input: { inbox: Array wakes?: Array signals?: Array + realtimeTranscripts?: Array }): Array { return materializeTimeline({ runs: input.runs, inbox: input.inbox, wakes: input.wakes ?? [], signals: input.signals ?? [], + realtimeTranscripts: input.realtimeTranscripts ?? [], contextInserted: [], contextRemoved: [], entities: [], @@ -194,6 +197,21 @@ function renderSignalMessage(signal: Signal): LLMMessage { } } +function isRealtimeSessionWake(payload: unknown): boolean { + if (!payload || typeof payload !== `object`) return false + const changes = (payload as { changes?: unknown }).changes + if (!Array.isArray(changes)) return false + return changes.some((change) => { + if (!change || typeof change !== `object`) return false + const payload = (change as { payload?: unknown }).payload + return ( + !!payload && + typeof payload === `object` && + (payload as { type?: unknown }).type === `realtime_session.started` + ) + }) +} + export function defaultProjection( item: TimelineItem ): Array | null { @@ -202,11 +220,22 @@ export function defaultProjection( return [{ role: `user`, content: projectInboxPayload(item) }] case `wake`: + if (isRealtimeSessionWake(item.payload)) return null return [{ role: `user`, content: asString(item.payload) }] case `signal`: return [renderSignalMessage(item.signal)] + case `realtime_transcript`: + if (item.text.length === 0) return null + if (item.status !== `final`) return null + return [ + { + role: item.direction === `input` ? `user` : `assistant`, + content: item.text, + }, + ] + case `run`: { const messages: Array = [] @@ -341,6 +370,11 @@ export function materializeTimeline( | { kind: `inbox`; order: TimelineOrder; item: IncludesInboxMessage } | { kind: `wake`; order: TimelineOrder; item: IncludesWakeMessage } | { kind: `signal`; order: TimelineOrder; item: IncludesSignal } + | { + kind: `realtime_transcript` + order: TimelineOrder + item: IncludesRealtimeTranscript + } | { kind: `run`; order: TimelineOrder; item: IncludesRun } | { kind: `context_inserted` @@ -371,6 +405,13 @@ export function materializeTimeline( order: item.order, item, })), + ...(data.realtimeTranscripts ?? []) + .filter((item) => item.text.length > 0) + .map((item) => ({ + kind: `realtime_transcript` as const, + order: item.order, + item, + })), ...data.runs.map((item) => ({ kind: `run` as const, order: item.order, @@ -429,6 +470,17 @@ export function materializeTimeline( signal: entry.item, } + case `realtime_transcript`: + return { + kind: `realtime_transcript`, + at: orderToOffset(entry.order), + key: entry.item.key, + sessionId: entry.item.session_id, + direction: entry.item.direction, + text: entry.item.text, + status: entry.item.status, + } + case `run`: return materializeRunItem(entry.item) diff --git a/packages/agents-runtime/src/tools.ts b/packages/agents-runtime/src/tools.ts index e0026c6bf9..a8132d3c4a 100644 --- a/packages/agents-runtime/src/tools.ts +++ b/packages/agents-runtime/src/tools.ts @@ -8,3 +8,4 @@ export { createScheduleTools } from './tools/schedules' export { createWebhookSourceTools } from './tools/webhook-sources' export { createSendTool } from './tools/send' export { createMarkGoalCompleteTool } from './tools/goal-tools' +export { createMarkdownDocumentTools } from './tools/markdown-docs' diff --git a/packages/agents-runtime/src/tools/markdown-docs.ts b/packages/agents-runtime/src/tools/markdown-docs.ts new file mode 100644 index 0000000000..d9354db5fc --- /dev/null +++ b/packages/agents-runtime/src/tools/markdown-docs.ts @@ -0,0 +1,1064 @@ +import { createTwoFilesPatch } from 'diff' +import { Type } from '@sinclair/typebox' +import { + markdownIndexFromRelativePosition, + relativePositionAtMarkdownIndex, +} from '../markdown-yjs' +import { + openMarkdownDocumentSession, + type MarkdownDocumentSession, +} from '../markdown-document-session' +import type { AgentTool, ProcessWakeConfig } from '../types' +import type { ManifestDocumentEntry } from '../entity-schema' +import * as Y from 'yjs' + +type ElectricToolContextBase = Parameters< + NonNullable +>[0] + +type ElectricToolContext = ElectricToolContextBase & { + openMarkdownDocumentSession?: (opts: { + document: ManifestDocumentEntry + entityUrl: string + principal: ElectricToolContextBase[`principal`] + }) => Promise +} + +function docLabel(id: string): string { + return `markdown-doc:${id}` +} + +type InsertMarkdownArgs = { + id: string + content: string + index?: number +} + +type ReplaceMarkdownArgs = { + id: string + content: string + old_string?: string + occurrence?: number + index?: number + length?: number +} + +type SetCursorArgs = { + id: string + index?: number + before?: string + after?: string + occurrence?: number +} + +type InsertSession = { + id?: string + inserted: string + nextIndex?: number + nextPosition?: Y.RelativePosition + seq: number + streamed: boolean + pending: Promise + error?: unknown +} + +type ReplaceSession = InsertSession & { + prepared?: boolean + deleted?: string + deleteIndex?: number + deleteLength?: number + beforeContent?: string +} + +type MaterializedMarkdownDocument = { + document: ManifestDocumentEntry + session: MarkdownDocumentSession + textName: string +} + +function injectedMarkdownDocuments( + args: Readonly> +): Array { + const docs = args.markdownDocs + if (!Array.isArray(docs)) return [] + return docs.filter(isManifestDocumentEntry) +} + +function isManifestDocumentEntry( + value: unknown +): value is ManifestDocumentEntry { + if (!value || typeof value !== `object`) return false + const entry = value as Partial + return ( + entry.kind === `document` && + typeof entry.id === `string` && + entry.provider === `y-durable-streams` && + typeof entry.docPath === `string` && + typeof entry.streamPath === `string` && + entry.transportMimeType === + `application/vnd.electric-agents.markdown-yjs` && + entry.contentMimeType === `text/markdown` && + entry.yTextName === `markdown` && + typeof entry.title === `string` + ) +} + +function asInsertArgs(value: unknown): Partial { + if (!value || typeof value !== `object`) return {} + const input = value as Record + return { + ...(typeof input.id === `string` && { id: input.id }), + ...(typeof input.content === `string` && { content: input.content }), + ...(typeof input.index === `number` && Number.isFinite(input.index) + ? { index: input.index } + : {}), + } +} + +function asReplaceArgs(value: unknown): Partial { + if (!value || typeof value !== `object`) return {} + const input = value as Record + return { + ...(typeof input.id === `string` && { id: input.id }), + ...(typeof input.content === `string` && { content: input.content }), + ...(typeof input.old_string === `string` && { + old_string: input.old_string, + }), + ...(typeof input.occurrence === `number` && + Number.isFinite(input.occurrence) + ? { occurrence: input.occurrence } + : {}), + ...(typeof input.index === `number` && Number.isFinite(input.index) + ? { index: input.index } + : {}), + ...(typeof input.length === `number` && Number.isFinite(input.length) + ? { length: input.length } + : {}), + } +} + +export function createMarkdownDocumentTools( + context: ElectricToolContext +): Array { + const readDocs = new Map() + const insertSessions = new Map() + const replaceSessions = new Map() + const materializedDocs = new Map() + const cursorPositions = new Map() + + const findManifestDocument = ( + id: string + ): ManifestDocumentEntry | undefined => { + const manifests = context.db.collections.manifests?.toArray as + | Array + | undefined + return ( + manifests?.find( + (entry): entry is ManifestDocumentEntry => + isManifestDocumentEntry(entry) && entry.id === id + ) ?? + injectedMarkdownDocuments(context.args).find( + (entry): entry is ManifestDocumentEntry => + isManifestDocumentEntry(entry) && entry.id === id + ) + ) + } + + context.registerCleanup(async () => { + const sessions = Array.from(materializedDocs.values()) + materializedDocs.clear() + await Promise.all( + sessions.map((materialized) => materialized.session.close()) + ) + }) + + const openDocumentSession = async ( + document: ManifestDocumentEntry + ): Promise => { + const cached = materializedDocs.get(document.id) + if (cached) return cached + const session = context.openMarkdownDocumentSession + ? await context.openMarkdownDocumentSession({ + document, + entityUrl: context.entityUrl, + principal: context.principal, + }) + : await openMarkdownDocumentSession({ + document, + connection: await context.getMarkdownDocumentConnection( + document.streamPath + ), + entityUrl: context.entityUrl, + principal: context.principal, + }) + const materialized = { + document, + session, + textName: document.yTextName, + } + materializedDocs.set(document.id, materialized) + readDocs.set(document.id, contentOf(materialized)) + return materialized + } + + const materializeDocument = async ( + id: string + ): Promise => { + const cached = materializedDocs.get(id) + if (cached) return cached + const document = findManifestDocument(id) + if (!document) { + throw new Error( + `Markdown document ${JSON.stringify( + id + )} is not in this entity's manifest or injected document refs. Create it with create_markdown_doc first or pass the document ref to this worker.` + ) + } + return openDocumentSession(document) + } + + const contentOf = (materialized: MaterializedMarkdownDocument): string => + materialized.session.content() + + const appendPresence = async ( + materialized: MaterializedMarkdownDocument, + opts: { anchor?: number; head?: number; clear?: boolean } + ): Promise => { + await materialized.session.setPresence(opts).catch(() => undefined) + } + + const applyInsertChunk = async ( + id: string, + chunk: string, + session: InsertSession, + index?: number + ): Promise => { + const materialized = await materializeDocument(id) + const text = materialized.session.text + const position = + session.nextPosition ?? + (index === undefined ? cursorPositions.get(id) : undefined) + const absolute = position + ? Y.createAbsolutePositionFromRelativePosition( + position, + materialized.session.doc + ) + : null + const insertIndex = + absolute && absolute.type === text + ? Math.max(0, Math.min(absolute.index, text.length)) + : Math.max( + 0, + Math.min( + session.nextIndex ?? (index !== undefined ? index : text.length), + text.length + ) + ) + if (chunk.length > 0) { + materialized.session.doc.transact(() => { + text.insert(insertIndex, chunk) + }, `agent`) + } + const nextIndex = insertIndex + chunk.length + const nextPosition = Y.createRelativePositionFromTypeIndex(text, nextIndex) + await appendPresence(materialized, { + anchor: nextIndex, + head: nextIndex, + }) + session.nextIndex = nextIndex + session.nextPosition = nextPosition + cursorPositions.set(id, nextPosition) + session.streamed = true + readDocs.set(id, contentOf(materialized)) + } + + const setCursor = async ( + id: string, + index: number + ): Promise<{ materialized: MaterializedMarkdownDocument; index: number }> => { + const materialized = await materializeDocument(id) + const text = materialized.session.text + const boundedIndex = Math.max(0, Math.min(index, text.length)) + const position = relativePositionAtMarkdownIndex( + materialized.session.doc, + boundedIndex, + materialized.textName + ) + cursorPositions.set(id, position) + return { materialized, index: boundedIndex } + } + + const resolveCursorIndex = ( + content: string, + args: SetCursorArgs + ): { index?: number; error?: string } => { + const locatorCount = + (args.index !== undefined ? 1 : 0) + + (args.before !== undefined ? 1 : 0) + + (args.after !== undefined ? 1 : 0) + if (locatorCount > 1) { + return { error: `Pass only one of index, before, or after.` } + } + if (args.index !== undefined) return { index: args.index } + const needle = args.before ?? args.after + if (needle === undefined) return { index: content.length } + if (needle.length === 0) { + return { error: `before/after must not be empty.` } + } + const occurrence = Math.max(1, Math.floor(args.occurrence ?? 1)) + let from = 0 + let found = -1 + for (let count = 0; count < occurrence; count += 1) { + found = content.indexOf(needle, from) + if (found < 0) { + return { + error: `Could not find occurrence ${occurrence} of ${JSON.stringify( + needle + )}.`, + } + } + from = found + needle.length + } + return { index: args.after !== undefined ? found + needle.length : found } + } + + const resolveReplaceRange = ( + content: string, + args: Omit + ): { index?: number; length?: number; deleted?: string; error?: string } => { + const hasOldString = args.old_string !== undefined + const hasRange = args.index !== undefined || args.length !== undefined + if (hasOldString && hasRange) { + return { error: `Pass either old_string or index/length, not both.` } + } + if (!hasOldString && !hasRange) { + return { error: `Pass old_string or index/length to choose a range.` } + } + if (hasOldString) { + const oldString = args.old_string! + if (oldString.length === 0) { + return { error: `old_string must not be empty.` } + } + const occurrence = + args.occurrence === undefined + ? undefined + : Math.max(1, Math.floor(args.occurrence)) + let from = 0 + let found = -1 + let count = 0 + while (true) { + const index = content.indexOf(oldString, from) + if (index < 0) break + count += 1 + if (occurrence === undefined || count === occurrence) { + found = index + if (occurrence !== undefined) break + } + from = index + oldString.length + } + if (found < 0) { + return { + error: + occurrence === undefined + ? `old_string not found.` + : `Could not find occurrence ${occurrence} of old_string.`, + } + } + if (occurrence === undefined && count > 1) { + return { + error: `found ${count} matches for old_string; pass occurrence to choose one or provide a more specific old_string.`, + } + } + return { + index: found, + length: oldString.length, + deleted: oldString, + } + } + + if (args.index === undefined || args.length === undefined) { + return { error: `Pass both index and length for explicit ranges.` } + } + const index = Math.max(0, Math.min(Math.floor(args.index), content.length)) + const length = Math.max( + 0, + Math.min(Math.floor(args.length), content.length - index) + ) + if (length === 0) { + return { error: `Replacement range length must be greater than zero.` } + } + return { + index, + length, + deleted: content.slice(index, index + length), + } + } + + const prepareReplaceSession = async ( + session: ReplaceSession, + args: Omit + ): Promise => { + const materialized = await materializeDocument(args.id) + if (session.prepared) return materialized + + const before = contentOf(materialized) + const range = resolveReplaceRange(before, args) + if ( + range.error || + range.index === undefined || + range.length === undefined + ) { + throw new Error(range.error ?? `Could not resolve replacement range.`) + } + + await appendPresence(materialized, { + anchor: range.index, + head: range.index + range.length, + }) + const text = materialized.session.text + const deleteIndex = Math.max(0, Math.min(range.index, text.length)) + const deleteLength = Math.max( + 0, + Math.min(range.length, text.length - deleteIndex) + ) + if (deleteLength > 0) { + materialized.session.doc.transact(() => { + text.delete(deleteIndex, deleteLength) + }, `agent`) + } + const deletePosition = Y.createRelativePositionFromTypeIndex( + text, + deleteIndex + ) + await appendPresence(materialized, { + anchor: deleteIndex, + head: deleteIndex, + }) + + session.id = args.id + session.nextIndex = deleteIndex + session.nextPosition = deletePosition + session.prepared = true + session.deleted = + range.deleted ?? before.slice(range.index, range.index + range.length) + session.deleteIndex = deleteIndex + session.deleteLength = deleteLength + session.beforeContent = before + cursorPositions.set(args.id, deletePosition) + readDocs.set(args.id, contentOf(materialized)) + return materialized + } + + const enqueueInsert = ( + toolCallId: string, + action: (session: InsertSession) => Promise + ): void => { + const session = + insertSessions.get(toolCallId) ?? + ({ + inserted: ``, + seq: 0, + streamed: false, + pending: Promise.resolve(), + } satisfies InsertSession) + insertSessions.set(toolCallId, session) + session.pending = session.pending + .then(() => action(session)) + .catch((error) => { + session.error = error + }) + } + + const awaitInsertSession = async ( + toolCallId: string + ): Promise => { + const session = insertSessions.get(toolCallId) + if (!session) return undefined + await session.pending + if (session.error) throw session.error + return session + } + + const enqueueReplace = ( + toolCallId: string, + action: (session: ReplaceSession) => Promise + ): void => { + const session = + replaceSessions.get(toolCallId) ?? + ({ + inserted: ``, + seq: 0, + streamed: false, + pending: Promise.resolve(), + } satisfies ReplaceSession) + replaceSessions.set(toolCallId, session) + session.pending = session.pending + .then(() => action(session)) + .catch((error) => { + session.error = error + }) + } + + const awaitReplaceSession = async ( + toolCallId: string + ): Promise => { + const session = replaceSessions.get(toolCallId) + if (!session) return undefined + await session.pending + if (session.error) throw session.error + return session + } + + return [ + { + name: `create_markdown_doc`, + label: `Create Markdown Doc`, + description: `Create a collaborative markdown document, persist it as Yjs updates, and add it to this entity's manifest so users can open it in the app. This is not a filesystem file.`, + parameters: Type.Object({ + title: Type.String({ description: `Document title shown in the UI.` }), + content: Type.Optional( + Type.String({ description: `Initial markdown content.` }) + ), + id: Type.Optional( + Type.String({ + description: `Optional stable document id. Use letters, numbers, hyphens, or underscores.`, + }) + ), + }), + execute: async (_toolCallId, params) => { + const { id, title, content } = params as { + id?: string + title: string + content?: string + } + const result = await context.createMarkdownDocument({ + id, + title, + }) + const materialized = await openDocumentSession(result.document) + if (content && content.length > 0) { + await appendPresence(materialized, { anchor: 0, head: 0 }) + materialized.session.doc.transact(() => { + materialized.session.text.delete( + 0, + materialized.session.text.length + ) + materialized.session.text.insert(0, content) + }, `agent`) + await appendPresence(materialized, { + anchor: content.length, + head: content.length, + }) + await materialized.session.flush() + await appendPresence(materialized, { clear: true }) + readDocs.set(result.document.id, contentOf(materialized)) + } + return { + content: [ + { + type: `text` as const, + text: `Created markdown document ${result.document.id}: ${result.document.title}`, + }, + ], + details: { document: result.document, txid: result.txid }, + } + }, + }, + { + name: `set_markdown_doc_cursor`, + label: `Set Markdown Doc Cursor`, + description: `Set the stateful insertion cursor for a collaborative markdown document. The cursor is stored as a Yjs relative position for this wake, so later insert_markdown_doc calls can stream at that position even if the document changes around it. Pass exactly one of index, before, or after; omit all three to place the cursor at the end.`, + parameters: Type.Object({ + id: Type.String({ description: `Document id.` }), + index: Type.Optional( + Type.Number({ + description: `Optional UTF-16 text offset for the cursor.`, + }) + ), + before: Type.Optional( + Type.String({ + description: `Place the cursor before this literal markdown text.`, + }) + ), + after: Type.Optional( + Type.String({ + description: `Place the cursor after this literal markdown text.`, + }) + ), + occurrence: Type.Optional( + Type.Number({ + description: `1-based occurrence for before/after matching. Defaults to 1.`, + }) + ), + }), + execute: async (_toolCallId, params) => { + const args = params as SetCursorArgs + const materialized = await materializeDocument(args.id) + const content = contentOf(materialized) + const resolved = resolveCursorIndex(content, args) + if (resolved.error || resolved.index === undefined) { + return { + content: [ + { + type: `text` as const, + text: `Error: ${resolved.error ?? `could not resolve cursor`}`, + }, + ], + details: { cursorSet: false }, + } + } + const result = await setCursor(args.id, resolved.index) + return { + content: [ + { + type: `text` as const, + text: `Set markdown document ${args.id} cursor at index ${result.index}`, + }, + ], + details: { + document: result.materialized.document, + cursorSet: true, + index: result.index, + }, + } + }, + }, + { + name: `insert_markdown_doc`, + label: `Insert Markdown Doc`, + description: `Insert markdown into a collaborative app document. When the model streams the content argument, the insertion is applied incrementally to the wake-local Yjs document and appended to the document stream so open editors can watch it appear. Put id and optional index before content in the tool arguments. If index is omitted, the current set_markdown_doc_cursor position is used; if no cursor is set, content is appended.`, + parameters: Type.Object({ + id: Type.String({ description: `Document id.` }), + index: Type.Optional( + Type.Number({ + description: `Optional UTF-16 text offset. Omit to append to the end of the current document.`, + }) + ), + content: Type.String({ description: `Markdown content to insert.` }), + }), + onArgsDelta: ({ toolCallId, argsPreview }) => { + const args = asInsertArgs(argsPreview) + if (!args.id || typeof args.content !== `string`) return + enqueueInsert(toolCallId, async (session) => { + session.id = args.id + if (session.nextIndex === undefined && args.index !== undefined) { + session.nextIndex = args.index + } + if (!args.content!.startsWith(session.inserted)) return + const chunk = args.content!.slice(session.inserted.length) + if (chunk.length === 0) return + session.inserted = args.content! + await applyInsertChunk(args.id!, chunk, session, args.index) + session.seq++ + }) + }, + execute: async (toolCallId, params) => { + const { id, content, index } = params as InsertMarkdownArgs + const session = await awaitInsertSession(toolCallId) + let inserted = session?.inserted ?? `` + let streamed = session?.streamed ?? false + let nextIndex = session?.nextIndex ?? index + + if (content !== inserted) { + if (inserted.length === 0 || content.startsWith(inserted)) { + const remaining = + inserted.length === 0 ? content : content.slice(inserted.length) + if (remaining.length > 0) { + const finalSession = + session ?? + ({ + inserted: ``, + seq: 0, + streamed: false, + pending: Promise.resolve(), + } satisfies InsertSession) + await applyInsertChunk(id, remaining, finalSession, nextIndex) + nextIndex = finalSession.nextIndex + inserted = content + streamed = streamed || remaining.length !== content.length + } + } else { + const materialized = materializedDocs.get(id) + if (materialized) { + await appendPresence(materialized, { clear: true }) + } + insertSessions.delete(toolCallId) + return { + content: [ + { + type: `text` as const, + text: `Error: streamed content diverged from final insert content; no final reconciliation was applied.`, + }, + ], + details: { inserted: inserted.length, expected: content.length }, + } + } + } + + const materialized = await materializeDocument(id) + await materialized.session.flush() + await appendPresence(materialized, { clear: true }) + const finalContent = contentOf(materialized) + readDocs.set(id, finalContent) + insertSessions.delete(toolCallId) + return { + content: [ + { + type: `text` as const, + text: `Inserted ${content.length} characters into markdown document ${id}`, + }, + ], + details: { + document: materialized.document, + streamed, + insertedBytes: new TextEncoder().encode(content).length, + nextIndex, + }, + } + }, + executionMode: `sequential`, + }, + { + name: `replace_markdown_doc_range`, + label: `Replace Markdown Doc Range`, + description: `Delete one range from a collaborative app markdown document, then stream or insert replacement markdown at that location. Use old_string for a unique literal match, old_string plus occurrence for repeated text, or index plus length for an explicit UTF-16 range. The agent cursor follows the end of streamed replacement text.`, + parameters: Type.Object({ + id: Type.String({ description: `Document id.` }), + old_string: Type.Optional( + Type.String({ + description: `Literal markdown text to replace. Must be unique unless occurrence is provided.`, + }) + ), + occurrence: Type.Optional( + Type.Number({ + description: `1-based occurrence to replace when old_string appears multiple times.`, + }) + ), + index: Type.Optional( + Type.Number({ + description: `Optional UTF-16 start offset for an explicit replacement range.`, + }) + ), + length: Type.Optional( + Type.Number({ + description: `UTF-16 length for an explicit replacement range. Required when index is used.`, + }) + ), + content: Type.String({ + description: `Replacement markdown content to stream into the deleted range.`, + }), + }), + onArgsDelta: ({ toolCallId, argsPreview }) => { + const args = asReplaceArgs(argsPreview) + if (!args.id || typeof args.content !== `string`) return + if ( + args.old_string === undefined && + (args.index === undefined || args.length === undefined) + ) { + return + } + enqueueReplace(toolCallId, async (session) => { + await prepareReplaceSession(session, { + id: args.id!, + old_string: args.old_string, + occurrence: args.occurrence, + index: args.index, + length: args.length, + }) + if (!args.content!.startsWith(session.inserted)) return + const chunk = args.content!.slice(session.inserted.length) + if (chunk.length === 0) return + session.inserted = args.content! + await applyInsertChunk(args.id!, chunk, session) + session.seq++ + }) + }, + execute: async (toolCallId, params) => { + const args = params as ReplaceMarkdownArgs + const session = + (await awaitReplaceSession(toolCallId)) ?? + ({ + inserted: ``, + seq: 0, + streamed: false, + pending: Promise.resolve(), + } satisfies ReplaceSession) + + try { + await prepareReplaceSession(session, { + id: args.id, + old_string: args.old_string, + occurrence: args.occurrence, + index: args.index, + length: args.length, + }) + } catch (error) { + replaceSessions.delete(toolCallId) + return { + content: [ + { + type: `text` as const, + text: `Error: ${error instanceof Error ? error.message : `could not prepare replacement`}`, + }, + ], + details: { replaced: false }, + } + } + + let inserted = session.inserted + let streamed = session.streamed + let nextIndex = session.nextIndex + if (args.content !== inserted) { + if (inserted.length === 0 || args.content.startsWith(inserted)) { + const remaining = + inserted.length === 0 + ? args.content + : args.content.slice(inserted.length) + if (remaining.length > 0) { + await applyInsertChunk(args.id, remaining, session) + nextIndex = session.nextIndex + inserted = args.content + streamed = streamed || remaining.length !== args.content.length + } + } else { + const materialized = materializedDocs.get(args.id) + if (materialized) { + await appendPresence(materialized, { clear: true }) + } + replaceSessions.delete(toolCallId) + return { + content: [ + { + type: `text` as const, + text: `Error: streamed replacement content diverged from final content; no final reconciliation was applied.`, + }, + ], + details: { + replaced: false, + inserted: inserted.length, + expected: args.content.length, + }, + } + } + } + + const materialized = await materializeDocument(args.id) + await materialized.session.flush() + await appendPresence(materialized, { clear: true }) + const finalContent = contentOf(materialized) + readDocs.set(args.id, finalContent) + replaceSessions.delete(toolCallId) + const diff = createTwoFilesPatch( + docLabel(args.id), + docLabel(args.id), + session.beforeContent ?? finalContent, + finalContent, + undefined, + undefined, + { context: 3 } + ) + return { + content: [ + { + type: `text` as const, + text: `Replaced ${session.deleteLength ?? 0} characters in markdown document ${args.id}`, + }, + ], + details: { + document: materialized.document, + replaced: true, + deleted: session.deleted, + deleteIndex: session.deleteIndex, + deleteLength: session.deleteLength, + streamed, + insertedBytes: new TextEncoder().encode(args.content).length, + nextIndex, + diff, + }, + } + }, + executionMode: `sequential`, + }, + { + name: `read_markdown_doc`, + label: `Read Markdown Doc`, + description: `Read the current plain markdown content from a collaborative app document, not from the filesystem.`, + parameters: Type.Object({ + id: Type.String({ description: `Document id.` }), + }), + execute: async (_toolCallId, params) => { + const { id } = params as { id: string } + const materialized = await materializeDocument(id) + const content = contentOf(materialized) + const cursorIndex = cursorPositions.has(id) + ? markdownIndexFromRelativePosition( + materialized.session.doc, + cursorPositions.get(id)!, + materialized.textName + ) + : undefined + readDocs.set(id, content) + return { + content: [ + { + type: `text` as const, + text: content, + }, + ], + details: { + document: materialized.document, + bytes: new TextEncoder().encode(content).length, + cursorIndex, + }, + } + }, + }, + { + name: `write_markdown_doc`, + label: `Write Markdown Doc`, + description: `Replace the full content of a collaborative app markdown document. This does not write a filesystem file.`, + parameters: Type.Object({ + id: Type.String({ description: `Document id.` }), + content: Type.String({ description: `Full markdown content.` }), + }), + execute: async (_toolCallId, params) => { + const { id, content } = params as { id: string; content: string } + const materialized = await materializeDocument(id) + const before = contentOf(materialized) + await appendPresence(materialized, { anchor: 0, head: 0 }) + materialized.session.doc.transact(() => { + materialized.session.text.delete(0, materialized.session.text.length) + if (content.length > 0) materialized.session.text.insert(0, content) + }, `agent`) + await appendPresence(materialized, { + anchor: content.length, + head: content.length, + }) + await materialized.session.flush() + await appendPresence(materialized, { clear: true }) + readDocs.set(id, content) + const diff = createTwoFilesPatch( + docLabel(id), + docLabel(id), + before, + content, + undefined, + undefined, + { context: 3 } + ) + return { + content: [ + { + type: `text` as const, + text: `Wrote markdown document ${id}`, + }, + ], + details: { document: materialized.document, diff }, + } + }, + executionMode: `sequential`, + }, + { + name: `edit_markdown_doc`, + label: `Edit Markdown Doc`, + description: `Replace text in a collaborative app markdown document by appending a Yjs update, not by writing a filesystem file. Read the document first when you need to inspect current content. By default old_string must occur exactly once; set replace_all to true to replace every occurrence.`, + parameters: Type.Object({ + id: Type.String({ description: `Document id.` }), + old_string: Type.String({ + description: `Literal markdown text to find. Must be unique unless replace_all is true.`, + }), + new_string: Type.String({ description: `Replacement markdown text.` }), + replace_all: Type.Optional( + Type.Boolean({ description: `Replace every occurrence.` }) + ), + }), + execute: async (_toolCallId, params) => { + const { id, old_string, new_string, replace_all } = params as { + id: string + old_string: string + new_string: string + replace_all?: boolean + } + const materialized = await materializeDocument(id) + const before = contentOf(materialized) + + const matches = before.split(old_string).length - 1 + if (matches === 0) { + return { + content: [ + { type: `text` as const, text: `Error: old_string not found` }, + ], + details: { replacements: 0 }, + } + } + if (!replace_all && matches > 1) { + return { + content: [ + { + type: `text` as const, + text: `Error: found ${matches} matches for old_string; pass replace_all=true or provide a more specific old_string.`, + }, + ], + details: { replacements: 0 }, + } + } + + const index = before.indexOf(old_string) + await appendPresence(materialized, { anchor: index, head: index }) + let cursorIndex = index + materialized.session.doc.transact(() => { + if (replace_all) { + let cursor = 0 + while (true) { + const nextIndex = materialized.session.text + .toString() + .indexOf(old_string, cursor) + if (nextIndex < 0) break + materialized.session.text.delete(nextIndex, old_string.length) + materialized.session.text.insert(nextIndex, new_string) + cursor = nextIndex + new_string.length + cursorIndex = cursor + } + } else { + materialized.session.text.delete(index, old_string.length) + materialized.session.text.insert(index, new_string) + cursorIndex = index + new_string.length + } + }, `agent`) + const resultContent = contentOf(materialized) + await appendPresence(materialized, { + anchor: cursorIndex, + head: cursorIndex, + }) + await materialized.session.flush() + await appendPresence(materialized, { clear: true }) + readDocs.set(id, resultContent) + const diff = createTwoFilesPatch( + docLabel(id), + docLabel(id), + before, + resultContent, + undefined, + undefined, + { context: 3 } + ) + return { + content: [ + { + type: `text` as const, + text: `Edited markdown document ${id}: ${matches} replacement${ + matches === 1 ? `` : `s` + }`, + }, + ], + details: { + replacements: matches, + document: materialized.document, + diff, + }, + } + }, + executionMode: `sequential`, + }, + ] +} diff --git a/packages/agents-runtime/src/types.ts b/packages/agents-runtime/src/types.ts index 3ef89c41b2..86a4c633aa 100644 --- a/packages/agents-runtime/src/types.ts +++ b/packages/agents-runtime/src/types.ts @@ -41,12 +41,19 @@ import type { ManifestAttachmentEntry as EntityManifestAttachmentEntry, ManifestChildEntry as EntityManifestChildEntry, ManifestContextEntry as EntityManifestContextEntry, + ManifestDocumentEntry as EntityManifestDocumentEntry, ManifestCronScheduleEntry as EntityManifestCronScheduleEntry, ManifestEffectEntry as EntityManifestEffectEntry, ManifestFutureSendScheduleEntry as EntityManifestFutureSendScheduleEntry, ManifestGoalEntry as EntityManifestGoalEntry, + ManifestRealtimeSessionEntry as EntityManifestRealtimeSessionEntry, ManifestSharedStateEntry as EntityManifestSharedStateEntry, ManifestSourceEntry as EntityManifestSourceEntry, + RealtimeAudioSpan as EntityRealtimeAudioSpan, + RealtimeSession as EntityRealtimeSession, + RealtimeSessionStatus as EntityRealtimeSessionStatus, + RealtimeSessionStreamRefs as EntityRealtimeSessionStreamRefs, + RealtimeTranscript as EntityRealtimeTranscript, Signal as EntitySignalEntry, WakeEntry, } from './entity-schema' @@ -81,6 +88,12 @@ export type EntitiesObservationHandle = ObservationHandle & { db: ObservationStreamDB } +export type MarkdownDocumentConnection = { + baseUrl: string + docId: string + headers?: Record +} + export type JsonValue = | string | number @@ -319,13 +332,20 @@ export type ManifestEntry = EntityManifest export type ManifestAttachmentEntry = EntityManifestAttachmentEntry export type ManifestChildEntry = EntityManifestChildEntry export type ManifestContextEntry = EntityManifestContextEntry +export type ManifestDocumentEntry = EntityManifestDocumentEntry export type ManifestCronScheduleEntry = EntityManifestCronScheduleEntry export type ManifestEffectEntry = EntityManifestEffectEntry export type ManifestFutureSendScheduleEntry = EntityManifestFutureSendScheduleEntry export type ManifestGoalEntry = EntityManifestGoalEntry +export type ManifestRealtimeSessionEntry = EntityManifestRealtimeSessionEntry export type ManifestSourceEntry = EntityManifestSourceEntry export type ManifestSharedStateEntry = EntityManifestSharedStateEntry +export type RealtimeSession = EntityRealtimeSession +export type RealtimeSessionStatus = EntityRealtimeSessionStatus +export type RealtimeSessionStreamRefs = EntityRealtimeSessionStreamRefs +export type RealtimeAudioSpan = EntityRealtimeAudioSpan +export type RealtimeTranscript = EntityRealtimeTranscript export type ContextInserted = EntityContextInserted export type ContextRemoved = EntityContextRemoved export type ContextEntryAttrs = EntityContextEntryAttrs @@ -395,6 +415,15 @@ export type TimelineItem = } | { kind: `wake`; at: number; payload: unknown } | { kind: `signal`; at: number; signal: EntitySignalEntry } + | { + kind: `realtime_transcript` + at: number + key: string + sessionId: string + direction: `input` | `output` + text: string + status: `partial` | `final` + } | { kind: `run` at: number @@ -410,6 +439,7 @@ export type TimelineItem = error: string | null status: | `started` + | `args_streaming` | `args_complete` | `executing` | `completed` @@ -759,6 +789,7 @@ export interface ProcessWakeConfig { createElectricTools?: (context: { entityUrl: string entityType: string + principal?: RuntimePrincipal args: Readonly> db: EntityStreamDBWithActions events: Array @@ -785,6 +816,27 @@ export interface ProcessWakeConfig { unsubscribeFromWebhookSource: (opts: { id: string }) => Promise<{ txid: string }> + createMarkdownDocument: (opts: { + id?: string + title: string + meta?: Record + }) => Promise<{ txid: string; document: ManifestDocumentEntry }> + getMarkdownDocumentConnection: ( + streamPath: string + ) => Promise + readMarkdownDocumentStream: ( + streamPath: string, + opts?: { offset?: string } + ) => Promise<{ bytes: Uint8Array; offset?: string }> + appendMarkdownDocumentUpdate: ( + streamPath: string, + update: Uint8Array + ) => Promise<{ offset?: string }> + appendMarkdownDocumentAwareness: ( + streamPath: string, + update: Uint8Array + ) => Promise<{ offset?: string }> + registerCleanup: (cleanup: () => void | Promise) => void }) => Array | Promise> /** Optional shutdown signal to end idle waits during host teardown. */ shutdownSignal?: AbortSignal @@ -947,7 +999,20 @@ export type AgentRunResult = { usage: { tokens: number; duration: number } } -export type AgentTool = PiAgentTool +export interface ToolArgumentDeltaContext { + toolCallId: string + toolName: string + contentIndex?: number + delta: string + argsPreview?: unknown +} + +export type AgentTool = PiAgentTool & { + onArgsDelta?: ( + context: ToolArgumentDeltaContext, + signal?: AbortSignal + ) => Promise | void +} export type AgentModel = string | Model export interface AgentConfig { @@ -976,6 +1041,220 @@ export interface AgentConfig { testResponses?: TestResponses } +export type RealtimeAudioCodec = `pcm16` + +export interface RealtimeAudioFormat { + codec: RealtimeAudioCodec + sampleRate: number + channels: number +} + +export interface RealtimeInputTranscriptionConfig { + model?: string + language?: string + prompt?: string + delay?: `minimal` | `low` | `medium` | `high` | `xhigh` +} + +export type RealtimeTurnDetectionConfig = + | false + | { type: `none` } + | { + type: `server_vad` + threshold?: number + prefixPaddingMs?: number + silenceDurationMs?: number + createResponse?: boolean + interruptResponse?: boolean + } + | { + type: `semantic_vad` + eagerness?: `low` | `medium` | `high` | `auto` + createResponse?: boolean + interruptResponse?: boolean + } + +export interface RealtimeAudioConfig { + inputFormat?: RealtimeAudioFormat + outputFormat?: RealtimeAudioFormat + inputTranscription?: false | RealtimeInputTranscriptionConfig + turnDetection?: RealtimeTurnDetectionConfig +} + +export interface RealtimeToolPolicy { + direct?: Array + confirm?: Array + delegate?: Array +} + +export interface RealtimeSessionPolicy { + textDuringSession?: `route-to-realtime` + retention?: `forever` +} + +export interface RealtimeContextConfig { + includeTimeline?: boolean +} + +export type RealtimeProviderEvent = + | { type: `session.started`; sessionId?: string } + | { type: `session.updated` } + | { type: `session.closed`; reason?: string } + | { type: `session.error`; error: string; code?: string } + | { + type: `input_audio.speech_started` + audioOffset?: string + turnId?: string + } + | { + type: `input_audio.speech_stopped` + audioOffset?: string + turnId?: string + } + | { + type: `input_audio.committed` + turnId?: string + previousTurnId?: string + } + | { type: `input_transcript.delta`; delta: string; turnId?: string } + | { type: `input_transcript.completed`; text: string; turnId?: string } + | { + type: `output_audio.delta` + audio: Uint8Array + responseId?: string + itemId?: string + } + | { type: `output_audio.completed`; responseId?: string; itemId?: string } + | { + type: `output_transcript.delta` + delta: string + responseId?: string + itemId?: string + contentIndex?: number + transcriptSource?: + | `response.audio_transcript` + | `response.output_audio_transcript` + | `response.output_text` + } + | { + type: `output_transcript.completed` + text?: string + responseId?: string + itemId?: string + contentIndex?: number + transcriptSource?: + | `response.audio_transcript` + | `response.output_audio_transcript` + | `response.output_text` + } + | { type: `response.started`; responseId?: string } + | { type: `response.completed`; responseId?: string } + | { type: `response.cancelled`; responseId?: string } + | { + type: `tool_call.started` + toolCallId: string + name: string + args?: unknown + } + | { + type: `tool_call.arguments_delta` + toolCallId: string + delta: string + } + | { + type: `tool_call.arguments_completed` + toolCallId: string + name: string + args: unknown + } + | { + type: `tool_call.completed` + toolCallId: string + name: string + result: unknown + isError?: boolean + } + +export interface RealtimeProviderConnectInput { + systemPrompt: string + messages: Array + tools: Array + audio?: RealtimeAudioConfig + session?: ManifestRealtimeSessionEntry + signal?: AbortSignal +} + +export interface RealtimeToolResult { + toolCallId: string + name: string + result: unknown + isError?: boolean +} + +export interface RealtimeProviderSession { + events: AsyncIterable + updateSession?: (update: unknown) => Promise + appendInputAudio?: ( + chunk: Uint8Array, + meta?: Record + ) => Promise + clearInputAudio?: () => Promise + commitInputAudio?: () => Promise + sendText?: (text: string) => Promise + sendToolResult?: (result: RealtimeToolResult) => Promise + cancelResponse?: () => Promise + truncateOutputAudio?: (opts: { + itemId: string + audioEndMs: number + }) => Promise + close?: (reason?: string) => Promise +} + +export interface RealtimeProviderConfig { + id: string + model: string + connect: ( + input: RealtimeProviderConnectInput + ) => Promise +} + +export interface RealtimeTranscriptEvent { + key: string + sessionId: string + direction: `input` | `output` + text: string + status: `partial` | `final` + turnId?: string + responseId?: string +} + +export interface RealtimeConfig { + systemPrompt: string + provider: RealtimeProviderConfig + tools?: Array + audio?: RealtimeAudioConfig + toolPolicy?: RealtimeToolPolicy + context?: RealtimeContextConfig + session?: RealtimeSessionPolicy + onTranscript?: (transcript: RealtimeTranscriptEvent) => void | Promise + testResponses?: TestResponses +} + +export type RealtimeRunResult = AgentRunResult + +export interface RealtimeHandle { + run: () => Promise + close: (reason?: string) => Promise + stop: (reason?: string) => Promise + cancelResponse: (opts?: { truncateAudio?: boolean }) => Promise + sendText: (text: string) => Promise +} + +export interface RealtimeHelpers { + activeSession: () => ManifestRealtimeSessionEntry | undefined + sessions: () => Array +} + export type TestResponses = Array | TestResponseFn export type TestResponseFn = ( @@ -1075,6 +1354,7 @@ export interface HandlerContext< */ sandbox: Sandbox useAgent: (config: AgentConfig) => AgentHandle + useRealtime: (config: RealtimeConfig) => RealtimeHandle useContext: (config: UseContextConfig) => void timelineMessages: (opts?: TimelineProjectionOpts) => Array insertContext: (id: string, entry: ContextEntryInput) => void @@ -1090,6 +1370,7 @@ export interface HandlerContext< opts?: { status?: GoalEntry[`status`] } ) => GoalEntry | undefined agent: AgentHandle + realtime: RealtimeHelpers spawn: ( type: string, id: string, diff --git a/packages/agents-runtime/test/electric-agents-client.test.ts b/packages/agents-runtime/test/electric-agents-client.test.ts index b4587f4fd4..858faefff0 100644 --- a/packages/agents-runtime/test/electric-agents-client.test.ts +++ b/packages/agents-runtime/test/electric-agents-client.test.ts @@ -9,6 +9,7 @@ const { mockState } = vi.hoisted(() => ({ ensureCronStream: vi.fn(), registerPgSyncSource: vi.fn(), signalEntity: vi.fn(), + startRealtimeSession: vi.fn(), ensureStream: vi.fn(), createStreamDB: vi.fn(), preload: vi.fn(), @@ -27,6 +28,7 @@ vi.mock(`../src/runtime-server-client`, () => ({ ensureCronStream: mockState.ensureCronStream, registerPgSyncSource: mockState.registerPgSyncSource, signalEntity: mockState.signalEntity, + startRealtimeSession: mockState.startRealtimeSession, ensureStream: mockState.ensureStream, }), })) @@ -55,6 +57,20 @@ describe(`createAgentsClient`, () => { mockState.ensureStream = vi.fn().mockResolvedValue(`/_webhooks/repo`) mockState.createStreamDB = vi.fn() mockState.signalEntity = vi.fn().mockResolvedValue({ txid: 123 }) + mockState.startRealtimeSession = vi.fn().mockResolvedValue({ + sessionId: `rt-1`, + entityUrl: `/horton/demo`, + provider: `openai`, + model: `gpt-realtime-2`, + status: `requested`, + startedAt: `2026-06-09T10:00:00.000Z`, + streams: { + audio_in: `/horton/demo/realtime/rt-1/audio/in`, + audio_out: `/horton/demo/realtime/rt-1/audio/out`, + control_in: `/horton/demo/realtime/rt-1/control/in`, + control_out: `/horton/demo/realtime/rt-1/control/out`, + }, + }) mockState.observedDb = { preload: vi.fn().mockResolvedValue(undefined), collections: { @@ -191,6 +207,31 @@ describe(`createAgentsClient`, () => { }) }) + it(`exposes realtime session start through the server client`, async () => { + const client = createAgentsClient({ + baseUrl: `http://electric-agents.test`, + }) + + await expect( + client.startRealtimeSession({ + entityUrl: `/horton/demo`, + provider: `openai`, + model: `gpt-realtime-2`, + }) + ).resolves.toMatchObject({ + sessionId: `rt-1`, + streams: { + audio_in: `/horton/demo/realtime/rt-1/audio/in`, + }, + }) + + expect(mockState.startRealtimeSession).toHaveBeenCalledWith({ + entityUrl: `/horton/demo`, + provider: `openai`, + model: `gpt-realtime-2`, + }) + }) + it(`observe(webhook(...)) ensures the exact stream before preloading it`, async () => { const client = createAgentsClient({ baseUrl: `http://electric-agents.test/t/tenant-a/v1`, diff --git a/packages/agents-runtime/test/entity-timeline.test.ts b/packages/agents-runtime/test/entity-timeline.test.ts index d84696524c..63f215bff2 100644 --- a/packages/agents-runtime/test/entity-timeline.test.ts +++ b/packages/agents-runtime/test/entity-timeline.test.ts @@ -3,6 +3,7 @@ import { createCollection, createLiveQueryCollection, } from '@durable-streams/state/db' +import { BasicIndex } from '@tanstack/db' import { buildEntityTimelineData, compareTimelineOrders, @@ -38,6 +39,10 @@ function offset(index: number): EventPointer { } } +function emptyOrderableCollection() { + return { toArray: [], __electricRowOffsets: new Map() } +} + describe(`compareTimelineOrders`, () => { it(`compares two numbers`, () => { expect(compareTimelineOrders(1, 2)).toBeLessThan(0) @@ -1596,6 +1601,10 @@ describe(`entity includes query`, () => { const inbox = createSyncCollection(`test-inbox`, takeOffset) const wakes = createSyncCollection(`test-wakes`, takeOffset) const signals = createSyncCollection(`test-signals`, takeOffset) + const realtimeTranscripts = createSyncCollection( + `test-realtime-transcripts`, + takeOffset + ) const contextInserted = createSyncCollection( `test-context-inserted`, takeOffset @@ -1611,6 +1620,24 @@ describe(`entity includes query`, () => { `test-reasoningDeltas`, takeOffset ) + texts.collection.createIndex((row) => row.run_id, { + indexType: BasicIndex, + }) + textDeltas.collection.createIndex((row) => row.text_id, { + indexType: BasicIndex, + }) + textDeltas.collection.createIndex((row) => row.run_id, { + indexType: BasicIndex, + }) + toolCalls.collection.createIndex((row) => row.run_id, { + indexType: BasicIndex, + }) + steps.collection.createIndex((row) => row.run_id, { + indexType: BasicIndex, + }) + errors.collection.createIndex((row) => row.run_id, { + indexType: BasicIndex, + }) return { collections: { runs: runs.collection, @@ -1622,6 +1649,7 @@ describe(`entity includes query`, () => { inbox: inbox.collection, wakes: wakes.collection, signals: signals.collection, + realtimeTranscripts: realtimeTranscripts.collection, contextInserted: contextInserted.collection, contextRemoved: contextRemoved.collection, manifests: manifests.collection, @@ -1639,6 +1667,7 @@ describe(`entity includes query`, () => { inbox: withSeqInjection(inbox, takeSeq), wakes: withSeqInjection(wakes, takeSeq), signals: withSeqInjection(signals, takeSeq), + realtimeTranscripts: withSeqInjection(realtimeTranscripts, takeSeq), contextInserted: withSeqInjection(contextInserted, takeSeq), contextRemoved: withSeqInjection(contextRemoved, takeSeq), manifests: withSeqInjection(manifests, takeSeq), @@ -1934,6 +1963,55 @@ describe(`entity includes query`, () => { expect(rows[1]?.annotation?.note).toBe(`between`) }) + it(`orders live run rows by their first visible item`, async () => { + const { collections, sync } = createEntityCollections() + const queryFn = createEntityTimelineQuery({ collections } as any) + const liveQuery = createLiveQueryCollection({ + query: queryFn, + startSync: true, + }) + await liveQuery.preload() + + sync.runs.insert({ + key: `run-1`, + status: `started`, + _timeline_order: order(1), + }) + sync.realtimeTranscripts.insert({ + key: `rt-in-1`, + session_id: `rt-1`, + direction: `input`, + text: `Find the latest Electric Agents post`, + status: `final`, + audio_stream: `/horton/test/realtime/rt-1/audio/in`, + created_at: `2026-06-09T14:56:00.000Z`, + _timeline_order: order(2), + }) + sync.toolCalls.insert({ + key: `tc-1`, + run_id: `run-1`, + tool_call_id: `tc-1`, + tool_name: `web_search`, + status: `completed`, + args: { query: `most recent blog post Electric Agents site` }, + result: `https://electric.ax/blog/2026/04/29/introducing-electric-agents`, + _timeline_order: order(3), + }) + await new Promise((r) => setTimeout(r, 50)) + + const rows = getData(liveQuery) + expect( + rows.map((row) => + row.realtimeTranscript + ? `realtimeTranscript:${row.realtimeTranscript.key}` + : row.run + ? `run:${row.run.key}` + : `other` + ) + ).toEqual([`realtimeTranscript:rt-in-1`, `run:run-1`]) + expect(rows[1]?.run.order).toBe(order(3)) + }) + it(`projects related entities from one manifest row per related entity`, () => { const timeline = buildEntityTimelineData({ collections: { @@ -1946,6 +2024,7 @@ describe(`entity includes query`, () => { inbox: { toArray: [] }, wakes: { toArray: [] }, signals: { toArray: [] }, + realtimeTranscripts: emptyOrderableCollection(), contextInserted: { toArray: [], __electricRowOffsets: new Map() }, contextRemoved: { toArray: [], __electricRowOffsets: new Map() }, manifests: { @@ -2054,6 +2133,7 @@ describe(`entity includes query`, () => { }, wakes: { toArray: [], __electricRowOffsets: new Map() }, signals: { toArray: [], __electricRowOffsets: new Map() }, + realtimeTranscripts: emptyOrderableCollection(), contextInserted: { toArray: [], __electricRowOffsets: new Map() }, contextRemoved: { toArray: [], __electricRowOffsets: new Map() }, manifests: { toArray: [], __electricRowOffsets: new Map() }, @@ -2079,6 +2159,7 @@ describe(`entity includes query`, () => { inbox: { toArray: [], __electricRowOffsets: new Map() }, wakes: { toArray: [], __electricRowOffsets: new Map() }, signals: { toArray: [], __electricRowOffsets: new Map() }, + realtimeTranscripts: emptyOrderableCollection(), contextInserted: { toArray: [ { @@ -2196,6 +2277,7 @@ describe(`entity includes query`, () => { inbox: { toArray: [] }, wakes: { toArray: [] }, signals: { toArray: [] }, + realtimeTranscripts: emptyOrderableCollection(), contextInserted: { toArray: [], __electricRowOffsets: new Map() }, contextRemoved: { toArray: [], __electricRowOffsets: new Map() }, manifests: { @@ -2305,6 +2387,7 @@ describe(`entity includes query`, () => { inbox: { toArray: [] }, wakes: { toArray: [] }, signals: { toArray: [] }, + realtimeTranscripts: emptyOrderableCollection(), contextInserted: { toArray: [], __electricRowOffsets: new Map() }, contextRemoved: { toArray: [], __electricRowOffsets: new Map() }, manifests: { diff --git a/packages/agents-runtime/test/helpers/context-test-helpers.ts b/packages/agents-runtime/test/helpers/context-test-helpers.ts index beaf867202..eb475c9959 100644 --- a/packages/agents-runtime/test/helpers/context-test-helpers.ts +++ b/packages/agents-runtime/test/helpers/context-test-helpers.ts @@ -304,6 +304,13 @@ export function createTestHandlerContext( wakeEvent?: WakeEvent hydratedWebhookSourceWake?: HydratedWebhookSourceWake | null prepareAgentRun?: () => Promise + realtimeStreams?: { + baseUrl: string + headers?: Record + } + registerLiveWakeHandler?: Parameters< + typeof createHandlerContext + >[0][`registerLiveWakeHandler`] } = {} ) { const db = opts.db ?? buildStreamFixture([]) @@ -334,6 +341,8 @@ export function createTestHandlerContext( payload: `hi`, }, hydratedWebhookSourceWake: opts.hydratedWebhookSourceWake, + realtimeStreams: opts.realtimeStreams, + registerLiveWakeHandler: opts.registerLiveWakeHandler, prepareAgentRun: opts.prepareAgentRun, doObserve: vi.fn(), doSpawn: vi.fn(), diff --git a/packages/agents-runtime/test/markdown-docs-tools.test.ts b/packages/agents-runtime/test/markdown-docs-tools.test.ts new file mode 100644 index 0000000000..a1bb7d571c --- /dev/null +++ b/packages/agents-runtime/test/markdown-docs-tools.test.ts @@ -0,0 +1,608 @@ +import { describe, expect, it, vi } from 'vitest' +import * as decoding from 'lib0/decoding' +import { Awareness, applyAwarenessUpdate } from 'y-protocols/awareness' +import * as Y from 'yjs' +import { + createMarkdownYDoc, + encodeMarkdownAwarenessUpdate, + frameYjsUpdate, + markdownText, +} from '../src/markdown-yjs' +import { createMarkdownDocumentTools } from '../src/tools/markdown-docs' + +function concatBytes(a: Uint8Array, b: Uint8Array): Uint8Array { + const next = new Uint8Array(a.length + b.length) + next.set(a, 0) + next.set(b, a.length) + return next +} + +function concatFrames(frames: Array): Uint8Array { + return frames.reduce( + (bytes, frame) => concatBytes(bytes, frame), + new Uint8Array() + ) +} + +function streamBytesFromContent(content: string): Uint8Array { + const doc = new Y.Doc() + markdownText(doc).insert(0, content) + return frameYjsUpdate(Y.encodeStateAsUpdate(doc)) +} + +function contentFromStream(streamBytes: Uint8Array): string { + return markdownText(createMarkdownYDoc(streamBytes)).toString() +} + +async function waitForCondition( + predicate: () => boolean, + message: string +): Promise { + for (let i = 0; i < 20; i += 1) { + if (predicate()) return + await new Promise((resolve) => setTimeout(resolve, 0)) + } + throw new Error(message) +} + +function applyFramedAwarenessUpdate( + awareness: Awareness, + data: Uint8Array +): void { + const decoder = decoding.createDecoder(data) + while (decoding.hasContent(decoder)) { + applyAwarenessUpdate(awareness, decoding.readVarUint8Array(decoder), `test`) + } +} + +function cursorHeadIndexFromAwarenessFrame( + doc: Y.Doc, + frame: Uint8Array +): number | undefined { + const awareness = new Awareness(new Y.Doc()) + applyFramedAwarenessUpdate(awareness, frame) + for (const state of awareness.getStates().values()) { + const cursor = ( + state as { + cursor?: { head?: Y.RelativePosition; anchor?: Y.RelativePosition } + } + ).cursor + if (!cursor?.head) continue + const absolute = Y.createAbsolutePositionFromRelativePosition( + cursor.head, + doc + ) + return absolute?.index + } + return undefined +} + +function usersFromAwarenessFrame( + frame: Uint8Array +): Array> { + const awareness = new Awareness(new Y.Doc()) + applyFramedAwarenessUpdate(awareness, frame) + return Array.from(awareness.getStates().values()).flatMap((state) => { + const user = (state as { user?: Record }).user + return user ? [user] : [] + }) +} + +function createToolContext( + opts: { + manifestDocuments?: Array + markdownDocs?: Array + entityUrl?: string + principalUrl?: string + } = {} +) { + const document = { + key: `document:notes`, + kind: `document`, + id: `notes`, + provider: `y-durable-streams`, + docId: `agents/chat/session/documents/notes`, + docPath: `agents/chat/session/documents/notes`, + streamPath: `/v1/yjs/default/docs/agents/chat/session/documents/notes`, + transportMimeType: `application/vnd.electric-agents.markdown-yjs`, + contentMimeType: `text/markdown`, + yTextName: `markdown`, + title: `Notes`, + createdAt: `2026-06-07T00:00:00.000Z`, + } as const + let streamFrames = [streamBytesFromContent(`# Notes\n\nFirst line\n`)] + const awarenessFrames: Array = [] + const openSessions: Array<{ + doc: Y.Doc + off: () => void + }> = [] + const cleanupCallbacks: Array<() => void | Promise> = [] + const context: any = { + entityUrl: opts.entityUrl ?? `/chat/session`, + entityType: `chat`, + principal: { + url: opts.principalUrl ?? `/principal/agent:horton`, + kind: `agent`, + }, + args: { + ...(opts.markdownDocs ? { markdownDocs: opts.markdownDocs } : {}), + }, + db: { + collections: { + manifests: { toArray: opts.manifestDocuments ?? [document] }, + }, + }, + events: [], + createMarkdownDocument: vi.fn( + async (opts: { id?: string; title: string }) => { + streamFrames = [] + return { + txid: `tx-create`, + document: { + ...document, + id: opts.id ?? document.id, + title: opts.title, + }, + } + } + ), + getMarkdownDocumentConnection: vi.fn(async () => ({ + baseUrl: `http://test.local/v1/yjs/default`, + docId: document.docId, + headers: {}, + })), + openMarkdownDocumentSession: vi.fn( + async ({ document, entityUrl }: { document: any; entityUrl: string }) => { + const ydoc = createMarkdownYDoc(concatFrames(streamFrames)) + const text = markdownText(ydoc, document.yTextName) + const onUpdate = (update: Uint8Array, origin: unknown): void => { + if (origin === `server`) return + void context.appendMarkdownDocumentUpdate( + document.streamPath, + frameYjsUpdate(update) + ) + } + ydoc.on(`update`, onUpdate) + openSessions.push({ + doc: ydoc, + off: () => ydoc.off(`update`, onUpdate), + }) + const principalUrl = `/principal/entity:${encodeURIComponent( + entityUrl + )}` + const presenceName = + entityUrl.split(`/`).filter(Boolean).at(-1) ?? entityUrl + return { + document, + doc: ydoc, + text, + textName: document.yTextName, + content: () => text.toString(), + setPresence: vi.fn( + async (presence: { + anchor?: number + head?: number + clear?: boolean + }) => { + void context.appendMarkdownDocumentAwareness( + document.streamPath, + encodeMarkdownAwarenessUpdate({ + doc: ydoc, + docPath: document.docPath, + principalUrl, + clientKey: `${principalUrl}\0${entityUrl}`, + name: presenceName, + role: `agent`, + anchor: presence.anchor, + head: presence.head, + clear: presence.clear, + color: `#000000`, + colorLight: `#00000033`, + textName: document.yTextName, + }) + ) + } + ), + flush: vi.fn(async () => {}), + close: vi.fn(async () => { + ydoc.off(`update`, onUpdate) + ydoc.destroy() + }), + } + } + ), + readMarkdownDocumentStream: vi.fn( + async (_streamPath: string, opts?: { offset?: string }) => { + const offset = + opts?.offset !== undefined ? Number.parseInt(opts.offset, 10) : 0 + const start = Number.isFinite(offset) && offset >= 0 ? offset : 0 + return { + bytes: concatFrames(streamFrames.slice(start)), + offset: String(streamFrames.length), + } + } + ), + appendMarkdownDocumentUpdate: vi.fn( + async (_streamPath: string, update: Uint8Array) => { + streamFrames.push(update) + return { offset: String(streamFrames.length) } + } + ), + appendMarkdownDocumentAwareness: vi.fn( + async (_streamPath: string, update: Uint8Array) => { + awarenessFrames.push(update) + return {} + } + ), + registerCleanup: vi.fn((cleanup: () => void | Promise) => { + cleanupCallbacks.push(cleanup) + }), + upsertCronSchedule: vi.fn(), + upsertFutureSendSchedule: vi.fn(), + deleteSchedule: vi.fn(), + listEventSources: vi.fn(), + subscribeToEventSource: vi.fn(), + unsubscribeFromEventSource: vi.fn(), + } + return { + context, + getContent: () => contentFromStream(concatFrames(streamFrames)), + getDoc: () => createMarkdownYDoc(concatFrames(streamFrames)), + getAwarenessFrames: () => awarenessFrames, + appendExternalText: (text: string) => { + const streamBytes = concatFrames(streamFrames) + const doc = createMarkdownYDoc(streamBytes) + const yText = markdownText(doc) + const before = Y.encodeStateVector(doc) + yText.insert(yText.length, text) + const update = Y.encodeStateAsUpdate(doc, before) + streamFrames.push(frameYjsUpdate(update)) + for (const session of openSessions) { + Y.applyUpdate(session.doc, update, `server`) + } + doc.destroy() + }, + cleanup: async () => { + for (const cleanup of cleanupCallbacks) await cleanup() + for (const session of openSessions) session.off() + }, + document, + } +} + +describe(`markdown document tools`, () => { + it(`uses the optional awareness client key to distinguish same-principal editors`, () => { + const doc = new Y.Doc() + markdownText(doc).insert(0, `hello`) + const awareness = new Awareness(new Y.Doc()) + + applyFramedAwarenessUpdate( + awareness, + encodeMarkdownAwarenessUpdate({ + doc, + docPath: `agents/chat/session/documents/notes`, + principalUrl: `/principal/agent:horton`, + clientKey: `/principal/agent:horton\0/chat/session`, + name: `horton`, + role: `agent`, + color: `#000000`, + colorLight: `#00000033`, + }) + ) + applyFramedAwarenessUpdate( + awareness, + encodeMarkdownAwarenessUpdate({ + doc, + docPath: `agents/chat/session/documents/notes`, + principalUrl: `/principal/agent:horton`, + clientKey: `/principal/agent:horton\0/worker/one`, + name: `worker`, + role: `agent`, + color: `#111111`, + colorLight: `#11111133`, + }) + ) + + const remoteStates = Array.from(awareness.getStates()).filter( + ([clientId]) => clientId !== awareness.clientID + ) + expect(remoteStates).toHaveLength(2) + }) + + it(`creates the server document empty and appends initial content as a Yjs update`, async () => { + const { context, getContent } = createToolContext() + const create = createMarkdownDocumentTools(context).find( + (tool) => tool.name === `create_markdown_doc` + )! + + await create.execute(`tool-create`, { + id: `notes`, + title: `Notes`, + content: `# Created\n\nInitial content`, + }) + + expect(context.createMarkdownDocument).toHaveBeenCalledWith({ + id: `notes`, + title: `Notes`, + }) + expect(context.appendMarkdownDocumentUpdate).toHaveBeenCalledTimes(1) + expect(getContent()).toBe(`# Created\n\nInitial content`) + }) + + it(`materializes and edits markdown documents through Yjs stream updates`, async () => { + const { context, getContent } = createToolContext() + const edit = createMarkdownDocumentTools(context).find( + (tool) => tool.name === `edit_markdown_doc` + )! + + const result = await edit.execute(`tool-edit`, { + id: `notes`, + old_string: `First`, + new_string: `Second`, + }) + + expect(context.appendMarkdownDocumentUpdate).toHaveBeenCalledTimes(1) + expect(context.appendMarkdownDocumentAwareness).toHaveBeenCalled() + expect(getContent()).toContain(`Second line`) + expect(result.details).toMatchObject({ replacements: 1 }) + }) + + it(`reads injected markdown document refs without a local manifest entry`, async () => { + const base = createToolContext() + const { context } = createToolContext({ + manifestDocuments: [], + markdownDocs: [base.document], + entityUrl: `/worker/subagent`, + }) + const read = createMarkdownDocumentTools(context).find( + (tool) => tool.name === `read_markdown_doc` + )! + + const result = await read.execute(`tool-read-injected`, { id: `notes` }) + + expect(context.openMarkdownDocumentSession).toHaveBeenCalledWith( + expect.objectContaining({ + document: base.document, + entityUrl: `/worker/subagent`, + }) + ) + expect((result.content[0] as { text: string }).text).toContain(`# Notes`) + }) + + it(`edits a read document and returns a diff`, async () => { + const { context, getContent } = createToolContext() + const tools = createMarkdownDocumentTools(context) + const read = tools.find((tool) => tool.name === `read_markdown_doc`)! + const edit = tools.find((tool) => tool.name === `edit_markdown_doc`)! + + await read.execute(`tool-read`, { id: `notes` }) + const result = await edit.execute(`tool-edit`, { + id: `notes`, + old_string: `First line`, + new_string: `Second line`, + }) + + expect(context.appendMarkdownDocumentUpdate).toHaveBeenCalledTimes(1) + expect(getContent()).toContain(`Second line`) + expect(result.details).toMatchObject({ replacements: 1 }) + expect(String((result.details as any).diff)).toContain(`Second line`) + }) + + it(`streams insert_markdown_doc content deltas before final execution`, async () => { + const { context, getContent, getDoc, getAwarenessFrames } = + createToolContext() + const insert = createMarkdownDocumentTools(context).find( + (tool) => tool.name === `insert_markdown_doc` + )! + + await insert.onArgsDelta?.({ + toolCallId: `tool-insert`, + toolName: `insert_markdown_doc`, + delta: `"Hello`, + argsPreview: { id: `notes`, content: `Hello` }, + }) + await waitForCondition( + () => context.appendMarkdownDocumentAwareness.mock.calls.length === 1, + `expected first streamed insert presence update` + ) + expect(context.appendMarkdownDocumentAwareness).toHaveBeenCalledTimes(1) + expect( + cursorHeadIndexFromAwarenessFrame(getDoc(), getAwarenessFrames().at(-1)!) + ).toBe(getContent().length) + + await insert.onArgsDelta?.({ + toolCallId: `tool-insert`, + toolName: `insert_markdown_doc`, + delta: ` world"`, + argsPreview: { id: `notes`, content: `Hello world` }, + }) + await waitForCondition( + () => context.appendMarkdownDocumentAwareness.mock.calls.length === 2, + `expected second streamed insert presence update` + ) + expect(context.appendMarkdownDocumentAwareness).toHaveBeenCalledTimes(2) + expect( + cursorHeadIndexFromAwarenessFrame(getDoc(), getAwarenessFrames().at(-1)!) + ).toBe(getContent().length) + + const result = await insert.execute(`tool-insert`, { + id: `notes`, + content: `Hello world`, + }) + + expect(context.appendMarkdownDocumentUpdate).toHaveBeenCalledTimes(2) + expect(context.appendMarkdownDocumentAwareness).toHaveBeenCalledTimes(3) + expect(getContent()).toContain(`Hello world`) + expect(result.details).toMatchObject({ streamed: true }) + }) + + it(`labels streamed markdown presence with the agent entity name`, async () => { + const { context, getAwarenessFrames } = createToolContext({ + entityUrl: `/worker/expand-act-dsnosb`, + principalUrl: `/principal/system:dev-local`, + }) + const insert = createMarkdownDocumentTools(context).find( + (tool) => tool.name === `insert_markdown_doc` + )! + + await insert.onArgsDelta?.({ + toolCallId: `tool-insert-labelled`, + toolName: `insert_markdown_doc`, + delta: `"Draft`, + argsPreview: { id: `notes`, content: `Draft` }, + }) + await waitForCondition( + () => context.appendMarkdownDocumentAwareness.mock.calls.length === 1, + `expected streamed insert presence update` + ) + + const users = usersFromAwarenessFrame(getAwarenessFrames().at(-1)!) + expect(users).toHaveLength(1) + expect(users[0]).toMatchObject({ + name: `expand-act-dsnosb`, + principalUrl: `/principal/entity:%2Fworker%2Fexpand-act-dsnosb`, + role: `agent`, + }) + }) + + it(`streams insert_markdown_doc at a saved Yjs-relative cursor`, async () => { + const { context, getContent } = createToolContext() + const tools = createMarkdownDocumentTools(context) + const setCursor = tools.find( + (tool) => tool.name === `set_markdown_doc_cursor` + )! + const insert = tools.find((tool) => tool.name === `insert_markdown_doc`)! + + const cursorResult = await setCursor.execute(`tool-cursor`, { + id: `notes`, + after: `# Notes\n`, + }) + expect(cursorResult.details).toMatchObject({ + cursorSet: true, + index: `# Notes\n`.length, + }) + + await insert.onArgsDelta?.({ + toolCallId: `tool-insert-cursor`, + toolName: `insert_markdown_doc`, + delta: `"Inserted`, + argsPreview: { id: `notes`, content: `Inserted` }, + }) + await insert.onArgsDelta?.({ + toolCallId: `tool-insert-cursor`, + toolName: `insert_markdown_doc`, + delta: ` text\n"`, + argsPreview: { id: `notes`, content: `Inserted text\n` }, + }) + + await insert.execute(`tool-insert-cursor`, { + id: `notes`, + content: `Inserted text\n`, + }) + + expect(getContent()).toBe(`# Notes\nInserted text\n\nFirst line\n`) + expect(context.appendMarkdownDocumentUpdate).toHaveBeenCalledTimes(2) + }) + + it(`replaces a markdown range with one delete update and one insert update`, async () => { + const { context, getContent } = createToolContext() + const replace = createMarkdownDocumentTools(context).find( + (tool) => tool.name === `replace_markdown_doc_range` + )! + + const result = await replace.execute(`tool-replace`, { + id: `notes`, + old_string: `First line`, + content: `Replacement line`, + }) + + expect(getContent()).toBe(`# Notes\n\nReplacement line\n`) + expect(context.appendMarkdownDocumentUpdate).toHaveBeenCalledTimes(2) + expect(context.appendMarkdownDocumentAwareness).toHaveBeenCalledTimes(4) + expect(result.details).toMatchObject({ + replaced: true, + deleted: `First line`, + streamed: false, + }) + }) + + it(`streams replace_markdown_doc_range replacement content at the deleted range`, async () => { + const { context, getContent, getDoc, getAwarenessFrames } = + createToolContext() + const replace = createMarkdownDocumentTools(context).find( + (tool) => tool.name === `replace_markdown_doc_range` + )! + + await replace.onArgsDelta?.({ + toolCallId: `tool-stream-replace`, + toolName: `replace_markdown_doc_range`, + delta: `"Replacement`, + argsPreview: { + id: `notes`, + old_string: `First line`, + content: `Replacement`, + }, + }) + await waitForCondition( + () => context.appendMarkdownDocumentAwareness.mock.calls.length === 3, + `expected replacement delete and first streamed insert presence updates` + ) + expect(getContent()).toBe(`# Notes\n\nReplacement\n`) + expect( + cursorHeadIndexFromAwarenessFrame(getDoc(), getAwarenessFrames().at(-1)!) + ).toBe(getContent().length - 1) + + await replace.onArgsDelta?.({ + toolCallId: `tool-stream-replace`, + toolName: `replace_markdown_doc_range`, + delta: ` line"`, + argsPreview: { + id: `notes`, + old_string: `First line`, + content: `Replacement line`, + }, + }) + await waitForCondition( + () => context.appendMarkdownDocumentAwareness.mock.calls.length === 4, + `expected second streamed replacement presence update` + ) + expect(getContent()).toBe(`# Notes\n\nReplacement line\n`) + expect( + cursorHeadIndexFromAwarenessFrame(getDoc(), getAwarenessFrames().at(-1)!) + ).toBe(getContent().length - 1) + + const result = await replace.execute(`tool-stream-replace`, { + id: `notes`, + old_string: `First line`, + content: `Replacement line`, + }) + + expect(context.appendMarkdownDocumentUpdate).toHaveBeenCalledTimes(3) + expect(context.appendMarkdownDocumentAwareness).toHaveBeenCalledTimes(5) + expect(result.details).toMatchObject({ + replaced: true, + streamed: true, + deleted: `First line`, + }) + }) + + it(`refreshes a cached Yjs document from the stream before editing`, async () => { + const { context, getContent, appendExternalText } = createToolContext() + const tools = createMarkdownDocumentTools(context) + const read = tools.find((tool) => tool.name === `read_markdown_doc`)! + const edit = tools.find((tool) => tool.name === `edit_markdown_doc`)! + + await read.execute(`tool-read`, { id: `notes` }) + appendExternalText(`External line\n`) + + await edit.execute(`tool-edit`, { + id: `notes`, + old_string: `External line`, + new_string: `Refreshed line`, + }) + + expect(getContent()).toContain(`Refreshed line`) + expect(context.readMarkdownDocumentStream).not.toHaveBeenCalled() + expect(context.openMarkdownDocumentSession).toHaveBeenCalledTimes(1) + }) +}) diff --git a/packages/agents-runtime/test/openai-realtime.test.ts b/packages/agents-runtime/test/openai-realtime.test.ts new file mode 100644 index 0000000000..f2a096f861 --- /dev/null +++ b/packages/agents-runtime/test/openai-realtime.test.ts @@ -0,0 +1,1004 @@ +import { Type } from '@sinclair/typebox' +import { describe, expect, it, vi } from 'vitest' +import { createOpenAIRealtimeProvider } from '../src/openai-realtime' +import type { AgentTool, RealtimeProviderEvent } from '../src/types' + +type Listener = (...args: Array) => void + +class FakeWebSocket { + static instances: Array = [] + + readonly sent: Array = [] + readonly listeners = new Map>() + + constructor( + readonly url: string, + readonly init?: unknown + ) { + FakeWebSocket.instances.push(this) + queueMicrotask(() => this.emit(`open`)) + } + + addEventListener(event: string, listener: Listener): void { + const listeners = this.listeners.get(event) ?? [] + listeners.push(listener) + this.listeners.set(event, listeners) + } + + send(data: string): void { + this.sent.push(JSON.parse(data) as unknown) + } + + close(): void { + this.emit(`close`) + } + + emit(event: string, payload?: unknown): void { + for (const listener of this.listeners.get(event) ?? []) { + listener(payload) + } + } + + emitMessage(payload: unknown): void { + this.emit(`message`, { data: JSON.stringify(payload) }) + } +} + +function nextEvent(iterator: AsyncIterator) { + return iterator.next().then((result) => result.value) +} + +function responseCreateEvents(socket: FakeWebSocket): Array { + return socket.sent.filter( + (event) => + typeof event === `object` && + event !== null && + (event as { type?: unknown }).type === `response.create` + ) +} + +describe(`createOpenAIRealtimeProvider`, () => { + it(`connects over WebSocket and configures session state`, async () => { + FakeWebSocket.instances = [] + const tool: AgentTool = { + name: `lookup`, + label: `Lookup`, + description: `Look up a value`, + parameters: Type.Object({ q: Type.String() }), + execute: vi.fn(), + } + const provider = createOpenAIRealtimeProvider({ + apiKey: `sk-test`, + voice: `marin`, + reasoningEffort: `medium`, + safetyIdentifier: `user-1`, + WebSocket: FakeWebSocket, + }) + + await provider.connect({ + systemPrompt: `You are Horton.`, + messages: [{ role: `user`, content: `Previous context` } as never], + tools: [tool], + audio: { + inputFormat: { codec: `pcm16`, sampleRate: 24_000, channels: 1 }, + outputFormat: { codec: `pcm16`, sampleRate: 24_000, channels: 1 }, + }, + }) + + const socket = FakeWebSocket.instances[0]! + expect(socket.url).toBe( + `wss://api.openai.com/v1/realtime?model=gpt-realtime-2` + ) + expect(socket.init).toEqual({ + headers: { + Authorization: `Bearer sk-test`, + 'OpenAI-Safety-Identifier': `user-1`, + }, + }) + expect(socket.sent[0]).toMatchObject({ + type: `session.update`, + session: { + type: `realtime`, + model: `gpt-realtime-2`, + instructions: `You are Horton.`, + reasoning: { effort: `medium` }, + output_modalities: [`audio`], + tool_choice: `auto`, + tools: [ + { + type: `function`, + name: `lookup`, + description: `Look up a value`, + }, + ], + audio: { + input: { + format: { type: `audio/pcm`, rate: 24_000 }, + transcription: { model: `gpt-4o-mini-transcribe` }, + turn_detection: { + type: `server_vad`, + threshold: 0.55, + prefix_padding_ms: 300, + silence_duration_ms: 500, + create_response: true, + interrupt_response: true, + }, + }, + output: { + format: { type: `audio/pcm`, rate: 24_000 }, + voice: `marin`, + }, + }, + }, + }) + expect(socket.sent[1]).toEqual({ + type: `conversation.item.create`, + item: { + type: `message`, + role: `user`, + content: [{ type: `input_text`, text: `Previous context` }], + }, + }) + }) + + it(`does not send reasoning effort to non-reasoning realtime models`, async () => { + FakeWebSocket.instances = [] + const provider = createOpenAIRealtimeProvider({ + apiKey: `sk-test`, + model: `gpt-realtime-1.5`, + reasoningEffort: `low`, + WebSocket: FakeWebSocket, + }) + + await provider.connect({ + systemPrompt: `You are Horton.`, + messages: [], + tools: [], + audio: { + outputFormat: { codec: `pcm16`, sampleRate: 24_000, channels: 1 }, + }, + }) + + const socket = FakeWebSocket.instances[0]! + expect((socket.sent[0] as any).session.reasoning).toBeUndefined() + }) + + it(`requests audio output when a voice is configured without an output format`, async () => { + FakeWebSocket.instances = [] + const provider = createOpenAIRealtimeProvider({ + apiKey: `sk-test`, + voice: `marin`, + WebSocket: FakeWebSocket, + }) + + await provider.connect({ + systemPrompt: `Talk`, + messages: [], + tools: [], + }) + + const socket = FakeWebSocket.instances[0]! + expect(socket.sent[0]).toMatchObject({ + type: `session.update`, + session: { + output_modalities: [`audio`], + audio: { + output: { + voice: `marin`, + }, + }, + }, + }) + }) + + it(`can disable input audio transcription`, async () => { + FakeWebSocket.instances = [] + const provider = createOpenAIRealtimeProvider({ + apiKey: `sk-test`, + WebSocket: FakeWebSocket, + }) + + await provider.connect({ + systemPrompt: `Talk`, + messages: [], + tools: [], + audio: { + inputFormat: { codec: `pcm16`, sampleRate: 24_000, channels: 1 }, + inputTranscription: false, + }, + }) + + const socket = FakeWebSocket.instances[0]! + expect(socket.sent[0]).toMatchObject({ + session: { + audio: { + input: { + format: { type: `audio/pcm`, rate: 24_000 }, + }, + }, + }, + }) + expect( + (socket.sent[0] as any).session.audio.input.transcription + ).toBeUndefined() + }) + + it(`maps input transcription delay for low latency captions`, async () => { + FakeWebSocket.instances = [] + const provider = createOpenAIRealtimeProvider({ + apiKey: `sk-test`, + WebSocket: FakeWebSocket, + }) + + await provider.connect({ + systemPrompt: `Talk`, + messages: [], + tools: [], + audio: { + inputFormat: { codec: `pcm16`, sampleRate: 24_000, channels: 1 }, + inputTranscription: { + model: `gpt-realtime-whisper`, + delay: `minimal`, + }, + }, + }) + + const socket = FakeWebSocket.instances[0]! + expect(socket.sent[0]).toMatchObject({ + session: { + audio: { + input: { + transcription: { + model: `gpt-realtime-whisper`, + delay: `minimal`, + }, + }, + }, + }, + }) + }) + + it(`can disable realtime turn detection for manual audio commits`, async () => { + FakeWebSocket.instances = [] + const provider = createOpenAIRealtimeProvider({ + apiKey: `sk-test`, + WebSocket: FakeWebSocket, + }) + + await provider.connect({ + systemPrompt: `Talk`, + messages: [], + tools: [], + audio: { + inputFormat: { codec: `pcm16`, sampleRate: 24_000, channels: 1 }, + turnDetection: { type: `none` }, + }, + }) + + const socket = FakeWebSocket.instances[0]! + expect(socket.sent[0]).toMatchObject({ + session: { + audio: { + input: { + turn_detection: null, + }, + }, + }, + }) + }) + + it(`maps realtime server VAD configuration`, async () => { + FakeWebSocket.instances = [] + const provider = createOpenAIRealtimeProvider({ + apiKey: `sk-test`, + WebSocket: FakeWebSocket, + }) + + await provider.connect({ + systemPrompt: `Talk`, + messages: [], + tools: [], + audio: { + inputFormat: { codec: `pcm16`, sampleRate: 24_000, channels: 1 }, + turnDetection: { + type: `server_vad`, + threshold: 0.7, + prefixPaddingMs: 250, + silenceDurationMs: 650, + createResponse: false, + interruptResponse: false, + }, + }, + }) + + const socket = FakeWebSocket.instances[0]! + expect(socket.sent[0]).toMatchObject({ + session: { + audio: { + input: { + turn_detection: { + type: `server_vad`, + threshold: 0.7, + prefix_padding_ms: 250, + silence_duration_ms: 650, + create_response: false, + interrupt_response: false, + }, + }, + }, + }, + }) + }) + + it(`sends audio input chunks as OpenAI input buffer events`, async () => { + FakeWebSocket.instances = [] + const provider = createOpenAIRealtimeProvider({ + apiKey: `sk-test`, + WebSocket: FakeWebSocket, + }) + + const session = await provider.connect({ + systemPrompt: `Talk`, + messages: [], + tools: [], + }) + const socket = FakeWebSocket.instances[0]! + + await session.appendInputAudio?.(new Uint8Array([1, 2, 3, 4])) + await session.clearInputAudio?.() + await session.commitInputAudio?.() + + expect(socket.sent.at(-4)).toEqual({ + type: `input_audio_buffer.append`, + audio: `AQIDBA==`, + }) + expect(socket.sent.at(-3)).toEqual({ type: `input_audio_buffer.clear` }) + expect(socket.sent.at(-2)).toEqual({ type: `input_audio_buffer.commit` }) + expect(socket.sent.at(-1)).toEqual({ type: `response.create` }) + }) + + it(`queues text-triggered response creation while a response is active`, async () => { + FakeWebSocket.instances = [] + const provider = createOpenAIRealtimeProvider({ + apiKey: `sk-test`, + WebSocket: FakeWebSocket, + }) + + const session = await provider.connect({ + systemPrompt: `Talk`, + messages: [], + tools: [], + }) + const socket = FakeWebSocket.instances[0]! + const iterator = session.events[Symbol.asyncIterator]() + + socket.emitMessage({ type: `response.created`, response: { id: `resp-1` } }) + await expect(nextEvent(iterator)).resolves.toEqual({ + type: `response.started`, + responseId: `resp-1`, + }) + + await session.sendText?.(`worker finished`) + expect(socket.sent.at(-1)).toMatchObject({ + type: `conversation.item.create`, + item: { + type: `message`, + role: `user`, + content: [{ type: `input_text`, text: `worker finished` }], + }, + }) + expect(socket.sent).not.toContainEqual({ type: `response.create` }) + + socket.emitMessage({ type: `response.done`, response: { id: `resp-1` } }) + await expect(nextEvent(iterator)).resolves.toEqual({ + type: `response.completed`, + responseId: `resp-1`, + }) + expect(socket.sent.at(-1)).toEqual({ type: `response.create` }) + }) + + it(`keeps queued response creation guarded against stale completions`, async () => { + FakeWebSocket.instances = [] + const provider = createOpenAIRealtimeProvider({ + apiKey: `sk-test`, + WebSocket: FakeWebSocket, + }) + + const session = await provider.connect({ + systemPrompt: `Talk`, + messages: [], + tools: [], + }) + const socket = FakeWebSocket.instances[0]! + const iterator = session.events[Symbol.asyncIterator]() + + socket.emitMessage({ type: `response.created`, response: { id: `resp-1` } }) + await expect(nextEvent(iterator)).resolves.toEqual({ + type: `response.started`, + responseId: `resp-1`, + }) + + await session.sendText?.(`first queued`) + expect(responseCreateEvents(socket)).toHaveLength(0) + + socket.emitMessage({ type: `response.done`, response: { id: `resp-1` } }) + await expect(nextEvent(iterator)).resolves.toEqual({ + type: `response.completed`, + responseId: `resp-1`, + }) + expect(responseCreateEvents(socket)).toHaveLength(1) + + socket.emitMessage({ type: `response.done`, response: { id: `resp-1` } }) + await session.sendText?.(`second queued`) + expect(responseCreateEvents(socket)).toHaveLength(1) + + await expect(nextEvent(iterator)).resolves.toEqual({ + type: `response.completed`, + responseId: `resp-1`, + }) + socket.emitMessage({ type: `response.created`, response: { id: `resp-2` } }) + await expect(nextEvent(iterator)).resolves.toEqual({ + type: `response.started`, + responseId: `resp-2`, + }) + + socket.emitMessage({ type: `response.done`, response: { id: `resp-2` } }) + await expect(nextEvent(iterator)).resolves.toEqual({ + type: `response.completed`, + responseId: `resp-2`, + }) + expect(responseCreateEvents(socket)).toHaveLength(2) + }) + + it(`normalizes audio input chunks before appending them`, async () => { + FakeWebSocket.instances = [] + const provider = createOpenAIRealtimeProvider({ + apiKey: `sk-test`, + WebSocket: FakeWebSocket, + }) + + const session = await provider.connect({ + systemPrompt: `Talk`, + messages: [], + tools: [], + }) + const socket = FakeWebSocket.instances[0]! + + await session.appendInputAudio?.(new Uint8Array()) + await session.appendInputAudio?.(new Uint8Array([1])) + await session.appendInputAudio?.(new Uint8Array([1, 2, 3])) + + const large = new Uint8Array(32 * 1024 + 4) + large.fill(7) + await session.appendInputAudio?.(large) + + const appendEvents = socket.sent.filter( + (event): event is { type: string; audio: string } => + typeof event === `object` && + event !== null && + (event as { type?: unknown }).type === `input_audio_buffer.append` + ) + expect(appendEvents).toHaveLength(3) + expect(appendEvents[0]!.audio).toBe(`AQI=`) + expect(Buffer.from(appendEvents[1]!.audio, `base64`)).toHaveLength( + 32 * 1024 + ) + expect(Buffer.from(appendEvents[2]!.audio, `base64`)).toHaveLength(4) + }) + + it(`unblocks the event stream when the run signal aborts`, async () => { + FakeWebSocket.instances = [] + const controller = new AbortController() + const provider = createOpenAIRealtimeProvider({ + apiKey: `sk-test`, + WebSocket: FakeWebSocket, + }) + + const session = await provider.connect({ + systemPrompt: `Talk`, + messages: [], + tools: [], + signal: controller.signal, + }) + const iterator = session.events[Symbol.asyncIterator]() + + controller.abort() + + await expect(nextEvent(iterator)).resolves.toEqual({ + type: `session.closed`, + reason: `aborted`, + }) + }) + + it(`surfaces unexpected WebSocket closes as provider errors`, async () => { + FakeWebSocket.instances = [] + const provider = createOpenAIRealtimeProvider({ + apiKey: `sk-test`, + WebSocket: FakeWebSocket, + }) + + const session = await provider.connect({ + systemPrompt: `Talk`, + messages: [], + tools: [], + }) + const socket = FakeWebSocket.instances[0]! + const iterator = session.events[Symbol.asyncIterator]() + + socket.emit(`close`, { code: 1008, reason: `invalid model` }) + + await expect(nextEvent(iterator)).resolves.toEqual({ + type: `session.error`, + code: `websocket_closed`, + error: + `OpenAI realtime WebSocket closed before client stop ` + + `code=1008 reason=invalid model`, + }) + }) + + it(`can truncate output audio for interrupted playback`, async () => { + FakeWebSocket.instances = [] + const provider = createOpenAIRealtimeProvider({ + apiKey: `sk-test`, + WebSocket: FakeWebSocket, + }) + + const session = await provider.connect({ + systemPrompt: `Talk`, + messages: [], + tools: [], + }) + const socket = FakeWebSocket.instances[0]! + + await session.truncateOutputAudio?.({ + itemId: `item-1`, + audioEndMs: 320, + }) + + expect(socket.sent.at(-1)).toEqual({ + type: `conversation.item.truncate`, + item_id: `item-1`, + content_index: 0, + audio_end_ms: 320, + }) + }) + + it(`maps GA output audio and transcript events`, async () => { + FakeWebSocket.instances = [] + const provider = createOpenAIRealtimeProvider({ + apiKey: `sk-test`, + WebSocket: FakeWebSocket, + }) + + const session = await provider.connect({ + systemPrompt: `Talk`, + messages: [], + tools: [], + }) + const socket = FakeWebSocket.instances[0]! + const iterator = session.events[Symbol.asyncIterator]() + + socket.emitMessage({ + type: `response.output_audio.delta`, + response_id: `resp-1`, + item_id: `item-1`, + delta: `AQID`, + }) + await expect(nextEvent(iterator)).resolves.toEqual({ + type: `output_audio.delta`, + responseId: `resp-1`, + itemId: `item-1`, + audio: new Uint8Array([1, 2, 3]), + }) + + socket.emitMessage({ + type: `response.output_audio_transcript.delta`, + response_id: `resp-1`, + item_id: `item-1`, + content_index: 0, + delta: `hello`, + }) + await expect(nextEvent(iterator)).resolves.toEqual({ + type: `output_transcript.delta`, + responseId: `resp-1`, + itemId: `item-1`, + contentIndex: 0, + transcriptSource: `response.output_audio_transcript`, + delta: `hello`, + }) + + socket.emitMessage({ + type: `response.output_audio.done`, + response_id: `resp-1`, + item_id: `item-1`, + }) + await expect(nextEvent(iterator)).resolves.toEqual({ + type: `output_audio.completed`, + responseId: `resp-1`, + itemId: `item-1`, + }) + }) + + it(`maps GA input audio transcript events`, async () => { + FakeWebSocket.instances = [] + const provider = createOpenAIRealtimeProvider({ + apiKey: `sk-test`, + WebSocket: FakeWebSocket, + }) + + const session = await provider.connect({ + systemPrompt: `Talk`, + messages: [], + tools: [], + }) + const socket = FakeWebSocket.instances[0]! + const iterator = session.events[Symbol.asyncIterator]() + + socket.emitMessage({ + type: `input_audio_buffer.speech_started`, + item_id: `item-1`, + audio_start_ms: 120, + }) + await expect(nextEvent(iterator)).resolves.toEqual({ + type: `input_audio.speech_started`, + turnId: `item-1`, + audioOffset: `120`, + }) + + socket.emitMessage({ + type: `input_audio_buffer.speech_stopped`, + item_id: `item-1`, + audio_end_ms: 860, + }) + await expect(nextEvent(iterator)).resolves.toEqual({ + type: `input_audio.speech_stopped`, + turnId: `item-1`, + audioOffset: `860`, + }) + + socket.emitMessage({ + type: `input_audio_buffer.committed`, + item_id: `item-1`, + previous_item_id: `previous-item`, + }) + await expect(nextEvent(iterator)).resolves.toEqual({ + type: `input_audio.committed`, + turnId: `item-1`, + previousTurnId: `previous-item`, + }) + + socket.emitMessage({ + type: `conversation.item.input_audio_transcription.delta`, + item_id: `item-1`, + delta: `hello`, + }) + await expect(nextEvent(iterator)).resolves.toEqual({ + type: `input_transcript.delta`, + turnId: `item-1`, + delta: `hello`, + }) + + socket.emitMessage({ + type: `conversation.item.input_audio_transcription.completed`, + item_id: `item-1`, + transcript: `hello there`, + }) + await expect(nextEvent(iterator)).resolves.toEqual({ + type: `input_transcript.completed`, + turnId: `item-1`, + text: `hello there`, + }) + }) + + it(`maps OpenAI events and executes function calls`, async () => { + FakeWebSocket.instances = [] + const execute = vi.fn().mockResolvedValue({ + content: [{ type: `text`, text: `done` }], + details: { ok: true }, + }) + const tool: AgentTool = { + name: `lookup`, + label: `Lookup`, + description: `Look up a value`, + parameters: Type.Object({ q: Type.String() }), + execute, + } + const provider = createOpenAIRealtimeProvider({ + apiKey: `sk-test`, + WebSocket: FakeWebSocket, + }) + + const session = await provider.connect({ + systemPrompt: `Talk`, + messages: [], + tools: [tool], + }) + const socket = FakeWebSocket.instances[0]! + const iterator = session.events[Symbol.asyncIterator]() + + socket.emitMessage({ type: `session.created`, session: { id: `sess-1` } }) + await expect(nextEvent(iterator)).resolves.toEqual({ + type: `session.started`, + sessionId: `sess-1`, + }) + + socket.emitMessage({ + type: `response.output_item.added`, + item: { + type: `function_call`, + id: `fc-1`, + call_id: `call-1`, + name: `lookup`, + }, + }) + await expect(nextEvent(iterator)).resolves.toEqual({ + type: `tool_call.started`, + toolCallId: `call-1`, + name: `lookup`, + }) + + socket.emitMessage({ + type: `response.function_call_arguments.done`, + call_id: `call-1`, + name: `lookup`, + arguments: JSON.stringify({ q: `status` }), + }) + + await expect(nextEvent(iterator)).resolves.toEqual({ + type: `tool_call.arguments_completed`, + toolCallId: `call-1`, + name: `lookup`, + args: { q: `status` }, + }) + await expect(nextEvent(iterator)).resolves.toMatchObject({ + type: `tool_call.completed`, + toolCallId: `call-1`, + name: `lookup`, + }) + expect(execute).toHaveBeenCalledWith(`call-1`, { q: `status` }, undefined) + expect(socket.sent.at(-2)).toMatchObject({ + type: `conversation.item.create`, + item: { + type: `function_call_output`, + call_id: `call-1`, + }, + }) + expect(socket.sent.at(-1)).toEqual({ type: `response.create` }) + }) + + it(`runs realtime sequential tools one at a time`, async () => { + FakeWebSocket.instances = [] + type ToolResult = { + content: Array<{ type: `text`; text: string }> + details: Record + } + let releaseFirst!: (result: ToolResult) => void + const firstResult = new Promise((resolve) => { + releaseFirst = resolve + }) + let markFirstStarted!: () => void + const firstStarted = new Promise((resolve) => { + markFirstStarted = resolve + }) + let markSecondStarted!: () => void + const secondStarted = new Promise((resolve) => { + markSecondStarted = resolve + }) + let secondStartedFlag = false + const execute = vi.fn((toolCallId: string) => { + if (toolCallId === `call-1`) { + markFirstStarted() + return firstResult + } + secondStartedFlag = true + markSecondStarted() + return Promise.resolve({ + content: [{ type: `text`, text: `second done` }], + details: { ok: true }, + } satisfies ToolResult) + }) + const tool: AgentTool = { + name: `lookup`, + label: `Lookup`, + description: `Look up a value`, + parameters: Type.Object({ q: Type.String() }), + executionMode: `sequential`, + execute, + } + const provider = createOpenAIRealtimeProvider({ + apiKey: `sk-test`, + WebSocket: FakeWebSocket, + }) + + const session = await provider.connect({ + systemPrompt: `Talk`, + messages: [], + tools: [tool], + }) + const socket = FakeWebSocket.instances[0]! + const iterator = session.events[Symbol.asyncIterator]() + + socket.emitMessage({ + type: `response.function_call_arguments.done`, + call_id: `call-1`, + name: `lookup`, + arguments: JSON.stringify({ q: `first` }), + }) + await expect(nextEvent(iterator)).resolves.toMatchObject({ + type: `tool_call.arguments_completed`, + toolCallId: `call-1`, + }) + await firstStarted + + socket.emitMessage({ + type: `response.function_call_arguments.done`, + call_id: `call-2`, + name: `lookup`, + arguments: JSON.stringify({ q: `second` }), + }) + await expect(nextEvent(iterator)).resolves.toMatchObject({ + type: `tool_call.arguments_completed`, + toolCallId: `call-2`, + }) + + expect(execute).toHaveBeenCalledTimes(1) + expect(secondStartedFlag).toBe(false) + + releaseFirst({ + content: [{ type: `text`, text: `first done` }], + details: { ok: true }, + }) + + await expect(nextEvent(iterator)).resolves.toMatchObject({ + type: `tool_call.completed`, + toolCallId: `call-1`, + }) + await secondStarted + await expect(nextEvent(iterator)).resolves.toMatchObject({ + type: `tool_call.completed`, + toolCallId: `call-2`, + }) + expect(execute.mock.calls.map((call) => call[0])).toEqual([ + `call-1`, + `call-2`, + ]) + }) + + it(`queues tool-result response creation while a response is active`, async () => { + FakeWebSocket.instances = [] + const execute = vi.fn().mockResolvedValue({ + content: [{ type: `text`, text: `done` }], + details: { ok: true }, + }) + const tool: AgentTool = { + name: `lookup`, + label: `Lookup`, + description: `Look up a value`, + parameters: Type.Object({ q: Type.String() }), + execute, + } + const provider = createOpenAIRealtimeProvider({ + apiKey: `sk-test`, + WebSocket: FakeWebSocket, + }) + + const session = await provider.connect({ + systemPrompt: `Talk`, + messages: [], + tools: [tool], + }) + const socket = FakeWebSocket.instances[0]! + const iterator = session.events[Symbol.asyncIterator]() + + socket.emitMessage({ type: `response.created`, response: { id: `resp-1` } }) + await expect(nextEvent(iterator)).resolves.toEqual({ + type: `response.started`, + responseId: `resp-1`, + }) + + socket.emitMessage({ + type: `response.function_call_arguments.done`, + call_id: `call-1`, + name: `lookup`, + arguments: JSON.stringify({ q: `status` }), + }) + + await expect(nextEvent(iterator)).resolves.toMatchObject({ + type: `tool_call.arguments_completed`, + toolCallId: `call-1`, + }) + await expect(nextEvent(iterator)).resolves.toMatchObject({ + type: `tool_call.completed`, + toolCallId: `call-1`, + }) + expect(socket.sent.at(-1)).toMatchObject({ + type: `conversation.item.create`, + item: { + type: `function_call_output`, + call_id: `call-1`, + }, + }) + expect(socket.sent).not.toContainEqual({ type: `response.create` }) + + socket.emitMessage({ type: `response.done`, response: { id: `resp-1` } }) + await expect(nextEvent(iterator)).resolves.toEqual({ + type: `response.completed`, + responseId: `resp-1`, + }) + expect(socket.sent.at(-1)).toEqual({ type: `response.create` }) + }) + + it(`does not send tool results for a cancelled response`, async () => { + FakeWebSocket.instances = [] + let resolveTool: (value: { + content: Array<{ type: `text`; text: string }> + details: Record + }) => void = () => undefined + const execute = vi.fn( + () => + new Promise<{ + content: Array<{ type: `text`; text: string }> + details: Record + }>((resolve) => { + resolveTool = resolve + }) + ) + const tool: AgentTool = { + name: `lookup`, + label: `Lookup`, + description: `Look up a value`, + parameters: Type.Object({ q: Type.String() }), + execute, + } + const provider = createOpenAIRealtimeProvider({ + apiKey: `sk-test`, + WebSocket: FakeWebSocket, + }) + + const session = await provider.connect({ + systemPrompt: `Talk`, + messages: [], + tools: [tool], + }) + const socket = FakeWebSocket.instances[0]! + const iterator = session.events[Symbol.asyncIterator]() + + socket.emitMessage({ type: `response.created`, response: { id: `resp-1` } }) + await expect(nextEvent(iterator)).resolves.toEqual({ + type: `response.started`, + responseId: `resp-1`, + }) + + socket.emitMessage({ + type: `response.function_call_arguments.done`, + call_id: `call-1`, + name: `lookup`, + arguments: JSON.stringify({ q: `status` }), + }) + await expect(nextEvent(iterator)).resolves.toMatchObject({ + type: `tool_call.arguments_completed`, + toolCallId: `call-1`, + }) + expect(execute).toHaveBeenCalledWith(`call-1`, { q: `status` }, undefined) + + await session.cancelResponse?.() + resolveTool({ content: [{ type: `text`, text: `done` }], details: {} }) + + await expect(nextEvent(iterator)).resolves.toMatchObject({ + type: `tool_call.completed`, + toolCallId: `call-1`, + }) + expect(socket.sent).toContainEqual({ type: `response.cancel` }) + expect(socket.sent).not.toEqual( + expect.arrayContaining([ + expect.objectContaining({ + type: `conversation.item.create`, + item: expect.objectContaining({ + type: `function_call_output`, + call_id: `call-1`, + }), + }), + ]) + ) + }) +}) diff --git a/packages/agents-runtime/test/outbound-bridge.test.ts b/packages/agents-runtime/test/outbound-bridge.test.ts index 62fb5d75b3..a999e16d5f 100644 --- a/packages/agents-runtime/test/outbound-bridge.test.ts +++ b/packages/agents-runtime/test/outbound-bridge.test.ts @@ -101,6 +101,168 @@ describe(`createOutboundBridge`, () => { expect((writes[1]!.value as Record).run_id).toBe(`run-0`) }) + it(`persists streaming tool call argument deltas`, () => { + const writes: Array = [] + const bridge = createOutboundBridge([], (e) => { + writes.push(e) + }) + + bridge.onRunStart() + bridge.onToolCallArgsStart(`call-draft`, `draft`, { text: `He` }) + bridge.onToolCallArgsDelta(`call-draft`, `draft`, `llo`, { + contentIndex: 1, + argsPreview: { text: `Hello` }, + }) + bridge.onToolCallArgsEnd(`call-draft`, `draft`, { text: `Hello` }) + + expect(writes[1]).toMatchObject({ + type: `tool_call`, + key: `tc-0`, + headers: { operation: `insert` }, + value: { + tool_call_id: `call-draft`, + tool_name: `draft`, + status: `started`, + args_preview: { text: `He` }, + run_id: `run-0`, + }, + }) + expect(writes[2]).toMatchObject({ + type: `tool_call`, + key: `tc-0`, + headers: { operation: `update` }, + value: { + tool_call_id: `call-draft`, + tool_name: `draft`, + status: `args_streaming`, + args_preview: { text: `Hello` }, + run_id: `run-0`, + }, + }) + expect(writes[3]).toMatchObject({ + type: `tool_arg_delta`, + key: `tc-0:args-0`, + value: { + tool_call_key: `tc-0`, + tool_call_id: `call-draft`, + run_id: `run-0`, + seq: 0, + delta: `llo`, + content_index: 1, + }, + }) + expect(writes[4]).toMatchObject({ + type: `tool_call`, + key: `tc-0`, + headers: { operation: `update` }, + value: { + tool_call_id: `call-draft`, + tool_name: `draft`, + status: `args_complete`, + args: { text: `Hello` }, + run_id: `run-0`, + }, + }) + }) + + it(`transitions a streamed tool call to executing before completion`, () => { + const writes: Array = [] + const bridge = createOutboundBridge([], (e) => { + writes.push(e) + }) + + bridge.onRunStart() + bridge.onToolCallArgsStart(`call-draft`, `draft`, { text: `He` }) + bridge.onToolCallArgsDelta(`call-draft`, `draft`, `llo`, { + argsPreview: { text: `Hello` }, + }) + bridge.onToolCallArgsEnd(`call-draft`, `draft`, { text: `Hello` }) + bridge.onToolCallStart(`call-draft`, `draft`, { text: `Hello` }) + bridge.onToolCallEnd(`call-draft`, `draft`, `ok`, false) + + expect(writes[5]).toMatchObject({ + type: `tool_call`, + key: `tc-0`, + headers: { operation: `update` }, + value: { + tool_call_id: `call-draft`, + tool_name: `draft`, + status: `executing`, + args: { text: `Hello` }, + run_id: `run-0`, + }, + }) + expect(writes[6]).toMatchObject({ + type: `tool_call`, + key: `tc-0`, + value: { + status: `completed`, + args: { text: `Hello` }, + result: `ok`, + }, + }) + }) + + it(`creates a streaming tool call when a delta arrives before start`, () => { + const writes: Array = [] + const bridge = createOutboundBridge([], (e) => { + writes.push(e) + }) + + bridge.onRunStart() + bridge.onToolCallArgsDelta(`call-draft`, `draft`, `He`, { + argsPreview: { text: `He` }, + }) + + expect(writes[1]).toMatchObject({ + type: `tool_call`, + key: `tc-0`, + headers: { operation: `insert` }, + value: { + tool_call_id: `call-draft`, + tool_name: `draft`, + status: `args_streaming`, + args_preview: { text: `He` }, + }, + }) + expect(writes[2]).toMatchObject({ + type: `tool_arg_delta`, + key: `tc-0:args-0`, + value: { + tool_call_key: `tc-0`, + tool_call_id: `call-draft`, + seq: 0, + delta: `He`, + }, + }) + }) + + it(`keeps legacy synthetic tool ids distinct from provider ids`, () => { + const writes: Array = [] + const bridge = createOutboundBridge([], (e) => { + writes.push(e) + }) + + bridge.onRunStart() + bridge.onToolCallArgsStart(`tc-0`, `provider`, {}) + bridge.onToolCallStart(`legacy`, {}) + + expect(writes[1]).toMatchObject({ + key: `tc-0`, + value: { + tool_call_id: `tc-0`, + tool_name: `provider`, + }, + }) + expect(writes[2]).toMatchObject({ + key: `tc-1`, + value: { + tool_call_id: `legacy-tc-1`, + tool_name: `legacy`, + }, + }) + }) + it(`maps tool_call_end to tool_call update with result`, () => { const writes: Array = [] const bridge = createOutboundBridge([], (e) => { diff --git a/packages/agents-runtime/test/pi-adapter.test.ts b/packages/agents-runtime/test/pi-adapter.test.ts index 1424d7e29c..6fa8e04ca4 100644 --- a/packages/agents-runtime/test/pi-adapter.test.ts +++ b/packages/agents-runtime/test/pi-adapter.test.ts @@ -7,9 +7,8 @@ import { import { createAssistantMessageEventStream } from '@mariozechner/pi-ai' import { Type } from '@sinclair/typebox' import type { OutboundIdSeed } from '../src/outbound-bridge' -import type { LLMMessage } from '../src/types' +import type { AgentTool, LLMMessage } from '../src/types' import type { ChangeEvent } from '@durable-streams/state' -import type { AgentTool } from '@mariozechner/pi-agent-core' import type { AssistantMessage, Model, @@ -570,6 +569,387 @@ describe(`createPiAgentAdapter`, () => { ) }) + it(`dispatches streamed tool call arguments to the bridge and tool hook`, async () => { + let streamReadyResolve: + | ((stream: ReturnType) => void) + | null = null + const streamReady = new Promise< + ReturnType + >((resolve) => { + streamReadyResolve = resolve + }) + const partialMessage: AssistantMessage = { + role: `assistant`, + content: [ + { + type: `toolCall`, + id: `call-draft`, + name: `draft`, + arguments: { text: `Hello` }, + }, + ], + api: `anthropic-messages`, + provider: `anthropic`, + model: `claude-sonnet-4-5-20250929`, + usage: { + input: 0, + output: 0, + cacheRead: 0, + cacheWrite: 0, + totalTokens: 0, + cost: { + input: 0, + output: 0, + cacheRead: 0, + cacheWrite: 0, + total: 0, + }, + }, + stopReason: `toolUse`, + timestamp: Date.now(), + } + const completedMessage: AssistantMessage = { + ...partialMessage, + content: [{ type: `text`, text: `` }], + stopReason: `stop`, + } + const argDeltas: Array = [] + const signals: Array = [] + const events: Array = [] + const controller = new AbortController() + const factory = createPiAgentAdapter({ + systemPrompt: `Test system prompt`, + model: `claude-sonnet-4-5-20250929`, + tools: [ + { + name: `draft`, + label: `Draft`, + description: `Draft text`, + parameters: { + type: `object`, + properties: { text: { type: `string` } }, + required: [`text`], + } as never, + onArgsDelta: (context, signal) => { + argDeltas.push(context) + signals.push(signal) + }, + execute: async () => ({ + content: [{ type: `text`, text: `ok` }], + details: null, + }), + }, + ], + streamFn: () => { + const stream = createAssistantMessageEventStream() + streamReadyResolve?.(stream) + return stream + }, + }) + const handle = factory({ + entityUrl: `test/entity-1`, + epoch: 1, + messages: [], + outboundIdSeed: { run: 0, step: 0, msg: 0, tc: 0, reasoning: 0 }, + writeEvent: (event: ChangeEvent) => { + events.push(event) + }, + }) + + const runPromise = handle.run(`hello`, controller.signal) + const stream = await streamReady + stream.push({ + type: `start`, + partial: partialMessage, + }) + stream.push({ + type: `toolcall_start`, + contentIndex: 0, + partial: partialMessage, + }) + stream.push({ + type: `toolcall_delta`, + contentIndex: 0, + delta: `"Hello"`, + partial: partialMessage, + }) + stream.push({ + type: `toolcall_end`, + contentIndex: 0, + toolCall: partialMessage.content[0] as never, + partial: partialMessage, + }) + stream.push({ + type: `done`, + reason: `stop`, + message: completedMessage, + }) + await runPromise + + expect(argDeltas).toEqual([ + { + toolCallId: `call-draft`, + toolName: `draft`, + contentIndex: 0, + delta: `"Hello"`, + argsPreview: { text: `Hello` }, + }, + ]) + expect(signals).toEqual([controller.signal]) + expect(events).toContainEqual( + expect.objectContaining({ + type: `tool_arg_delta`, + value: expect.objectContaining({ + tool_call_id: `call-draft`, + delta: `"Hello"`, + content_index: 0, + }), + }) + ) + }) + + it(`serializes streamed tool argument hooks before tool execution`, async () => { + let streamReadyResolve: + | ((stream: ReturnType) => void) + | null = null + const streamReady = new Promise< + ReturnType + >((resolve) => { + streamReadyResolve = resolve + }) + let releaseFirstDelta!: () => void + const firstDeltaBarrier = new Promise((resolve) => { + releaseFirstDelta = resolve + }) + const usage = { + input: 0, + output: 0, + cacheRead: 0, + cacheWrite: 0, + totalTokens: 0, + cost: { + input: 0, + output: 0, + cacheRead: 0, + cacheWrite: 0, + total: 0, + }, + } + const toolCallMessage: AssistantMessage = { + role: `assistant`, + content: [ + { + type: `toolCall`, + id: `call-draft`, + name: `draft`, + arguments: { text: `AB` }, + }, + ], + api: `anthropic-messages`, + provider: `anthropic`, + model: `claude-sonnet-4-5-20250929`, + usage, + stopReason: `toolUse`, + timestamp: Date.now(), + } + const completedMessage: AssistantMessage = { + ...toolCallMessage, + content: [{ type: `text`, text: `done` }], + stopReason: `stop`, + } + const order: Array = [] + let streamCount = 0 + const factory = createPiAgentAdapter({ + systemPrompt: `Test system prompt`, + model: `claude-sonnet-4-5-20250929`, + tools: [ + { + name: `draft`, + label: `Draft`, + description: `Draft text`, + parameters: Type.Object({ text: Type.String() }), + onArgsDelta: async ({ delta }) => { + order.push(`start:${delta}`) + if (delta === `A`) { + await firstDeltaBarrier + } + order.push(`end:${delta}`) + }, + execute: async () => { + order.push(`execute`) + return { + content: [{ type: `text`, text: `ok` }], + details: null, + } + }, + }, + ], + streamFn: () => { + const stream = createAssistantMessageEventStream() + streamCount++ + if (streamCount === 1) { + streamReadyResolve?.(stream) + } else { + queueMicrotask(() => stream.end(completedMessage)) + } + return stream + }, + }) + const handle = factory({ + entityUrl: `test/entity-1`, + epoch: 1, + messages: [], + outboundIdSeed: { run: 0, step: 0, msg: 0, tc: 0, reasoning: 0 }, + writeEvent: (_event: ChangeEvent) => {}, + }) + + const runPromise = handle.run(`hello`) + const stream = await streamReady + stream.push({ type: `start`, partial: toolCallMessage }) + stream.push({ + type: `toolcall_start`, + contentIndex: 0, + partial: toolCallMessage, + }) + stream.push({ + type: `toolcall_delta`, + contentIndex: 0, + delta: `A`, + partial: toolCallMessage, + }) + stream.push({ + type: `toolcall_delta`, + contentIndex: 0, + delta: `B`, + partial: toolCallMessage, + }) + stream.push({ + type: `toolcall_end`, + contentIndex: 0, + toolCall: toolCallMessage.content[0] as never, + partial: toolCallMessage, + }) + stream.push({ + type: `done`, + reason: `toolUse`, + message: toolCallMessage, + }) + + await new Promise((resolve) => setTimeout(resolve, 0)) + expect(order).toEqual([`start:A`]) + + releaseFirstDelta() + await runPromise + + expect(order).toEqual([`start:A`, `end:A`, `start:B`, `end:B`, `execute`]) + }) + + it(`continues tool execution after a streamed argument hook rejects`, async () => { + let streamReadyResolve: + | ((stream: ReturnType) => void) + | null = null + const streamReady = new Promise< + ReturnType + >((resolve) => { + streamReadyResolve = resolve + }) + const usage = { + input: 0, + output: 0, + cacheRead: 0, + cacheWrite: 0, + totalTokens: 0, + cost: { + input: 0, + output: 0, + cacheRead: 0, + cacheWrite: 0, + total: 0, + }, + } + const toolCallMessage: AssistantMessage = { + role: `assistant`, + content: [ + { + type: `toolCall`, + id: `call-draft`, + name: `draft`, + arguments: { text: `A` }, + }, + ], + api: `anthropic-messages`, + provider: `anthropic`, + model: `claude-sonnet-4-5-20250929`, + usage, + stopReason: `toolUse`, + timestamp: Date.now(), + } + const completedMessage: AssistantMessage = { + ...toolCallMessage, + content: [{ type: `text`, text: `done` }], + stopReason: `stop`, + } + let executed = false + let streamCount = 0 + const factory = createPiAgentAdapter({ + systemPrompt: `Test system prompt`, + model: `claude-sonnet-4-5-20250929`, + tools: [ + { + name: `draft`, + label: `Draft`, + description: `Draft text`, + parameters: Type.Object({ text: Type.String() }), + onArgsDelta: async () => { + throw new Error(`hook failed`) + }, + execute: async () => { + executed = true + return { + content: [{ type: `text`, text: `ok` }], + details: null, + } + }, + }, + ], + streamFn: () => { + const stream = createAssistantMessageEventStream() + streamCount++ + if (streamCount === 1) { + streamReadyResolve?.(stream) + } else { + queueMicrotask(() => stream.end(completedMessage)) + } + return stream + }, + }) + const handle = factory({ + entityUrl: `test/entity-1`, + epoch: 1, + messages: [], + outboundIdSeed: { run: 0, step: 0, msg: 0, tc: 0, reasoning: 0 }, + writeEvent: (_event: ChangeEvent) => {}, + }) + + const runPromise = handle.run(`hello`) + const stream = await streamReady + stream.push({ type: `start`, partial: toolCallMessage }) + stream.push({ + type: `toolcall_delta`, + contentIndex: 0, + delta: `A`, + partial: toolCallMessage, + }) + stream.push({ + type: `done`, + reason: `toolUse`, + message: toolCallMessage, + }) + + await expect(runPromise).resolves.toBeUndefined() + expect(executed).toBe(true) + }) + it(`isRunning returns false initially`, () => { const factory = createPiAgentAdapter({ systemPrompt: `Test system prompt`, diff --git a/packages/agents-runtime/test/process-wake.test.ts b/packages/agents-runtime/test/process-wake.test.ts index 1db85bcb25..57dbd71909 100644 --- a/packages/agents-runtime/test/process-wake.test.ts +++ b/packages/agents-runtime/test/process-wake.test.ts @@ -427,6 +427,19 @@ const githubWebhookSourceContract: WebhookSourceContract = { ], } +async function waitFor( + predicate: () => boolean, + timeoutMs = 500 +): Promise { + const deadline = Date.now() + timeoutMs + while (!predicate()) { + if (Date.now() > deadline) { + throw new Error(`timed out waiting for condition`) + } + await new Promise((resolve) => setTimeout(resolve, 5)) + } +} + // --------------------------------------------------------------------------- // Tests // --------------------------------------------------------------------------- @@ -1976,6 +1989,111 @@ describe(`processWake`, () => { ]) }) + it(`routes live wake batches to an active realtime session before the handler returns`, async () => { + const sentTexts: Array = [] + let resolveConnected!: () => void + const connected = new Promise((resolve) => { + resolveConnected = resolve + }) + let closeProvider!: () => void + const providerClosed = new Promise((resolve) => { + closeProvider = resolve + }) + + defineEntity(`test-agent`, { + handler: async (ctx) => { + const realtime = ctx.useRealtime({ + systemPrompt: `You are realtime.`, + provider: { + id: `test`, + model: `test-realtime`, + async connect() { + resolveConnected() + return { + events: (async function* () { + yield { type: `session.started` as const } + await providerClosed + yield { type: `session.closed` as const } + })(), + sendText: async (text) => { + sentTexts.push(text) + }, + } + }, + }, + tools: [], + context: { includeTimeline: false }, + }) + await realtime.run() + }, + }) + + const wakePromise = processWake( + makeNotification({ triggerEvent: `inbox` }), + BASE_CONFIG + ) + await connected + + mockEntityOnBatch.current?.({ + items: [ + ev( + `wake`, + `wake-worker-one`, + `insert`, + { + source: `/worker/one`, + timeout: false, + changes: [], + finished_child: { + url: `/worker/one`, + type: `worker`, + run_status: `completed`, + response: `The markdown document is ready.`, + }, + }, + { offset: `11_000` } + ), + ev( + `wake`, + `wake-worker-two`, + `insert`, + { + source: `/worker/two`, + timeout: false, + changes: [], + finished_child: { + url: `/worker/two`, + type: `worker`, + run_status: `completed`, + response: `The second markdown section is ready.`, + }, + }, + { offset: `11_001` } + ), + ], + offset: `11_001`, + }) + + await waitFor(() => sentTexts.length === 1) + expect(sentTexts[0]).toContain(`live Electric Agents notification`) + expect(sentTexts[0]).toContain(`The markdown document is ready.`) + expect(sentTexts[0]).toContain(`The second markdown section is ready.`) + + closeProvider() + await wakePromise + + const doneCalls = fetchMock.mock.calls.filter(([url]) => + String(url).includes(`/_electric/wakes/wake-abc`) + ) + const lastDoneCall = doneCalls[doneCalls.length - 1]! + const body = JSON.parse(lastDoneCall[1]!.body as string) as { + acks: Array<{ path: string; offset: string }> + } + expect(body.acks).toEqual([ + { path: `/streams/entity:agent-1`, offset: `11_001` }, + ]) + }) + it(`processes a fresh message that arrives during idle after a management-only catch-up wake`, async () => { const wakePayloads: Array = [] diff --git a/packages/agents-runtime/test/realtime-context.test.ts b/packages/agents-runtime/test/realtime-context.test.ts new file mode 100644 index 0000000000..a294dab27a --- /dev/null +++ b/packages/agents-runtime/test/realtime-context.test.ts @@ -0,0 +1,1312 @@ +import { beforeEach, describe, expect, it, vi } from 'vitest' +import { createTestRealtimeProvider } from '../src/realtime' +import { + buildStreamFixture, + createTestHandlerContext, +} from './helpers/context-test-helpers' +import type { ChangeEvent } from '@durable-streams/state' +import type { WakeEvent } from '../src/types' + +const durableMock = vi.hoisted(() => { + type StreamSource = Iterable | AsyncIterable + const appends: Array<{ url: string; data: unknown }> = [] + const bodyStreams = new Map>() + const jsonStreams = new Map>() + class DurableStream { + constructor(readonly opts: { url: string }) {} + + async append(data: unknown): Promise { + appends.push({ url: this.opts.url, data }) + } + + async stream() { + const url = this.opts.url + return { + bodyStream: async function* () { + for await (const chunk of bodyStreams.get(url) ?? []) { + yield chunk + } + }, + jsonStream: async function* () { + for await (const event of jsonStreams.get(url) ?? []) { + yield event + } + }, + cancel: vi.fn(), + } + } + } + + return { appends, bodyStreams, jsonStreams, DurableStream } +}) + +vi.mock(`@durable-streams/client`, () => ({ + DurableStream: durableMock.DurableStream, +})) + +describe(`ctx.useRealtime()`, () => { + beforeEach(() => { + durableMock.appends.length = 0 + durableMock.bodyStreams.clear() + durableMock.jsonStreams.clear() + }) + + it(`records provider transcript output as realtime transcript rows`, async () => { + const { ctx } = createTestHandlerContext() + + const realtime = ctx.useRealtime({ + systemPrompt: `You are realtime.`, + provider: createTestRealtimeProvider({ response: `hello from voice` }), + tools: [], + }) + + await realtime.run() + + expect(ctx.db.collections.runs.toArray).toMatchObject([ + { key: `run-0`, status: `completed`, finish_reason: `stop` }, + ]) + expect(ctx.db.collections.steps.toArray).toMatchObject([ + { + key: `step-0`, + run_id: `run-0`, + model_provider: `test`, + model_id: `test-realtime`, + status: `completed`, + finish_reason: `stop`, + }, + ]) + expect(ctx.db.collections.textDeltas.toArray).toMatchObject([ + { + key: `realtime-transcript:ephemeral:output:fallback-0:delta-0`, + text_id: `realtime-transcript:ephemeral:output:fallback-0`, + realtime_transcript_id: `realtime-transcript:ephemeral:output:fallback-0`, + delta: `hello from voice`, + }, + ]) + expect(ctx.db.collections.realtimeTranscripts.toArray).toMatchObject([ + { + direction: `output`, + text: `hello from voice`, + status: `final`, + }, + ]) + }) + + it(`persists realtime input and output transcripts`, async () => { + const { ctx } = createTestHandlerContext() + const transcriptEvents: Array<{ + direction: `input` | `output` + text: string + status: `partial` | `final` + turnId?: string + responseId?: string + }> = [] + + const realtime = ctx.useRealtime({ + systemPrompt: `You are realtime.`, + provider: createTestRealtimeProvider({ + events: [ + { type: `session.started`, sessionId: `provider-session` }, + { + type: `input_transcript.delta`, + delta: `hel`, + turnId: `input-item-1`, + }, + { + type: `input_transcript.delta`, + delta: `lo`, + turnId: `input-item-1`, + }, + { + type: `input_transcript.completed`, + text: `hello there`, + turnId: `input-item-1`, + }, + { + type: `output_transcript.delta`, + delta: `Hi`, + responseId: `resp-1`, + }, + { + type: `output_transcript.completed`, + text: `Hi there`, + responseId: `resp-1`, + }, + { type: `session.closed` }, + ], + }), + tools: [], + onTranscript: (event) => { + transcriptEvents.push(event) + }, + }) + + await realtime.run() + + expect(ctx.db.collections.realtimeTranscripts.toArray).toMatchObject([ + { + key: `realtime-transcript:provider-session:input:input-item-1`, + session_id: `provider-session`, + direction: `input`, + text: `hello there`, + status: `final`, + turn_id: `input-item-1`, + audio_stream: `input`, + created_at: expect.any(String), + }, + { + key: `realtime-transcript:provider-session:output:resp-1`, + session_id: `provider-session`, + direction: `output`, + text: `Hi there`, + status: `final`, + response_id: `resp-1`, + audio_stream: `output`, + created_at: expect.any(String), + }, + ]) + expect(ctx.db.collections.textDeltas.toArray).toMatchObject([ + { + key: `realtime-transcript:provider-session:input:input-item-1:delta-0`, + text_id: `realtime-transcript:provider-session:input:input-item-1`, + realtime_transcript_id: `realtime-transcript:provider-session:input:input-item-1`, + delta: `hel`, + }, + { + key: `realtime-transcript:provider-session:input:input-item-1:delta-1`, + text_id: `realtime-transcript:provider-session:input:input-item-1`, + realtime_transcript_id: `realtime-transcript:provider-session:input:input-item-1`, + delta: `lo`, + }, + { + key: `realtime-transcript:provider-session:input:input-item-1:delta-2`, + text_id: `realtime-transcript:provider-session:input:input-item-1`, + realtime_transcript_id: `realtime-transcript:provider-session:input:input-item-1`, + delta: ` there`, + }, + { + key: `realtime-transcript:provider-session:output:resp-1:delta-0`, + text_id: `realtime-transcript:provider-session:output:resp-1`, + realtime_transcript_id: `realtime-transcript:provider-session:output:resp-1`, + delta: `Hi`, + }, + { + key: `realtime-transcript:provider-session:output:resp-1:delta-1`, + text_id: `realtime-transcript:provider-session:output:resp-1`, + realtime_transcript_id: `realtime-transcript:provider-session:output:resp-1`, + delta: ` there`, + }, + ]) + expect(transcriptEvents).toEqual( + expect.arrayContaining([ + expect.objectContaining({ + direction: `input`, + text: `hello there`, + status: `final`, + turnId: `input-item-1`, + }), + expect.objectContaining({ + direction: `output`, + text: `Hi there`, + status: `final`, + responseId: `resp-1`, + }), + ]) + ) + }) + + it(`uses one output transcript source family per response`, async () => { + const { ctx } = createTestHandlerContext() + + const realtime = ctx.useRealtime({ + systemPrompt: `You are realtime.`, + provider: createTestRealtimeProvider({ + events: [ + { type: `session.started`, sessionId: `provider-session` }, + { + type: `output_transcript.delta`, + delta: `Text duplicate`, + responseId: `resp-1`, + itemId: `item-1`, + transcriptSource: `response.output_text`, + }, + { + type: `output_transcript.delta`, + delta: `Audio transcript`, + responseId: `resp-1`, + itemId: `item-1`, + transcriptSource: `response.output_audio_transcript`, + }, + { + type: `output_transcript.delta`, + delta: ` ignored`, + responseId: `resp-1`, + itemId: `item-1`, + transcriptSource: `response.output_text`, + }, + { + type: `output_transcript.completed`, + text: `Audio transcript final`, + responseId: `resp-1`, + itemId: `item-1`, + transcriptSource: `response.output_audio_transcript`, + }, + { type: `session.closed` }, + ], + }), + tools: [], + }) + + await realtime.run() + + expect( + ctx.db.collections.realtimeTranscripts.get( + `realtime-transcript:provider-session:output:resp-1` + ) + ).toMatchObject({ + direction: `output`, + text: `Audio transcript final`, + status: `final`, + }) + }) + + it(`does not seed active realtime session transcripts into provider history`, async () => { + const { ctx } = createTestHandlerContext() + const capturedMessages: Array = [] + + ctx.db.collections.manifests.insert({ + key: `realtime-session:rt-1`, + kind: `realtime-session`, + id: `rt-1`, + provider: `test`, + model: `test-realtime`, + status: `requested`, + startedAt: `2026-06-09T12:00:00.000Z`, + endedAt: null, + retention: `forever`, + streams: { + audio_in: `/entities/test/realtime/rt-1/audio/in`, + audio_out: `/entities/test/realtime/rt-1/audio/out`, + control_in: `/entities/test/realtime/rt-1/control/in`, + control_out: `/entities/test/realtime/rt-1/control/out`, + }, + }) + ctx.db.collections.realtimeTranscripts.insert({ + key: `rt-active`, + session_id: `rt-1`, + direction: `input`, + text: `active session text`, + status: `final`, + audio_stream: `input`, + created_at: `2026-06-09T12:00:01.000Z`, + }) + ctx.db.collections.realtimeTranscripts.insert({ + key: `rt-prior`, + session_id: `rt-prior`, + direction: `input`, + text: `prior session text`, + status: `final`, + audio_stream: `input`, + created_at: `2026-06-09T11:00:01.000Z`, + }) + + const realtime = ctx.useRealtime({ + systemPrompt: `You are realtime.`, + provider: { + id: `test`, + model: `test-realtime`, + async connect(input) { + capturedMessages.push(...input.messages) + return { + events: (async function* () { + yield { type: `session.started` as const, sessionId: `rt-1` } + yield { type: `session.closed` as const } + })(), + } + }, + }, + tools: [], + }) + + await realtime.run() + + expect(capturedMessages).toEqual([ + { role: `user`, content: `prior session text` }, + ]) + }) + + it(`anchors delayed input transcripts at speech start`, async () => { + const db = buildStreamFixture([]) + const events: Array = [] + const { ctx } = createTestHandlerContext({ + db, + writeEvent: (event) => { + events.push(event) + db.utils.applyEvent(event) + }, + }) + + const realtime = ctx.useRealtime({ + systemPrompt: `You are realtime.`, + provider: createTestRealtimeProvider({ + events: [ + { type: `session.started`, sessionId: `provider-session` }, + { type: `input_audio.speech_started`, turnId: `input-item-1` }, + { + type: `output_transcript.delta`, + delta: `Hi`, + responseId: `resp-1`, + }, + { + type: `output_transcript.completed`, + text: `Hi there`, + responseId: `resp-1`, + }, + { + type: `input_transcript.completed`, + text: `hello there`, + turnId: `input-item-1`, + }, + { type: `session.closed` }, + ], + }), + tools: [], + }) + + await realtime.run() + + const transcriptEvents = events.filter( + (event) => + event.type === `realtime_transcript` && + event.key === `realtime-transcript:provider-session:input:input-item-1` + ) + expect(transcriptEvents).toHaveLength(2) + expect(transcriptEvents[0]).toMatchObject({ + headers: { operation: `insert` }, + value: { + direction: `input`, + text: ``, + status: `partial`, + }, + }) + expect(transcriptEvents[1]).toMatchObject({ + headers: { operation: `update` }, + value: { + direction: `input`, + text: `hello there`, + status: `final`, + }, + }) + + const inputTranscriptInsertIndex = events.findIndex( + (event) => event === transcriptEvents[0] + ) + const firstAssistantTranscriptIndex = events.findIndex( + (event) => + event.type === `realtime_transcript` && + event.key === `realtime-transcript:provider-session:output:resp-1` && + event.headers.operation === `insert` + ) + expect(inputTranscriptInsertIndex).toBeGreaterThanOrEqual(0) + expect(firstAssistantTranscriptIndex).toBeGreaterThanOrEqual(0) + expect(inputTranscriptInsertIndex).toBeLessThan( + firstAssistantTranscriptIndex + ) + }) + + it(`splits output transcripts around later input speech`, async () => { + const db = buildStreamFixture([]) + const events: Array = [] + const { ctx } = createTestHandlerContext({ + db, + writeEvent: (event) => { + events.push(event) + db.utils.applyEvent(event) + }, + }) + + const realtime = ctx.useRealtime({ + systemPrompt: `You are realtime.`, + provider: createTestRealtimeProvider({ + events: [ + { type: `session.started`, sessionId: `provider-session` }, + { + type: `output_transcript.delta`, + delta: `Hello `, + responseId: `resp-1`, + }, + { type: `input_audio.speech_started`, turnId: `input-item-1` }, + { + type: `output_transcript.delta`, + delta: `there`, + responseId: `resp-1`, + }, + { + type: `input_transcript.completed`, + text: `interrupting`, + turnId: `input-item-1`, + }, + { + type: `output_transcript.completed`, + text: `Hello there`, + responseId: `resp-1`, + }, + { type: `session.closed` }, + ], + }), + tools: [], + }) + + await realtime.run() + + expect( + ctx.db.collections.realtimeTranscripts.get( + `realtime-transcript:provider-session:output:resp-1` + ) + ).toMatchObject({ + direction: `output`, + text: `Hello `, + status: `final`, + }) + expect( + ctx.db.collections.realtimeTranscripts.get( + `realtime-transcript:provider-session:input:input-item-1` + ) + ).toMatchObject({ + direction: `input`, + text: `interrupting`, + status: `final`, + }) + expect( + ctx.db.collections.realtimeTranscripts.get( + `realtime-transcript:provider-session:output:resp-1:segment-1` + ) + ).toMatchObject({ + direction: `output`, + text: `there`, + status: `final`, + }) + + const firstOutputInsertIndex = events.findIndex( + (event) => + event.type === `realtime_transcript` && + event.key === `realtime-transcript:provider-session:output:resp-1` && + event.headers.operation === `insert` + ) + const inputInsertIndex = events.findIndex( + (event) => + event.type === `realtime_transcript` && + event.key === + `realtime-transcript:provider-session:input:input-item-1` && + event.headers.operation === `insert` + ) + const secondOutputInsertIndex = events.findIndex( + (event) => + event.type === `realtime_transcript` && + event.key === + `realtime-transcript:provider-session:output:resp-1:segment-1` && + event.headers.operation === `insert` + ) + expect(firstOutputInsertIndex).toBeGreaterThanOrEqual(0) + expect(inputInsertIndex).toBeGreaterThan(firstOutputInsertIndex) + expect(secondOutputInsertIndex).toBeGreaterThan(inputInsertIndex) + }) + + it(`finds active realtime sessions from the manifest`, () => { + const { ctx } = createTestHandlerContext() + + ctx.db.collections.manifests.insert({ + key: `realtime-session:rt-1`, + kind: `realtime-session`, + id: `rt-1`, + provider: `openai`, + model: `gpt-realtime-2`, + status: `active`, + startedAt: `2026-06-09T12:00:00.000Z`, + endedAt: null, + retention: `forever`, + streams: { + audio_in: `/entities/test/realtime/rt-1/audio/in`, + audio_out: `/entities/test/realtime/rt-1/audio/out`, + control_in: `/entities/test/realtime/rt-1/control/in`, + control_out: `/entities/test/realtime/rt-1/control/out`, + }, + }) + + expect(ctx.realtime.activeSession()).toMatchObject({ + id: `rt-1`, + status: `active`, + }) + }) + + it(`marks realtime sessions closed when the provider stream ends`, async () => { + const { ctx } = createTestHandlerContext() + + ctx.db.collections.manifests.insert({ + key: `realtime-session:rt-1`, + kind: `realtime-session`, + id: `rt-1`, + provider: `openai`, + model: `gpt-realtime-2`, + status: `requested`, + startedAt: `2026-06-09T12:00:00.000Z`, + endedAt: null, + retention: `forever`, + streams: { + audio_in: `/entities/test/realtime/rt-1/audio/in`, + audio_out: `/entities/test/realtime/rt-1/audio/out`, + control_in: `/entities/test/realtime/rt-1/control/in`, + control_out: `/entities/test/realtime/rt-1/control/out`, + }, + }) + + const realtime = ctx.useRealtime({ + systemPrompt: `You are realtime.`, + provider: createTestRealtimeProvider({ response: `done` }), + tools: [], + }) + + await realtime.run() + + expect(ctx.realtime.activeSession()).toBeUndefined() + expect( + ctx.db.collections.manifests.get(`realtime-session:rt-1`) + ).toMatchObject({ + status: `closed`, + endedAt: expect.any(String), + meta: { reason: `completed` }, + }) + expect( + ctx.db.collections.realtimeSessions.get(`realtime-session:rt-1`) + ).toMatchObject({ + status: `closed`, + ended_at: expect.any(String), + reason: `completed`, + }) + }) + + it(`marks realtime sessions failed when provider setup fails`, async () => { + const { ctx } = createTestHandlerContext() + + ctx.db.collections.manifests.insert({ + key: `realtime-session:rt-1`, + kind: `realtime-session`, + id: `rt-1`, + provider: `openai`, + model: `gpt-realtime-2`, + status: `requested`, + startedAt: `2026-06-09T12:00:00.000Z`, + endedAt: null, + retention: `forever`, + streams: { + audio_in: `/entities/test/realtime/rt-1/audio/in`, + audio_out: `/entities/test/realtime/rt-1/audio/out`, + control_in: `/entities/test/realtime/rt-1/control/in`, + control_out: `/entities/test/realtime/rt-1/control/out`, + }, + }) + + const realtime = ctx.useRealtime({ + systemPrompt: `You are realtime.`, + provider: { + id: `openai`, + model: `gpt-realtime-2`, + connect: async () => { + throw new Error(`missing key`) + }, + }, + tools: [], + }) + + await expect(realtime.run()).rejects.toThrow(`missing key`) + expect(ctx.realtime.activeSession()).toBeUndefined() + expect( + ctx.db.collections.manifests.get(`realtime-session:rt-1`) + ).toMatchObject({ + status: `failed`, + endedAt: expect.any(String), + meta: { error: `missing key` }, + }) + }) + + it(`does not fail the run when OpenAI reports inactive response cancellation`, async () => { + const { ctx } = createTestHandlerContext() + + const realtime = ctx.useRealtime({ + systemPrompt: `You are realtime.`, + provider: createTestRealtimeProvider({ + events: [ + { type: `session.started` }, + { + type: `session.error`, + code: `response_cancel_not_active`, + error: `Cancellation failed: no active response found`, + }, + { type: `session.closed` }, + ], + }), + tools: [], + }) + + await expect(realtime.run()).resolves.toMatchObject({ + usage: { tokens: 0 }, + }) + expect(ctx.db.collections.runs.toArray).toMatchObject([ + { status: `completed`, finish_reason: `stop` }, + ]) + }) + + it(`does not fail the run when OpenAI reports a stale output audio truncate`, async () => { + const { ctx } = createTestHandlerContext() + + const realtime = ctx.useRealtime({ + systemPrompt: `You are realtime.`, + provider: createTestRealtimeProvider({ + events: [ + { type: `session.started` }, + { + type: `session.error`, + code: `invalid_value`, + error: `Audio content of 6350ms is already shorter than 8160ms`, + }, + { type: `session.closed` }, + ], + }), + tools: [], + }) + + await expect(realtime.run()).resolves.toMatchObject({ + usage: { tokens: 0 }, + }) + expect(ctx.db.collections.runs.toArray).toMatchObject([ + { status: `completed`, finish_reason: `stop` }, + ]) + }) + + it(`does not create legacy tool rows for out-of-order realtime tool completions`, async () => { + const { ctx } = createTestHandlerContext() + + const realtime = ctx.useRealtime({ + systemPrompt: `You are realtime.`, + provider: createTestRealtimeProvider({ + events: [ + { type: `session.started` }, + { + type: `tool_call.arguments_completed`, + toolCallId: `call-1`, + name: `first_tool`, + args: { value: 1 }, + }, + { + type: `tool_call.arguments_completed`, + toolCallId: `call-2`, + name: `second_tool`, + args: { value: 2 }, + }, + { + type: `tool_call.completed`, + toolCallId: `call-1`, + name: `first_tool`, + result: `first done`, + }, + { + type: `tool_call.completed`, + toolCallId: `call-2`, + name: `second_tool`, + result: `second done`, + }, + { type: `session.closed` }, + ], + }), + tools: [], + }) + + await realtime.run() + + expect(ctx.db.collections.toolCalls.toArray).toMatchObject([ + { + tool_call_id: `call-1`, + tool_name: `first_tool`, + status: `completed`, + }, + { + tool_call_id: `call-2`, + tool_name: `second_tool`, + status: `completed`, + }, + ]) + expect( + ctx.db.collections.toolCalls.toArray.some((toolCall) => + toolCall.tool_call_id?.startsWith(`legacy-tc-`) + ) + ).toBe(false) + }) + + it(`forwards live inbox notifications to the active realtime provider`, async () => { + let liveWakeHandler: + | ((wake: { + wakeEvent: WakeEvent + wakeOffset: string + ackOffset: string + events: Array + }) => boolean | Promise) + | undefined + let resolveRegistered!: () => void + const registered = new Promise((resolve) => { + resolveRegistered = resolve + }) + let closeProvider!: () => void + const providerClosed = new Promise((resolve) => { + closeProvider = resolve + }) + const sendText = vi.fn(async () => undefined) + const prepareAgentRun = vi.fn(async () => undefined) + const { ctx } = createTestHandlerContext({ + prepareAgentRun, + registerLiveWakeHandler: (handler) => { + liveWakeHandler = handler + resolveRegistered() + return () => { + if (liveWakeHandler === handler) { + liveWakeHandler = undefined + } + } + }, + }) + + const realtime = ctx.useRealtime({ + systemPrompt: `You are realtime.`, + provider: { + id: `test`, + model: `test-realtime`, + async connect() { + return { + events: (async function* () { + yield { type: `session.started` as const } + await providerClosed + yield { type: `session.closed` as const } + })(), + sendText, + } + }, + }, + tools: [], + }) + + const run = realtime.run() + await registered + + await expect( + liveWakeHandler?.({ + wakeEvent: { + type: `inbox`, + source: `/user/alice`, + fromOffset: 0, + toOffset: 0, + eventCount: 1, + payload: `typed while realtime is active`, + }, + wakeOffset: `10_0`, + ackOffset: `10_0`, + events: [], + }) + ).resolves.toBe(true) + + expect(sendText).toHaveBeenCalledWith(`typed while realtime is active`) + expect(prepareAgentRun).toHaveBeenCalled() + + ctx.db.collections.manifests.insert({ + key: `document:story-outline`, + kind: `document`, + id: `story-outline`, + provider: `y-durable-streams`, + docId: `agents/worker/worker-1/documents/story-outline`, + docPath: `agents/worker/worker-1/documents/story-outline`, + streamPath: `/v1/yjs/default/docs/agents/worker/worker-1/documents/story-outline`, + transportMimeType: `application/vnd.electric-agents.markdown-yjs`, + contentMimeType: `text/markdown`, + yTextName: `markdown`, + title: `Story Outline`, + createdAt: `2026-06-17T14:00:00.000Z`, + meta: { + sourceEntityUrl: `/worker/worker-1`, + sourceDocumentId: `story-outline`, + }, + }) + ctx.db.collections.manifests.insert({ + key: `document:story-act-two`, + kind: `document`, + id: `story-act-two`, + provider: `y-durable-streams`, + docId: `agents/worker/worker-2/documents/story-act-two`, + docPath: `agents/worker/worker-2/documents/story-act-two`, + streamPath: `/v1/yjs/default/docs/agents/worker/worker-2/documents/story-act-two`, + transportMimeType: `application/vnd.electric-agents.markdown-yjs`, + contentMimeType: `text/markdown`, + yTextName: `markdown`, + title: `Story Act Two`, + createdAt: `2026-06-17T14:00:00.000Z`, + meta: { + sourceEntityUrl: `/worker/worker-2`, + sourceDocumentId: `story-act-two`, + }, + }) + + await expect( + liveWakeHandler?.({ + wakeEvent: { + type: `wake`, + source: `/horton/parent`, + fromOffset: 0, + toOffset: 0, + eventCount: 2, + payload: { + type: `wake_batch`, + sources: [`/worker/worker-1`, `/worker/worker-2`], + wakes: [ + { + source: `/worker/worker-1`, + timeout: false, + changes: [], + finished_child: { + url: `/worker/worker-1`, + type: `worker`, + run_status: `completed`, + response: `The markdown document is ready.`, + }, + }, + { + source: `/worker/worker-2`, + timeout: false, + changes: [], + finished_child: { + url: `/worker/worker-2`, + type: `worker`, + run_status: `completed`, + response: `The second markdown document is ready.`, + }, + }, + ], + }, + }, + wakeOffset: `11_0`, + ackOffset: `11_0`, + events: [], + }) + ).resolves.toBe(true) + + expect(sendText).toHaveBeenLastCalledWith( + expect.stringContaining(`live Electric Agents notification`) + ) + expect(sendText).toHaveBeenLastCalledWith( + expect.stringContaining(`The markdown document is ready.`) + ) + expect(sendText).toHaveBeenLastCalledWith( + expect.stringContaining(`The second markdown document is ready.`) + ) + expect(sendText).toHaveBeenLastCalledWith( + expect.stringContaining(`Story Outline (id: story-outline)`) + ) + expect(sendText).toHaveBeenLastCalledWith( + expect.stringContaining(`Story Act Two (id: story-act-two)`) + ) + expect(sendText).toHaveBeenLastCalledWith( + expect.stringContaining(`read_markdown_doc`) + ) + + closeProvider() + await run + }) + + it(`persists provider audio and control output to realtime durable streams`, async () => { + const { ctx } = createTestHandlerContext({ + realtimeStreams: { + baseUrl: `http://server.test`, + headers: { authorization: `Bearer claim` }, + }, + }) + ctx.db.collections.manifests.insert({ + key: `realtime-session:rt-1`, + kind: `realtime-session`, + id: `rt-1`, + provider: `openai`, + model: `gpt-realtime-2`, + status: `active`, + startedAt: `2026-06-09T12:00:00.000Z`, + endedAt: null, + retention: `forever`, + streams: { + audio_in: `/test/entity/realtime/rt-1/audio/in`, + audio_out: `/test/entity/realtime/rt-1/audio/out`, + control_in: `/test/entity/realtime/rt-1/control/in`, + control_out: `/test/entity/realtime/rt-1/control/out`, + }, + }) + + const realtime = ctx.useRealtime({ + systemPrompt: `You are realtime.`, + provider: createTestRealtimeProvider({ + events: [ + { type: `session.started`, sessionId: `rt-1` }, + { + type: `output_audio.delta`, + audio: new Uint8Array([1, 2, 3]), + responseId: `resp-1`, + itemId: `item-1`, + }, + { type: `output_audio.completed`, responseId: `resp-1` }, + { type: `session.closed` }, + ], + }), + tools: [], + }) + + await realtime.run() + + expect(durableMock.appends).toEqual([ + { + url: `http://server.test/test/entity/realtime/rt-1/control/out`, + data: expect.any(Uint8Array), + }, + { + url: `http://server.test/test/entity/realtime/rt-1/audio/out`, + data: new Uint8Array([1, 2, 3]), + }, + { + url: `http://server.test/test/entity/realtime/rt-1/control/out`, + data: expect.any(Uint8Array), + }, + { + url: `http://server.test/test/entity/realtime/rt-1/control/out`, + data: expect.any(Uint8Array), + }, + { + url: `http://server.test/test/entity/realtime/rt-1/control/out`, + data: expect.any(Uint8Array), + }, + ]) + const decoder = new TextDecoder() + expect( + JSON.parse(decoder.decode(durableMock.appends[2]!.data as Uint8Array)) + ).toEqual({ + type: `output_audio.delta`, + responseId: `resp-1`, + itemId: `item-1`, + byteLength: 3, + }) + expect(ctx.db.collections.realtimeAudioSpans.toArray).toMatchObject([ + { + session_id: `rt-1`, + stream: `output`, + producer_id: `/test/entity/realtime/rt-1/audio/out`, + seq: 0, + byte_start: 0, + byte_end: 3, + byte_length: 3, + sample_start: 0, + sample_count: 1, + sample_rate: 24_000, + channels: 1, + codec: `pcm16`, + timing_source: `provider`, + participant_id: `assistant`, + provider_item_id: `item-1`, + response_id: `resp-1`, + }, + ]) + }) + + it(`skips realtime input audio commits below the provider minimum`, async () => { + const { ctx } = createTestHandlerContext({ + realtimeStreams: { + baseUrl: `http://server.test`, + headers: { authorization: `Bearer claim` }, + }, + }) + ctx.db.collections.manifests.insert({ + key: `realtime-session:rt-1`, + kind: `realtime-session`, + id: `rt-1`, + provider: `openai`, + model: `gpt-realtime-2`, + status: `active`, + startedAt: `2026-06-09T12:00:00.000Z`, + endedAt: null, + retention: `forever`, + streams: { + audio_in: `/test/entity/realtime/rt-1/audio/in`, + audio_out: `/test/entity/realtime/rt-1/audio/out`, + control_in: `/test/entity/realtime/rt-1/control/in`, + control_out: `/test/entity/realtime/rt-1/control/out`, + }, + }) + + durableMock.bodyStreams.set( + `http://server.test/test/entity/realtime/rt-1/audio/in`, + [new Uint8Array(2048)] + ) + durableMock.jsonStreams.set( + `http://server.test/test/entity/realtime/rt-1/control/in`, + [ + { type: `input_audio.commit`, afterAudioBytes: 2048 }, + { type: `session.close`, reason: `test` }, + ] + ) + + const appendInputAudio = vi.fn() + const clearInputAudio = vi.fn() + const commitInputAudio = vi.fn() + const close = vi.fn() + const realtime = ctx.useRealtime({ + systemPrompt: `You are realtime.`, + provider: { + id: `test`, + model: `test-realtime`, + connect: async () => ({ + appendInputAudio, + clearInputAudio, + commitInputAudio, + close, + events: (async function* () { + yield { type: `session.started` as const, sessionId: `rt-1` } + await new Promise((resolve) => setTimeout(resolve, 20)) + yield { type: `session.closed` as const } + })(), + }), + }, + tools: [], + audio: { + turnDetection: { type: `none` }, + }, + }) + + await realtime.run() + + expect(appendInputAudio).not.toHaveBeenCalled() + expect(clearInputAudio).toHaveBeenCalledTimes(1) + expect(commitInputAudio).not.toHaveBeenCalled() + expect(close).toHaveBeenCalledWith(`test`) + }) + + it(`commits only the requested realtime input audio byte range`, async () => { + const { ctx } = createTestHandlerContext({ + realtimeStreams: { + baseUrl: `http://server.test`, + headers: { authorization: `Bearer claim` }, + }, + }) + ctx.db.collections.manifests.insert({ + key: `realtime-session:rt-1`, + kind: `realtime-session`, + id: `rt-1`, + provider: `openai`, + model: `gpt-realtime-2`, + status: `active`, + startedAt: `2026-06-09T12:00:00.000Z`, + endedAt: null, + retention: `forever`, + streams: { + audio_in: `/test/entity/realtime/rt-1/audio/in`, + audio_out: `/test/entity/realtime/rt-1/audio/out`, + control_in: `/test/entity/realtime/rt-1/control/in`, + control_out: `/test/entity/realtime/rt-1/control/out`, + }, + }) + + const firstTurnAudio = new Uint8Array(4800).fill(1) + const secondTurnAudio = new Uint8Array(4800).fill(2) + durableMock.bodyStreams.set( + `http://server.test/test/entity/realtime/rt-1/audio/in`, + [firstTurnAudio, secondTurnAudio] + ) + durableMock.jsonStreams.set( + `http://server.test/test/entity/realtime/rt-1/control/in`, + [ + { type: `input_audio.commit`, afterAudioBytes: 4800 }, + { type: `input_audio.commit`, afterAudioBytes: 9600 }, + { type: `session.close`, reason: `test` }, + ] + ) + + const appendInputAudio = vi.fn() + const commitInputAudio = vi.fn() + const close = vi.fn() + const realtime = ctx.useRealtime({ + systemPrompt: `You are realtime.`, + provider: { + id: `test`, + model: `test-realtime`, + connect: async () => ({ + appendInputAudio, + commitInputAudio, + close, + events: (async function* () { + yield { type: `session.started` as const, sessionId: `rt-1` } + await new Promise((resolve) => setTimeout(resolve, 20)) + yield { type: `session.closed` as const } + })(), + }), + }, + tools: [], + audio: { + turnDetection: { type: `none` }, + }, + }) + + await realtime.run() + + expect(appendInputAudio).toHaveBeenNthCalledWith(1, firstTurnAudio) + expect(appendInputAudio).toHaveBeenNthCalledWith(2, secondTurnAudio) + expect(commitInputAudio).toHaveBeenCalledTimes(2) + expect(close).toHaveBeenCalledWith(`test`) + }) + + it(`streams realtime input audio directly when provider VAD is enabled`, async () => { + const { ctx } = createTestHandlerContext({ + realtimeStreams: { + baseUrl: `http://server.test`, + headers: { authorization: `Bearer claim` }, + }, + }) + ctx.db.collections.manifests.insert({ + key: `realtime-session:rt-1`, + kind: `realtime-session`, + id: `rt-1`, + provider: `openai`, + model: `gpt-realtime-2`, + status: `active`, + startedAt: `2026-06-09T12:00:00.000Z`, + endedAt: null, + retention: `forever`, + streams: { + audio_in: `/test/entity/realtime/rt-1/audio/in`, + audio_out: `/test/entity/realtime/rt-1/audio/out`, + control_in: `/test/entity/realtime/rt-1/control/in`, + control_out: `/test/entity/realtime/rt-1/control/out`, + }, + }) + + const firstChunk = new Uint8Array(2048).fill(1) + const secondChunk = new Uint8Array(2048).fill(2) + durableMock.bodyStreams.set( + `http://server.test/test/entity/realtime/rt-1/audio/in`, + [firstChunk, secondChunk] + ) + durableMock.jsonStreams.set( + `http://server.test/test/entity/realtime/rt-1/control/in`, + (async function* () { + await new Promise((resolve) => setTimeout(resolve, 20)) + yield { type: `session.close`, reason: `test` } + })() + ) + + const appendInputAudio = vi.fn() + const commitInputAudio = vi.fn() + const close = vi.fn() + const realtime = ctx.useRealtime({ + systemPrompt: `You are realtime.`, + provider: { + id: `test`, + model: `test-realtime`, + connect: async () => ({ + appendInputAudio, + commitInputAudio, + close, + events: (async function* () { + yield { type: `session.started` as const, sessionId: `rt-1` } + await new Promise((resolve) => setTimeout(resolve, 20)) + yield { type: `session.closed` as const } + })(), + }), + }, + tools: [], + }) + + await realtime.run() + + expect(appendInputAudio).toHaveBeenNthCalledWith(1, firstChunk) + expect(appendInputAudio).toHaveBeenNthCalledWith(2, secondChunk) + expect(commitInputAudio).not.toHaveBeenCalled() + expect(close).toHaveBeenCalledWith(`test`) + expect(ctx.db.collections.realtimeAudioSpans.toArray).toMatchObject([ + { + session_id: `rt-1`, + stream: `input`, + producer_id: `/test/entity/realtime/rt-1/audio/in`, + seq: 0, + byte_start: 0, + byte_end: 4096, + byte_length: 4096, + sample_start: 0, + sample_count: 2048, + sample_rate: 24_000, + channels: 1, + codec: `pcm16`, + timing_source: `runtime`, + participant_id: `user`, + }, + ]) + }) + + it(`does not block later realtime control commands behind pending audio bytes`, async () => { + const { ctx } = createTestHandlerContext({ + realtimeStreams: { + baseUrl: `http://server.test`, + headers: { authorization: `Bearer claim` }, + }, + }) + ctx.db.collections.manifests.insert({ + key: `realtime-session:rt-1`, + kind: `realtime-session`, + id: `rt-1`, + provider: `openai`, + model: `gpt-realtime-2`, + status: `active`, + startedAt: `2026-06-09T12:00:00.000Z`, + endedAt: null, + retention: `forever`, + streams: { + audio_in: `/test/entity/realtime/rt-1/audio/in`, + audio_out: `/test/entity/realtime/rt-1/audio/out`, + control_in: `/test/entity/realtime/rt-1/control/in`, + control_out: `/test/entity/realtime/rt-1/control/out`, + }, + }) + + durableMock.jsonStreams.set( + `http://server.test/test/entity/realtime/rt-1/control/in`, + [ + { type: `input_audio.commit`, afterAudioBytes: 9600 }, + { type: `session.close`, reason: `test` }, + ] + ) + + const commitInputAudio = vi.fn() + const close = vi.fn() + const realtime = ctx.useRealtime({ + systemPrompt: `You are realtime.`, + provider: { + id: `test`, + model: `test-realtime`, + connect: async () => ({ + commitInputAudio, + close, + events: (async function* () { + yield { type: `session.started` as const, sessionId: `rt-1` } + await new Promise((resolve) => setTimeout(resolve, 20)) + yield { type: `session.closed` as const } + })(), + }), + }, + tools: [], + audio: { + turnDetection: { type: `none` }, + }, + }) + + await realtime.run() + + expect(commitInputAudio).not.toHaveBeenCalled() + expect(close).toHaveBeenCalledWith(`test`) + }) +}) diff --git a/packages/agents-runtime/test/runtime-dsl.test.ts b/packages/agents-runtime/test/runtime-dsl.test.ts index 4cca907005..d86592f509 100644 --- a/packages/agents-runtime/test/runtime-dsl.test.ts +++ b/packages/agents-runtime/test/runtime-dsl.test.ts @@ -2711,12 +2711,15 @@ t.define(TYPES.n1WakeTypeParent, { key: string wakeType: string source: string + payloadSources?: Array }>(ctx.db, `wakeLog`) + const payload = wake.payload as { sources?: Array } | undefined wakeLog.insert({ key: `wake-${Date.now()}-${Math.random().toString(36).slice(2, 6)}`, wakeType: wake.type, source: wake.source, + ...(payload?.sources ? { payloadSources: payload.sources } : {}), }) await runTestAgent( @@ -2738,6 +2741,24 @@ t.define(TYPES.n1WakeTypeParent, { ) return `spawned:${childId}:wake.type=${wake.type}` } + if (trimmed.startsWith(`spawn_two_and_observe `)) { + const childIds = trimmed + .slice(`spawn_two_and_observe `.length) + .split(/\s+/) + .filter(Boolean) + for (const childId of childIds) { + await ctx.spawn( + TYPES.n1WakeTypeChild, + childId, + {}, + { + initialMessage: `hello from parent`, + wake: { on: `runFinished`, includeResponse: true }, + } + ) + } + return `spawned:${childIds.join(`,`)}:wake.type=${wake.type}` + } return `echo:${trimmed}:wake.type=${wake.type}` }, }) @@ -6258,6 +6279,55 @@ describe(`N: wake primitives verification`, () => { expect(wakeEntry!.wakeType).toBe(`wake`) }, 30_000) + it(`N1b: parent handles both child completion wakes that arrive together`, async () => { + const parent = await t.spawn(TYPES.n1WakeTypeParent, `wake-type-two`) + + await parent.send(`spawn_two_and_observe wt-child-2a wt-child-2b`) + await parent.waitForRun() + + await Promise.all([ + t.entity(`/${TYPES.n1WakeTypeChild}/wt-child-2a`).waitForRun(), + t.entity(`/${TYPES.n1WakeTypeChild}/wt-child-2b`).waitForRun(), + ]) + + const expectedSources = [ + `/${TYPES.n1WakeTypeChild}/wt-child-2a`, + `/${TYPES.n1WakeTypeChild}/wt-child-2b`, + ] + const parentHistory = await parent.waitFor((history) => { + const wakeEntries = history.events + .filter((event) => event.type === `wake_log_entry`) + .map((event) => eventValueRecord(event)) + .filter((value) => value?.wakeType === `wake`) + return ( + wakeEntries.some((value) => + expectedSources.every((source) => + (value?.payloadSources as Array | undefined)?.includes( + source + ) + ) + ) || + expectedSources.every((source) => + wakeEntries.some((value) => value?.source === source) + ) + ) + }, 15_000) + + const wakeEntries = parentHistory.events + .filter((event) => event.type === `wake_log_entry`) + .map((event) => eventValueRecord(event)) + .filter((value) => value?.wakeType === `wake`) + const combinedWakeEntry = wakeEntries.find((value) => value?.payloadSources) + + if (combinedWakeEntry) { + expect(combinedWakeEntry.payloadSources).toEqual(expectedSources) + } else { + expect(new Set(wakeEntries.map((value) => value?.source))).toEqual( + new Set(expectedSources) + ) + } + }, 30_000) + it(`N2: observe(db(...)) with wake option triggers re-wake on shared state write`, async () => { // Finding 2: ctx.observe(db(id, schema), { wake }) now calls // registerWake(), and the server evaluates wakes for shared-state diff --git a/packages/agents-runtime/test/runtime-server-client-update-metadata.test.ts b/packages/agents-runtime/test/runtime-server-client-update-metadata.test.ts index 43398ef7a4..092a72a782 100644 --- a/packages/agents-runtime/test/runtime-server-client-update-metadata.test.ts +++ b/packages/agents-runtime/test/runtime-server-client-update-metadata.test.ts @@ -266,6 +266,84 @@ describe(`runtime-server-client.deleteTag`, () => { }) }) +describe(`runtime-server-client realtime sessions`, () => { + it(`starts a realtime session through the control-plane route`, async () => { + const calls: Array<{ url: string; init?: RequestInit }> = [] + const responseBody = { + sessionId: `rt-1`, + entityUrl: `/horton/demo`, + provider: `openai`, + model: `gpt-realtime-2`, + status: `requested`, + startedAt: `2026-06-09T10:00:00.000Z`, + streams: { + audio_in: `/horton/demo/realtime/rt-1/audio/in`, + audio_out: `/horton/demo/realtime/rt-1/audio/out`, + control_in: `/horton/demo/realtime/rt-1/control/in`, + control_out: `/horton/demo/realtime/rt-1/control/out`, + }, + } + const fakeFetch = vi.fn(async (url: string, init?: RequestInit) => { + calls.push({ url, init }) + return new Response(JSON.stringify(responseBody), { + status: 201, + headers: { 'content-type': `application/json` }, + }) + }) as unknown as typeof fetch + const client = createRuntimeServerClient({ + baseUrl: `http://test.example/t/tenant-a/v1`, + fetch: fakeFetch, + principalKey: `user:sam`, + }) + + await expect( + client.startRealtimeSession({ + entityUrl: `/horton/demo`, + id: `rt-1`, + provider: `openai`, + model: `gpt-realtime-2`, + inputAudio: { codec: `pcm16`, sampleRate: 16_000, channels: 1 }, + meta: { source: `button` }, + }) + ).resolves.toEqual(responseBody) + + expect(calls).toHaveLength(1) + expect(calls[0]!.url).toBe( + `http://test.example/t/tenant-a/v1/_electric/realtime/sessions` + ) + expect(calls[0]!.init?.method).toBe(`POST`) + const headers = new Headers(calls[0]!.init?.headers) + expect(headers.get(`content-type`)).toBe(`application/json`) + expect(headers.get(`electric-principal`)).toBe(`user:sam`) + expect(JSON.parse(calls[0]!.init!.body as string)).toEqual({ + entityUrl: `/horton/demo`, + id: `rt-1`, + provider: `openai`, + model: `gpt-realtime-2`, + inputAudio: { codec: `pcm16`, sampleRate: 16_000, channels: 1 }, + meta: { source: `button` }, + }) + }) + + it(`surfaces realtime session start failures`, async () => { + const fakeFetch = vi.fn( + async () => new Response(`not allowed`, { status: 401 }) + ) as unknown as typeof fetch + const client = createRuntimeServerClient({ + baseUrl: `http://test.example`, + fetch: fakeFetch, + }) + + await expect( + client.startRealtimeSession({ + entityUrl: `/horton/demo`, + provider: `openai`, + model: `gpt-realtime-2`, + }) + ).rejects.toThrow(/startRealtimeSession.*401.*not allowed/) + }) +}) + describe(`runtime-server-client webhook sources`, () => { it(`lists webhook sources from the runtime server`, async () => { const fakeFetch = vi.fn( diff --git a/packages/agents-runtime/test/setup-context.test.ts b/packages/agents-runtime/test/setup-context.test.ts index ec98b88389..05f9913584 100644 --- a/packages/agents-runtime/test/setup-context.test.ts +++ b/packages/agents-runtime/test/setup-context.test.ts @@ -2060,6 +2060,86 @@ describe(`entity patterns`, () => { expect(handle.entityUrl).toContain(`dyn-child-1`) }, 5000) + it(`serializes inline spawn wiring before creating the next child`, async () => { + const db = mockDb() + let releaseFirst!: () => void + const firstRelease = new Promise((resolve) => { + releaseFirst = resolve + }) + let markFirstStarted!: () => void + const firstStarted = new Promise((resolve) => { + markFirstStarted = resolve + }) + let secondStarted = false + + const createOrGetChild = vi.fn(async (_type: string, id: string) => { + if (id === `one`) { + markFirstStarted() + await firstRelease + } else { + secondStarted = true + } + return { + entityUrl: `/worker/${id}-server`, + streamPath: `/worker/${id}-server/main`, + } + }) + const createChildDb = vi.fn(async () => mockDb()) + const ctx = createSetupContext({ + entityUrl: `test-inline-spawn`, + entityType: `test-agent`, + args: Object.freeze({}), + db, + events: [], + writeEvent: () => {}, + serverBaseUrl: `http://localhost:3000`, + effectScope: { + register: vi.fn(), + activateAll: vi.fn(), + disposeAll: vi.fn().mockResolvedValue(undefined), + } as never, + customStateNames: [], + wiring: { + createOrGetChild, + forkEntity: vi.fn(), + createChildDb, + createSourceDb: vi.fn(), + createSharedStateDb: vi.fn(), + } as unknown as WiringConfig, + }) + + const first = ctx.spawn(`worker`, `one`) + await firstStarted + const second = ctx.spawn(`worker`, `two`) + await Promise.resolve() + + expect(createOrGetChild).toHaveBeenCalledTimes(1) + expect(secondStarted).toBe(false) + + releaseFirst() + const [firstHandle, secondHandle] = await Promise.all([first, second]) + + expect(secondStarted).toBe(true) + expect(createOrGetChild.mock.calls.map((call) => call[1])).toEqual([ + `one`, + `two`, + ]) + expect(firstHandle.entityUrl).toBe(`/worker/one-server`) + expect(secondHandle.entityUrl).toBe(`/worker/two-server`) + expect(db.collections.manifests.toArray).toEqual( + expect.arrayContaining([ + expect.objectContaining({ + key: manifestChildKey(`worker`, `one`), + entity_url: `/worker/one-server`, + }), + expect.objectContaining({ + key: manifestChildKey(`worker`, `two`), + entity_url: `/worker/two-server`, + }), + ]) + ) + }) + it(`active-phase observe stages an observe manifest row before completion`, async () => { const { createWakeSession } = await import(`../src/wake-session`) const db = mockDb() diff --git a/packages/agents-runtime/test/timeline-context.test.ts b/packages/agents-runtime/test/timeline-context.test.ts index 0370ca1b1c..ec434cb326 100644 --- a/packages/agents-runtime/test/timeline-context.test.ts +++ b/packages/agents-runtime/test/timeline-context.test.ts @@ -6,6 +6,7 @@ import { import type { EntityStreamDB } from '../src/entity-stream-db' import type { IncludesInboxMessage, + IncludesRealtimeTranscript, IncludesRun, IncludesSignal, IncludesWakeMessage, @@ -172,6 +173,77 @@ describe(`timeline context`, () => { expect(result).toEqual([{ role: `user`, content: `updated text` }]) }) + it(`projects realtime input and output transcripts as chat messages`, () => { + const realtimeTranscripts: Array = [ + { + key: `rt-in`, + order: order(1), + session_id: `rt-1`, + direction: `input`, + text: `voice question`, + status: `final`, + audio_stream: `input`, + created_at: `2026-03-28T00:00:00.000Z`, + }, + { + key: `rt-out`, + order: order(2), + session_id: `rt-1`, + direction: `output`, + text: `voice answer`, + status: `final`, + audio_stream: `output`, + created_at: `2026-03-28T00:00:01.000Z`, + }, + ] + + expect( + buildTimelineMessages({ + runs: [], + inbox: [], + wakes: [], + realtimeTranscripts, + }) + ).toEqual([ + { role: `user`, content: `voice question` }, + { role: `assistant`, content: `voice answer` }, + ]) + }) + + it(`does not project partial realtime transcripts as chat messages`, () => { + const realtimeTranscripts: Array = [ + { + key: `rt-partial`, + order: order(1), + session_id: `rt-1`, + direction: `input`, + text: `partially heard`, + status: `partial`, + audio_stream: `input`, + created_at: `2026-03-28T00:00:00.000Z`, + }, + { + key: `rt-final`, + order: order(2), + session_id: `rt-1`, + direction: `input`, + text: `final question`, + status: `final`, + audio_stream: `input`, + created_at: `2026-03-28T00:00:01.000Z`, + }, + ] + + expect( + buildTimelineMessages({ + runs: [], + inbox: [], + wakes: [], + realtimeTranscripts, + }) + ).toEqual([{ role: `user`, content: `final question` }]) + }) + it(`buildTimelineMessages keeps pending tool calls without emitting tool results`, () => { expect( buildTimelineMessages({ @@ -494,6 +566,7 @@ describe(`timeline context`, () => { __electricRowOffsets: new Map([[`wake-1`, offset(7)]]), }, signals: { toArray: [], __electricRowOffsets: new Map() }, + realtimeTranscripts: { toArray: [], __electricRowOffsets: new Map() }, contextInserted: { toArray: [], __electricRowOffsets: new Map() }, contextRemoved: { toArray: [], __electricRowOffsets: new Map() }, manifests: { toArray: [], __electricRowOffsets: new Map() }, @@ -536,6 +609,7 @@ describe(`timeline context`, () => { inbox: { toArray: [] }, wakes: { toArray: [] }, signals: { toArray: [] }, + realtimeTranscripts: { toArray: [] }, contextInserted: { toArray: [] }, contextRemoved: { toArray: [] }, manifests: { toArray: [] }, diff --git a/packages/agents-runtime/tsdown.config.ts b/packages/agents-runtime/tsdown.config.ts index f2e095fd14..d0f378e6d7 100644 --- a/packages/agents-runtime/tsdown.config.ts +++ b/packages/agents-runtime/tsdown.config.ts @@ -8,6 +8,8 @@ const config: Options = { `src/sandbox.ts`, `src/sandbox-docker.ts`, `src/client.ts`, + `src/use-chat.ts`, + `src/use-chat-hook.ts`, // First-class entry so its .d.ts is stable (raced chunk fails dts gen in CI). `src/skills/types.ts`, ], diff --git a/packages/agents-server-ui/package.json b/packages/agents-server-ui/package.json index 170fbb2854..78fe977656 100644 --- a/packages/agents-server-ui/package.json +++ b/packages/agents-server-ui/package.json @@ -15,8 +15,12 @@ }, "dependencies": { "@base-ui/react": "^1.4.1", + "@codemirror/lang-markdown": "^6.5.0", + "@codemirror/state": "^6.6.0", + "@codemirror/view": "^6.43.0", "@durable-streams/client": "^0.2.6", "@durable-streams/state": "^0.3.1", + "@durable-streams/y-durable-streams": "0.2.7", "@electric-ax/agents-runtime": "workspace:*", "@handlewithcare/react-prosemirror": "^3.0.6", "@streamdown/math": "^1.0.2", @@ -26,8 +30,10 @@ "@tanstack/react-router": "^1.167.4", "@tanstack/react-table": "^8.21.3", "@tanstack/react-virtual": "^3.13.23", + "codemirror": "^6.0.1", "fractional-indexing": "^3.2.0", "katex": "^0.16.45", + "lib0": "^0.2.99", "lucide-react": "^0.561.0", "mermaid": "^11.14.0", "nanoid": "^3.3.11", @@ -41,6 +47,9 @@ "react-reconciler": "0.32.0", "shiki": "^4.0.2", "streamdown": "^2.5.0", + "y-codemirror.next": "0.3.5", + "y-protocols": "^1.0.6", + "yjs": "^13.6.26", "zod": "^3.25.76" }, "devDependencies": { diff --git a/packages/agents-server-ui/src/components/AgentResponse.tsx b/packages/agents-server-ui/src/components/AgentResponse.tsx index 34fd4d2a29..85dd309d12 100644 --- a/packages/agents-server-ui/src/components/AgentResponse.tsx +++ b/packages/agents-server-ui/src/components/AgentResponse.tsx @@ -490,6 +490,10 @@ export const AgentResponseLive = memo(function AgentResponseLive({ (q) => (run.errors ? q.from({ error: run.errors }) : undefined), [run.errors] ) + const { data: steps = [] } = useLiveQuery( + (q) => (run.steps ? q.from({ step: run.steps }) : undefined), + [run.steps] + ) // Subscribe to the run's reasoning rows so the section ticks as // each `reasoning_delta` arrives. Empty array for runs without // any reasoning content (most non-extended-thinking models). @@ -585,6 +589,13 @@ export const AgentResponseLive = memo(function AgentResponseLive({ const toggleReasoning = useCallback((key: string) => { setExpandedReasoning((prev) => ({ ...prev, [key]: !prev[key] })) }, []) + const isRealtimeRun = useMemo( + () => + (steps as Array<{ model_id?: string }>).some((step) => + step.model_id?.includes(`realtime`) + ), + [steps] + ) const contentItems = useMemo( () => liveRunItemsToContentItems(sortedItems), [sortedItems] @@ -657,6 +668,10 @@ export const AgentResponseLive = memo(function AgentResponseLive({ copiedTimerRef.current = setTimeout(() => setCopied(false), 1200) } + if (isRealtimeRun && sortedItems.length === 0 && !failureText) { + return <> + } + return ( {renderEntries.map((entry) => { diff --git a/packages/agents-server-ui/src/components/EntityContextDrawer.tsx b/packages/agents-server-ui/src/components/EntityContextDrawer.tsx index 292b147c89..a563ea2d4e 100644 --- a/packages/agents-server-ui/src/components/EntityContextDrawer.tsx +++ b/packages/agents-server-ui/src/components/EntityContextDrawer.tsx @@ -50,6 +50,7 @@ type DrawerEntry = action: | { kind: `entity`; url: string } | { kind: `state`; sourceId: string } + | { kind: `document`; id: string } | { kind: `inspect` } entity: DrawerEntity | null } @@ -218,11 +219,21 @@ export function EntityContextDrawer({ }) } + const openDocument = (documentId: string, side = false): void => { + helpers.openEntity(entity.url, { + viewId: `markdown-doc`, + viewParams: { doc: documentId }, + ...(side ? { target: { tileId, position: `split-right` as const } } : {}), + }) + } + const handleEntry = (entry: DrawerEntry): void => { if (entry.action.kind === `entity`) { openEntity(entry.action.url) } else if (entry.action.kind === `state`) { openStateInspector(entry.action.sourceId) + } else if (entry.action.kind === `document`) { + openDocument(entry.action.id) } else { setInspectTarget({ title: entry.title, value: entry.manifest }) } @@ -233,6 +244,8 @@ export function EntityContextDrawer({ openEntity(entry.action.url, true) } else if (entry.action.kind === `state`) { openStateInspector(entry.action.sourceId, true) + } else if (entry.action.kind === `document`) { + openDocument(entry.action.id, true) } } @@ -565,6 +578,8 @@ function manifestKindLabel(manifest: Manifest): string { return `Effect` case `attachment`: return `Attachment` + case `document`: + return `Markdown document` case `context`: return `Context` case `schedule`: @@ -572,6 +587,7 @@ function manifestKindLabel(manifest: Manifest): string { case `goal`: return `Goal` } + return manifest.kind } function createParentEntry(parent: DrawerEntity): DrawerEntry { @@ -684,6 +700,18 @@ function createManifestEntry( entity: null, } + case `document`: + return { + key: manifest.key, + groupKey: `document`, + groupLabel: `Documents`, + title: manifest.title, + meta: manifest.id, + manifest, + action: { kind: `document`, id: manifest.id }, + entity: null, + } + case `context`: return { key: manifest.key, @@ -714,6 +742,7 @@ function createManifestEntry( case `goal`: return null } + return null } function describeSourceConfig(config: unknown): string { diff --git a/packages/agents-server-ui/src/components/EntityTimeline.tsx b/packages/agents-server-ui/src/components/EntityTimeline.tsx index 697ff709c5..c2c20411d0 100644 --- a/packages/agents-server-ui/src/components/EntityTimeline.tsx +++ b/packages/agents-server-ui/src/components/EntityTimeline.tsx @@ -21,9 +21,11 @@ import { Database, ExternalLink, FileJson, + FileText, GitBranch, Radio, Reply, + SplitSquareHorizontal, } from 'lucide-react' import { loadTimelineRowHeights, @@ -46,7 +48,7 @@ import { useCurrentPrincipal } from '../hooks/useCurrentPrincipal' import { Icon, IconButton, ScrollArea, Stack, Text, Tooltip } from '../ui' import { UserMessage } from './UserMessage' import type { ForkFromHereAction, UserMessageAttachment } from './UserMessage' -import { AgentResponseLive } from './AgentResponse' +import { AgentResponse, AgentResponseLive } from './AgentResponse' import { CommentBubble } from './CommentBubble' import { InlineEventCard } from './InlineEventCard' import { InlineStatusBadge } from './InlineStatusBadge' @@ -110,6 +112,20 @@ function readInboxPayloadDisplay(payload: unknown): string { return stringifyPayload(payload, 2) } +function isRealtimeSessionWake(row: RenderTimelineRow): boolean { + const changes = row.wake?.payload.changes + if (!Array.isArray(changes)) return false + return changes.some((change) => { + if (!change || typeof change !== `object`) return false + const payload = (change as { payload?: unknown }).payload + return ( + !!payload && + typeof payload === `object` && + (payload as { type?: unknown }).type === `realtime_session.started` + ) + }) +} + function stringifySearchPayload(value: unknown): string { if (value == null) return `` if (typeof value === `string`) return value @@ -243,6 +259,13 @@ function estimateRowHeight( const lines = Math.max(1, Math.ceil(row.comment.body.length / charsPerLine)) return Math.max(58, 42 + lines * lineHeight) + timelineRowGap(row, nextRow) } + if (row.realtimeTranscript) { + const lines = Math.max( + 1, + Math.ceil(row.realtimeTranscript.text.length / charsPerLine) + ) + return Math.max(64, 48 + lines * lineHeight) + timelineRowGap(row) + } if (row.wake || row.signal || row.manifest) { return 76 + timelineRowGap(row, nextRow) } @@ -299,6 +322,7 @@ function timelineRowSearchText( ): string { if (row.comment) return row.comment.body if (row.inbox) return readInboxText(row.inbox.payload) + if (row.realtimeTranscript) return row.realtimeTranscript.text if (row.wake) { return wakeSectionText({ kind: `wake`, @@ -316,6 +340,7 @@ function timelineRowLabel(row: RenderTimelineRow): string { if (row.comment) return `Comment` if (row.inbox?.from_agent) return `Agent message` if (row.inbox) return `User message` + if (row.realtimeTranscript) return `Voice message` if (row.wake) return `Wake` if (row.signal) return `Signal` if (row.error) return `Error` @@ -815,6 +840,7 @@ function isTimelineFindMatch( function ManifestTimelineRow({ manifest, entityUrl, + tileId, entityStatus, onReply, }: { @@ -828,6 +854,8 @@ function ManifestTimelineRow({ const navigate = useNavigate() const entityTarget = getManifestEntityUrl(manifest) const stateSourceId = getManifestStateSourceId(manifest) + const documentId = manifest.kind === `document` ? manifest.id : null + const splitTargetTileId = tileId ?? workspace?.helpers.activeTileId ?? null const isEntity = entityTarget !== null const title = manifestTitle(manifest) const meta = manifestMeta(manifest) @@ -856,13 +884,62 @@ function ManifestTimelineRow({ }) }, [entityUrl, stateSourceId, workspace]) + const openDocument = useCallback(() => { + if (!entityUrl || !documentId || !workspace) return + workspace.helpers.openEntity(entityUrl, { + viewId: `markdown-doc`, + viewParams: { doc: documentId }, + }) + }, [documentId, entityUrl, workspace]) + + const splitDocumentRight = useCallback(() => { + if (!entityUrl || !documentId || !workspace) return + if (!splitTargetTileId) return + workspace.helpers.openEntity(entityUrl, { + viewId: `markdown-doc`, + viewParams: { doc: documentId }, + target: { tileId: splitTargetTileId, position: `split-right` }, + }) + }, [documentId, entityUrl, splitTargetTileId, workspace]) + const statusBadge = entityStatus ? ( {entityStatus} ) : null - const openAction = stateSourceId ? ( + const openAction = documentId ? ( + <> + + + + + + + + + + + + ) : stateSourceId ? ( - {isEntity || stateSourceId ? ( + {isEntity || stateSourceId || documentId ? ( details ) : ( <> @@ -977,6 +1054,8 @@ function manifestKindLabel(manifest: Manifest): string { return `Effect` case `attachment`: return `Attachment` + case `document`: + return `Markdown document` case `context`: return `Context` case `schedule`: @@ -984,6 +1063,7 @@ function manifestKindLabel(manifest: Manifest): string { case `goal`: return `Goal` } + return manifest.kind } function manifestTitle(manifest: Manifest): string { @@ -995,11 +1075,13 @@ function manifestTitle(manifest: Manifest): string { case `shared-state`: case `effect`: case `attachment`: + case `document`: case `context`: case `schedule`: case `goal`: return manifest.id } + return manifest.key } function manifestMeta(manifest: Manifest): string { @@ -1014,6 +1096,8 @@ function manifestMeta(manifest: Manifest): string { return manifest.function_ref case `attachment`: return `${manifest.mimeType} · ${manifest.status}` + case `document`: + return manifest.title case `context`: return `${Object.keys(manifest.attrs).length} attrs` case `schedule`: @@ -1023,6 +1107,7 @@ function manifestMeta(manifest: Manifest): string { case `goal`: return manifest.status ?? `active` } + return `` } function manifestDetails( @@ -1064,6 +1149,16 @@ function manifestDetails( value: `${manifest.subject.type}:${manifest.subject.key}`, }, ] + case `document`: + return [ + { label: `Title`, value: manifest.title }, + { label: `MIME`, value: manifest.contentMimeType }, + { label: `Transport`, value: manifest.transportMimeType }, + { label: `Provider`, value: manifest.provider }, + { label: `Y.Text`, value: manifest.yTextName }, + { label: `Doc ID`, value: manifest.docId }, + { label: `Path`, value: manifest.docPath }, + ] case `context`: return [ { label: `Name`, value: manifest.name }, @@ -1093,12 +1188,14 @@ function manifestDetails( }, ] } + return [] } function manifestIcon(manifest: Manifest) { if (getManifestStateSourceId(manifest)) return Database if (getManifestEntityUrl(manifest)) return GitBranch if (manifest.kind === `schedule`) return Radio + if (manifest.kind === `document`) return FileText if (manifest.kind === `attachment`) return FileJson return FileJson } @@ -1291,6 +1388,46 @@ const TimelineRow = memo(function TimelineRow({ ) } + if (row.realtimeTranscript) { + if (row.realtimeTranscript.text.trim().length === 0) { + return <> + } + const timestamp = Date.parse(row.realtimeTranscript.created_at) + if (row.realtimeTranscript.direction === `output`) { + const isStreamingTranscript = row.realtimeTranscript.status !== `final` + return ( + + ) + } + return ( + + ) + } + if (row.wake) { return ( (null) const textColumnWidth = Math.max(0, contentWidth - CHAT_SURFACE_GUTTER) const displayRows = useMemo( - () => rows.filter((row) => !isAttachmentManifest(row.manifest)), + () => + rows.filter( + (row) => + !isAttachmentManifest(row.manifest) && !isRealtimeSessionWake(row) + ), [rows] ) const attachmentsByInboxKey = useMemo(() => { @@ -1548,7 +1689,7 @@ export function EntityTimeline({ if (streamingIndex < 0) return null for (let index = streamingIndex - 1; index >= 0; index--) { const row = displayRows[index] - if (row?.inbox) { + if (row?.inbox || row?.realtimeTranscript) { return row.$key } } @@ -1565,6 +1706,9 @@ export function EntityTimeline({ if (row.inbox) { const timestamp = Date.parse(row.inbox.timestamp) lastUserTimestamp = Number.isFinite(timestamp) ? timestamp : null + } else if (row.realtimeTranscript) { + const timestamp = Date.parse(row.realtimeTranscript.created_at) + lastUserTimestamp = Number.isFinite(timestamp) ? timestamp : null } else if (row.run) { timestampByRowKey.set(row.$key, lastUserTimestamp) } diff --git a/packages/agents-server-ui/src/components/MessageInput.module.css b/packages/agents-server-ui/src/components/MessageInput.module.css index bf5683dd52..86bff9f36e 100644 --- a/packages/agents-server-ui/src/components/MessageInput.module.css +++ b/packages/agents-server-ui/src/components/MessageInput.module.css @@ -65,6 +65,43 @@ color: var(--ds-text-1); } +.inlineIconButton.voiceActive { + background: var(--ds-accent-a3); + color: var(--ds-accent-11); +} + +.voiceMeter { + display: inline-flex; + align-items: center; + justify-content: center; + gap: 2px; + width: 0; + height: 20px; + color: var(--ds-accent-11); + opacity: 0; + overflow: hidden; + transition: + opacity 0.12s ease, + width 0.12s ease; +} + +.voiceMeter[data-active='true'] { + width: 18px; + opacity: 1; +} + +.voiceMeterBar { + display: block; + width: 3px; + height: 14px; + border-radius: var(--ds-radius-full); + background: currentColor; + transform-origin: center bottom; + transition: + opacity 0.08s linear, + transform 0.08s linear; +} + .inlineIconButton:focus-visible { outline: 2px solid var(--ds-accent-a6); outline-offset: -2px; diff --git a/packages/agents-server-ui/src/components/MessageInput.tsx b/packages/agents-server-ui/src/components/MessageInput.tsx index 906e4e890b..f8af213ad8 100644 --- a/packages/agents-server-ui/src/components/MessageInput.tsx +++ b/packages/agents-server-ui/src/components/MessageInput.tsx @@ -1,5 +1,5 @@ import { useCallback, useEffect, useMemo, useRef, useState } from 'react' -import { ArrowUp, Square } from 'lucide-react' +import { ArrowUp, AudioLines, Square } from 'lucide-react' import { useLiveQuery } from '@tanstack/react-db' import type { EntityStreamDBWithActions } from '@electric-ax/agents-runtime/client' import { @@ -18,6 +18,18 @@ import { parseGoalCommand, serializeComposerInput, } from '@electric-ax/agents-runtime/client' +import { + startRealtimeAudioSession, + type RealtimeAudioSession, +} from '../lib/realtime-audio' +import { + adoptSharedRealtimeSession, + createRealtimeSessionKey, + releaseSharedRealtimeSession, + stopSharedRealtimeSession, + storeSharedRealtimeSession, +} from '../lib/realtime-session-store' +import { useRealtimeAvailability } from '../hooks/useRealtimeAvailability' import { ComposerEditor } from './ComposerEditor' import { ComposerShell } from './ComposerShell' import { Icon, Stack, Text, Tooltip } from '../ui' @@ -70,6 +82,10 @@ export function MessageInput({ drawer, onSend, onStop, + autoStartRealtimeSignal, + autoStartRealtimeInitialText, + autoStartRealtimeGreetIfSilent = false, + onRealtimeAutoStartConsumed, }: { db: EntityStreamDBWithActions | null baseUrl: string @@ -91,6 +107,10 @@ export function MessageInput({ onClearCommentTarget?: () => void onSend?: () => void onStop?: () => void + autoStartRealtimeSignal?: string | null + autoStartRealtimeInitialText?: string + autoStartRealtimeGreetIfSilent?: boolean + onRealtimeAutoStartConsumed?: () => void /** * Optional content rendered above the composer, sharing its docked * width and lift into the timeline above. The composer is z-indexed @@ -117,7 +137,17 @@ export function MessageInput({ key: string originalText: string } | null>(null) + const [realtimePending, setRealtimePending] = useState(false) + const [realtimeActive, setRealtimeActive] = useState(false) + const [realtimeInputLevel, setRealtimeInputLevel] = useState(0) + const realtimeSessionRef = useRef(null) + const handledAutoStartRealtimeRef = useRef(null) const composerFocusRef = useRef<{ focus: () => void } | null>(null) + const realtimeAvailability = useRealtimeAvailability() + const realtimeKey = useMemo( + () => createRealtimeSessionKey(baseUrl, entityUrl), + [baseUrl, entityUrl] + ) const inputDisabled = disabled || writeDisabled const isCommentMode = composerMode === `comment` const attachmentsDisabled = @@ -223,10 +253,35 @@ export function MessageInput({ const showStop = !isCommentMode && generationActive && + !realtimeActive && inputText.length === 0 && attachmentCount === 0 && !disabled const canStop = showStop && !stopPending && !stopDisabled + const canStartRealtime = + !inputDisabled && + !editingMessage && + !isCommentMode && + attachmentCount === 0 && + Boolean(baseUrl) && + realtimeAvailability.canStart + + useEffect(() => { + const session = adoptSharedRealtimeSession(realtimeKey) + realtimeSessionRef.current = session + session?.setInputLevelHandler(setRealtimeInputLevel) + setRealtimeActive(Boolean(session)) + if (!session) setRealtimeInputLevel(0) + + return () => { + const currentSession = realtimeSessionRef.current + currentSession?.setInputLevelHandler(undefined) + if (currentSession) { + releaseSharedRealtimeSession(realtimeKey, currentSession) + } + realtimeSessionRef.current = null + } + }, [realtimeKey]) const handleSubmit = useCallback( (composerPayload?: ComposerInputPayload) => { @@ -254,6 +309,16 @@ export function MessageInput({ return } const files = imageAttachmentsEnabled ? attachments : [] + if (realtimeSessionRef.current && !editingMessage && files.length === 0) { + const session = realtimeSessionRef.current + setValue(``) + onSend?.() + session.sendText(text).catch((err: Error) => { + setError(err.message) + setValue((current) => (current ? current : text)) + }) + return + } const tx = editingMessage ? updateAction?.({ key: editingMessage.key, @@ -318,6 +383,112 @@ export function MessageInput({ handleSubmit() }, [canStop, handleSubmit, onStop]) + const startRealtimeSession = useCallback( + ({ + initialText, + greetIfSilent = false, + }: { initialText?: string; greetIfSilent?: boolean } = {}) => { + if (realtimePending) return + setError(null) + if (!canStartRealtime) { + if (realtimeAvailability.unavailableReason) { + setError(realtimeAvailability.unavailableReason) + } + return + } + const existingSession = adoptSharedRealtimeSession(realtimeKey) + if (existingSession) { + existingSession.setInputLevelHandler(setRealtimeInputLevel) + realtimeSessionRef.current = existingSession + setRealtimeActive(true) + return + } + setRealtimePending(true) + startRealtimeAudioSession({ + baseUrl, + entityUrl, + onInputLevel: setRealtimeInputLevel, + initialText, + greetIfSilent, + }) + .then((session) => { + session.setInputLevelHandler(setRealtimeInputLevel) + storeSharedRealtimeSession(realtimeKey, session) + realtimeSessionRef.current = session + setRealtimeActive(true) + }) + .catch((err: Error) => { + setError(err.message) + setRealtimeInputLevel(0) + }) + .finally(() => { + setRealtimePending(false) + }) + }, + [ + baseUrl, + canStartRealtime, + entityUrl, + realtimeKey, + realtimeAvailability.unavailableReason, + realtimePending, + ] + ) + + const handleRealtimeToggle = useCallback(() => { + if (realtimePending) return + setError(null) + if (realtimeSessionRef.current) { + const session = realtimeSessionRef.current + realtimeSessionRef.current = null + session.setInputLevelHandler(undefined) + setRealtimePending(true) + stopSharedRealtimeSession(realtimeKey, session) + .catch((err: Error) => setError(err.message)) + .finally(() => { + setRealtimeActive(false) + setRealtimeInputLevel(0) + setRealtimePending(false) + }) + return + } + startRealtimeSession() + }, [realtimeKey, realtimePending, startRealtimeSession]) + + useEffect(() => { + if (!autoStartRealtimeSignal) return + if (handledAutoStartRealtimeRef.current === autoStartRealtimeSignal) return + if (realtimeAvailability.loading || realtimePending) return + if (!realtimeAvailability.canStart) { + handledAutoStartRealtimeRef.current = autoStartRealtimeSignal + onRealtimeAutoStartConsumed?.() + if (realtimeAvailability.unavailableReason) { + setError(realtimeAvailability.unavailableReason) + } + return + } + if (!canStartRealtime) return + handledAutoStartRealtimeRef.current = autoStartRealtimeSignal + onRealtimeAutoStartConsumed?.() + if (!realtimeSessionRef.current) { + startRealtimeSession({ + initialText: autoStartRealtimeInitialText, + greetIfSilent: autoStartRealtimeGreetIfSilent, + }) + } + }, [ + autoStartRealtimeSignal, + autoStartRealtimeGreetIfSilent, + autoStartRealtimeInitialText, + canStartRealtime, + onRealtimeAutoStartConsumed, + realtimeAvailability.canStart, + realtimeAvailability.loading, + realtimeAvailability.unavailableReason, + realtimePending, + startRealtimeSession, + ]) + const startEditing = useCallback( (message: EntityTimelineData[`inbox`][number]) => { if (inputDisabled) return @@ -394,6 +565,12 @@ export function MessageInput({ ) const isButtonActive = canSubmit || (showStop && !stopDisabled) + const voiceLevel = realtimeActive ? realtimeInputLevel : 0 + const voiceBars = [ + Math.max(0.18, Math.min(1, 0.24 + voiceLevel * 0.76)), + Math.max(0.24, Math.min(1, 0.34 + voiceLevel * 0.9)), + Math.max(0.16, Math.min(1, 0.2 + voiceLevel * 0.82)), + ] const sendTooltip = showStop ? stopDisabled ? `Signal permission required` @@ -403,6 +580,15 @@ export function MessageInput({ : `Send message` const replyPreviewLabel = formatReplyBannerLabel(commentTarget) const replyPreviewText = commentTarget?.snapshot.text + const realtimeTooltip = realtimeActive + ? `Stop voice mode` + : attachmentCount > 0 + ? `Remove attachments to start voice mode` + : realtimeAvailability.loading + ? `Checking realtime credentials` + : (realtimeAvailability.unavailableReason ?? `Start voice mode`) + const realtimeButtonDisabled = + realtimePending || (!realtimeActive && !canStartRealtime) return ( {drawer?.({ @@ -472,15 +658,61 @@ export function MessageInput({ ) : null } controls={ - imageAttachmentsEnabled && !isCommentMode ? ( - - ) : null + <> + {!isCommentMode ? ( + <> + + + + + + + + ) : null} + {imageAttachmentsEnabled && !isCommentMode ? ( + + ) : null} + } send={ diff --git a/packages/agents-server-ui/src/components/NewSessionPage.module.css b/packages/agents-server-ui/src/components/NewSessionPage.module.css index 70106efecc..241c4d76e1 100644 --- a/packages/agents-server-ui/src/components/NewSessionPage.module.css +++ b/packages/agents-server-ui/src/components/NewSessionPage.module.css @@ -470,6 +470,36 @@ display: inline-flex; } +.composerVoice { + all: unset; + display: inline-flex; + align-items: center; + justify-content: center; + width: 24px; + height: 24px; + border-radius: var(--ds-radius-full); + background: var(--ds-gray-a3); + color: var(--ds-text-3); + cursor: pointer; + transition: + background 0.12s ease, + color 0.12s ease, + opacity 0.12s ease; + flex-shrink: 0; +} +.composerVoice:hover:not(:disabled) { + background: var(--ds-gray-a4); + color: var(--ds-text-1); +} +.composerVoice:disabled { + cursor: not-allowed; + opacity: 0.55; +} +.composerVoicePending { + background: var(--ds-accent-a3); + color: var(--ds-accent-11); +} + .composerSend { all: unset; display: inline-flex; diff --git a/packages/agents-server-ui/src/components/settings/SettingsSidebar.tsx b/packages/agents-server-ui/src/components/settings/SettingsSidebar.tsx index adbb6d5c95..30019938fa 100644 --- a/packages/agents-server-ui/src/components/settings/SettingsSidebar.tsx +++ b/packages/agents-server-ui/src/components/settings/SettingsSidebar.tsx @@ -6,6 +6,7 @@ import { KeyRound, Palette, Plug, + RadioTower, Server, Settings as SettingsIcon, Terminal, @@ -21,6 +22,7 @@ export type SettingsCategoryId = | `account` | `servers` | `credentials` + | `realtime` | `command-line` | `appearance` | `local-runtime` @@ -105,6 +107,12 @@ export function SettingsSidebar({ icon: , visible: true, }, + { + id: `realtime`, + label: `Realtime`, + icon: , + visible: true, + }, { id: `command-line`, label: `Command Line`, diff --git a/packages/agents-server-ui/src/components/settings/pages/RealtimePage.module.css b/packages/agents-server-ui/src/components/settings/pages/RealtimePage.module.css new file mode 100644 index 0000000000..f7681ab603 --- /dev/null +++ b/packages/agents-server-ui/src/components/settings/pages/RealtimePage.module.css @@ -0,0 +1,61 @@ +.modelSelect { + min-width: 240px; +} + +.modelList { + display: flex; + flex-direction: column; + gap: 0; +} + +.modelItem { + display: flex; + align-items: flex-start; + justify-content: space-between; + gap: 16px; + padding: 12px 0; + border-top: 1px solid var(--ds-border-1); +} + +.modelItem:first-child { + padding-top: 0; + border-top: 0; +} + +.modelItem:last-child { + padding-bottom: 0; +} + +.modelText { + min-width: 0; + display: flex; + flex-direction: column; + gap: 4px; +} + +.modelTitle { + display: inline-flex; + align-items: center; + gap: 6px; + min-width: 0; + color: var(--ds-text-1); + font-size: var(--ds-text-sm); +} + +.modelId { + overflow: hidden; + text-overflow: ellipsis; + white-space: nowrap; + color: var(--ds-text-3); + font-size: var(--ds-text-xs); +} + +.modelDescription { + color: var(--ds-text-3); + font-size: var(--ds-text-xs); + line-height: 1.45; +} + +.recommended { + flex-shrink: 0; +} diff --git a/packages/agents-server-ui/src/components/settings/pages/RealtimePage.tsx b/packages/agents-server-ui/src/components/settings/pages/RealtimePage.tsx new file mode 100644 index 0000000000..d465242059 --- /dev/null +++ b/packages/agents-server-ui/src/components/settings/pages/RealtimePage.tsx @@ -0,0 +1,388 @@ +import { useEffect, useMemo, useState } from 'react' +import { useNavigate } from '@tanstack/react-router' +import { + loadRealtimeSettingsStatus, + saveRealtimeSettings, + type RealtimeSettingsStatus, +} from '../../../lib/server-connection' +import { Button, Select, Switch, Text } from '../../../ui' +import { + SettingsPanel, + SettingsRow, + SettingsScreen, + SettingsSection, + SettingsStatusBadge, +} from '../SettingsScreen' +import styles from './RealtimePage.module.css' + +export function RealtimePage(): React.ReactElement { + const isDesktop = typeof window !== `undefined` && Boolean(window.electronAPI) + const navigate = useNavigate() + const [status, setStatus] = useState(null) + const [saving, setSaving] = useState(false) + const [error, setError] = useState(null) + + useEffect(() => { + let cancelled = false + void loadRealtimeSettingsStatus().then((next) => { + if (cancelled) return + setStatus(next) + }) + return () => { + cancelled = true + } + }, []) + + const modelById = useMemo( + () => new Map(status?.availableModels.map((model) => [model.id, model])), + [status?.availableModels] + ) + const voiceById = useMemo( + () => new Map(status?.availableVoices.map((voice) => [voice.id, voice])), + [status?.availableVoices] + ) + const reasoningEffortById = useMemo( + () => + new Map( + status?.availableReasoningEfforts.map((effort) => [effort.id, effort]) + ), + [status?.availableReasoningEfforts] + ) + const selectedModel = status ? modelById.get(status.settings.model) : null + const selectedVoice = status ? voiceById.get(status.settings.voice) : null + const selectedReasoningEffort = status + ? reasoningEffortById.get(status.settings.reasoningEffort) + : null + + const saveSettingsPatch = async ( + patch: Partial + ): Promise => { + if (!status) return + const next = { + ...status, + settings: { ...status.settings, ...patch }, + } + setStatus(next) + setSaving(true) + setError(null) + try { + await saveRealtimeSettings(next.settings) + } catch (err) { + setStatus(status) + setError(err instanceof Error ? err.message : String(err)) + } finally { + setSaving(false) + } + } + + return ( + + + {!isDesktop ? ( + + + Realtime settings are managed by the connected desktop or server + runtime. This web build uses the default model when starting a + session from the browser. + + + ) : !status ? ( + + + Loading… + + + ) : ( + <> + + + {authBadgeLabel(status)} + + + + } + /> + OpenAI + } + /> + { + if (model) void saveSettingsPatch({ model }) + }} + disabled={saving} + > + + model ? (modelById.get(model)?.label ?? model) : `Model` + } + /> + + {status.availableModels.map((model) => ( + + {model.label} + + ))} + + + } + /> + { + if (voice) void saveSettingsPatch({ voice }) + }} + disabled={saving} + > + + voice ? (voiceById.get(voice)?.label ?? voice) : `Voice` + } + /> + + {status.availableVoices.map((voice) => ( + + {voice.label} + + ))} + + + } + /> + { + if (reasoningEffort) { + void saveSettingsPatch({ + reasoningEffort: + reasoningEffort as RealtimeSettingsStatus[`settings`][`reasoningEffort`], + }) + } + }} + disabled={ + saving || status.settings.model !== `gpt-realtime-2` + } + > + + reasoningEffort + ? (reasoningEffortById.get( + reasoningEffort as RealtimeSettingsStatus[`settings`][`reasoningEffort`] + )?.label ?? reasoningEffort) + : `Effort` + } + /> + + {status.availableReasoningEfforts.map((effort) => ( + + {effort.label} + + ))} + + + } + /> + { + void saveSettingsPatch({ interruptResponse }) + }} + /> + } + /> + {saving && ( + + + Saving… + + + )} + {error && ( + + + {error} + + + )} + + )} + + + {status && ( + + +
+ {status.availableVoices.map((voice) => ( +
+
+ + {voice.label} + {voice.recommended && ( + + Recommended + + )} + + {voice.id} + + {voice.description} + +
+ {voice.id === status.settings.voice && ( + + + Selected + + + )} +
+ ))} +
+
+
+ )} + + {status && ( + + +
+ {status.availableModels.map((model) => ( +
+
+ + {model.label} + {model.recommended && ( + + Recommended + + )} + + {model.id} + + {model.description} + +
+ {model.id === status.settings.model && ( + + + Selected + + + )} +
+ ))} +
+
+
+ )} +
+ ) +} + +function authDescription(status: RealtimeSettingsStatus): string { + if (status.openAIApiKeyStatus === `valid`) { + return `Realtime sessions connect to the OpenAI Realtime API with your OpenAI API key.` + } + if (status.openAIApiKeyStatus === `invalid`) { + return ( + status.openAIApiKeyError ?? + `The configured OpenAI API key could not be used for realtime audio.` + ) + } + if (status.openAIApiKeyStatus === `unknown`) { + return ( + status.openAIApiKeyError ?? + `Unable to verify realtime API access right now.` + ) + } + if (status.codexEnabled) { + return `ChatGPT / Codex sign-in is enabled, but realtime voice still needs an OpenAI API key.` + } + return `Add an OpenAI API key in Credentials. ChatGPT / Codex sign-in alone does not grant Realtime API access.` +} + +function authBadgeLabel(status: RealtimeSettingsStatus): string { + switch (status.openAIApiKeyStatus) { + case `valid`: + return `Ready` + case `invalid`: + return `Invalid key` + case `unknown`: + return status.hasOpenAIApiKey ? `Verify failed` : `Checking` + case `missing`: + return `API key required` + } +} diff --git a/packages/agents-server-ui/src/components/views/ChatView.tsx b/packages/agents-server-ui/src/components/views/ChatView.tsx index b6b2509332..caefa99aa2 100644 --- a/packages/agents-server-ui/src/components/views/ChatView.tsx +++ b/packages/agents-server-ui/src/components/views/ChatView.tsx @@ -32,6 +32,9 @@ const CHAT_VIEW_PERMISSIONS: ReadonlyArray = [ `signal`, `fork`, ] +const REALTIME_INITIAL_TEXT_VIEW_PARAM = `realtimeInitialText` +const REALTIME_GREET_VIEW_PARAM = `realtimeGreet` + /** * The default view: chat / timeline + message composer. * @@ -362,6 +365,37 @@ function GenericChatBody({ setStopPending(false) }, [entityUrl]) + const autoStartRealtimeSignal = + viewParams?.realtime === `start` && entityUrl + ? [ + entityUrl, + `realtime`, + `start`, + viewParams[REALTIME_INITIAL_TEXT_VIEW_PARAM] ?? ``, + viewParams[REALTIME_GREET_VIEW_PARAM] ?? ``, + ].join(`:`) + : null + const autoStartRealtimeInitialText = + viewParams?.realtime === `start` + ? viewParams[REALTIME_INITIAL_TEXT_VIEW_PARAM] + : undefined + const autoStartRealtimeGreetIfSilent = + viewParams?.realtime === `start` && + viewParams[REALTIME_GREET_VIEW_PARAM] === `1` + const handleRealtimeAutoStartConsumed = useCallback(() => { + const nextParams = Object.fromEntries( + Object.entries(viewParams ?? {}).filter( + ([key]) => + key !== `realtime` && + key !== REALTIME_INITIAL_TEXT_VIEW_PARAM && + key !== REALTIME_GREET_VIEW_PARAM + ) + ) + helpers.setTileView(tileId, `chat`, { + viewParams: Object.keys(nextParams).length > 0 ? nextParams : undefined, + }) + }, [helpers, tileId, viewParams]) + const stopGeneration = useCallback(() => { if (!canSignal) return if (!entityUrl || !signalEntity || !generationActive || stopPending) return @@ -441,6 +475,10 @@ function GenericChatBody({ )} onSend={() => setSentMessageSignal((value) => value + 1)} onStop={stopGeneration} + autoStartRealtimeSignal={autoStartRealtimeSignal} + autoStartRealtimeInitialText={autoStartRealtimeInitialText} + autoStartRealtimeGreetIfSilent={autoStartRealtimeGreetIfSilent} + onRealtimeAutoStartConsumed={handleRealtimeAutoStartConsumed} /> ) diff --git a/packages/agents-server-ui/src/components/views/MarkdownDocumentView.module.css b/packages/agents-server-ui/src/components/views/MarkdownDocumentView.module.css new file mode 100644 index 0000000000..f740653a1c --- /dev/null +++ b/packages/agents-server-ui/src/components/views/MarkdownDocumentView.module.css @@ -0,0 +1,314 @@ +.root { + --markdown-doc-editor-bg: #fff; + + display: flex; + min-height: 0; + height: 100%; + flex-direction: column; + background: var(--ds-bg); + color: var(--ds-text-1); + font-family: var(--ds-font-body); +} + +.bar { + display: flex; + align-items: center; + justify-content: space-between; + gap: var(--ds-space-3); + min-height: 36px; + box-sizing: border-box; + padding: 6px var(--ds-space-3); + border-top: 1px solid var(--ds-divider); + border-bottom: 1px solid var(--ds-divider); + background: var(--ds-surface); +} + +:global(:root[data-theme='dark']) .root { + --markdown-doc-editor-bg: var(--ds-bg); +} + +.title { + min-width: 0; + overflow: hidden; + text-overflow: ellipsis; + white-space: nowrap; + color: var(--ds-text-1); + font-family: var(--ds-font-heading); + font-size: var(--ds-text-sm); + line-height: var(--ds-text-sm-lh); + font-weight: 600; +} + +.status { + flex: 0 0 auto; + color: var(--ds-text-3); + font-size: var(--ds-text-xs); + line-height: var(--ds-text-xs-lh); +} + +.connectionStatus { + display: inline-flex; + align-items: center; + justify-content: center; + width: var(--ds-icon-sm); + height: var(--ds-icon-sm); + flex: 0 0 auto; + color: var(--ds-text-4); +} + +.connectionStatus[data-status='connected'] { + color: var(--ds-green-11); +} + +.connectionStatus[data-status='connecting'], +.connectionStatus[data-status='loading'] { + color: var(--ds-accent-11); +} + +.connectionStatus[data-status='disconnected'] { + color: var(--ds-text-4); +} + +.connectionStatus[data-status='error'] { + color: var(--ds-red-11); +} + +.editorScrollArea { + min-height: 0; + flex: 1; + overflow: hidden; + background: var(--markdown-doc-editor-bg); +} + +.editorViewport { + background: var(--markdown-doc-editor-bg); +} + +.editor { + min-height: 100%; + background: var(--markdown-doc-editor-bg); +} + +.editor :global(.cm-editor) { + min-height: 100%; + background: var(--markdown-doc-editor-bg); + color: var(--ds-text-1); + font-family: var(--ds-font-mono); + font-size: 13px; + line-height: 1.6; +} + +.editor :global(.cm-editor.cm-focused) { + outline: none; +} + +.editor :global(.cm-scroller) { + font-family: var(--ds-font-mono); + background: var(--markdown-doc-editor-bg); + overflow: visible; +} + +.editor :global(.cm-content) { + min-height: 100%; + padding: 8px 0 36px; + caret-color: var(--ds-accent-11); +} + +.editor :global(.cm-line) { + box-sizing: border-box; + padding: 0 8px; +} + +.editor :global(.cm-cursor) { + border-left-color: var(--ds-accent-11); +} + +.editor :global(.cm-selectionBackground), +.editor :global(.cm-focused .cm-selectionBackground) { + background: var(--ds-accent-a4) !important; +} + +.editor :global(.cm-activeLine) { + background: var(--ds-gray-a2); +} + +.editor :global(.cm-gutters) { + background: var(--ds-bg-subtle); + color: var(--ds-text-4); + border-right: 1px solid var(--ds-divider); + font-family: var(--ds-font-mono); + font-size: var(--ds-text-sm); +} + +.editor :global(.cm-activeLineGutter) { + background: var(--ds-gray-a2); + color: var(--ds-text-2); +} + +.editor :global(.cm-lineNumbers .cm-gutterElement) { + min-width: 24px; + padding: 0 6px; +} + +.editor :global(.cm-foldGutter .cm-gutterElement) { + width: 10px; + padding: 0 2px; + color: var(--ds-text-4); +} + +.editor :global(.cm-matchingBracket), +.editor :global(.cm-nonmatchingBracket) { + background: var(--ds-accent-a3); + color: var(--ds-text-1); +} + +.editor :global(.cm-tooltip), +.editor :global(.cm-tooltip-autocomplete) { + overflow: hidden; + border: 1px solid var(--ds-overlay-border); + border-radius: var(--ds-radius-3); + background: var(--ds-surface-raised); + color: var(--ds-text-1); + box-shadow: var(--ds-overlay-shadow); + font-family: var(--ds-font-body); + font-size: var(--ds-text-sm); +} + +.editor :global(.cm-tooltip-autocomplete ul li[aria-selected]) { + background: var(--ds-bg-hover); + color: var(--ds-text-1); +} + +.editor :global(.cm-panels) { + border-color: var(--ds-divider); + background: var(--ds-surface); + color: var(--ds-text-1); + font-family: var(--ds-font-body); + font-size: var(--ds-text-sm); +} + +.editor :global(.cm-panels-top) { + border-bottom: 1px solid var(--ds-divider); +} + +.editor :global(.cm-panels-bottom) { + border-top: 1px solid var(--ds-divider); +} + +.editor :global(.cm-search) { + display: flex; + align-items: center; + gap: var(--ds-space-2); + padding: 6px var(--ds-space-3); +} + +.editor :global(.cm-search label) { + display: inline-flex; + align-items: center; + gap: 4px; + color: var(--ds-text-2); +} + +.editor :global(.cm-search input) { + min-height: 24px; + box-sizing: border-box; + border: 1px solid var(--ds-border-1); + border-radius: var(--ds-radius-2); + background: var(--ds-input-bg); + color: var(--ds-text-1); + font-family: var(--ds-font-body); + font-size: var(--ds-text-sm); + outline: none; + padding: 2px 8px; +} + +.editor :global(.cm-search input:focus) { + border-color: var(--ds-accent-9); + box-shadow: 0 0 0 1px var(--ds-accent-9); +} + +.editor :global(.cm-search button) { + min-height: 24px; + border: 1px solid transparent; + border-radius: var(--ds-radius-2); + background: transparent; + color: var(--ds-text-2); + cursor: pointer; + font-family: var(--ds-font-body); + font-size: var(--ds-text-sm); + font-weight: 500; + padding: 2px 8px; +} + +.editor :global(.cm-search button:hover) { + background: var(--ds-bg-hover); + color: var(--ds-text-1); +} + +.presence { + display: flex; + align-items: center; + gap: var(--ds-space-2); + min-width: 0; +} + +.presenceDot { + width: 8px; + height: 8px; + flex: 0 0 auto; + border-radius: 999px; + box-shadow: 0 0 0 1px var(--ds-surface); +} + +.empty { + padding: var(--ds-space-5); + color: var(--ds-text-3); + font-family: var(--ds-font-body); + font-size: var(--ds-text-sm); + line-height: var(--ds-text-sm-lh); +} + +.editor :global(.cm-ySelection) { + opacity: 0.32; +} + +.editor :global(.cm-ySelectionCaret) { + z-index: 20; + border-right: 0; + border-left-width: 2px; + display: inline-block; + min-height: 1.25em; +} + +.editor :global(.cm-ySelectionCaretDot) { + display: none; +} + +.editor :global(.cm-ySelectionInfo) { + top: -1.35em; + left: -2px; + z-index: 21; + max-width: 180px; + overflow: hidden; + padding: 1px 5px 2px; + border-radius: var(--ds-radius-1); + color: #fff; + font-family: var(--ds-font-body); + font-size: var(--ds-text-2xs); + font-style: normal; + font-weight: 600; + line-height: 1.2; + opacity: 1; + pointer-events: none; + text-overflow: ellipsis; + white-space: nowrap; + box-shadow: var(--ds-shadow-1); +} + +.editor :global(.cm-ySelectionCaret:hover > .cm-ySelectionInfo) { + opacity: 1; +} + +.editor :global(.cm-yLineSelection) { + opacity: 0.32; +} diff --git a/packages/agents-server-ui/src/components/views/MarkdownDocumentView.test.ts b/packages/agents-server-ui/src/components/views/MarkdownDocumentView.test.ts new file mode 100644 index 0000000000..7746958de6 --- /dev/null +++ b/packages/agents-server-ui/src/components/views/MarkdownDocumentView.test.ts @@ -0,0 +1,73 @@ +import { describe, expect, it } from 'vitest' +import * as encoding from 'lib0/encoding' +import { + Awareness, + encodeAwarenessUpdate, + type Awareness as AwarenessType, +} from 'y-protocols/awareness' +import * as Y from 'yjs' +import { + applyMarkdownAwarenessFrames, + markdownDocumentConnectionConfig, +} from './MarkdownDocumentView' +import type { ManifestDocumentEntry } from '@electric-ax/agents-runtime/client' + +function frame(update: Uint8Array): Uint8Array { + const encoder = encoding.createEncoder() + encoding.writeVarUint8Array(encoder, update) + return encoding.toUint8Array(encoder) +} + +describe(`markdownDocumentConnectionConfig`, () => { + it(`uses explicit provider doc metadata for editor connections`, () => { + const config = markdownDocumentConnectionConfig( + `http://localhost:4437/app`, + { + key: `document:notes`, + kind: `document`, + id: `notes`, + provider: `y-durable-streams`, + docId: `agents/chat/session/documents/notes`, + docPath: `agents/chat/session/documents/notes`, + streamPath: `/v1/yjs/default/docs/agents/chat/session/documents/notes`, + transportMimeType: `application/vnd.electric-agents.markdown-yjs`, + contentMimeType: `text/markdown`, + yTextName: `markdown`, + title: `Notes`, + createdAt: `2026-01-01T00:00:00.000Z`, + } as ManifestDocumentEntry + ) + + expect(config).toMatchObject({ + providerUrl: `http://localhost:4437/app/v1/yjs/default`, + docId: `agents/chat/session/documents/notes`, + yTextName: `markdown`, + }) + expect(config.docUrl.toString()).toBe( + `http://localhost:4437/app/v1/yjs/default/docs/agents/chat/session/documents/notes` + ) + }) +}) + +describe(`applyMarkdownAwarenessFrames`, () => { + it(`applies lib0-framed awareness updates`, () => { + const sourceDoc = new Y.Doc() + const source = new Awareness(sourceDoc) + source.setLocalState({ + user: { name: `horton`, role: `agent`, status: `editing` }, + cursor: { anchor: 4, head: 4 }, + }) + + const target = new Awareness(new Y.Doc()) as AwarenessType + applyMarkdownAwarenessFrames( + target, + frame(encodeAwarenessUpdate(source, [source.clientID])) + ) + + const remoteState = target.getStates().get(source.clientID) + expect(remoteState).toMatchObject({ + user: { name: `horton`, role: `agent`, status: `editing` }, + cursor: { anchor: 4, head: 4 }, + }) + }) +}) diff --git a/packages/agents-server-ui/src/components/views/MarkdownDocumentView.tsx b/packages/agents-server-ui/src/components/views/MarkdownDocumentView.tsx new file mode 100644 index 0000000000..74d454377f --- /dev/null +++ b/packages/agents-server-ui/src/components/views/MarkdownDocumentView.tsx @@ -0,0 +1,427 @@ +import { useEffect, useMemo, useRef, useState } from 'react' +import { markdown } from '@codemirror/lang-markdown' +import { EditorState } from '@codemirror/state' +import { EditorView, basicSetup } from 'codemirror' +import { keymap } from '@codemirror/view' +import { YjsProvider } from '@durable-streams/y-durable-streams' +import { useLiveQuery } from '@tanstack/react-db' +import { Plug, TriangleAlert, Unplug } from 'lucide-react' +import { yCollab, yUndoManagerKeymap } from 'y-codemirror.next' +import * as decoding from 'lib0/decoding' +import { + Awareness, + applyAwarenessUpdate, + encodeAwarenessUpdate, + removeAwarenessStates, +} from 'y-protocols/awareness' +import * as Y from 'yjs' +import { useCurrentPrincipal } from '../../hooks/useCurrentPrincipal' +import { getConfiguredServerHeaders, serverFetch } from '../../lib/auth-fetch' +import { useElectricAgents } from '../../lib/ElectricAgentsProvider' +import { + principalKeyFromInput, + userDisplayName, + userIdFromPrincipal, +} from '../../lib/principals' +import { Icon, ScrollArea } from '../../ui' +import styles from './MarkdownDocumentView.module.css' +import type { EntityViewProps } from '../../lib/workspace/viewRegistry' +import { + MARKDOWN_DOCUMENT_AGENT_PRESENCE_TTL_MS, + type ManifestDocumentEntry, +} from '@electric-ax/agents-runtime/client' +import type { LucideIcon } from 'lucide-react' + +type DocumentResponse = { + document: ManifestDocumentEntry +} + +type DocumentConnectionStatus = + | `loading` + | `connecting` + | `connected` + | `disconnected` + | `error` + +type RemoteUser = { + name: string + status?: string + color?: string + expiresAt?: number +} + +function entityApiUrl(baseUrl: string, entityUrl: string, suffix: string): URL { + const url = new URL(baseUrl) + url.pathname = `${url.pathname.replace(/\/+$/, ``)}/_electric/entities${entityUrl}${suffix}` + return url +} + +function colorFor(value: string): { color: string; light: string } { + const colors = [ + [`#2563eb`, `#2563eb33`], + [`#059669`, `#05966933`], + [`#dc2626`, `#dc262633`], + [`#7c3aed`, `#7c3aed33`], + [`#c2410c`, `#c2410c33`], + [`#0f766e`, `#0f766e33`], + ] as const + let hash = 0 + for (let i = 0; i < value.length; i += 1) { + hash = (hash * 31 + value.charCodeAt(i)) >>> 0 + } + const [color, light] = colors[hash % colors.length]! + return { color, light } +} + +function providerBaseUrl(baseUrl: string, streamPath: string): string { + const docsIndex = streamPath.indexOf(`/docs/`) + const prefix = docsIndex >= 0 ? streamPath.slice(0, docsIndex) : streamPath + const url = new URL(baseUrl) + url.pathname = `${url.pathname.replace(/\/+$/, ``)}${prefix}` + return url.toString().replace(/\/+$/, ``) +} + +function connectionStatusLabel(status: DocumentConnectionStatus): string { + switch (status) { + case `loading`: + return `Loading document` + case `connecting`: + return `Connecting` + case `connected`: + return `Connected` + case `disconnected`: + return `Disconnected` + case `error`: + return `Connection error` + } +} + +function connectionStatusIcon(status: DocumentConnectionStatus): LucideIcon { + switch (status) { + case `error`: + return TriangleAlert + case `disconnected`: + return Unplug + case `loading`: + case `connecting`: + case `connected`: + return Plug + } +} + +function principalPresenceLabel(principalKey: string): string { + const colon = principalKey.indexOf(`:`) + const id = colon >= 0 ? principalKey.slice(colon + 1) : principalKey + if (id.startsWith(`/`)) { + return id.split(`/`).filter(Boolean).at(-1) ?? id + } + return id || principalKey +} + +export function applyMarkdownAwarenessFrames( + awareness: Awareness, + data: Uint8Array +): void { + if (data.length === 0) return + const decoder = decoding.createDecoder(data) + while (decoding.hasContent(decoder)) { + applyAwarenessUpdate( + awareness, + decoding.readVarUint8Array(decoder), + `server` + ) + } +} + +async function primeMarkdownAwareness( + awareness: Awareness, + docUrl: URL, + signal: AbortSignal +): Promise { + const awarenessUrl = new URL(docUrl) + awarenessUrl.searchParams.set(`awareness`, `default`) + awarenessUrl.searchParams.set(`offset`, `-1`) + const response = await serverFetch(awarenessUrl, { + method: `GET`, + headers: getConfiguredServerHeaders(awarenessUrl), + signal, + }) + if (signal.aborted) return + if (response.status === 404) return + if (!response.ok) return + const bytes = new Uint8Array(await response.arrayBuffer()) + if (signal.aborted) return + + const snapshot = new Awareness(new Y.Doc()) + applyMarkdownAwarenessFrames(snapshot, bytes) + const now = Date.now() + const activeAgents = Array.from(snapshot.getStates()) + .filter(([, state]) => { + const user = ( + state as { + user?: { role?: string; status?: string; expiresAt?: number } + } + ).user + return ( + user?.role === `agent` && + user.status === `editing` && + typeof user.expiresAt === `number` && + user.expiresAt > now + ) + }) + .map(([clientId]) => clientId) + if (activeAgents.length > 0) { + applyAwarenessUpdate( + awareness, + encodeAwarenessUpdate(snapshot, activeAgents), + `server` + ) + } + snapshot.destroy() +} + +export function markdownDocumentConnectionConfig( + baseUrl: string, + documentEntry: ManifestDocumentEntry +): { + providerUrl: string + docUrl: URL + docId: string + yTextName: string +} { + const providerUrl = providerBaseUrl(baseUrl, documentEntry.streamPath) + const docId = documentEntry.docId + return { + providerUrl, + docId, + yTextName: documentEntry.yTextName, + docUrl: new URL(`${providerUrl}/docs/${docId}`), + } +} + +export function MarkdownDocumentView({ + baseUrl, + entityUrl, + viewParams, +}: EntityViewProps): React.ReactElement { + const documentId = viewParams?.doc ?? null + const editorRef = useRef(null) + const editorViewRef = useRef(null) + const remoteStateFirstSeenRef = useRef>(new Map()) + const [documentEntry, setDocumentEntry] = + useState(null) + const [status, setStatus] = useState(`loading`) + const [remoteUsers, setRemoteUsers] = useState>([]) + const { principal } = useCurrentPrincipal() + const { usersCollection } = useElectricAgents() + const { data: users = [] } = useLiveQuery( + (q) => { + if (!usersCollection) return undefined + return q.from({ user: usersCollection }) + }, + [usersCollection] + ) + const usersById = useMemo( + () => new Map(users.map((user) => [user.id, user] as const)), + [users] + ) + + useEffect(() => { + let cancelled = false + setDocumentEntry(null) + setStatus(documentId ? `loading` : `error`) + if (!documentId) return + const url = entityApiUrl( + baseUrl, + entityUrl, + `/documents/${encodeURIComponent(documentId)}` + ) + serverFetch(url, { headers: { accept: `application/json` } }) + .then(async (response) => { + if (!response.ok) { + throw new Error(`Document request failed (${response.status})`) + } + return (await response.json()) as DocumentResponse + }) + .then((result) => { + if (!cancelled) setDocumentEntry(result.document) + }) + .catch(() => { + if (!cancelled) setStatus(`error`) + }) + return () => { + cancelled = true + } + }, [baseUrl, entityUrl, documentId]) + + const principalLabel = useMemo(() => { + const userId = userIdFromPrincipal(principal) + const user = userId ? usersById.get(userId) : undefined + const displayName = userDisplayName(user) + if (displayName) return displayName + return principalPresenceLabel(principalKeyFromInput(principal) ?? principal) + }, [principal, usersById]) + + useEffect(() => { + if (!editorRef.current || !documentEntry) return + + const ydoc = new Y.Doc() + const awareness = new Awareness(ydoc) + const userColor = colorFor(principalLabel) + awareness.setLocalStateField(`user`, { + name: principalLabel, + color: userColor.color, + colorLight: userColor.light, + }) + + const { providerUrl, docUrl, docId, yTextName } = + markdownDocumentConnectionConfig(baseUrl, documentEntry) + const awarenessPrimeController = new AbortController() + void primeMarkdownAwareness( + awareness, + docUrl, + awarenessPrimeController.signal + ).catch(() => undefined) + const provider = new YjsProvider({ + doc: ydoc, + baseUrl: providerUrl, + docId, + awareness, + headers: getConfiguredServerHeaders(docUrl), + liveMode: `sse`, + }) + const ytext = ydoc.getText(yTextName) + const state = EditorState.create({ + doc: ytext.toString(), + extensions: [ + keymap.of([...yUndoManagerKeymap]), + basicSetup, + markdown(), + EditorView.lineWrapping, + yCollab(ytext, awareness), + ], + }) + const view = new EditorView({ state, parent: editorRef.current }) + editorViewRef.current = view + + const updateRemoteUsers = (): void => { + const users: Array = [] + const staleClients: Array = [] + const seenClients = new Set() + const now = Date.now() + awareness.getStates().forEach((state, clientId) => { + if (clientId === awareness.clientID) return + seenClients.add(clientId) + const user = ( + state as { + user?: { + name?: string + status?: string + color?: string + role?: string + expiresAt?: number + } + } + ).user + const firstSeen = + remoteStateFirstSeenRef.current.get(clientId) ?? Date.now() + remoteStateFirstSeenRef.current.set(clientId, firstSeen) + const isExpired = + typeof user?.expiresAt === `number` + ? user.expiresAt <= now + : user?.role === `agent` && + user.status === `editing` && + now - firstSeen > MARKDOWN_DOCUMENT_AGENT_PRESENCE_TTL_MS + if (isExpired) { + staleClients.push(clientId) + return + } + if (user?.name) { + users.push({ + name: user.name, + status: user.status, + color: user.color, + expiresAt: user.expiresAt, + }) + } + }) + for (const clientId of remoteStateFirstSeenRef.current.keys()) { + if (!seenClients.has(clientId)) { + remoteStateFirstSeenRef.current.delete(clientId) + } + } + if (staleClients.length > 0) { + removeAwarenessStates(awareness, staleClients, `stale-agent-presence`) + } + setRemoteUsers(users) + } + const statusHandler = (next: DocumentConnectionStatus): void => + setStatus(next) + provider.on(`status`, statusHandler) + awareness.on(`change`, updateRemoteUsers) + const stalePresenceInterval = window.setInterval(updateRemoteUsers, 1_000) + provider.connect() + setStatus(`connecting`) + + return () => { + awarenessPrimeController.abort() + window.clearInterval(stalePresenceInterval) + provider.off(`status`, statusHandler) + awareness.off(`change`, updateRemoteUsers) + provider.destroy() + editorViewRef.current?.destroy() + editorViewRef.current = null + ydoc.destroy() + setRemoteUsers([]) + } + }, [baseUrl, documentEntry, principalLabel]) + + if (!documentId) { + return
No document selected.
+ } + + return ( +
+
+
+ {documentEntry?.title ?? `Markdown document`} +
+
+ + + + {remoteUsers.slice(0, 3).map((user) => { + const color = user.color ?? colorFor(user.name).color + return ( + + + + {user.status ? `${user.name} · ${user.status}` : user.name} + + + ) + })} +
+
+ {status === `error` ? ( +
Document could not be opened.
+ ) : ( + +
+ + )} +
+ ) +} diff --git a/packages/agents-server-ui/src/components/views/NewSessionView.tsx b/packages/agents-server-ui/src/components/views/NewSessionView.tsx index 225cb9351f..acb82306b2 100644 --- a/packages/agents-server-ui/src/components/views/NewSessionView.tsx +++ b/packages/agents-server-ui/src/components/views/NewSessionView.tsx @@ -1,6 +1,7 @@ import { useCallback, useEffect, useMemo, useRef, useState } from 'react' import { ArrowUp, + AudioLines, Check, ChevronDown, ChevronRight, @@ -12,6 +13,7 @@ import { COMPOSER_INPUT_MESSAGE_TYPE } from '@electric-ax/agents-runtime/client' import { nanoid } from 'nanoid' import { useElectricAgents } from '../../lib/ElectricAgentsProvider' import { useWorkspace } from '../../hooks/useWorkspace' +import { useRealtimeAvailability } from '../../hooks/useRealtimeAvailability' import { recentWorkingDirsForRunner } from '../../lib/recentWorkingDirectories' import { isSandboxProfileRemote, @@ -67,6 +69,7 @@ import type { SlashCommandRow, } from '@electric-ax/agents-runtime/client' import type { StandaloneViewProps } from '../../lib/workspace/viewRegistry' +import type { TileViewParams } from '../../lib/workspace/types' /** * The "default agent" — when an entity type with this name is registered @@ -74,6 +77,9 @@ import type { StandaloneViewProps } from '../../lib/workspace/viewRegistry' * so the most common flow is one keystroke away. */ const DEFAULT_AGENT_NAME = `horton` +const REALTIME_AUTOSTART_VIEW_PARAMS: TileViewParams = { realtime: `start` } +const REALTIME_INITIAL_TEXT_VIEW_PARAM = `realtimeInitialText` +const REALTIME_GREET_VIEW_PARAM = `realtimeGreet` const HERO_TITLES = [ `Let’s ship`, @@ -344,7 +350,8 @@ export function NewSessionView({ initialMessage?: unknown, initialMessageType?: string, initialAttachments?: Array, - sandboxProfile?: string | null + sandboxProfile?: string | null, + viewParams?: TileViewParams ): Promise => { if (!spawnEntity) return false setError(null) @@ -402,6 +409,7 @@ export function NewSessionView({ } helpers.openEntity(entityUrl, { target: { tileId, position: `replace` }, + ...(viewParams ? { viewParams } : {}), }) return true } catch (err) { @@ -450,20 +458,15 @@ export function NewSessionView({ return () => setToolbarTitle(null) }, [handleCancelSelected, selected, setToolbarTitle]) - const handleStartDefault = useCallback( - async ( - input: string | ComposerInputPayload, + const prepareDefaultAgentArgs = useCallback( + ( args: Record, - attachments: Array, sandboxProfile: string | null - ): Promise => { - if (!defaultAgent) return false - // Inject the picker's choice into the spawn args for the composer flow - // only — non-default agents have their own schemas and may not - // understand `workingDirectory`. A remote sandbox runs in the provider - // VM, so a host working directory is meaningless there: skip it for - // remote profiles. The spawned session itself becomes the newest - // synced recent for this runner. + ): Record => { + // Inject the picker's choice into the spawn args for the default-agent + // composer only — non-default agents have their own schemas and may not + // understand `workingDirectory`. Remote sandboxes run in provider VMs, so + // host paths are meaningless there. const profileIsRemote = isSandboxProfileRemote( allSandboxProfiles, sandboxProfile @@ -472,7 +475,20 @@ export function NewSessionView({ // factory — require a (non-remote) profile or the arg is a no-op. const includeWorkingDir = workingDirectory !== null && sandboxProfile !== null && !profileIsRemote - const augmented = includeWorkingDir ? { ...args, workingDirectory } : args + return includeWorkingDir ? { ...args, workingDirectory } : args + }, + [allSandboxProfiles, workingDirectory] + ) + + const handleStartDefault = useCallback( + async ( + input: string | ComposerInputPayload, + args: Record, + attachments: Array, + sandboxProfile: string | null + ): Promise => { + if (!defaultAgent) return false + const augmented = prepareDefaultAgentArgs(args, sandboxProfile) const hasAttachments = attachments.length > 0 const initialMessage = typeof input === `string` @@ -493,7 +509,35 @@ export function NewSessionView({ sandboxProfile ) }, - [defaultAgent, doSpawn, workingDirectory, allSandboxProfiles] + [defaultAgent, doSpawn, prepareDefaultAgentArgs] + ) + + const handleStartDefaultRealtime = useCallback( + async ( + input: string, + args: Record, + sandboxProfile: string | null + ): Promise => { + if (!defaultAgent) return false + const augmented = prepareDefaultAgentArgs(args, sandboxProfile) + const initialText = input.trim() + const viewParams: TileViewParams = { + ...REALTIME_AUTOSTART_VIEW_PARAMS, + ...(initialText + ? { [REALTIME_INITIAL_TEXT_VIEW_PARAM]: initialText } + : { [REALTIME_GREET_VIEW_PARAM]: `1` }), + } + return await doSpawn( + defaultAgent.name, + augmented, + undefined, + undefined, + undefined, + sandboxProfile, + viewParams + ) + }, + [defaultAgent, doSpawn, prepareDefaultAgentArgs] ) const defaultComposerReady = @@ -531,6 +575,7 @@ export function NewSessionView({ defaultAgentSandboxProfiles={defaultAgent ? allSandboxProfiles : []} onSelectType={handleSelectType} onStartDefault={handleStartDefault} + onStartDefaultRealtime={handleStartDefaultRealtime} spawnReady={Boolean(spawnEntity)} defaultComposerReady={defaultComposerReady} error={error} @@ -553,6 +598,7 @@ function Picker({ defaultAgentSandboxProfiles, onSelectType, onStartDefault, + onStartDefaultRealtime, spawnReady, defaultComposerReady, error, @@ -573,6 +619,11 @@ function Picker({ attachments: Array, sandboxProfile: string | null ) => Promise + onStartDefaultRealtime: ( + input: string, + args: Record, + sandboxProfile: string | null + ) => Promise spawnReady: boolean defaultComposerReady: boolean error: string | null @@ -608,6 +659,7 @@ function Picker({ agent={defaultAgent} sandboxProfiles={defaultAgentSandboxProfiles} onSubmit={onStartDefault} + onStartRealtime={onStartDefaultRealtime} disabled={!defaultComposerReady} workingDirectory={workingDirectory} onChangeWorkingDirectory={onChangeWorkingDirectory} @@ -888,6 +940,7 @@ function DefaultAgentComposer({ agent, sandboxProfiles, onSubmit, + onStartRealtime, disabled, workingDirectory, onChangeWorkingDirectory, @@ -904,6 +957,11 @@ function DefaultAgentComposer({ attachments: Array, sandboxProfile: string | null ) => Promise + onStartRealtime: ( + input: string, + args: Record, + sandboxProfile: string | null + ) => Promise disabled?: boolean workingDirectory: string | null onChangeWorkingDirectory: (path: string | null) => void @@ -923,8 +981,13 @@ function DefaultAgentComposer({ [sandboxProfiles, selectedSandboxProfile] ) const [value, setValue] = useState(``) - const [submitting, setSubmitting] = useState(false) + const [submittingMode, setSubmittingMode] = useState< + `message` | `realtime` | null + >(null) + const submitting = submittingMode !== null + const realtimeSubmitting = submittingMode === `realtime` const composerFocusRef = useRef<{ focus: () => void } | null>(null) + const realtimeAvailability = useRealtimeAvailability() const inlineProps = useMemo( () => inlineSchemaProperties(agent.creation_schema), [agent.creation_schema] @@ -1004,7 +1067,7 @@ function DefaultAgentComposer({ payload ?? serializeComposerInput(value, slashCommands) const trimmed = nextPayload.source.trim() if ((!trimmed && files.length === 0) || disabled || submitting) return - setSubmitting(true) + setSubmittingMode(`message`) const cleaned: Record = {} for (const [k, v] of Object.entries(args)) { if (v !== undefined && v !== ``) cleaned[k] = v @@ -1023,7 +1086,7 @@ function DefaultAgentComposer({ }) .catch(() => undefined) .finally(() => { - setSubmitting(false) + setSubmittingMode(null) }) }, [ @@ -1040,6 +1103,40 @@ function DefaultAgentComposer({ ] ) + const startRealtime = useCallback(() => { + const files = imageAttachmentsEnabled ? attachments : [] + if (disabled || submitting || files.length > 0) return + if (!realtimeAvailability.canStart) return + const initialText = serializeComposerInput( + value, + slashCommands + ).source.trim() + setSubmittingMode(`realtime`) + const cleaned: Record = {} + for (const [k, v] of Object.entries(args)) { + if (v !== undefined && v !== ``) cleaned[k] = v + } + void onStartRealtime(initialText, cleaned, selectedSandboxProfile) + .then((ok) => { + if (ok) setValue(``) + }) + .catch(() => undefined) + .finally(() => { + setSubmittingMode(null) + }) + }, [ + args, + attachments, + disabled, + imageAttachmentsEnabled, + onStartRealtime, + realtimeAvailability.canStart, + selectedSandboxProfile, + slashCommands, + submitting, + value, + ]) + const attachmentCount = imageAttachmentsEnabled ? attachments.length : 0 const isActive = Boolean( (value.trim() || attachmentCount > 0) && !disabled && !submitting @@ -1048,6 +1145,19 @@ function DefaultAgentComposer({ const sendTooltip = submitting ? `Starting ${agent.name} session` : `Start ${agent.name} session` + const realtimeTooltip = + attachmentCount > 0 + ? `Remove attachments to start voice mode` + : realtimeSubmitting + ? `Starting voice session` + : realtimeAvailability.loading + ? `Checking realtime credentials` + : (realtimeAvailability.unavailableReason ?? `Start voice session`) + const realtimeDisabled = + disabled || + submitting || + attachmentCount > 0 || + !realtimeAvailability.canStart return (
{submitting && ( - Starting… + + {realtimeSubmitting ? `Starting voice…` : `Starting…`} + )} + + + + +