Problem
The existing <site> ask commands for chat-app adapters (grok, chatgpt, claude, gemini, deepseek) all wait for the assistant's next visible message bubble and return that text:
clis/grok/ask.js polls [data-testid="assistant-message"] via getMessageBubbles
clis/chatgpt/ask.js, clis/claude/ask.js, clis/deepseek/ask.js, clis/gemini/ask.js — same DOM-scrape shape
That works, but the assistant's underlying SSE response carries a lot the DOM doesn't:
- thinking trace (reasoning models hide it from the rendered bubble until the user clicks a toggle)
- server-assigned
conversationId/responseId/parentResponseId (needed to chain follow-ups deterministically)
model / modelHash (which exact variant served the answer — useful for grok-3 vs grok-4 routing)
generatedImageUrls (the bubble only shows <img> tags; the SSE has stable URLs the agent can save)
title (Grok / DeepSeek / ChatGPT generate a title in the SSE tail)
For agents driving these UIs, missing this metadata means an extra round trip into the DOM or a fragile read call.
Proposal
Add a <site> stream sub-command per chat adapter that:
- Installs a
window.fetch interceptor for the site's streaming chat endpoint
- Grok:
POST /rest/app-chat/conversations/new
- ChatGPT:
POST /backend-api/f/conversation
- Claude:
POST /api/organizations/<org>/chat_conversations/<id>/completion
- Gemini:
POST /_/BardChatUi/.../StreamGenerate
- DeepSeek:
POST /api/v0/chat/completion (uses XHR, not fetch — also patches XMLHttpRequest)
- Drains the response via
response.clone().body.getReader() so SSE chunks are captured even while the site's own client SDK is consuming the body
- Reuses the existing
sendMessage / ensureOn<Site> / isLoggedIn helpers from utils.js to drive the UI
- Parses the SSE/JSON-lines/wrb.fr/JSON-patch frames per site and returns ONE row with
response, thinking, model, conversationId, responseId, title, images
Existing ask stays untouched — it's the right primitive for "just give me the rendered text"; stream is for callers that want the full envelope.
We aren't using Strategy.INTERCEPT / page.installInterceptor() because the upstream interceptor calls await response.clone().json(), which drops Grok's newline-delimited JSON, DeepSeek's text/event-stream, ChatGPT's JSON-patch SSE, etc. The fetch hook for stream keeps the raw body string and parses per site.
Prior art
We've already built and tested this pattern against all five sites in a separate CLI: https://github.com/Daily-AC/webai-cli — single binary that wraps opencli's browser bridge. Each adapter's selectors, endpoint patterns, and SSE shape are documented there.
Plan
- This PR (#TBD):
clis/grok/stream.js + clis/grok/stream.test.js only. Smallest reviewable surface; validates the pattern in the existing adapter style.
- Follow-ups (one PR per site): chatgpt, claude, gemini, deepseek. Each parser is non-trivial enough to warrant its own PR.
- Possible refactor later: if all 5 land cleanly, extract a
_shared/stream-capture.js helper for the fetch/XHR hook + drain pattern. Skipping that now to keep this PR minimal.
Happy to adjust naming (stream vs ask-stream vs ask --api), the columns shape, or anything else before expanding to the other four sites.
Problem
The existing
<site> askcommands for chat-app adapters (grok,chatgpt,claude,gemini,deepseek) all wait for the assistant's next visible message bubble and return that text:clis/grok/ask.jspolls[data-testid="assistant-message"]viagetMessageBubblesclis/chatgpt/ask.js,clis/claude/ask.js,clis/deepseek/ask.js,clis/gemini/ask.js— same DOM-scrape shapeThat works, but the assistant's underlying SSE response carries a lot the DOM doesn't:
conversationId/responseId/parentResponseId(needed to chain follow-ups deterministically)model/modelHash(which exact variant served the answer — useful for grok-3 vs grok-4 routing)generatedImageUrls(the bubble only shows<img>tags; the SSE has stable URLs the agent can save)title(Grok / DeepSeek / ChatGPT generate a title in the SSE tail)For agents driving these UIs, missing this metadata means an extra round trip into the DOM or a fragile
readcall.Proposal
Add a
<site> streamsub-command per chat adapter that:window.fetchinterceptor for the site's streaming chat endpointPOST /rest/app-chat/conversations/newPOST /backend-api/f/conversationPOST /api/organizations/<org>/chat_conversations/<id>/completionPOST /_/BardChatUi/.../StreamGeneratePOST /api/v0/chat/completion(uses XHR, not fetch — also patchesXMLHttpRequest)response.clone().body.getReader()so SSE chunks are captured even while the site's own client SDK is consuming the bodysendMessage/ensureOn<Site>/isLoggedInhelpers fromutils.jsto drive the UIresponse,thinking,model,conversationId,responseId,title,imagesExisting
askstays untouched — it's the right primitive for "just give me the rendered text";streamis for callers that want the full envelope.We aren't using
Strategy.INTERCEPT/page.installInterceptor()because the upstream interceptor callsawait response.clone().json(), which drops Grok's newline-delimited JSON, DeepSeek'stext/event-stream, ChatGPT's JSON-patch SSE, etc. The fetch hook forstreamkeeps the raw body string and parses per site.Prior art
We've already built and tested this pattern against all five sites in a separate CLI: https://github.com/Daily-AC/webai-cli — single binary that wraps opencli's browser bridge. Each adapter's selectors, endpoint patterns, and SSE shape are documented there.
Plan
clis/grok/stream.js+clis/grok/stream.test.jsonly. Smallest reviewable surface; validates the pattern in the existing adapter style._shared/stream-capture.jshelper for the fetch/XHR hook + drain pattern. Skipping that now to keep this PR minimal.Happy to adjust naming (
streamvsask-streamvsask --api), the columns shape, or anything else before expanding to the other four sites.