Skip to content

Add stream command for chat-app adapters that surfaces the full SSE response (grok first) #1847

@Daily-AC

Description

@Daily-AC

Problem

The existing <site> ask commands for chat-app adapters (grok, chatgpt, claude, gemini, deepseek) all wait for the assistant's next visible message bubble and return that text:

  • clis/grok/ask.js polls [data-testid="assistant-message"] via getMessageBubbles
  • clis/chatgpt/ask.js, clis/claude/ask.js, clis/deepseek/ask.js, clis/gemini/ask.js — same DOM-scrape shape

That works, but the assistant's underlying SSE response carries a lot the DOM doesn't:

  • thinking trace (reasoning models hide it from the rendered bubble until the user clicks a toggle)
  • server-assigned conversationId/responseId/parentResponseId (needed to chain follow-ups deterministically)
  • model / modelHash (which exact variant served the answer — useful for grok-3 vs grok-4 routing)
  • generatedImageUrls (the bubble only shows <img> tags; the SSE has stable URLs the agent can save)
  • title (Grok / DeepSeek / ChatGPT generate a title in the SSE tail)

For agents driving these UIs, missing this metadata means an extra round trip into the DOM or a fragile read call.

Proposal

Add a <site> stream sub-command per chat adapter that:

  1. Installs a window.fetch interceptor for the site's streaming chat endpoint
    • Grok: POST /rest/app-chat/conversations/new
    • ChatGPT: POST /backend-api/f/conversation
    • Claude: POST /api/organizations/<org>/chat_conversations/<id>/completion
    • Gemini: POST /_/BardChatUi/.../StreamGenerate
    • DeepSeek: POST /api/v0/chat/completion (uses XHR, not fetch — also patches XMLHttpRequest)
  2. Drains the response via response.clone().body.getReader() so SSE chunks are captured even while the site's own client SDK is consuming the body
  3. Reuses the existing sendMessage / ensureOn<Site> / isLoggedIn helpers from utils.js to drive the UI
  4. Parses the SSE/JSON-lines/wrb.fr/JSON-patch frames per site and returns ONE row with response, thinking, model, conversationId, responseId, title, images

Existing ask stays untouched — it's the right primitive for "just give me the rendered text"; stream is for callers that want the full envelope.

We aren't using Strategy.INTERCEPT / page.installInterceptor() because the upstream interceptor calls await response.clone().json(), which drops Grok's newline-delimited JSON, DeepSeek's text/event-stream, ChatGPT's JSON-patch SSE, etc. The fetch hook for stream keeps the raw body string and parses per site.

Prior art

We've already built and tested this pattern against all five sites in a separate CLI: https://github.com/Daily-AC/webai-cli — single binary that wraps opencli's browser bridge. Each adapter's selectors, endpoint patterns, and SSE shape are documented there.

Plan

  • This PR (#TBD): clis/grok/stream.js + clis/grok/stream.test.js only. Smallest reviewable surface; validates the pattern in the existing adapter style.
  • Follow-ups (one PR per site): chatgpt, claude, gemini, deepseek. Each parser is non-trivial enough to warrant its own PR.
  • Possible refactor later: if all 5 land cleanly, extract a _shared/stream-capture.js helper for the fetch/XHR hook + drain pattern. Skipping that now to keep this PR minimal.

Happy to adjust naming (stream vs ask-stream vs ask --api), the columns shape, or anything else before expanding to the other four sites.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions