中文 | English
How does Claude Code transform the raw API stream into a unified runtime event model?
l2-agent-loopl8-streaming
services/api/claude.tsquery.ts
queryModelWithStreamingapi_request_sentfirst_chunkstream_request_startmessage_start
- how request-sent, headers-received, and first-chunk timings are measured
- why
query.tsemitsstream_request_startbefore the stream is consumed - why watchdogs, stall detection, and fallback timeouts live close to the API layer
services/api/claude.ts:752:export async function* queryModelWithStreamingservices/api/claude.ts:1805-1807:api_request_sentservices/api/claude.ts:1971-1973:first_chunkquery.ts:337:stream_request_start
Claude Code streaming is not just text chunking. It is a runtime model that unifies network state, event boundaries, recovery behavior, and latency measurement into one consumable stream. The lesson here is not “how to use the Anthropic SDK,” but how to define event semantics cleanly.
- Why does
services/api/claude.tscare about stalls and idle timeouts? - What is the most important contract between
query.tsand the API layer? - If the underlying stream dies, why is a generic exception not enough?