You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add lightweight turn-level observability to the example LLM server
harness. The C++ worker now reports warm-resume accounting,
prefill/decode duration, total worker time, and token rates in the
terminal JSONL done message. The Python WorkerClient and SessionRuntime
propagate those fields, and ServingChat emits one structured INFO log
line per completed turn.
This is intended for local-agent demos and evals where we need to
understand whether warm resume is firing and where time is spent,
without changing the OpenAI-compatible response payload.
0 commit comments