chore(examples): soften 00 determinism wording + clarify 09 loop exit

chris-colinsky · chris-colinsky · commit 4ea1657c1c5b · 2026-05-18T14:11:55.000-07:00
- 00 hello-world: docstring + module comment described
  temperature=0.0 as making the run "reproduce deterministically",
  which over-promises. LLM APIs don't guarantee strict determinism
  even at temp 0 (provider-side batching, GPU sampling heuristics,
  model-version drift). Reworded to "reduces sampling variance" and
  "as reproducible as the API allows" so the pedagogical point
  (RuntimeConfig is the tuning knob) lands without an inaccurate
  guarantee. ``_DETERMINISTIC`` variable name kept as a recognizable
  shorthand for the demo.
- 09 tool-use: docstring said the loop terminates when
  ``finish_reason="stop"``, but the route function actually checks
  whether the last AssistantMessage carries any ``tool_calls``.
  finish_reason isn't tracked in state. Reworded to match the
  implementation: "loop terminates when the assistant message has
  no tool_calls (the model is done requesting tools) or after a
  hard turn cap."
diff --git a/examples/00-hello-world/main.py b/examples/00-hello-world/main.py
@@ -13,8 +13,10 @@
     instance on ``Response.parsed``.
   - JSON Schema dict (``research``): raw dict on ``Response.parsed``.
 - ``RuntimeConfig`` for per-call sampling knobs — every ``complete()``
-  here passes ``config=RuntimeConfig(temperature=0.0)`` so the run
-  reproduces deterministically.
+  here passes ``config=RuntimeConfig(temperature=0.0)`` to reduce
+  sampling variance across runs. Temperature 0 isn't a strict
+  determinism guarantee (providers vary at the infra level) but it's
+  the standard tuning knob for "as reproducible as the API allows."
 - Conditional routing on a parsed field (``route`` reads
   ``state.classification.intent``).
 - ``attach_observer`` for boundary visibility.
@@ -87,10 +89,11 @@ class PipelineState(State):
 # builders, IDE inspection) import this module without running main().
 _provider_instance: OpenAIProvider | None = None
 
-# Per-call sampling knobs. The demo locks the model at temperature 0
-# so the routing classification (and the rest of the run) reproduces
-# across invocations — useful for tutorial output, less appropriate
-# for production where some sampling variety is desirable.
+# Per-call sampling knobs. The demo sets temperature 0 to reduce
+# variance across invocations — the run is "as reproducible as the
+# API allows" but not strictly deterministic (providers vary at the
+# infra level even at temp 0). Useful for tutorial output; production
+# usually wants some sampling variety.
 # RuntimeConfig also surfaces max_tokens, top_p, and seed; only
 # temperature is set here so the others fall through to provider
 # defaults.
diff --git a/examples/09-tool-use/main.py b/examples/09-tool-use/main.py
@@ -11,9 +11,9 @@
 
 The agent loops: send messages + tools to the model, dispatch any
 ``tool_calls`` the model emits, feed the results back as
-``ToolMessage`` entries, and call the model again. Loop terminates when
-the model returns content with ``finish_reason="stop"`` (or after a
-hard turn cap).
+``ToolMessage`` entries, and call the model again. Loop terminates
+when the assistant message has no ``tool_calls`` (the model is done
+requesting tools) or after a hard turn cap.
 
 **What's interesting in the implementation:**