Skip to content

Latest commit

 

History

History
34 lines (26 loc) · 1.47 KB

File metadata and controls

34 lines (26 loc) · 1.47 KB

Simulation examples

Worked YAML files demonstrating evalview simulate — hermetic runs against declared mocks (or recorded cassettes) for tool calls, LLM responses, and HTTP.

File What it shows
flight-booking.yaml Declarative mocks: block with subset param matching, a deliberate error-path mock, and expected-behavior assertions that fail if the agent picks the wrong flight.
order-lookup-replay.yaml + cassettes/order-lookup-replay.json The cassette workflow: no mocks: block in the YAML, hermetic replay served entirely from a pre-recorded JSON cassette.

Cassette walkthrough

Cassettes capture real tool calls once and replay them deterministically forever — no live services in CI.

# Replay hermetically using the bundled cassette:
evalview simulate examples/simulation/order-lookup-replay.yaml \
  --replay --cassette-dir examples/simulation/cassettes

# Re-record after the real backend changes (requires API keys):
evalview simulate examples/simulation/order-lookup-replay.yaml \
  --record --cassette-dir examples/simulation/cassettes

The cassette format is documented in docs/SIMULATE.md. Per-tool sequential matching keeps replay robust even when the agent reorders calls between runs; declarative mocks: (if present) take precedence so a single recording can be overridden without re-recording the whole run.

See docs/SIMULATE.md for the full YAML schema and CLI reference.