Skip to content

chore(weave): stop verifiers test from hitting live OpenAI#7003

Open
gtarpenning wants to merge 7 commits into
masterfrom
gtarpenning/fix-verifiers-vcr
Open

chore(weave): stop verifiers test from hitting live OpenAI#7003
gtarpenning wants to merge 7 commits into
masterfrom
gtarpenning/fix-verifiers-vcr

Conversation

@gtarpenning
Copy link
Copy Markdown
Member

@gtarpenning gtarpenning commented May 28, 2026

WB-34922

Summary

  • test_verifiers_environment_evaluate_with_mock_env only mocked the verifiers Environment loop, not the model call, so env.evaluate(...) made a live OpenAI request. In CI this 401s (not_authorized_invalid_key_type -> the org rejects user keys), and even locally it depended on a real key.
  • Add a @pytest.mark.vcr marker + committed cassette so the single chat.completions.create call is replayed, matching the rest of tests/integrations/. VCR intercepts at the httpx layer so the 43-call trace structure is unchanged.
  • allow_playback_repeats=True + a dummy api key keep it hermetic regardless of how many completion calls the rollout makes.
  • Also pin project="wandb-qa" in test_langchain_google_vertexai_usage: ChatVertexAI resolves project from GOOGLE_CLOUD_PROJECT (the runner sets wandb-production) before the patched google.auth.default, so VCR couldn't match the wandb-qa cassette path.

Testing

ran locally against verifiers==0.1.3.post0 (CIs pinned version) on the sqlite trace server: passes in 4s, cassette replays the model call, no network (dummy key). vertexai test reproduced the CI failure with GOOGLE_CLOUD_PROJECT=wandb-production and passes with the project pin.

test_verifiers_environment_evaluate_with_mock_env only mocked the
verifiers Environment loop, not the model call, so env.evaluate made a
live OpenAI request. In CI this 401s (org rejects user keys). Add a VCR
cassette + marker so the model call is replayed and the test is hermetic.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@gtarpenning gtarpenning marked this pull request as ready for review May 29, 2026 00:02
@gtarpenning gtarpenning requested a review from a team as a code owner May 29, 2026 00:02
gtarpenning and others added 6 commits May 29, 2026 09:05
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The cassette records project wandb-qa, but ChatVertexAI resolves project
from GOOGLE_CLOUD_PROJECT before the patched google.auth.default. The CI
runner sets it to wandb-production, so VCR (record mode none) can't match
the cassette path. Pin project explicitly so the request is deterministic.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants