chore(weave): stop verifiers test from hitting live OpenAI#7003
Open
gtarpenning wants to merge 7 commits into
Open
chore(weave): stop verifiers test from hitting live OpenAI#7003gtarpenning wants to merge 7 commits into
gtarpenning wants to merge 7 commits into
Conversation
test_verifiers_environment_evaluate_with_mock_env only mocked the verifiers Environment loop, not the model call, so env.evaluate made a live OpenAI request. In CI this 401s (org rejects user keys). Add a VCR cassette + marker so the model call is replayed and the test is hermetic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
mscavezze-cw
approved these changes
May 29, 2026
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The cassette records project wandb-qa, but ChatVertexAI resolves project from GOOGLE_CLOUD_PROJECT before the patched google.auth.default. The CI runner sets it to wandb-production, so VCR (record mode none) can't match the cassette path. Pin project explicitly so the request is deterministic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
WB-34922
Summary
test_verifiers_environment_evaluate_with_mock_envonly mocked the verifiersEnvironmentloop, not the model call, soenv.evaluate(...)made a live OpenAI request. In CI this 401s (not_authorized_invalid_key_type-> the org rejects user keys), and even locally it depended on a real key.@pytest.mark.vcrmarker + committed cassette so the singlechat.completions.createcall is replayed, matching the rest oftests/integrations/. VCR intercepts at the httpx layer so the 43-call trace structure is unchanged.allow_playback_repeats=True+ a dummy api key keep it hermetic regardless of how many completion calls the rollout makes.project="wandb-qa"intest_langchain_google_vertexai_usage:ChatVertexAIresolves project fromGOOGLE_CLOUD_PROJECT(the runner setswandb-production) before the patchedgoogle.auth.default, so VCR couldn't match thewandb-qacassette path.Testing
ran locally against
verifiers==0.1.3.post0(CIs pinned version) on the sqlite trace server: passes in 4s, cassette replays the model call, no network (dummy key). vertexai test reproduced the CI failure withGOOGLE_CLOUD_PROJECT=wandb-productionand passes with the project pin.