feat: add per-model Vertex AI region support for judge panel#240
Conversation
litellm.drop_params=True (set by DeepEval) silently strips vertex_project and vertex_location from completion kwargs. This adds a _vertex_override context manager to litellm_patch.py that intercepts these params and temporarily sets them as litellm module-level attributes, which litellm checks as a fallback in its vertex_ai handler. Thread-safe and a no-op when no vertex params are present. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughAdds per-request Vertex AI region overrides to LiteLLM completion wrappers: new sync/async context managers and a shared lock apply ChangesVertex AI per-model region support in LiteLLM wrapper
🎯 4 (Complex) | ⏱️ ~40 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lightspeed_evaluation/core/llm/litellm_patch.py`:
- Around line 118-140: The current _vertex_override context manager skips
acquiring litellm_state_lock when no per-request Vertex params are present,
allowing a race where unmodified requests can read
litellm.vertex_project/litellm.vertex_location while another thread is mutating
them; always acquire the lock before reading or mutating these globals (i.e.,
move the lock acquisition to cover the pop/getattr checks and the set/restore
sequence in _vertex_override). Also stop using a blocking threading.Lock for
async usage: introduce an async-safe lock (e.g., an asyncio.Lock named
litellm_state_async_lock) and ensure any async code path uses an async context
manager that acquires this asyncio.Lock (and never holds either lock across an
await); keep separate sync (threading.Lock) and async locks and make code paths
use the appropriate one so no blocking occurs on the event loop and all accesses
to litellm.vertex_project and litellm.vertex_location are consistently
protected.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d43b6f27-7dbc-4527-a00b-11f24af79baa
📒 Files selected for processing (3)
config/system.yamlsrc/lightspeed_evaluation/core/llm/litellm_patch.pytests/unit/core/llm/test_litellm_patch.py
The sync context manager now always acquires litellm_state_lock, even when no vertex params are present so that non-vertex requests cannot read litellm.vertex_project/vertex_location while another thread is mutating them. An async variant (_vertex_override_async) backed by a new asyncio.Lock replaces the threading.Lock usage in the acompletion path, preventing the event loop from blocking during concurrent calls.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lightspeed_evaluation/core/llm/litellm_patch.py`:
- Around line 109-110: Replace the two module locks with a single
threading.Lock: remove litellm_state_async_lock and keep a single
litellm_state_lock (threading.Lock) to guard access to litellm.vertex_project
and litellm.vertex_location; update _vertex_override_async so it does not await
an asyncio.Lock but instead uses asyncio.to_thread to run a helper that acquires
litellm_state_lock, mutates the vertex fields, and releases the lock (so the
event loop is never bound to an asyncio.Lock); ensure all sync code that
currently uses litellm_state_lock continues to use that same lock and add a test
that exercises interleaved sync/async completions to verify no races.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 76851658-bcac-40ff-8f1b-4ff7776634d0
📒 Files selected for processing (2)
src/lightspeed_evaluation/core/llm/litellm_patch.pytests/unit/core/llm/test_litellm_patch.py
Replace the dual-lock scheme (threading.Lock + asyncio.Lock) with a single threading.Lock shared by both sync and async code paths. _vertex_override_async now uses asyncio.to_thread to acquire the lock in a thread-pool worker, avoiding event-loop blocking without needing a separate asyncio.Lock.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/unit/core/llm/test_litellm_patch.py (1)
93-94: ⚡ Quick winAlso assert lock release, not only acquisition.
Line 93 and Line 179/193 only verify
__enter__; this can miss regressions where the lock is acquired but not released.Proposed fix
- mock_lock.__enter__.assert_called_once() + mock_lock.__enter__.assert_called_once() + mock_lock.__exit__.assert_called_once() ... - assert mock_lock.__enter__.call_count >= 1 + assert mock_lock.__enter__.call_count >= 1 + assert mock_lock.__exit__.call_count >= 1 + assert mock_lock.__enter__.call_count == mock_lock.__exit__.call_count ... - mock_lock.__enter__.assert_called_once() + mock_lock.__enter__.assert_called_once() + mock_lock.__exit__.assert_called_once()Also applies to: 179-193
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/core/llm/test_litellm_patch.py` around lines 93 - 94, The tests currently only assert lock acquisition via mock_lock.__enter__.assert_called_once(); update the assertions to also verify the lock is released by asserting mock_lock.__exit__ was called once (and optionally with the expected exception args (None, None, None) where appropriate) in the same test(s) that reference mock_lock (e.g., the assertions near mock_lock.__enter__ at the locations around lines 93 and 179/193); ensure you add the corresponding mock_lock.__exit__ assertion after the code under test so the test fails if the lock is not released.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lightspeed_evaluation/core/llm/litellm_patch.py`:
- Around line 146-184: The async context manager _vertex_override_async
currently acquires litellm_state_lock only inside the helper _apply and releases
it inside _restore, which lets the lock be dropped during the yield and causes
races; change the flow to acquire the lock (via await
asyncio.to_thread(litellm_state_lock.acquire)) before mutating
litellm.vertex_project / litellm.vertex_location, keep the lock held across the
yield, and then in the finally block release the lock (via await
asyncio.to_thread(litellm_state_lock.release)) after restoring the old values;
use the existing local variables for old_vp/old_vl (from _apply) and drop the
internal with-blocks so the lock lifetime spans the yield in
_vertex_override_async.
In `@tests/unit/core/llm/test_litellm_patch.py`:
- Around line 336-341: The test awaiting asyncio.gather(...) can hang
indefinitely; wrap the gather call in a timeout (e.g., using asyncio.wait_for or
asyncio.timeout) so the call to asyncio.gather(run_sync(...), run_async(...)) is
bounded (add a reasonable seconds constant) and let the test fail if the timeout
is reached; update the test that calls run_sync and run_async (the
asyncio.gather block) to use asyncio.wait_for(..., timeout=...) and assert/allow
the timeout to surface as a test failure.
---
Nitpick comments:
In `@tests/unit/core/llm/test_litellm_patch.py`:
- Around line 93-94: The tests currently only assert lock acquisition via
mock_lock.__enter__.assert_called_once(); update the assertions to also verify
the lock is released by asserting mock_lock.__exit__ was called once (and
optionally with the expected exception args (None, None, None) where
appropriate) in the same test(s) that reference mock_lock (e.g., the assertions
near mock_lock.__enter__ at the locations around lines 93 and 179/193); ensure
you add the corresponding mock_lock.__exit__ assertion after the code under test
so the test fails if the lock is not released.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f51547cd-ebc5-48f0-88a4-8a95fab49735
📒 Files selected for processing (2)
src/lightspeed_evaluation/core/llm/litellm_patch.pytests/unit/core/llm/test_litellm_patch.py
…indow The async context manager acquired and released litellm_state_lock inside separate _apply/_restore helpers, leaving the lock unheld during the yield. A concurrent caller could overwrite vertex_project/ vertex_location globals in that window. Now acquire before mutating and release in finally, matching the sync version's lock lifetime.
Wrap the asyncio.gather call in asyncio.wait_for with a 10-second timeout so the test fails cleanly instead of hanging indefinitely if a deadlock occurs between sync and async vertex completions.
Add __exit__ assertions alongside existing __enter__ checks so the tests verify the lock is both acquired and released.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lightspeed_evaluation/core/llm/litellm_patch.py`:
- Around line 158-162: The async _vertex_override context manager currently
checks vp/vl before acquiring litellm_state_lock, causing a race with sync
callers; change the async implementation to acquire litellm_state_lock first
(using the same async lock/context as the sync path), then pop vp/vl and perform
the None check while holding the lock, and only yield after the lock-protected
override setup so that litellm.vertex_project and litellm.vertex_location are
never observed in a partially-updated state; ensure the lock is released
correctly on exit just like the sync _vertex_override.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3a2515af-05ba-4f04-9dc1-407574e8bda4
📒 Files selected for processing (2)
src/lightspeed_evaluation/core/llm/litellm_patch.pytests/unit/core/llm/test_litellm_patch.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/unit/core/llm/test_litellm_patch.py
| vp = kwargs.pop("vertex_project", None) | ||
| vl = kwargs.pop("vertex_location", None) | ||
| if vp is None and vl is None: | ||
| yield | ||
| return |
There was a problem hiding this comment.
Async no-op path skips lock, creating race condition with sync callers.
The sync _vertex_override holds litellm_state_lock for the entire duration including the no-op case (lines 127-132), but the async version checks vp is None and vl is None before acquiring the lock and yields without it. This allows an async no-op caller to proceed while a sync caller is modifying litellm.vertex_project/litellm.vertex_location, causing the async caller's underlying litellm call to observe partially-updated or temporary override values.
Move the lock acquisition before the None check to match the sync behavior:
🔧 Proposed fix
async def _vertex_override_async(
kwargs: dict[str, Any],
) -> AsyncGenerator[None, None]:
...
- vp = kwargs.pop("vertex_project", None)
- vl = kwargs.pop("vertex_location", None)
- if vp is None and vl is None:
- yield
- return
-
await asyncio.to_thread(litellm_state_lock.acquire)
- old_vp = getattr(litellm, "vertex_project", None)
- old_vl = getattr(litellm, "vertex_location", None)
try:
+ vp = kwargs.pop("vertex_project", None)
+ vl = kwargs.pop("vertex_location", None)
+ if vp is None and vl is None:
+ yield
+ return
+ old_vp = getattr(litellm, "vertex_project", None)
+ old_vl = getattr(litellm, "vertex_location", None)
if vp is not None:
litellm.vertex_project = vp
if vl is not None:
litellm.vertex_location = vl
- yield
- finally:
- litellm.vertex_project = old_vp
- litellm.vertex_location = old_vl
+ try:
+ yield
+ finally:
+ litellm.vertex_project = old_vp
+ litellm.vertex_location = old_vl
+ finally:
await asyncio.to_thread(litellm_state_lock.release)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/lightspeed_evaluation/core/llm/litellm_patch.py` around lines 158 - 162,
The async _vertex_override context manager currently checks vp/vl before
acquiring litellm_state_lock, causing a race with sync callers; change the async
implementation to acquire litellm_state_lock first (using the same async
lock/context as the sync path), then pop vp/vl and perform the None check while
holding the lock, and only yield after the lock-protected override setup so that
litellm.vertex_project and litellm.vertex_location are never observed in a
partially-updated state; ensure the lock is released correctly on exit just like
the sync _vertex_override.
…close race The async _vertex_override_async popped vertex_project/vertex_location from kwargs and checked for None before acquiring litellm_state_lock, allowing concurrent callers to observe partially-updated litellm globals. Move lock acquisition before the pop/check so the async path matches the sync _vertex_override, which already holds the lock from the start.
Description
Vertex AI (Agent Platform) hosts models across different regions; for example,
meta/llama-3.3-70b-instruct-maasmay only be available inus-central1whilegemini-2.0-flashis served fromeurope-west1. When using a panel of various judges, it is possible to require use ofvertex_location(and optionallyvertex_project) to correctly query specific judge models.Currently,
litellm.drop_params=True(set by DeepEval) silently stripsvertex_projectandvertex_locationfrom completion kwargs because they aren't in litellm's supported-params list for the provider. This means per-model region configuration is quietly ignored.This PR adds a
_vertex_overridecontext manager tolitellm_patch.pythat intercepts these params before they reach litellm and temporarily sets themlitellm_state_lock) and a complete no-op when no vertex params are present.Usage Example
Changes
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Documentation
Tests