Skip to content

feat: add per-model Vertex AI region support for judge panel#240

Draft
nvola wants to merge 7 commits into
lightspeed-core:mainfrom
nvola:vertex-per-model-region
Draft

feat: add per-model Vertex AI region support for judge panel#240
nvola wants to merge 7 commits into
lightspeed-core:mainfrom
nvola:vertex-per-model-region

Conversation

@nvola
Copy link
Copy Markdown

@nvola nvola commented May 18, 2026

Description

Vertex AI (Agent Platform) hosts models across different regions; for example, meta/llama-3.3-70b-instruct-maas may only be available in us-central1 while gemini-2.0-flash is served from europe-west1. When using a panel of various judges, it is possible to require use of vertex_location (and optionally vertex_project) to correctly query specific judge models.

Currently, litellm.drop_params=True (set by DeepEval) silently strips vertex_project and vertex_location from completion kwargs because they aren't in litellm's supported-params list for the provider. This means per-model region configuration is quietly ignored.

This PR adds a _vertex_override context manager to litellm_patch.py that intercepts these params before they reach litellm and temporarily sets them litellm_state_lock) and a complete no-op when no vertex params are present.

Usage Example

llm_pool:
  models:
    judge_vertex_gemini:
      provider: vertex_ai
      model: gemini-2.0-flash
      parameters:
        vertex_location: us-central1
    judge_vertex_llama:
      provider: vertex_ai
      model: meta/llama-3.3-70b-instruct-maas
      parameters:
        vertex_location: europe-west1

Changes

  • src/lightspeed_evaluation/core/llm/litellm_patch.py: Added _vertex_override context manager; integrated into existing completion/acompletion wrappers
  • tests/unit/core/llm/test_litellm_patch.py: 11 new tests covering: no-op path, location-only, project-only, both params, exception recovery, lock behavior, sync and async integration
  • config/system.yaml: Added commented example of per-model vertex_location

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Unit tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude
  • Generated by: N/A

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • Per-request region routing for Vertex AI judge models so calls can target a specific Vertex region on a per-call basis.
  • Documentation

    • Added commented example configuration blocks showing how to declare Vertex AI judge models and specify per-model region settings.
  • Tests

    • Added unit and integration-style tests verifying region-override behavior, parameter forwarding/removal, global state restoration, and safe sync/async interleaving.

Review Change Stack

litellm.drop_params=True (set by DeepEval) silently strips
vertex_project and vertex_location from completion kwargs. This
adds a _vertex_override context manager to litellm_patch.py that
intercepts these params and temporarily sets them as litellm
module-level attributes, which litellm checks as a fallback in
its vertex_ai handler. Thread-safe and a no-op when no vertex
params are present.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 57e339e6-f439-4ab3-8ef7-95623f036ecc

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

Adds per-request Vertex AI region overrides to LiteLLM completion wrappers: new sync/async context managers and a shared lock apply vertex_location/vertex_project from completion kwargs to litellm module attributes during the call and restore them afterward. Sync/async wrappers and tests are updated; config examples added.

Changes

Vertex AI per-model region support in LiteLLM wrapper

Layer / File(s) Summary
Module docstring and imports
src/lightspeed_evaluation/core/llm/litellm_patch.py
Document Vertex region support and add contextmanager/asynccontextmanager and typing imports.
Global lock and Vertex override context managers
src/lightspeed_evaluation/core/llm/litellm_patch.py
Add litellm_state_lock and implement _vertex_override and _vertex_override_async that pop vertex_location/vertex_project from kwargs, apply them to litellm module attributes, and restore prior values in a finally block (async uses asyncio.to_thread for lock handling).
Sync and async completion wrapper integration
src/lightspeed_evaluation/core/llm/litellm_patch.py
Execute original litellm.completion and litellm.acompletion inside the Vertex override contexts so per-request Vertex params are active during execution while preserving token tracking and exception logging.
Final patch wiring and embeddings
src/lightspeed_evaluation/core/llm/litellm_patch.py
Retain module-level patching assignments for litellm.completion, litellm.acompletion, litellm.embedding, and litellm.aembedding after the new Vertex infrastructure; embedding wrappers unchanged.
Unit tests for Vertex override and integration
tests/unit/core/llm/test_litellm_patch.py
Add sync and async tests for _vertex_override / _vertex_override_async (no-op, set/restore, kwargs removal, exception restore, lock acquisition), integration tests asserting completion wrappers strip Vertex kwargs when forwarding and restore module state, and concurrency/interleaving tests.
Configuration examples
config/system.yaml
Add commented llm_pool model examples (judge_vertex_gemini, judge_vertex_llama) showing per-model vertex_location for Vertex AI.

🎯 4 (Complex) | ⏱️ ~40 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main feature: per-model Vertex AI region support for judge panels, matching the core objective of enabling region-specific model routing.
Docstring Coverage ✅ Passed Docstring coverage is 92.86% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lightspeed_evaluation/core/llm/litellm_patch.py`:
- Around line 118-140: The current _vertex_override context manager skips
acquiring litellm_state_lock when no per-request Vertex params are present,
allowing a race where unmodified requests can read
litellm.vertex_project/litellm.vertex_location while another thread is mutating
them; always acquire the lock before reading or mutating these globals (i.e.,
move the lock acquisition to cover the pop/getattr checks and the set/restore
sequence in _vertex_override). Also stop using a blocking threading.Lock for
async usage: introduce an async-safe lock (e.g., an asyncio.Lock named
litellm_state_async_lock) and ensure any async code path uses an async context
manager that acquires this asyncio.Lock (and never holds either lock across an
await); keep separate sync (threading.Lock) and async locks and make code paths
use the appropriate one so no blocking occurs on the event loop and all accesses
to litellm.vertex_project and litellm.vertex_location are consistently
protected.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d43b6f27-7dbc-4527-a00b-11f24af79baa

📥 Commits

Reviewing files that changed from the base of the PR and between 07cee27 and e46d5e1.

📒 Files selected for processing (3)
  • config/system.yaml
  • src/lightspeed_evaluation/core/llm/litellm_patch.py
  • tests/unit/core/llm/test_litellm_patch.py

Comment thread src/lightspeed_evaluation/core/llm/litellm_patch.py
The sync context manager now always acquires litellm_state_lock, even when no vertex params are present so that non-vertex requests cannot read litellm.vertex_project/vertex_location while another thread is mutating them. An async variant (_vertex_override_async) backed by a new asyncio.Lock replaces the threading.Lock usage in the acompletion path, preventing the event loop from blocking during concurrent calls.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lightspeed_evaluation/core/llm/litellm_patch.py`:
- Around line 109-110: Replace the two module locks with a single
threading.Lock: remove litellm_state_async_lock and keep a single
litellm_state_lock (threading.Lock) to guard access to litellm.vertex_project
and litellm.vertex_location; update _vertex_override_async so it does not await
an asyncio.Lock but instead uses asyncio.to_thread to run a helper that acquires
litellm_state_lock, mutates the vertex fields, and releases the lock (so the
event loop is never bound to an asyncio.Lock); ensure all sync code that
currently uses litellm_state_lock continues to use that same lock and add a test
that exercises interleaved sync/async completions to verify no races.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 76851658-bcac-40ff-8f1b-4ff7776634d0

📥 Commits

Reviewing files that changed from the base of the PR and between e46d5e1 and 2369a0b.

📒 Files selected for processing (2)
  • src/lightspeed_evaluation/core/llm/litellm_patch.py
  • tests/unit/core/llm/test_litellm_patch.py

Comment thread src/lightspeed_evaluation/core/llm/litellm_patch.py Outdated
Replace the dual-lock scheme (threading.Lock + asyncio.Lock) with a single threading.Lock shared by both sync and async code paths. _vertex_override_async now uses asyncio.to_thread to acquire the lock in a thread-pool worker, avoiding event-loop blocking without needing a separate asyncio.Lock.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/unit/core/llm/test_litellm_patch.py (1)

93-94: ⚡ Quick win

Also assert lock release, not only acquisition.

Line 93 and Line 179/193 only verify __enter__; this can miss regressions where the lock is acquired but not released.

Proposed fix
-        mock_lock.__enter__.assert_called_once()
+        mock_lock.__enter__.assert_called_once()
+        mock_lock.__exit__.assert_called_once()

...
-        assert mock_lock.__enter__.call_count >= 1
+        assert mock_lock.__enter__.call_count >= 1
+        assert mock_lock.__exit__.call_count >= 1
+        assert mock_lock.__enter__.call_count == mock_lock.__exit__.call_count

...
-        mock_lock.__enter__.assert_called_once()
+        mock_lock.__enter__.assert_called_once()
+        mock_lock.__exit__.assert_called_once()

Also applies to: 179-193

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/core/llm/test_litellm_patch.py` around lines 93 - 94, The tests
currently only assert lock acquisition via
mock_lock.__enter__.assert_called_once(); update the assertions to also verify
the lock is released by asserting mock_lock.__exit__ was called once (and
optionally with the expected exception args (None, None, None) where
appropriate) in the same test(s) that reference mock_lock (e.g., the assertions
near mock_lock.__enter__ at the locations around lines 93 and 179/193); ensure
you add the corresponding mock_lock.__exit__ assertion after the code under test
so the test fails if the lock is not released.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lightspeed_evaluation/core/llm/litellm_patch.py`:
- Around line 146-184: The async context manager _vertex_override_async
currently acquires litellm_state_lock only inside the helper _apply and releases
it inside _restore, which lets the lock be dropped during the yield and causes
races; change the flow to acquire the lock (via await
asyncio.to_thread(litellm_state_lock.acquire)) before mutating
litellm.vertex_project / litellm.vertex_location, keep the lock held across the
yield, and then in the finally block release the lock (via await
asyncio.to_thread(litellm_state_lock.release)) after restoring the old values;
use the existing local variables for old_vp/old_vl (from _apply) and drop the
internal with-blocks so the lock lifetime spans the yield in
_vertex_override_async.

In `@tests/unit/core/llm/test_litellm_patch.py`:
- Around line 336-341: The test awaiting asyncio.gather(...) can hang
indefinitely; wrap the gather call in a timeout (e.g., using asyncio.wait_for or
asyncio.timeout) so the call to asyncio.gather(run_sync(...), run_async(...)) is
bounded (add a reasonable seconds constant) and let the test fail if the timeout
is reached; update the test that calls run_sync and run_async (the
asyncio.gather block) to use asyncio.wait_for(..., timeout=...) and assert/allow
the timeout to surface as a test failure.

---

Nitpick comments:
In `@tests/unit/core/llm/test_litellm_patch.py`:
- Around line 93-94: The tests currently only assert lock acquisition via
mock_lock.__enter__.assert_called_once(); update the assertions to also verify
the lock is released by asserting mock_lock.__exit__ was called once (and
optionally with the expected exception args (None, None, None) where
appropriate) in the same test(s) that reference mock_lock (e.g., the assertions
near mock_lock.__enter__ at the locations around lines 93 and 179/193); ensure
you add the corresponding mock_lock.__exit__ assertion after the code under test
so the test fails if the lock is not released.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f51547cd-ebc5-48f0-88a4-8a95fab49735

📥 Commits

Reviewing files that changed from the base of the PR and between 2369a0b and 0a37727.

📒 Files selected for processing (2)
  • src/lightspeed_evaluation/core/llm/litellm_patch.py
  • tests/unit/core/llm/test_litellm_patch.py

Comment thread src/lightspeed_evaluation/core/llm/litellm_patch.py Outdated
Comment thread tests/unit/core/llm/test_litellm_patch.py Outdated
nvola added 2 commits May 18, 2026 21:41
…indow

The async context manager acquired and released litellm_state_lock inside separate _apply/_restore helpers, leaving the lock unheld during the yield. A concurrent caller could overwrite vertex_project/ vertex_location globals in that window. Now acquire before mutating and release in finally, matching the sync version's lock lifetime.
Wrap the asyncio.gather call in asyncio.wait_for with a 10-second timeout so the test fails cleanly instead of hanging indefinitely if a deadlock occurs between sync and async vertex completions.
@nvola nvola marked this pull request as draft May 19, 2026 04:48
Add __exit__ assertions alongside existing __enter__ checks so the tests verify the lock is both acquired and released.
@nvola nvola marked this pull request as ready for review May 19, 2026 16:59
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lightspeed_evaluation/core/llm/litellm_patch.py`:
- Around line 158-162: The async _vertex_override context manager currently
checks vp/vl before acquiring litellm_state_lock, causing a race with sync
callers; change the async implementation to acquire litellm_state_lock first
(using the same async lock/context as the sync path), then pop vp/vl and perform
the None check while holding the lock, and only yield after the lock-protected
override setup so that litellm.vertex_project and litellm.vertex_location are
never observed in a partially-updated state; ensure the lock is released
correctly on exit just like the sync _vertex_override.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3a2515af-05ba-4f04-9dc1-407574e8bda4

📥 Commits

Reviewing files that changed from the base of the PR and between 0a37727 and 8d456a4.

📒 Files selected for processing (2)
  • src/lightspeed_evaluation/core/llm/litellm_patch.py
  • tests/unit/core/llm/test_litellm_patch.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/core/llm/test_litellm_patch.py

Comment on lines +158 to +162
vp = kwargs.pop("vertex_project", None)
vl = kwargs.pop("vertex_location", None)
if vp is None and vl is None:
yield
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Async no-op path skips lock, creating race condition with sync callers.

The sync _vertex_override holds litellm_state_lock for the entire duration including the no-op case (lines 127-132), but the async version checks vp is None and vl is None before acquiring the lock and yields without it. This allows an async no-op caller to proceed while a sync caller is modifying litellm.vertex_project/litellm.vertex_location, causing the async caller's underlying litellm call to observe partially-updated or temporary override values.

Move the lock acquisition before the None check to match the sync behavior:

🔧 Proposed fix
 async def _vertex_override_async(
     kwargs: dict[str, Any],
 ) -> AsyncGenerator[None, None]:
     ...
-    vp = kwargs.pop("vertex_project", None)
-    vl = kwargs.pop("vertex_location", None)
-    if vp is None and vl is None:
-        yield
-        return
-
     await asyncio.to_thread(litellm_state_lock.acquire)
-    old_vp = getattr(litellm, "vertex_project", None)
-    old_vl = getattr(litellm, "vertex_location", None)
     try:
+        vp = kwargs.pop("vertex_project", None)
+        vl = kwargs.pop("vertex_location", None)
+        if vp is None and vl is None:
+            yield
+            return
+        old_vp = getattr(litellm, "vertex_project", None)
+        old_vl = getattr(litellm, "vertex_location", None)
         if vp is not None:
             litellm.vertex_project = vp
         if vl is not None:
             litellm.vertex_location = vl
-        yield
-    finally:
-        litellm.vertex_project = old_vp
-        litellm.vertex_location = old_vl
+        try:
+            yield
+        finally:
+            litellm.vertex_project = old_vp
+            litellm.vertex_location = old_vl
+    finally:
         await asyncio.to_thread(litellm_state_lock.release)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lightspeed_evaluation/core/llm/litellm_patch.py` around lines 158 - 162,
The async _vertex_override context manager currently checks vp/vl before
acquiring litellm_state_lock, causing a race with sync callers; change the async
implementation to acquire litellm_state_lock first (using the same async
lock/context as the sync path), then pop vp/vl and perform the None check while
holding the lock, and only yield after the lock-protected override setup so that
litellm.vertex_project and litellm.vertex_location are never observed in a
partially-updated state; ensure the lock is released correctly on exit just like
the sync _vertex_override.

@nvola nvola marked this pull request as draft May 19, 2026 23:07
…close race

The async _vertex_override_async popped vertex_project/vertex_location from kwargs and checked for None before acquiring litellm_state_lock, allowing concurrent callers to observe partially-updated litellm globals. Move lock acquisition before the pop/check so the async path matches the sync _vertex_override, which already holds the lock from the start.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant