feat: add per-model Vertex AI region support for judge panel by nvola · Pull Request #240 · lightspeed-core/lightspeed-evaluation

nvola · 2026-05-18T23:46:27Z

Description

Vertex AI (Agent Platform) hosts models across different regions; for example, meta/llama-3.3-70b-instruct-maas may only be available in us-central1 while gemini-2.0-flash is served from europe-west1. When using a panel of various judges, it is possible to require use of vertex_location (and optionally vertex_project) to correctly query specific judge models.

Currently, litellm.drop_params=True (set by DeepEval) silently strips vertex_project and vertex_location from completion kwargs because they aren't in litellm's supported-params list for the provider. This means per-model region configuration is quietly ignored.

This PR adds a _vertex_override context manager to litellm_patch.py that intercepts these params before they reach litellm and temporarily sets them litellm_state_lock) and a complete no-op when no vertex params are present.

Usage Example

llm_pool:
  models:
    judge_vertex_gemini:
      provider: vertex_ai
      model: gemini-2.0-flash
      parameters:
        vertex_location: us-central1
    judge_vertex_llama:
      provider: vertex_ai
      model: meta/llama-3.3-70b-instruct-maas
      parameters:
        vertex_location: europe-west1

Changes

src/lightspeed_evaluation/core/llm/litellm_patch.py: Added _vertex_override context manager; integrated into existing completion/acompletion wrappers
tests/unit/core/llm/test_litellm_patch.py: 11 new tests covering: no-op path, location-only, project-only, both params, exception recovery, lock behavior, sync and async integration
config/system.yaml: Added commented example of per-model vertex_location

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Claude
Generated by: N/A

Related Tickets & Documents

Related Issue #
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

New Features
- Per-request region routing for Vertex AI judge models so calls can target a specific Vertex region on a per-call basis.
Documentation
- Added commented example configuration blocks showing how to declare Vertex AI judge models and specify per-model region settings.
Tests
- Added unit and integration-style tests verifying region-override behavior, parameter forwarding/removal, global state restoration, and safe sync/async interleaving.

litellm.drop_params=True (set by DeepEval) silently strips vertex_project and vertex_location from completion kwargs. This adds a _vertex_override context manager to litellm_patch.py that intercepts these params and temporarily sets them as litellm module-level attributes, which litellm checks as a fallback in its vertex_ai handler. Thread-safe and a no-op when no vertex params are present. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-05-18T23:46:40Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 57e339e6-f439-4ab3-8ef7-95623f036ecc

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

Adds per-request Vertex AI region overrides to LiteLLM completion wrappers: new sync/async context managers and a shared lock apply vertex_location/vertex_project from completion kwargs to litellm module attributes during the call and restore them afterward. Sync/async wrappers and tests are updated; config examples added.

Changes

Vertex AI per-model region support in LiteLLM wrapper

Layer / File(s)	Summary
Module docstring and imports `src/lightspeed_evaluation/core/llm/litellm_patch.py`	Document Vertex region support and add contextmanager/asynccontextmanager and typing imports.
Global lock and Vertex override context managers `src/lightspeed_evaluation/core/llm/litellm_patch.py`	Add `litellm_state_lock` and implement `_vertex_override` and `_vertex_override_async` that pop `vertex_location`/`vertex_project` from kwargs, apply them to `litellm` module attributes, and restore prior values in a finally block (async uses asyncio.to_thread for lock handling).
Sync and async completion wrapper integration `src/lightspeed_evaluation/core/llm/litellm_patch.py`	Execute original `litellm.completion` and `litellm.acompletion` inside the Vertex override contexts so per-request Vertex params are active during execution while preserving token tracking and exception logging.
Final patch wiring and embeddings `src/lightspeed_evaluation/core/llm/litellm_patch.py`	Retain module-level patching assignments for `litellm.completion`, `litellm.acompletion`, `litellm.embedding`, and `litellm.aembedding` after the new Vertex infrastructure; embedding wrappers unchanged.
Unit tests for Vertex override and integration `tests/unit/core/llm/test_litellm_patch.py`	Add sync and async tests for `_vertex_override` / `_vertex_override_async` (no-op, set/restore, kwargs removal, exception restore, lock acquisition), integration tests asserting completion wrappers strip Vertex kwargs when forwarding and restore module state, and concurrency/interleaving tests.
Configuration examples `config/system.yaml`	Add commented `llm_pool` model examples (`judge_vertex_gemini`, `judge_vertex_llama`) showing per-model `vertex_location` for Vertex AI.

🎯 4 (Complex) | ⏱️ ~40 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately captures the main feature: per-model Vertex AI region support for judge panels, matching the core objective of enabling region-specific model routing.
Docstring Coverage	✅ Passed	Docstring coverage is 92.86% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lightspeed_evaluation/core/llm/litellm_patch.py`:
- Around line 118-140: The current _vertex_override context manager skips
acquiring litellm_state_lock when no per-request Vertex params are present,
allowing a race where unmodified requests can read
litellm.vertex_project/litellm.vertex_location while another thread is mutating
them; always acquire the lock before reading or mutating these globals (i.e.,
move the lock acquisition to cover the pop/getattr checks and the set/restore
sequence in _vertex_override). Also stop using a blocking threading.Lock for
async usage: introduce an async-safe lock (e.g., an asyncio.Lock named
litellm_state_async_lock) and ensure any async code path uses an async context
manager that acquires this asyncio.Lock (and never holds either lock across an
await); keep separate sync (threading.Lock) and async locks and make code paths
use the appropriate one so no blocking occurs on the event loop and all accesses
to litellm.vertex_project and litellm.vertex_location are consistently
protected.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d43b6f27-7dbc-4527-a00b-11f24af79baa

📥 Commits

Reviewing files that changed from the base of the PR and between 07cee27 and e46d5e1.

📒 Files selected for processing (3)

config/system.yaml
src/lightspeed_evaluation/core/llm/litellm_patch.py
tests/unit/core/llm/test_litellm_patch.py

The sync context manager now always acquires litellm_state_lock, even when no vertex params are present so that non-vertex requests cannot read litellm.vertex_project/vertex_location while another thread is mutating them. An async variant (_vertex_override_async) backed by a new asyncio.Lock replaces the threading.Lock usage in the acompletion path, preventing the event loop from blocking during concurrent calls.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lightspeed_evaluation/core/llm/litellm_patch.py`:
- Around line 109-110: Replace the two module locks with a single
threading.Lock: remove litellm_state_async_lock and keep a single
litellm_state_lock (threading.Lock) to guard access to litellm.vertex_project
and litellm.vertex_location; update _vertex_override_async so it does not await
an asyncio.Lock but instead uses asyncio.to_thread to run a helper that acquires
litellm_state_lock, mutates the vertex fields, and releases the lock (so the
event loop is never bound to an asyncio.Lock); ensure all sync code that
currently uses litellm_state_lock continues to use that same lock and add a test
that exercises interleaved sync/async completions to verify no races.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 76851658-bcac-40ff-8f1b-4ff7776634d0

📥 Commits

Reviewing files that changed from the base of the PR and between e46d5e1 and 2369a0b.

📒 Files selected for processing (2)

src/lightspeed_evaluation/core/llm/litellm_patch.py
tests/unit/core/llm/test_litellm_patch.py

Replace the dual-lock scheme (threading.Lock + asyncio.Lock) with a single threading.Lock shared by both sync and async code paths. _vertex_override_async now uses asyncio.to_thread to acquire the lock in a thread-pool worker, avoiding event-loop blocking without needing a separate asyncio.Lock.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

tests/unit/core/llm/test_litellm_patch.py (1)

93-94: ⚡ Quick win

Also assert lock release, not only acquisition.

Line 93 and Line 179/193 only verify __enter__; this can miss regressions where the lock is acquired but not released.

Proposed fix

-        mock_lock.__enter__.assert_called_once()
+        mock_lock.__enter__.assert_called_once()
+        mock_lock.__exit__.assert_called_once()

...
-        assert mock_lock.__enter__.call_count >= 1
+        assert mock_lock.__enter__.call_count >= 1
+        assert mock_lock.__exit__.call_count >= 1
+        assert mock_lock.__enter__.call_count == mock_lock.__exit__.call_count

...
-        mock_lock.__enter__.assert_called_once()
+        mock_lock.__enter__.assert_called_once()
+        mock_lock.__exit__.assert_called_once()

Also applies to: 179-193

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/core/llm/test_litellm_patch.py` around lines 93 - 94, The tests
currently only assert lock acquisition via
mock_lock.__enter__.assert_called_once(); update the assertions to also verify
the lock is released by asserting mock_lock.__exit__ was called once (and
optionally with the expected exception args (None, None, None) where
appropriate) in the same test(s) that reference mock_lock (e.g., the assertions
near mock_lock.__enter__ at the locations around lines 93 and 179/193); ensure
you add the corresponding mock_lock.__exit__ assertion after the code under test
so the test fails if the lock is not released.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lightspeed_evaluation/core/llm/litellm_patch.py`:
- Around line 146-184: The async context manager _vertex_override_async
currently acquires litellm_state_lock only inside the helper _apply and releases
it inside _restore, which lets the lock be dropped during the yield and causes
races; change the flow to acquire the lock (via await
asyncio.to_thread(litellm_state_lock.acquire)) before mutating
litellm.vertex_project / litellm.vertex_location, keep the lock held across the
yield, and then in the finally block release the lock (via await
asyncio.to_thread(litellm_state_lock.release)) after restoring the old values;
use the existing local variables for old_vp/old_vl (from _apply) and drop the
internal with-blocks so the lock lifetime spans the yield in
_vertex_override_async.

In `@tests/unit/core/llm/test_litellm_patch.py`:
- Around line 336-341: The test awaiting asyncio.gather(...) can hang
indefinitely; wrap the gather call in a timeout (e.g., using asyncio.wait_for or
asyncio.timeout) so the call to asyncio.gather(run_sync(...), run_async(...)) is
bounded (add a reasonable seconds constant) and let the test fail if the timeout
is reached; update the test that calls run_sync and run_async (the
asyncio.gather block) to use asyncio.wait_for(..., timeout=...) and assert/allow
the timeout to surface as a test failure.

---

Nitpick comments:
In `@tests/unit/core/llm/test_litellm_patch.py`:
- Around line 93-94: The tests currently only assert lock acquisition via
mock_lock.__enter__.assert_called_once(); update the assertions to also verify
the lock is released by asserting mock_lock.__exit__ was called once (and
optionally with the expected exception args (None, None, None) where
appropriate) in the same test(s) that reference mock_lock (e.g., the assertions
near mock_lock.__enter__ at the locations around lines 93 and 179/193); ensure
you add the corresponding mock_lock.__exit__ assertion after the code under test
so the test fails if the lock is not released.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f51547cd-ebc5-48f0-88a4-8a95fab49735

📥 Commits

Reviewing files that changed from the base of the PR and between 2369a0b and 0a37727.

📒 Files selected for processing (2)

src/lightspeed_evaluation/core/llm/litellm_patch.py
tests/unit/core/llm/test_litellm_patch.py

…indow The async context manager acquired and released litellm_state_lock inside separate _apply/_restore helpers, leaving the lock unheld during the yield. A concurrent caller could overwrite vertex_project/ vertex_location globals in that window. Now acquire before mutating and release in finally, matching the sync version's lock lifetime.

Wrap the asyncio.gather call in asyncio.wait_for with a 10-second timeout so the test fails cleanly instead of hanging indefinitely if a deadlock occurs between sync and async vertex completions.

Add __exit__ assertions alongside existing __enter__ checks so the tests verify the lock is both acquired and released.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lightspeed_evaluation/core/llm/litellm_patch.py`:
- Around line 158-162: The async _vertex_override context manager currently
checks vp/vl before acquiring litellm_state_lock, causing a race with sync
callers; change the async implementation to acquire litellm_state_lock first
(using the same async lock/context as the sync path), then pop vp/vl and perform
the None check while holding the lock, and only yield after the lock-protected
override setup so that litellm.vertex_project and litellm.vertex_location are
never observed in a partially-updated state; ensure the lock is released
correctly on exit just like the sync _vertex_override.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3a2515af-05ba-4f04-9dc1-407574e8bda4

📥 Commits

Reviewing files that changed from the base of the PR and between 0a37727 and 8d456a4.

📒 Files selected for processing (2)

src/lightspeed_evaluation/core/llm/litellm_patch.py
tests/unit/core/llm/test_litellm_patch.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/unit/core/llm/test_litellm_patch.py

coderabbitai · 2026-05-19T17:02:12Z

+    vp = kwargs.pop("vertex_project", None)
+    vl = kwargs.pop("vertex_location", None)
+    if vp is None and vl is None:
+        yield
+        return


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Async no-op path skips lock, creating race condition with sync callers.

The sync _vertex_override holds litellm_state_lock for the entire duration including the no-op case (lines 127-132), but the async version checks vp is None and vl is None before acquiring the lock and yields without it. This allows an async no-op caller to proceed while a sync caller is modifying litellm.vertex_project/litellm.vertex_location, causing the async caller's underlying litellm call to observe partially-updated or temporary override values.

Move the lock acquisition before the None check to match the sync behavior:

🔧 Proposed fix

async def _vertex_override_async( kwargs: dict[str, Any], ) -> AsyncGenerator[None, None]: ... - vp = kwargs.pop("vertex_project", None) - vl = kwargs.pop("vertex_location", None) - if vp is None and vl is None: - yield - return - await asyncio.to_thread(litellm_state_lock.acquire) - old_vp = getattr(litellm, "vertex_project", None) - old_vl = getattr(litellm, "vertex_location", None) try: + vp = kwargs.pop("vertex_project", None) + vl = kwargs.pop("vertex_location", None) + if vp is None and vl is None: + yield + return + old_vp = getattr(litellm, "vertex_project", None) + old_vl = getattr(litellm, "vertex_location", None) if vp is not None: litellm.vertex_project = vp if vl is not None: litellm.vertex_location = vl - yield - finally: - litellm.vertex_project = old_vp - litellm.vertex_location = old_vl + try: + yield + finally: + litellm.vertex_project = old_vp + litellm.vertex_location = old_vl + finally: await asyncio.to_thread(litellm_state_lock.release)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lightspeed_evaluation/core/llm/litellm_patch.py` around lines 158 - 162, The async _vertex_override context manager currently checks vp/vl before acquiring litellm_state_lock, causing a race with sync callers; change the async implementation to acquire litellm_state_lock first (using the same async lock/context as the sync path), then pop vp/vl and perform the None check while holding the lock, and only yield after the lock-protected override setup so that litellm.vertex_project and litellm.vertex_location are never observed in a partially-updated state; ensure the lock is released correctly on exit just like the sync _vertex_override.

…close race The async _vertex_override_async popped vertex_project/vertex_location from kwargs and checked for None before acquiring litellm_state_lock, allowing concurrent callers to observe partially-updated litellm globals. Move lock acquisition before the pop/check so the async path matches the sync _vertex_override, which already holds the lock from the start.

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

Comment thread src/lightspeed_evaluation/core/llm/litellm_patch.py

coderabbitai Bot reviewed May 19, 2026

View reviewed changes

Comment thread src/lightspeed_evaluation/core/llm/litellm_patch.py Outdated

coderabbitai Bot reviewed May 19, 2026

View reviewed changes

Comment thread src/lightspeed_evaluation/core/llm/litellm_patch.py Outdated

Comment thread tests/unit/core/llm/test_litellm_patch.py Outdated

nvola added 2 commits May 18, 2026 21:41

test: add timeout to asyncio.gather in deadlock detection test

d9addea

Wrap the asyncio.gather call in asyncio.wait_for with a 10-second timeout so the test fails cleanly instead of hanging indefinitely if a deadlock occurs between sync and async vertex completions.

nvola marked this pull request as draft May 19, 2026 04:48

test: assert lock release in sync vertex override tests

8d456a4

Add __exit__ assertions alongside existing __enter__ checks so the tests verify the lock is both acquired and released.

nvola marked this pull request as ready for review May 19, 2026 16:59

coderabbitai Bot reviewed May 19, 2026

View reviewed changes

nvola marked this pull request as draft May 19, 2026 23:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add per-model Vertex AI region support for judge panel#240

feat: add per-model Vertex AI region support for judge panel#240
nvola wants to merge 7 commits into
lightspeed-core:mainfrom
nvola:vertex-per-model-region

nvola commented May 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 18, 2026 •

edited

Loading

Review skipped

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nvola commented May 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Usage Example

Changes

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nvola commented May 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 18, 2026 •

edited

Loading