Skip to content

feat(vertex): support embedding via :predict endpoint#4640

Open
vinci7 wants to merge 1 commit intoQuantumNous:mainfrom
vinci7:feat/vertex-embedding
Open

feat(vertex): support embedding via :predict endpoint#4640
vinci7 wants to merge 1 commit intoQuantumNous:mainfrom
vinci7:feat/vertex-embedding

Conversation

@vinci7
Copy link
Copy Markdown

@vinci7 vinci7 commented May 6, 2026

Summary

Implement Vertex AI embedding support by translating Gemini-format embedding requests into Vertex's :predict format and converting the response back to OpenAI format. Currently relay/channel/vertex/adaptor.go::ConvertEmbeddingRequest returns not implemented.

This is a re-submission of #2488 with two additional fixes that the original PR was missing:

  1. Routing covers OpenAI-compatible /v1/embeddings, not only Gemini-native :embedContent / :batchEmbedContents. The original PR's check was nested inside info.RelayMode == constant.RelayModeGemini, so OpenAI-compat embedding requests fell through to gemini.GeminiChatHandler.
  2. Response is converted to OpenAI format, not streamed verbatim. Vertex returns {"predictions":[{"embeddings":{"values":[...]}}]}, which OpenAI clients cannot parse. Output is now a proper dto.OpenAIEmbeddingResponse.

Changes

relay/channel/vertex/adaptor.go

  • URL builder appends :predict when model name contains embedding
  • ConvertEmbeddingRequest delegates to the existing gemini adaptor (which already handles the OpenAI→Gemini conversion)
  • DoRequest intercepts embedding requests and reshapes Gemini-format {content:{parts:[{text}]}, taskType, title, outputDimensionality} into Vertex-format {instances:[{content, task_type, title}], parameters:{outputDimensionality}}
  • DoResponse routes embedding responses through new vertexEmbeddingHandler via isVertexEmbedding(info) helper

relay/channel/vertex/relay-vertex.go

  • VertexEmbeddingResponse struct (parses predictions[].embeddings.values and statistics.token_count)
  • vertexEmbeddingHandler — converts Vertex response → dto.OpenAIEmbeddingResponse → writes back to client
  • isVertexEmbedding(info) helper — matches both URL path containing embed and embedding model name prefixes (gemini-embedding-*, text-embedding-*, text-multilingual-embedding-*)

All JSON operations use common.Marshal / common.Unmarshal per Rule 1. No changes to non-vertex code paths.

Test plan

Verified on a Vertex AI channel against the following models on us-central1 and global locations:

  • gemini-embedding-001 → 3072-dim vectors, OpenAI format
  • text-embedding-005 → 768-dim vectors
  • text-multilingual-embedding-002 → 768-dim vectors
  • Token usage logged correctly for billing

Sample request:

curl http://your-new-api/v1/embeddings \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model":"gemini-embedding-001","input":"hello"}'

Sample response:

{
  "object": "list",
  "data": [{"object":"embedding","index":0,"embedding":[-0.034, 0.011, ...]}],
  "model": "gemini-embedding-001",
  "usage": {"prompt_tokens": 1, "total_tokens": 1}
}

Related

Summary by CodeRabbit

  • New Features
    • Vertex AI embedding support now available with OpenAI-compatible response formatting
    • Embedding requests support optional output dimensionality configuration
    • Enhanced request routing and response handling for embedding operations

Implement Vertex AI embedding by translating Gemini-format embedding
requests into Vertex's :predict format and converting the response
back to OpenAI format.

This is a re-submission of QuantumNous#2488 with two additional fixes:

1. Routing covers OpenAI-compatible /v1/embeddings path, not only the
   Gemini-native :embedContent / :batchEmbedContents paths.
2. Response is converted from Vertex predict format
   ({"predictions":[{"embeddings":{"values":[...]}}]}) into
   OpenAIEmbeddingResponse so OpenAI clients can parse it.

Changes:
- vertex/adaptor.go:
  - URL builder appends :predict for any model name containing "embedding"
  - ConvertEmbeddingRequest delegates to gemini adaptor
  - DoRequest reshapes Gemini {content,parts,taskType,title,outputDimensionality}
    into Vertex {instances:[{content,task_type,title}], parameters:{outputDimensionality}}
  - DoResponse routes embedding responses to vertexEmbeddingHandler via
    new isVertexEmbedding(info) helper that matches both URL path and
    embedding model name prefixes
- vertex/relay-vertex.go:
  - VertexEmbeddingResponse struct
  - vertexEmbeddingHandler: parses predictions, converts to
    OpenAIEmbeddingResponse, writes back to client
  - isVertexEmbedding helper

All JSON ops use common.Marshal/Unmarshal per Rule 1.

Tested against gemini-embedding-001, text-embedding-005, and
text-multilingual-embedding-002 on us-central1 and global locations.

Closes-related-to: QuantumNous#2488 (auto-closed due to main force-push, never merged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

Walkthrough

This PR adds Vertex embedding support by introducing request/response handling for Vertex embedding models. ConvertEmbeddingRequest delegates to the Gemini adaptor, DoRequest rewrites Gemini embedding payloads into Vertex format, and DoResponse routes embeddings to a new vertexEmbeddingHandler that transforms Vertex responses to OpenAI-compatible format.

Changes

Vertex Embedding Support

Layer / File(s) Summary
Data Shape
relay/channel/vertex/relay-vertex.go
New public type VertexEmbeddingResponse models Vertex embedding API responses with predictions, embeddings, and metadata including token counts.
Request Routing & Transformation
relay/channel/vertex/adaptor.go
GetRequestURL treats embedding models (like imagen) for the predict path; DoRequest intercepts Gemini embedding requests and rewrites the body into Vertex embedding payloads (single and batch) with optional outputDimensionality; ConvertEmbeddingRequest delegates to Gemini adaptor.
Response Handling & Conversion
relay/channel/vertex/relay-vertex.go, relay/channel/vertex/adaptor.go
DoResponse routes vertex embeddings to new vertexEmbeddingHandler; vertexEmbeddingHandler unmarshals VertexEmbeddingResponse, transforms to OpenAI-like embedding format, aggregates token usage, and writes the HTTP response.
Wiring & Logging
relay/channel/vertex/adaptor.go, relay/channel/vertex/relay-vertex.go
Added bytes and logger imports; introduced isVertexEmbedding routing logic to determine when to apply vertex embedding handlers; added debug logging for transformed request bodies.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Router
    participant DoRequest as DoRequest<br/>(Payload Transform)
    participant VertexAPI as Vertex API
    participant DoResponse as DoResponse<br/>(Routing)
    participant Handler as vertexEmbeddingHandler<br/>(Transform)
    participant Return

    Client->>Router: Embedding request (Gemini model)
    Router->>DoRequest: Route based on GetRequestURL<br/>(embedding model detected)
    DoRequest->>DoRequest: ConvertEmbeddingRequest<br/>(delegate to Gemini adaptor)
    DoRequest->>DoRequest: Rewrite body to Vertex<br/>payload format
    DoRequest->>VertexAPI: Forward transformed request
    VertexAPI-->>DoResponse: Vertex embedding response
    DoResponse->>DoResponse: isVertexEmbedding check
    DoResponse->>Handler: Route to vertexEmbeddingHandler
    Handler->>Handler: Unmarshal VertexEmbeddingResponse
    Handler->>Handler: Transform to OpenAI format
    Handler->>Handler: Aggregate token usage
    Handler->>Return: Marshal & write response
    Return-->>Client: OpenAI-compatible embedding response
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly Related PRs

  • QuantumNous/new-api#1500: Modifies embedding request/response handling for Gemini (batch vs single) and relates to the payload transformation logic in this PR.
  • QuantumNous/new-api#1537: Directly related through shared embedding request/response flow modifications and routing to native embedding handlers.
  • QuantumNous/new-api#1834: Related through Gemini embedding detection and validation logic that integrates into Vertex embedding processing.

Poem

🐰 Embeddings now soar through Vertex's embrace,
From Gemini requests, transformed without trace,
Payloads are woven with dimensional grace,
OpenAI format wins the rabbit race!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and concisely describes the main change: adding Vertex embedding support via the :predict endpoint, which matches the primary objective of the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
relay/channel/vertex/adaptor.go (1)

329-329: 💤 Low value

Align embedding detection between request and response paths.

DoRequest keys off strings.Contains(c.Request.URL.Path, "embed") while isVertexEmbedding (used in DoResponse) additionally accepts embedding model-name prefixes (gemini-embedding, text-embedding, text-multilingual-embedding). If a request ever lands here with one of those models but a path that doesn't contain embed, DoResponse will route to vertexEmbeddingHandler while the body is never converted to Vertex instances/parameters, so Vertex would reject the call. Consider extracting a shared isVertexEmbedding(info) check and using it in both places to keep the pre/post conversion symmetric.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@relay/channel/vertex/adaptor.go` at line 329, The current DoRequest check
uses strings.Contains(c.Request.URL.Path, "embed") which is not consistent with
isVertexEmbedding(info) used in DoResponse and can cause asymmetric handling;
change DoRequest to call the same isVertexEmbedding(info) helper (or extract one
if not exported) instead of relying on c.Request.URL.Path so that when
isVertexEmbedding(info) returns true (matching model names like
"gemini-embedding", "text-embedding", "text-multilingual-embedding" or
RequestMode == RequestModeGemini) you run the same Vertex conversion logic that
produces Vertex instances/parameters before sending the request; update the
branch that currently references RequestMode and path to use
isVertexEmbedding(info) and ensure vertexEmbeddingHandler and any
body-conversion code are invoked the same way as DoResponse expects.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@relay/channel/vertex/adaptor.go`:
- Around line 328-391: The DoRequest embedding-rewrite silently swallows
Unmarshal and Marshal errors, causing confusing upstream 400s; update the blocks
in DoRequest (function Adaptor.DoRequest) so that when
common.Unmarshal(bodyBytes, &req) returns a non-nil error you immediately return
nil and that error (or a wrapped error with context like "gemini embedding
unmarshal"), and likewise check the error from common.Marshal(vertexReq) and
return it instead of ignoring it; reference the Unmarshal locations that handle
dto.GeminiBatchEmbeddingRequest and dto.GeminiEmbeddingRequest and the Marshal
call that produces newBodyBytes to make the changes.

In `@relay/channel/vertex/relay-vertex.go`:
- Around line 65-114: vertexEmbeddingHandler currently unmarshals resp.Body even
for non-2xx upstream responses, producing empty Predictions and returning 200;
add an early status check after reading the body (or immediately after defer) to
mirror the relay pattern: if resp.StatusCode != http.StatusOK call
service.RelayErrorHandler(c.Request.Context(), resp, false) and return its
result as the *types.NewAPIError so the handler stops and propagates the proper
error; update vertexEmbeddingHandler (and ensure this behavior applies before
unmarshalling into VertexEmbeddingResponse and before constructing the
OpenAIEmbeddingResponse/usage).

---

Nitpick comments:
In `@relay/channel/vertex/adaptor.go`:
- Line 329: The current DoRequest check uses
strings.Contains(c.Request.URL.Path, "embed") which is not consistent with
isVertexEmbedding(info) used in DoResponse and can cause asymmetric handling;
change DoRequest to call the same isVertexEmbedding(info) helper (or extract one
if not exported) instead of relying on c.Request.URL.Path so that when
isVertexEmbedding(info) returns true (matching model names like
"gemini-embedding", "text-embedding", "text-multilingual-embedding" or
RequestMode == RequestModeGemini) you run the same Vertex conversion logic that
produces Vertex instances/parameters before sending the request; update the
branch that currently references RequestMode and path to use
isVertexEmbedding(info) and ensure vertexEmbeddingHandler and any
body-conversion code are invoked the same way as DoResponse expects.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 72aa674a-0a0c-4db8-b1d4-c4e9bb7b949e

📥 Commits

Reviewing files that changed from the base of the PR and between 9acf5fe and 874f354.

📒 Files selected for processing (2)
  • relay/channel/vertex/adaptor.go
  • relay/channel/vertex/relay-vertex.go

Comment on lines 328 to 391
func (a *Adaptor) DoRequest(c *gin.Context, info *relaycommon.RelayInfo, requestBody io.Reader) (any, error) {
if a.RequestMode == RequestModeGemini && strings.Contains(c.Request.URL.Path, "embed") {
bodyBytes, err := io.ReadAll(requestBody)
if err != nil {
return nil, err
}

vertexReq := make(map[string]interface{})
instances := make([]interface{}, 0)

if info.IsGeminiBatchEmbedding {
var req dto.GeminiBatchEmbeddingRequest
if err := common.Unmarshal(bodyBytes, &req); err == nil {
for _, r := range req.Requests {
instance := make(map[string]interface{})
content := ""
for _, part := range r.Content.Parts {
if part.Text != "" {
content += part.Text
}
}
instance["content"] = content
if r.TaskType != "" {
instance["task_type"] = r.TaskType
}
if r.Title != "" {
instance["title"] = r.Title
}
instances = append(instances, instance)
}
}
} else {
var req dto.GeminiEmbeddingRequest
if err := common.Unmarshal(bodyBytes, &req); err == nil {
instance := make(map[string]interface{})
content := ""
for _, part := range req.Content.Parts {
if part.Text != "" {
content += part.Text
}
}
instance["content"] = content
if req.TaskType != "" {
instance["task_type"] = req.TaskType
}
if req.Title != "" {
instance["title"] = req.Title
}
instances = append(instances, instance)

if req.OutputDimensionality > 0 {
vertexReq["parameters"] = map[string]interface{}{
"outputDimensionality": req.OutputDimensionality,
}
}
}
}
vertexReq["instances"] = instances
newBodyBytes, _ := common.Marshal(vertexReq)
requestBody = bytes.NewReader(newBodyBytes)
logger.LogDebug(c, "Vertex Embedding request body: "+string(newBodyBytes))
}
return channel.DoApiRequest(a, c, info, requestBody)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Surface body-conversion errors instead of swallowing them.

The new embedding rewrite path silently ignores both unmarshal and marshal failures:

  • Lines 340 and 361: if err := common.Unmarshal(bodyBytes, &req); err == nil { ... } — when parsing fails, control just falls through and the code sends {"instances":[]} to Vertex. The client then sees a confusing upstream 400 INVALID_ARGUMENT: Should provide instances for text model prediction while the real parse error is gone.
  • Line 386: newBodyBytes, _ := common.Marshal(vertexReq) — if marshalling fails, an empty/nil body is forwarded silently.

Please return the error in both cases so misuse / schema drift is observable.

🛠️ Suggested fix
 		if info.IsGeminiBatchEmbedding {
 			var req dto.GeminiBatchEmbeddingRequest
-			if err := common.Unmarshal(bodyBytes, &req); err == nil {
-				for _, r := range req.Requests {
-					instance := make(map[string]interface{})
-					content := ""
-					for _, part := range r.Content.Parts {
-						if part.Text != "" {
-							content += part.Text
-						}
-					}
-					instance["content"] = content
-					if r.TaskType != "" {
-						instance["task_type"] = r.TaskType
-					}
-					if r.Title != "" {
-						instance["title"] = r.Title
-					}
-					instances = append(instances, instance)
-				}
-			}
+			if err := common.Unmarshal(bodyBytes, &req); err != nil {
+				return nil, fmt.Errorf("failed to parse gemini batch embedding request: %w", err)
+			}
+			for _, r := range req.Requests {
+				instance := make(map[string]interface{})
+				content := ""
+				for _, part := range r.Content.Parts {
+					if part.Text != "" {
+						content += part.Text
+					}
+				}
+				instance["content"] = content
+				if r.TaskType != "" {
+					instance["task_type"] = r.TaskType
+				}
+				if r.Title != "" {
+					instance["title"] = r.Title
+				}
+				instances = append(instances, instance)
+			}
 		} else {
 			var req dto.GeminiEmbeddingRequest
-			if err := common.Unmarshal(bodyBytes, &req); err == nil {
-				instance := make(map[string]interface{})
-				content := ""
-				for _, part := range req.Content.Parts {
-					if part.Text != "" {
-						content += part.Text
-					}
-				}
-				instance["content"] = content
-				if req.TaskType != "" {
-					instance["task_type"] = req.TaskType
-				}
-				if req.Title != "" {
-					instance["title"] = req.Title
-				}
-				instances = append(instances, instance)
-
-				if req.OutputDimensionality > 0 {
-					vertexReq["parameters"] = map[string]interface{}{
-						"outputDimensionality": req.OutputDimensionality,
-					}
-				}
-			}
+			if err := common.Unmarshal(bodyBytes, &req); err != nil {
+				return nil, fmt.Errorf("failed to parse gemini embedding request: %w", err)
+			}
+			instance := make(map[string]interface{})
+			content := ""
+			for _, part := range req.Content.Parts {
+				if part.Text != "" {
+					content += part.Text
+				}
+			}
+			instance["content"] = content
+			if req.TaskType != "" {
+				instance["task_type"] = req.TaskType
+			}
+			if req.Title != "" {
+				instance["title"] = req.Title
+			}
+			instances = append(instances, instance)
+
+			if req.OutputDimensionality > 0 {
+				vertexReq["parameters"] = map[string]interface{}{
+					"outputDimensionality": req.OutputDimensionality,
+				}
+			}
 		}
 		vertexReq["instances"] = instances
-		newBodyBytes, _ := common.Marshal(vertexReq)
+		newBodyBytes, err := common.Marshal(vertexReq)
+		if err != nil {
+			return nil, fmt.Errorf("failed to marshal vertex embedding request: %w", err)
+		}
 		requestBody = bytes.NewReader(newBodyBytes)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@relay/channel/vertex/adaptor.go` around lines 328 - 391, The DoRequest
embedding-rewrite silently swallows Unmarshal and Marshal errors, causing
confusing upstream 400s; update the blocks in DoRequest (function
Adaptor.DoRequest) so that when common.Unmarshal(bodyBytes, &req) returns a
non-nil error you immediately return nil and that error (or a wrapped error with
context like "gemini embedding unmarshal"), and likewise check the error from
common.Marshal(vertexReq) and return it instead of ignoring it; reference the
Unmarshal locations that handle dto.GeminiBatchEmbeddingRequest and
dto.GeminiEmbeddingRequest and the Marshal call that produces newBodyBytes to
make the changes.

Comment on lines +65 to +114
func vertexEmbeddingHandler(c *gin.Context, resp *http.Response, info *relaycommon.RelayInfo) (*dto.Usage, *types.NewAPIError) {
defer service.CloseResponseBodyGracefully(resp)

responseBody, err := io.ReadAll(resp.Body)
if err != nil {
return nil, types.NewOpenAIError(err, types.ErrorCodeBadResponseBody, http.StatusInternalServerError)
}

if common.DebugEnabled {
logger.LogDebug(c, "Vertex Embedding response body: "+string(responseBody))
}

var vertexResponse VertexEmbeddingResponse
if err := common.Unmarshal(responseBody, &vertexResponse); err != nil {
return nil, types.NewOpenAIError(err, types.ErrorCodeBadResponseBody, http.StatusInternalServerError)
}

openAIResponse := dto.OpenAIEmbeddingResponse{
Object: "list",
Data: make([]dto.OpenAIEmbeddingResponseItem, 0, len(vertexResponse.Predictions)),
Model: info.UpstreamModelName,
}

tokenCount := 0
for i, prediction := range vertexResponse.Predictions {
openAIResponse.Data = append(openAIResponse.Data, dto.OpenAIEmbeddingResponseItem{
Object: "embedding",
Embedding: prediction.Embeddings.Values,
Index: i,
})
tokenCount += prediction.Embeddings.Statistics.TokenCount
}

usage := &dto.Usage{
PromptTokens: tokenCount,
TotalTokens: tokenCount,
}
openAIResponse.Usage = *usage

jsonResponse, err := common.Marshal(openAIResponse)
if err != nil {
return nil, types.NewOpenAIError(err, types.ErrorCodeBadResponseBody, http.StatusInternalServerError)
}

c.Writer.Header().Set("Content-Type", "application/json")
c.Writer.WriteHeader(http.StatusOK)
_, _ = c.Writer.Write(jsonResponse)

return usage, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check whether other handlers in this repo guard on resp.StatusCode and how relay dispatches DoResponse.
rg -nP -C3 '\bresp\.StatusCode\b' relay/channel/vertex relay/channel/gemini
rg -nP -C5 'DoResponse\(' relay/relay_adaptor.go relay/relay_text.go relay/relay_embedding.go 2>/dev/null
fd -t f -e go . relay | xargs rg -nP -C2 'StatusCode\s*!=\s*http\.StatusOK' | head -n 80

Repository: QuantumNous/new-api

Length of output: 6210


Add status code check to vertexEmbeddingHandler.

vertexEmbeddingHandler unconditionally unmarshals resp.Body without checking resp.StatusCode. When Vertex returns an error (4xx/5xx), the body contains {"error":{...}} which silently unmarshals into a struct with empty Predictions. The client receives a 200 OK with an empty embedding list while the billing layer records 0 tokens for what was actually an upstream failure.

Add an early return on non-2xx status, mirroring the pattern used throughout the relay framework (e.g., relay/embedding_handler.go:72, relay/claude_handler.go:196):

if resp.StatusCode != http.StatusOK {
    newAPIError := service.RelayErrorHandler(c.Request.Context(), resp, false)
    return nil, newAPIError
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@relay/channel/vertex/relay-vertex.go` around lines 65 - 114,
vertexEmbeddingHandler currently unmarshals resp.Body even for non-2xx upstream
responses, producing empty Predictions and returning 200; add an early status
check after reading the body (or immediately after defer) to mirror the relay
pattern: if resp.StatusCode != http.StatusOK call
service.RelayErrorHandler(c.Request.Context(), resp, false) and return its
result as the *types.NewAPIError so the handler stops and propagates the proper
error; update vertexEmbeddingHandler (and ensure this behavior applies before
unmarshalling into VertexEmbeddingResponse and before constructing the
OpenAIEmbeddingResponse/usage).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant