Skip to content

[BUG] VertexAI embeddings cannot send task_type for gemini-embedding-001 #810

Description

@adamcooper

Basic checks

  • I searched existing issues - this hasn't been reported
  • I can reproduce this consistently
  • This is a RubyLLM bug, not my application code

What's broken?

RubyLLM's VertexAI embedding provider does not expose a way to send Vertex AI's per-request embedding task_type for gemini-embedding-001.

This matters because Vertex AI embedding task types select embeddings optimized for different use cases, such as SEMANTIC_SIMILARITY, RETRIEVAL_QUERY, and RETRIEVAL_DOCUMENT. If task_type is omitted, embeddings can be generated in a different space than the caller intended, with no obvious failure.

Current RubyLLM behaviour appears to be:

  • RubyLLM::Embedding.embed accepts and forwards dimensions:, but not task_type: or other provider-specific embedding params.
  • RubyLLM::Providers::VertexAI::Embeddings.render_embedding_payload builds instances as { content: ... } only, so task_type cannot reach the Vertex AI request body.

We had to carry a local monkey patch to thread task_type: through RubyLLM.embed and merge it into each VertexAI instances[] entry.

How to reproduce

  1. Configure RubyLLM for Vertex AI.
  2. Try to create an embedding that explicitly selects a Vertex AI task type:
RubyLLM.embed(
  "hello",
  model: "gemini-embedding-001",
  provider: :vertexai,
  dimensions: 1536,
  task_type: "SEMANTIC_SIMILARITY"
)
  1. Ruby raises because task_type: is not accepted by RubyLLM.embed.
  2. If the caller omits task_type: to avoid the unsupported keyword:
RubyLLM.embed(
  "hello",
  model: "gemini-embedding-001",
  provider: :vertexai,
  dimensions: 1536
)
  1. The request succeeds, but the Vertex AI payload cannot include the intended task type.

Expected behavior

RubyLLM should support forwarding VertexAI embedding task type information, for example:

RubyLLM.embed(
  "hello",
  model: "gemini-embedding-001",
  provider: :vertexai,
  dimensions: 1536,
  task_type: "SEMANTIC_SIMILARITY"
)

That should produce a VertexAI payload shaped like:

{
  "instances": [
    {
      "content": "hello",
      "task_type": "SEMANTIC_SIMILARITY"
    }
  ],
  "parameters": {
    "outputDimensionality": 1536
  }
}

For RETRIEVAL_DOCUMENT, it may also be worth supporting title, since Vertex's embedding docs expose that alongside task_type.

What actually happened

Passing task_type: to RubyLLM.embed is not currently supported:

ArgumentError: unknown keyword: :task_type

If task_type: is omitted, RubyLLM's VertexAI provider sends a payload with only content per instance:

{
  "instances": [
    {
      "content": "hello"
    }
  ],
  "parameters": {
    "outputDimensionality": 1536
  }
}

That means the request succeeds, but the selected embedding task type is silently lost.

Environment

  • Ruby version: 3.3.10
  • RubyLLM version: 1.16.0
  • Provider: VertexAI
  • Model: gemini-embedding-001
  • OS: macOS

AI Suggested Fix

A minimal fix would be to plumb provider-specific embedding options through the embedding path and let the VertexAI provider consume the ones it supports.

One possible shape:

# lib/ruby_llm/embedding.rb
def self.embed(text,
               model: nil,
               provider: nil,
               assume_model_exists: false,
               context: nil,
               dimensions: nil,
               **provider_params)
  # existing setup...

  RubyLLM.instrument('embedding.ruby_llm', payload, config: config) do |event|
    result = provider_instance.embed(
      text,
      model: model_id,
      dimensions: dimensions,
      **provider_params
    )

    # existing instrumentation...
    result
  end
end
# lib/ruby_llm/providers/vertexai/embeddings.rb
def render_embedding_payload(text, model:, dimensions:, task_type: nil, title: nil)
  instances = [text].flatten.map do |t|
    instance = { content: t.to_s }
    instance[:task_type] = task_type if task_type
    instance[:title] = title if title
    instance
  end

  { instances: instances }.tap do |payload|
    payload[:parameters] = { outputDimensionality: dimensions } if dimensions
  end
end

The provider's embed method may also need to forward these kwargs into render_embedding_payload, depending on the current provider protocol method signature.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions