fix:新增vLLM Embedding提供商,解决新版vllm部署bge-m3这类无法瘦身的embedding模型时,不允许传入dimensions参数,而原有的OpenAI Embedding会强制传入向量维度参数导致对vllm embedding的请求失败的问题。#8236
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces significant updates to the AstrBot configuration and provider management, including the addition of new embedding providers (NVIDIA, Ollama, vLLM) and support for MiniMax Token Plan. It also refactors the embedding dimension handling to improve compatibility with vLLM, which does not support the dimensions parameter. My review identified several areas for improvement: the manual dimension input requirement in the dashboard should be refined to support automatic filling for standard providers, redundant logic for dimension inference and error handling in embedding sources should be refactored into shared utilities, and a type mismatch in the vLLM embedding configuration template needs to be corrected to ensure successful validation.
| //[已禁用] 不再自动写入配置文件,仅显示提示 | ||
| // providerConfig.embedding_dimensions = response.data.data.embedding_dimensions | ||
| useToast().success("获取成功: " + response.data.data.embedding_dimensions) | ||
| useToast().info(`检测到维度: ${response.data.data.embedding_dimensions}。如需保存,请手动填入后点保存。`) |
| model_dims = { | ||
| "bge-m3": 1024, | ||
| "bge-large-en-v1.5": 1024, | ||
| "bge-large-zh-v1.5": 1024, | ||
| "text-embedding-3-small": 1536, | ||
| "text-embedding-3-large": 3072, | ||
| "text-embedding-ada-002": 1536, | ||
| } | ||
| for model_key, dim in model_dims.items(): |
There was a problem hiding this comment.
此处定义的 model_dims 字典及其相关的维度推断逻辑与 vllm_embedding_source.py 中的 _COMMON_MODEL_DIMENSIONS 完全重复。根据通用规则,建议将此逻辑提取到公共工具模块中,以避免代码重复并提高可维护性。
References
- When implementing similar functionality for different cases (e.g., direct vs. quoted attachments), refactor the logic into a shared helper function to avoid code duplication.
| except Exception as e: | ||
| # 如果包含"matryoshka"或"dimensions"相关的错误,说明vLLM不支持该参数 | ||
| # 尝试不带dimensions重试 | ||
| error_msg = str(e).lower() | ||
| if ("matryoshka" in error_msg or "dimensions" in error_msg) and kwargs.get("dimensions"): | ||
| logger.warning( | ||
| f"[OpenAI Embedding] Detected vLLM dimensions error, retrying without dimensions parameter: {e}" | ||
| ) | ||
| kwargs_retry = {k: v for k, v in kwargs.items() if k != "dimensions"} | ||
| try: | ||
| embedding = await self.client.embeddings.create( | ||
| input=text, | ||
| model=self.model, | ||
| **kwargs_retry, | ||
| ) | ||
| logger.info( | ||
| "[OpenAI Embedding] Successfully retrieved embedding without dimensions parameter, marking as vLLM" | ||
| ) | ||
| # 标记为vLLM以便后续调用也跳过dimensions | ||
| self._mark_as_vllm() | ||
| return embedding.data[0].embedding | ||
| except Exception as retry_error: | ||
| logger.error( | ||
| f"[OpenAI Embedding] Retry without dimensions also failed: {retry_error}" | ||
| ) | ||
| raise retry_error | ||
| else: | ||
| raise | ||
|
|
||
| async def get_embeddings(self, text: list[str]) -> list[list[float]]: | ||
| """批量获取文本的嵌入""" | ||
| kwargs = self._embedding_kwargs() | ||
| embeddings = await self.client.embeddings.create( | ||
| input=text, | ||
| model=self.model, | ||
| **kwargs, | ||
| ) | ||
| return [item.embedding for item in embeddings.data] | ||
| try: | ||
| embeddings = await self.client.embeddings.create( | ||
| input=text, | ||
| model=self.model, | ||
| **kwargs, | ||
| ) | ||
| return [item.embedding for item in embeddings.data] | ||
| except Exception as e: | ||
| # 如果包含"matryoshka"或"dimensions"相关的错误,说明vLLM不支持该参数 | ||
| # 尝试不带dimensions重试 | ||
| error_msg = str(e).lower() | ||
| if ("matryoshka" in error_msg or "dimensions" in error_msg) and kwargs.get("dimensions"): | ||
| logger.warning( | ||
| f"[OpenAI Embedding] Detected vLLM dimensions error in batch mode, retrying without dimensions: {e}" | ||
| ) | ||
| kwargs_retry = {k: v for k, v in kwargs.items() if k != "dimensions"} | ||
| try: | ||
| embeddings = await self.client.embeddings.create( | ||
| input=text, | ||
| model=self.model, | ||
| **kwargs_retry, | ||
| ) | ||
| logger.info( | ||
| "[OpenAI Embedding] Successfully retrieved batch embeddings without dimensions parameter" | ||
| ) | ||
| # 标记为vLLM以便后续调用也跳过dimensions | ||
| self._mark_as_vllm() | ||
| return [item.embedding for item in embeddings.data] | ||
| except Exception as retry_error: | ||
| logger.error( | ||
| f"[OpenAI Embedding] Batch retry without dimensions also failed: {retry_error}" | ||
| ) | ||
| raise retry_error | ||
| else: | ||
| raise |
There was a problem hiding this comment.
get_embedding 和 get_embeddings 方法中针对 vLLM 维度错误的捕获、日志记录及重试逻辑高度重复。建议将这部分逻辑提取为一个通用的私有辅助方法(如 _request_with_vllm_retry),以减少冗余代码。
References
- When implementing similar functionality for different cases (e.g., direct vs. quoted attachments), refactor the logic into a shared helper function to avoid code duplication.
| "embedding_api_key": "", | ||
| "embedding_api_base": "", | ||
| "embedding_model": "", | ||
| "embedding_dimensions": "", |
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- This PR mixes the new vLLM Embedding provider with a large number of unrelated default-config changes (dashboard password handling, enabling many channels by default, new CUA/sandbox and websearch options, wording tweaks, etc.); consider splitting the provider work into a focused PR so behavior changes to existing deployments are easier to review and roll back.
- You now both add a dedicated
vllm_embeddingprovider and add vLLM auto-detection/dimensionsworkarounds intoopenai_embedding_source.py(including the magic API key value"vllm"); it would be clearer and less surprising to keep vLLM-specific behavior confined to the new provider or explicitly document why OpenAI-style providers should also try to mutate behavior based on naming heuristics. - The new
vLLM Embeddingtemplatehintis a hard-coded Chinese string instead of an i18n key like the nearby providers; aligning it with the existing localization system (and adding the translation entry) will keep the UI consistent across languages.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- This PR mixes the new vLLM Embedding provider with a large number of unrelated default-config changes (dashboard password handling, enabling many channels by default, new CUA/sandbox and websearch options, wording tweaks, etc.); consider splitting the provider work into a focused PR so behavior changes to existing deployments are easier to review and roll back.
- You now both add a dedicated `vllm_embedding` provider and add vLLM auto-detection/`dimensions` workarounds into `openai_embedding_source.py` (including the magic API key value `"vllm"`); it would be clearer and less surprising to keep vLLM-specific behavior confined to the new provider or explicitly document why OpenAI-style providers should also try to mutate behavior based on naming heuristics.
- The new `vLLM Embedding` template `hint` is a hard-coded Chinese string instead of an i18n key like the nearby providers; aligning it with the existing localization system (and adding the translation entry) will keep the UI consistent across languages.
## Individual Comments
### Comment 1
<location path="astrbot/core/provider/sources/openai_embedding_source.py" line_range="83-63" />
<code_context>
- **kwargs,
- )
- return [item.embedding for item in embeddings.data]
+ try:
+ embeddings = await self.client.embeddings.create(
+ input=text,
+ model=self.model,
+ **kwargs,
+ )
+ return [item.embedding for item in embeddings.data]
+ except Exception as e:
+ # 如果包含"matryoshka"或"dimensions"相关的错误,说明vLLM不支持该参数
</code_context>
<issue_to_address>
**suggestion:** The vLLM `dimensions` retry logic is duplicated between single and batch embedding methods.
The vLLM-specific error handling (`matryoshka`/`dimensions` detection, stripping `dimensions` from kwargs, retry, and marking as vLLM) is duplicated in both `get_embedding` and `get_embeddings`. Please extract this into a shared helper (e.g., `_create_embeddings_with_optional_dimensions_retry(input, kwargs)`) that both methods call so vLLM compatibility logic is maintained in one place.
Suggested implementation:
```python
async def _create_embeddings_with_optional_dimensions_retry(
self,
input: Any,
**kwargs: Any,
):
"""
调用 embeddings 接口,并在检测到 vLLM 对 dimensions/matryoshka 不兼容时,
去掉 dimensions 参数重试一次。
"""
try:
return await self.client.embeddings.create(
input=input,
model=self.model,
**kwargs,
)
except Exception as e:
# 如果包含"matryoshka"或"dimensions"相关的错误,说明vLLM不支持该参数
# 尝试不带dimensions重试
error_msg = str(e).lower()
if ("matryoshka" in error_msg or "dimensions" in error_msg) and kwargs.get("dimensions"):
logger.warning(
f"[OpenAI Embedding] Detected vLLM dimensions error, retrying without dimensions parameter: {e}"
)
kwargs_retry = {k: v for k, v in kwargs.items() if k != "dimensions"}
# 如果需要标记为 vLLM 后端,可以在这里设置状态位,例如:self._is_vllm = True
return await self.client.embeddings.create(
input=input,
model=self.model,
**kwargs_retry,
)
# 不是 vLLM 相关错误则继续抛出
raise
async def get_embedding(self, text: str) -> list[float]:
"""获取文本的嵌入"""
kwargs = self._embedding_kwargs()
embeddings = await self._create_embeddings_with_optional_dimensions_retry(
input=text,
**kwargs,
)
return embeddings.data[0].embedding
```
1. 在文件顶部确保已经导入 `Any`(如果还没有的话),例如:
- `from typing import Any` 或 `import typing as t` 并把签名改为 `input: t.Any, **kwargs: t.Any` 以符合项目现有的类型导入风格。
2. 在批量嵌入方法(通常名为 `async def get_embeddings(...)` 或类似)中,删除当前的 `try/except` vLLM 处理逻辑,并改为:
- 构造 `kwargs = self._embedding_kwargs()`
- 调用 `embeddings = await self._create_embeddings_with_optional_dimensions_retry(input=texts, **kwargs)`
- 返回 `[item.embedding for item in embeddings.data]`
3. 如果类中已有“标记为 vLLM 后端”的逻辑(例如 `self._is_vllm` 或 `self._mark_vllm_backend()`),请将该标记逻辑集中到 `_create_embeddings_with_optional_dimensions_retry` 的 vLLM 分支中,以避免在单条和批量方法中重复设置。
</issue_to_address>
### Comment 2
<location path="astrbot/dashboard/routes/config.py" line_range="914" />
<code_context>
+ if "matryoshka" in err_msg or "dimensions" in err_msg:
+ logger.info("Detected vLLM specific error, bypassing...")
+ # 伪造一个成功的响应,告知前端进入"兼容模式"
+ return Response().ok({"embedding_dimensions": "vLLM-Adaptive"}).__dict__
return Response().error(f"获取嵌入维度失败: {e!s}").__dict__
</code_context>
<issue_to_address>
**issue (bug_risk):** Embedding dimension API now returns a string sentinel, which may break consumers expecting an integer.
The new vLLM path returns `{"embedding_dimensions": "vLLM-Adaptive"}`, changing the type from number to string. While the Vue config UI handles this, other consumers of `/api/config/provider/get_embedding_dim` may still expect an integer (e.g. schema validation or `int(...)` casts) and break.
To avoid that, consider keeping `embedding_dimensions` numeric and conveying the adaptive mode via a separate field, for example:
- `{ embedding_dimensions: 0, vllm_adaptive: true }`, or
- `{ embedding_dimensions: 0, mode: "vllm_adaptive" }`.
This preserves the existing type contract while still signaling vLLM adaptive behavior.
</issue_to_address>
### Comment 3
<location path="astrbot/core/provider/sources/openai_embedding_source.py" line_range="80" />
<code_context>
+ self._is_vllm_detected = True
+ logger.info("[OpenAI Embedding] Marked as vLLM (runtime detection via error)")
async def get_embedding(self, text: str) -> list[float]:
"""获取文本的嵌入"""
kwargs = self._embedding_kwargs()
</code_context>
<issue_to_address>
**issue (complexity):** Consider refactoring the embedding provider to centralize vLLM handling, retry logic, and dimension inference into shared helpers to avoid duplication and noisy logging.
You can keep all the new behavior but significantly reduce complexity with a small refactor inside this provider.
### 1. Deduplicate vLLM retry / error handling
`get_embedding` and `get_embeddings` are almost identical apart from the input/output shape. You can move the try/except + retry + `_mark_as_vllm` logic into a single helper:
```python
from typing import Any, Sequence
async def _create_embeddings(self, input: Any) -> Sequence[Sequence[float]]:
kwargs = self._embedding_kwargs()
try:
resp = await self.client.embeddings.create(
input=input,
model=self.model,
**kwargs,
)
return [item.embedding for item in resp.data]
except Exception as e:
error_msg = str(e).lower()
if ("matryoshka" in error_msg or "dimensions" in error_msg) and kwargs.get("dimensions"):
logger.warning(
f"[OpenAI Embedding] Detected vLLM dimensions error, retrying without dimensions: {e}"
)
kwargs_retry = {k: v for k, v in kwargs.items() if k != "dimensions"}
try:
resp = await self.client.embeddings.create(
input=input,
model=self.model,
**kwargs_retry,
)
self._mark_as_vllm()
logger.info(
"[OpenAI Embedding] Successfully retrieved embeddings without dimensions parameter, marking as vLLM"
)
return [item.embedding for item in resp.data]
except Exception as retry_error:
logger.error(
f"[OpenAI Embedding] Retry without dimensions also failed: {retry_error}"
)
raise retry_error
raise
async def get_embedding(self, text: str) -> list[float]:
return (await self._create_embeddings(text))[0]
async def get_embeddings(self, text: list[str]) -> list[list[float]]:
return list(await self._create_embeddings(text))
```
This keeps all existing behavior (including runtime vLLM detection) but centralizes it.
### 2. Centralize vLLM detection within `_embedding_kwargs`
Right now vLLM heuristics live in `_is_vllm`, `_mark_as_vllm`, `_embedding_kwargs`, and the error handler. You can at least confine detection/decision to `_is_vllm` and `_mark_as_vllm`, and keep `_embedding_kwargs` “dumb”:
```python
def _embedding_kwargs(self) -> dict:
kwargs: dict[str, Any] = {}
embedding_dim_config = self.provider_config.get("embedding_dimensions", "")
provider_id = self.provider_config.get("id", "unknown")
if self._is_vllm():
# vLLM never gets dimensions here
logger.debug(
f"[OpenAI Embedding] {provider_id}: vLLM detected, skipping dimensions (config='{embedding_dim_config}')"
)
return kwargs
if embedding_dim_config:
try:
dim_value = int(embedding_dim_config)
kwargs["dimensions"] = dim_value
logger.debug(
f"[OpenAI Embedding] {provider_id}: Added dimensions parameter: {dim_value}"
)
except (ValueError, TypeError):
logger.warning(
f"[OpenAI Embedding] {provider_id}: embedding_dimensions is not a valid integer: "
f"'{embedding_dim_config}', ignored."
)
return kwargs
```
All call sites (`get_embedding`, `get_embeddings`, `_create_embeddings`) simply call `_embedding_kwargs` without doing any extra vLLM-specific branching.
### 3. Reduce noisy logging in hot paths and share the model→dimension map
`_embedding_kwargs` and `get_dim` are on the critical path. Most of those `info` logs can be `debug`, and the model dimension map can be shared to keep behavior consistent with other providers:
```python
# module-level shared map
_MODEL_DIMS = {
"bge-m3": 1024,
"bge-large-en-v1.5": 1024,
"bge-large-zh-v1.5": 1024,
"text-embedding-3-small": 1536,
"text-embedding-3-large": 3072,
"text-embedding-ada-002": 1536,
}
def get_dim(self) -> int:
provider_id = self.provider_config.get("id", "unknown")
embedding_dim_config = self.provider_config.get("embedding_dimensions", "")
if embedding_dim_config:
try:
dim = int(embedding_dim_config)
if dim > 0:
logger.debug(
f"[OpenAI Embedding] {provider_id}: Dimension from config: {dim}"
)
return dim
except (ValueError, TypeError):
logger.warning(
f"[OpenAI Embedding] {provider_id}: embedding_dimensions is not a valid integer: "
f"'{embedding_dim_config}', trying model inference"
)
model = self.provider_config.get("embedding_model", "").lower()
for model_key, dim in _MODEL_DIMS.items():
if model_key in model:
logger.debug(
f"[OpenAI Embedding] {provider_id}: Inferred dimension {dim} from model: {model}"
)
return dim
logger.warning(
f"[OpenAI Embedding] {provider_id}: Could not determine dimension "
f"(model: {model}, config: '{embedding_dim_config}')"
)
return 0
```
This keeps all current functionality (vLLM detection, auto-dim inference, retry behavior) but shrinks the mental surface area of the class and makes future changes (e.g., adjusting retry heuristics or model dims) safer and localized.
</issue_to_address>
### Comment 4
<location path="astrbot/core/provider/sources/vllm_embedding_source.py" line_range="33" />
<code_context>
+ provider_type=ProviderType.EMBEDDING,
+ provider_display_name="vLLM Embedding",
+)
+class VLLMEmbeddingProvider(EmbeddingProvider):
+ def __init__(self, provider_config: dict, provider_settings: dict) -> None:
+ super().__init__(provider_config, provider_settings)
</code_context>
<issue_to_address>
**issue (complexity):** Consider refactoring transport setup, model resolution, dimension inference, and logging into smaller shared utilities and linear flows to keep the provider’s behavior while making it easier to follow and maintain.
You can keep all current behavior while reducing complexity by factoring a few concerns out and simplifying some flows.
### 1. Simplify transport selection & runtime swap
You already compute `_force_direct_transport` in `__init__`, but `_ensure_runtime_ready` recomputes `_should_force_direct_transport()` and duplicates client‑construction logic.
You can make transport selection a one‑liner and centralize client creation, which removes cross‑method branching while preserving heuristics:
```python
def __init__(self, provider_config: dict, provider_settings: dict) -> None:
super().__init__(provider_config, provider_settings)
self.provider_config = provider_config
self.provider_settings = provider_settings
self.timeout = int(provider_config.get("timeout", 20) or 20)
self.model = str(provider_config.get("embedding_model", "") or "").strip()
self.set_model(self.model)
self._force_direct_transport = self._should_force_direct_transport()
self._direct_client_ready = self._force_direct_transport
self._detected_dimension: int | None = None
self._resolved_request_model: str | None = None
self.client = self._build_openai_client(force_direct=self._force_direct_transport)
def _build_openai_client(self, force_direct: bool) -> AsyncOpenAI:
return AsyncOpenAI(
api_key=self.provider_config.get("embedding_api_key"),
base_url=self._effective_api_base(),
timeout=self.timeout,
http_client=self._build_http_client(force_direct=force_direct),
)
def _build_http_client(self, force_direct: bool) -> httpx.AsyncClient | None:
proxy = str(self.provider_config.get("proxy", "") or "").strip()
if proxy:
logger.info("[vLLM Embedding] %s 使用显式代理: %s", self._provider_id(), proxy)
return httpx.AsyncClient(proxy=proxy, timeout=self.timeout)
if force_direct:
return httpx.AsyncClient(timeout=self.timeout, trust_env=False)
return None
async def _ensure_runtime_ready(self) -> None:
if self._direct_client_ready or not self._force_direct_transport:
return
old_client = self.client
self.client = self._build_openai_client(force_direct=True)
self._direct_client_ready = True
logger.info(
"[vLLM Embedding] %s 检测到本地/内网端点,已切换为 trust_env=False 的直连 client。",
self._provider_id(),
)
if old_client is not None and old_client is not self.client:
try:
await old_client.close()
except Exception:
logger.debug("[vLLM Embedding] %s 关闭旧 client 失败,已忽略。", self._provider_id())
```
This removes the second `_should_force_direct_transport()` call and keeps all behavior the same.
### 2. Share embedding dimension inference
`_COMMON_MODEL_DIMENSIONS` and `_infer_dimension_from_model` are likely identical to the OpenAI provider. You can move them to a shared utility to avoid duplication and keep the logic in one place:
```python
# embedding_dimensions.py (new module)
_COMMON_MODEL_DIMENSIONS = {
"bge-m3": 1024,
"bge-large-en-v1.5": 1024,
"bge-large-zh-v1.5": 1024,
"text-embedding-3-small": 1536,
"text-embedding-3-large": 3072,
"text-embedding-ada-002": 1536,
}
def infer_dimension_from_model(model_name: Any) -> int | None:
normalized_model = str(model_name or "").strip().lower()
for model_key, dimension in _COMMON_MODEL_DIMENSIONS.items():
if model_key in normalized_model:
return dimension
return None
```
Then in this provider:
```python
from .embedding_dimensions import infer_dimension_from_model
def get_dim(self) -> int:
configured_dim = self._configured_dimension()
if configured_dim:
return configured_dim
if self._detected_dimension:
return self._detected_dimension
inferred_dim = infer_dimension_from_model(self.model)
return inferred_dim or 0
```
This keeps behavior while reducing duplication and maintenance cost.
### 3. Reduce perceived complexity in logging
The current `info` logs are quite verbose and fire on every request. You can keep the diagnostics but move most of them to `debug` so normal logs are cleaner:
```python
async def get_embedding(self, text: str) -> list[float]:
await self._ensure_runtime_ready()
request_model = await self._resolve_request_model()
logger.debug(
"[vLLM Embedding] %s 单条 embedding 请求,model=%s,text_len=%s,跳过 dimensions。",
self._provider_id(),
request_model,
len(text),
)
...
```
Keep `info`/`warning` only for misconfigurations or fallbacks (`/models` failure, basename fallback, client switch), which makes the class easier to reason about in normal operation.
### 4. Make model resolution flow more linear
You can keep the `/models` robustness but tighten the control flow in `_resolve_request_model` to a straight, small decision tree:
```python
async def _resolve_request_model(self) -> str:
if self._resolved_request_model is not None:
return self._resolved_request_model
configured_model = self.model
if not configured_model:
self._resolved_request_model = ""
return ""
available_models = await self._list_vllm_models()
resolved = self._match_served_model(configured_model, available_models)
if resolved:
self._resolved_request_model = resolved
if resolved != configured_model:
logger.info(
"[vLLM Embedding] %s 已将模型名 %s 对齐到 served-model-name %s。",
self._provider_id(),
configured_model,
resolved,
)
return resolved
basename_model = configured_model.rsplit("/", 1)[-1].strip()
if basename_model and basename_model != configured_model:
logger.warning(
"[vLLM Embedding] %s 未能从 /models 精确匹配 %s,回退为 %s。",
self._provider_id(),
configured_model,
basename_model,
)
self._resolved_request_model = basename_model
return basename_model
self._resolved_request_model = configured_model
return configured_model
```
This doesn’t change the behavior, but it reads as a single linear decision with clear ordering and fewer early returns, which makes the model‑name reconciliation easier to follow.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| if "matryoshka" in err_msg or "dimensions" in err_msg: | ||
| logger.info("Detected vLLM specific error, bypassing...") | ||
| # 伪造一个成功的响应,告知前端进入"兼容模式" | ||
| return Response().ok({"embedding_dimensions": "vLLM-Adaptive"}).__dict__ |
There was a problem hiding this comment.
issue (bug_risk): Embedding dimension API now returns a string sentinel, which may break consumers expecting an integer.
The new vLLM path returns {"embedding_dimensions": "vLLM-Adaptive"}, changing the type from number to string. While the Vue config UI handles this, other consumers of /api/config/provider/get_embedding_dim may still expect an integer (e.g. schema validation or int(...) casts) and break.
To avoid that, consider keeping embedding_dimensions numeric and conveying the adaptive mode via a separate field, for example:
{ embedding_dimensions: 0, vllm_adaptive: true }, or{ embedding_dimensions: 0, mode: "vllm_adaptive" }.
This preserves the existing type contract while still signaling vLLM adaptive behavior.
This PR adds a dedicated built-in
vllm_embeddingprovider to AstrBot and exposes it as a first-class Embedding provider in the WebUI.Before this change, users who wanted to use vLLM's OpenAI-compatible Embedding endpoint had to either reuse
openai_embeddingor rely on extra runtime patch/plugin logic. In practice, that caused several problems:vLLM Embeddingprovider card in the Add Provider dialog;dimensionshabit is not always compatible with vLLM embedding endpoints;served-model-name, which is not obvious from the existing provider options;这个 PR 为 AstrBot 新增了一个内置的


vllm_embedding提供商,并在 WebUI 中将其作为独立的 Embedding 提供商暴露出来。在此之前,如果用户想使用 vLLM 的 OpenAI-compatible Embedding 接口,通常只能复用
openai_embedding或依赖额外的运行时 patch / 插件逻辑,实际会带来几个问题:vLLM Embedding卡片;dimensions传参习惯并不总是兼容 vLLM embedding 接口;served-model-name之间往往需要额外对齐;Modifications / 改动点
Added a new built-in provider source:
astrbot/core/provider/sources/vllm_embedding_source.py.Added
vllm_embeddingimport wiring inastrbot/core/provider/manager.py, so AstrBot can load the provider type through the normal core provider path.Added a
vLLM Embeddingprovider template inastrbot/core/config/default.py, so the provider appears underEmbeddingin the Add Provider dialog.Normalized the default values of the new provider template: keep
id = vllm_embeddingandtimeout = 20, while leaving other visible fields blank and defaultingenabletofalse.Implemented the provider behavior specifically for vLLM embedding compatibility instead of treating vLLM as a disguised OpenAI embedding provider:
dimensionsrequest parameter when sending embedding requests;embedding_api_baseto the expected/v1style endpoint;served-model-namevia/models;Included the supporting config/UI changes already present on this branch so provider hints are surfaced more clearly in the configuration form:
astrbot/dashboard/routes/config.pydashboard/src/components/shared/AstrBotConfig.vuedashboard/src/i18n/locales/zh-CN/features/config-metadata.jsonThis change makes
vLLM Embeddinga first-class built-in provider instead of requiring users to overloadopenai_embeddingfor vLLM.新增内置 provider 文件:
astrbot/core/provider/sources/vllm_embedding_source.py。在
astrbot/core/provider/manager.py中加入vllm_embedding的导入分发逻辑,使其能走 AstrBot 本体正常的 provider 加载链路。在
astrbot/core/config/default.py中新增vLLM Embeddingprovider 模板,使其自动出现在 WebUI 的Embedding提供商列表中。规范化了新 provider 的默认值:保留
id = vllm_embedding和timeout = 20,其余可见字段默认留空,并将enable设为false。按 vLLM embedding 的实际兼容需求实现了内置 provider,而不是继续把 vLLM 当作“伪 OpenAI Embedding”来用:
dimensions参数;embedding_api_base;/models尝试把配置模型名对齐到 vLLMserved-model-name;同时纳入了该分支上已有的配置/UI 配套改动,用于在配置表单中更清晰地展示 provider hint:
astrbot/dashboard/routes/config.pydashboard/src/components/shared/AstrBotConfig.vuedashboard/src/i18n/locales/zh-CN/features/config-metadata.json这项改动的目标是把
vLLM Embedding变成 AstrBot 本体中的一等内置 provider,而不是要求用户继续复用openai_embedding。This is NOT a breaking change. / 这不是一个破坏性变更。
Screenshots or Test Results / 运行截图或测试结果
Verification steps executed on a mirrored AstrBot copy before pushing this branch:
py_compilesucceeded for:astrbot/core/provider/sources/vllm_embedding_source.pyastrbot/core/provider/manager.pyastrbot/core/config/default.pyastrbot.core.provider.sources.vllm_embedding_sourcesucceeded.registered=Trueclass_name=VLLMEmbeddingProvidervLLM Embeddingappears inEmbedding -> Add Provider.ID = vllm_embeddingEnable = falseAPI Key = emptyAPI Base URL = emptyEmbedding Model = emptyEmbedding Dimensions = emptyTimeout = 20Proxy = emptyvllm_embeddingprovider was created successfully in the mirrored AstrBot instance.openai,httpx) that are already present in bothrequirements.txtandpyproject.toml.已执行的验证步骤(基于镜像 AstrBot 副本):
py_compile:astrbot/core/provider/sources/vllm_embedding_source.pyastrbot/core/provider/manager.pyastrbot/core/config/default.pyastrbot.core.provider.sources.vllm_embedding_source成功。registered=Trueclass_name=VLLMEmbeddingProviderEmbedding -> Add Provider中确认已经出现vLLM Embedding卡片。ID = vllm_embeddingEnable = falseAPI Key = 空API Base URL = 空Embedding Model = 空Embedding Dimensions = 空Timeout = 20Proxy = 空vllm_embeddingprovider。openai、httpx已经存在于项目的requirements.txt和pyproject.toml中。Checklist / 检查清单
😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Add a dedicated vLLM Embedding provider and related config/UI updates to improve compatibility with vLLM-based embedding backends and enhance provider configuration capabilities.
New Features:
Enhancements: