Skip to content

Commit 9ec0044

Browse files
committed
feat: add retry configuration parameters for StackitEmbedder and update README with embedder retry behavior
1 parent e4ffae2 commit 9ec0044

5 files changed

Lines changed: 172 additions & 10 deletions

File tree

infrastructure/rag/values.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -192,6 +192,14 @@ backend:
192192
stackitEmbedder:
193193
STACKIT_EMBEDDER_MODEL: "intfloat/e5-mistral-7b-instruct"
194194
STACKIT_EMBEDDER_BASE_URL: https://api.openai-compat.model-serving.eu01.onstackit.cloud/v1
195+
# Retry settings (optional). If omitted, fall back to shared RETRY_DECORATOR_* values.
196+
STACKIT_EMBEDDER_MAX_RETRIES: "5"
197+
STACKIT_EMBEDDER_RETRY_BASE_DELAY: "0.5"
198+
STACKIT_EMBEDDER_RETRY_MAX_DELAY: "600"
199+
STACKIT_EMBEDDER_BACKOFF_FACTOR: "2"
200+
STACKIT_EMBEDDER_ATTEMPT_CAP: "6"
201+
STACKIT_EMBEDDER_JITTER_MIN: "0.05"
202+
STACKIT_EMBEDDER_JITTER_MAX: "0.25"
195203
ollama:
196204
OLLAMA_MODEL: "llama3.2:3b-instruct-fp16"
197205
OLLAMA_BASE_URL: "http://rag-ollama:11434"
@@ -314,6 +322,7 @@ adminBackend:
314322
summarizer:
315323
SUMMARIZER_MAXIMUM_INPUT_SIZE: "8000"
316324
SUMMARIZER_MAXIMUM_CONCURRENCY: "10"
325+
# Retry settings (optional). If omitted, fall back to shared RETRY_DECORATOR_* values.
317326
SUMMARIZER_MAX_RETRIES: "5"
318327
SUMMARIZER_RETRY_BASE_DELAY: "0.5"
319328
SUMMARIZER_RETRY_MAX_DELAY: "600"

libs/README.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ It consists of the following python packages:
88
- [1.1 Requirements](#11-requirements)
99
- [1.2 Endpoints](#12-endpoints)
1010
- [1.3 Replaceable parts](#13-replaceable-parts)
11+
- [1.4 Embedder retry behavior](#14-embedder-retry-behavior)
1112
- [`2. Admin API lib`](#2-admin-api-lib)
1213
- [2.1 Requirements](#21-requirements)
1314
- [2.2 Endpoints](#22-endpoints)
@@ -99,6 +100,34 @@ Uploaded documents are required to contain the following metadata:
99100
| chat_endpoint | [`rag_core_api.api_endpoints.chat.Chat`](./rag-core-api/src/rag_core_api/api_endpoints/chat.py) | [`rag_core_api.impl.api_endpoints.default_chat.DefaultChat`](./rag-core-api/src/rag_core_api/impl/api_endpoints/default_chat.py) | Implementation of the chat endpoint. Default implementation just calls the *traced_chat_graph* |
100101
| ragas_llm | `langchain_core.language_models.chat_models.BaseChatModel` | `langchain_openai.ChatOpenAI` or `langchain_ollama.ChatOllama` | The LLM used for the ragas evaluation. |
101102

103+
### 1.4 Embedder retry behavior
104+
105+
The default STACKIT embedder implementation (`StackitEmbedder`) uses the shared retry decorator with exponential backoff from the core library.
106+
107+
- Decorator: `rag_core_lib.impl.utils.retry_decorator.retry_with_backoff`
108+
- Base settings (fallback): [`RetryDecoratorSettings`](./rag-core-lib/src/rag_core_lib/impl/settings/retry_decorator_settings.py)
109+
- Per-embedder overrides: [`StackitEmbedderSettings`](./rag-core-api/src/rag_core_api/impl/settings/stackit_embedder_settings.py)
110+
111+
How it resolves settings
112+
113+
- Each retry-related field in `StackitEmbedderSettings` is optional. When a field is provided (not None), it overrides the corresponding value from `RetryDecoratorSettings`.
114+
- When a field is not provided (None), the embedder falls back to the value from `RetryDecoratorSettings`.
115+
- Zero values (e.g., 0 or 0.0 where allowed) are honored and do not trigger fallback.
116+
- The effective retry configuration is computed once per embedder instance at initialization.
117+
118+
Configuring via environment variables
119+
120+
- Embedder-specific (prefix `STACKIT_EMBEDDER_`):
121+
- `STACKIT_EMBEDDER_MAX_RETRIES`
122+
- `STACKIT_EMBEDDER_RETRY_BASE_DELAY`
123+
- `STACKIT_EMBEDDER_RETRY_MAX_DELAY`
124+
- `STACKIT_EMBEDDER_BACKOFF_FACTOR`
125+
- `STACKIT_EMBEDDER_ATTEMPT_CAP`
126+
- `STACKIT_EMBEDDER_JITTER_MIN`
127+
- `STACKIT_EMBEDDER_JITTER_MAX`
128+
- Global fallback (prefix `RETRY_DECORATOR_`): see section [4.2](#42-retry-decorator-exponential-backoff) for all keys and defaults.
129+
- Helm chart: set the same keys under `backend.envs.stackitEmbedder` in [infrastructure/rag/values.yaml](../infrastructure/rag/values.yaml).
130+
102131
## 2. Admin API Lib
103132

104133
The Admin API Library contains all required components for file management capabilities for RAG systems, handling all document lifecycle operations. It also includes a default `dependency_container`, that is pre-configured and should fit most use-cases.

libs/rag-core-api/src/rag_core_api/dependency_container.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@
6363
from rag_core_lib.impl.settings.langfuse_settings import LangfuseSettings
6464
from rag_core_lib.impl.settings.ollama_llm_settings import OllamaSettings
6565
from rag_core_lib.impl.settings.rag_class_types_settings import RAGClassTypeSettings
66+
from rag_core_lib.impl.settings.retry_decorator_settings import RetryDecoratorSettings
6667
from rag_core_lib.impl.settings.stackit_vllm_settings import StackitVllmSettings
6768
from rag_core_lib.impl.tracers.langfuse_traced_chain import LangfuseTracedGraph
6869
from rag_core_lib.impl.utils.async_threadsafe_semaphore import AsyncThreadsafeSemaphore
@@ -89,6 +90,7 @@ class DependencyContainer(DeclarativeContainer):
8990
stackit_embedder_settings = StackitEmbedderSettings()
9091
chat_history_settings = ChatHistorySettings()
9192
sparse_embedder_settings = SparseEmbedderSettings()
93+
retry_decorator_settings = RetryDecoratorSettings()
9294
chat_history_config.from_dict(chat_history_settings.model_dump())
9395

9496
class_selector_config.from_dict(rag_class_type_settings.model_dump() | embedder_class_type_settings.model_dump())
@@ -98,7 +100,7 @@ class DependencyContainer(DeclarativeContainer):
98100
ollama=Singleton(
99101
LangchainCommunityEmbedder, embedder=Singleton(OllamaEmbeddings, **ollama_embedder_settings.model_dump())
100102
),
101-
stackit=Singleton(StackitEmbedder, stackit_embedder_settings),
103+
stackit=Singleton(StackitEmbedder, stackit_embedder_settings, retry_decorator_settings),
102104
)
103105

104106
sparse_embedder = Singleton(FastEmbedSparse, **sparse_embedder_settings.model_dump())

libs/rag-core-api/src/rag_core_api/impl/embeddings/stackit_embedder.py

Lines changed: 74 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,41 @@
11
"""Module that contains the StackitEmbedder class."""
22

33
from langchain_core.embeddings import Embeddings
4-
from openai import OpenAI
4+
from openai import OpenAI, APIConnectionError, APIError, APITimeoutError, RateLimitError
55

66
from rag_core_api.embeddings.embedder import Embedder
77
from rag_core_api.impl.settings.stackit_embedder_settings import StackitEmbedderSettings
8+
import logging
9+
from rag_core_lib.impl.settings.retry_decorator_settings import RetryDecoratorSettings
10+
from rag_core_lib.impl.utils.retry_decorator import retry_with_backoff
11+
12+
logger = logging.getLogger(__name__)
813

914

1015
class StackitEmbedder(Embedder, Embeddings):
1116
"""A class that represents any Langchain provided Embedder."""
1217

13-
def __init__(self, stackit_embedder_settings: StackitEmbedderSettings):
18+
def __init__(
19+
self, stackit_embedder_settings: StackitEmbedderSettings, retry_decorator_settings: RetryDecoratorSettings
20+
):
1421
"""
1522
Initialize the StackitEmbedder with the given settings.
1623
1724
Parameters
1825
----------
1926
stackit_embedder_settings : StackitEmbedderSettings
2027
The settings for configuring the StackitEmbedder, including the API key and base URL.
28+
retry_decorator_settings : RetryDecoratorSettings
29+
Default retry settings used as fallback when StackitEmbedderSettings leaves fields unset.
2130
"""
2231
self._client = OpenAI(
2332
api_key=stackit_embedder_settings.api_key,
2433
base_url=stackit_embedder_settings.base_url,
2534
)
2635
self._settings = stackit_embedder_settings
36+
self._retry_decorator_settings = self._create_retry_decorator_settings(
37+
stackit_embedder_settings, retry_decorator_settings
38+
)
2739

2840
def get_embedder(self) -> "StackitEmbedder":
2941
"""Return the embedder instance.
@@ -48,12 +60,16 @@ def embed_documents(self, texts: list[str]) -> list[list[float]]:
4860
list[list[float]]
4961
A list where each element is a list of floats representing the embedded vector of a document.
5062
"""
51-
responses = self._client.embeddings.create(
52-
input=texts,
53-
model=self._settings.model,
54-
)
5563

56-
return [data.embedding for data in responses.data]
64+
@self._retry_with_backoff_wrapper()
65+
def _call(texts: list[str]) -> list[list[float]]:
66+
responses = self._client.embeddings.create(
67+
input=texts,
68+
model=self._settings.model,
69+
)
70+
return [data.embedding for data in responses.data]
71+
72+
return _call(texts)
5773

5874
def embed_query(self, text: str) -> list[float]:
5975
"""
@@ -69,4 +85,54 @@ def embed_query(self, text: str) -> list[float]:
6985
list[float]
7086
The embedded representation of the query text.
7187
"""
72-
return self.embed_documents([text])[0]
88+
embeddings_list = self.embed_documents([text])
89+
if embeddings_list:
90+
embeddings = embeddings_list[0]
91+
return embeddings if embeddings else []
92+
logger.warning("No embeddings found for query: %s", text)
93+
return embeddings_list
94+
95+
def _create_retry_decorator_settings(
96+
self,
97+
stackit_settings: StackitEmbedderSettings,
98+
retry_defaults: RetryDecoratorSettings,
99+
) -> RetryDecoratorSettings:
100+
# Prefer values from StackitEmbedderSettings when provided;
101+
# otherwise fall back to RetryDecoratorSettings defaults
102+
return RetryDecoratorSettings(
103+
max_retries=(
104+
stackit_settings.max_retries if stackit_settings.max_retries is not None else retry_defaults.max_retries
105+
),
106+
retry_base_delay=(
107+
stackit_settings.retry_base_delay
108+
if stackit_settings.retry_base_delay is not None
109+
else retry_defaults.retry_base_delay
110+
),
111+
retry_max_delay=(
112+
stackit_settings.retry_max_delay
113+
if stackit_settings.retry_max_delay is not None
114+
else retry_defaults.retry_max_delay
115+
),
116+
backoff_factor=(
117+
stackit_settings.backoff_factor
118+
if stackit_settings.backoff_factor is not None
119+
else retry_defaults.backoff_factor
120+
),
121+
attempt_cap=(
122+
stackit_settings.attempt_cap if stackit_settings.attempt_cap is not None else retry_defaults.attempt_cap
123+
),
124+
jitter_min=(
125+
stackit_settings.jitter_min if stackit_settings.jitter_min is not None else retry_defaults.jitter_min
126+
),
127+
jitter_max=(
128+
stackit_settings.jitter_max if stackit_settings.jitter_max is not None else retry_defaults.jitter_max
129+
),
130+
)
131+
132+
def _retry_with_backoff_wrapper(self):
133+
return retry_with_backoff(
134+
settings=self._retry_decorator_settings,
135+
exceptions=(APIError, RateLimitError, APITimeoutError, APIConnectionError),
136+
rate_limit_exceptions=(RateLimitError,),
137+
logger=logger,
138+
)

libs/rag-core-api/src/rag_core_api/impl/settings/stackit_embedder_settings.py

Lines changed: 57 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
"""Module contains settings regarding the stackit embedder."""
22

3-
from pydantic import Field
3+
from typing import Optional
4+
from pydantic import Field, PositiveInt
45
from pydantic_settings import BaseSettings
56

67

@@ -17,6 +18,20 @@ class StackitEmbedderSettings(BaseSettings):
1718
(default "https://e629124b-accc-4e25-a1cc-dc57ac741e1d.model-serving.eu01.onstackit.cloud/v1").
1819
api_key : str
1920
The API key for authentication.
21+
max_retries: Optional[PositiveInt]
22+
Total retries, not counting the initial attempt.
23+
retry_base_delay: Optional[float]
24+
Base delay in seconds for the first retry.
25+
retry_max_delay: Optional[float]
26+
Maximum delay cap in seconds for any single wait.
27+
backoff_factor: Optional[float]
28+
Exponential backoff factor (>= 1).
29+
attempt_cap: Optional[int]
30+
Cap for exponent growth (backoff_factor ** attempt_cap).
31+
jitter_min: Optional[float]
32+
Minimum jitter in seconds.
33+
jitter_max: Optional[float]
34+
Maximum jitter in seconds.
2035
"""
2136

2237
class Config:
@@ -28,3 +43,44 @@ class Config:
2843
model: str = Field(default="intfloat/e5-mistral-7b-instruct")
2944
base_url: str = Field(default="https://e629124b-accc-4e25-a1cc-dc57ac741e1d.model-serving.eu01.onstackit.cloud/v1")
3045
api_key: str = Field(default="")
46+
max_retries: Optional[PositiveInt] = Field(
47+
default=None,
48+
title="Max Retries",
49+
description="Total retries, not counting the initial attempt.",
50+
)
51+
retry_base_delay: Optional[float] = Field(
52+
default=None,
53+
ge=0,
54+
title="Retry Base Delay",
55+
description="Base delay in seconds for the first retry.",
56+
)
57+
retry_max_delay: Optional[float] = Field(
58+
default=None,
59+
gt=0,
60+
title="Retry Max Delay",
61+
description="Maximum delay cap in seconds for any single wait.",
62+
)
63+
backoff_factor: Optional[float] = Field(
64+
default=None,
65+
ge=1.0,
66+
title="Backoff Factor",
67+
description="Exponential backoff factor (>= 1).",
68+
)
69+
attempt_cap: Optional[int] = Field(
70+
default=None,
71+
ge=0,
72+
title="Attempt Cap",
73+
description="Cap for exponent growth (backoff_factor ** attempt_cap).",
74+
)
75+
jitter_min: Optional[float] = Field(
76+
default=None,
77+
ge=0.0,
78+
title="Jitter Min (s)",
79+
description="Minimum jitter in seconds.",
80+
)
81+
jitter_max: Optional[float] = Field(
82+
default=None,
83+
ge=0.0,
84+
title="Jitter Max (s)",
85+
description="Maximum jitter in seconds.",
86+
)

0 commit comments

Comments
 (0)