Skip to content

Commit 6ad9d27

Browse files
authored
feat: support external embedding service for scaling (#248)
tested with (using both config.yaml and env vars): - internal main_em.py service - external local embedding service - remotely hosted embedding service (IONOS) --------- Signed-off-by: Anupam Kumar <kyteinsky@gmail.com>
1 parent 4c2b3f5 commit 6ad9d27

10 files changed

Lines changed: 225 additions & 37 deletions

File tree

README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,13 @@ Make sure to restart the app after changing the config file. For docker, this wo
116116

117117
This is a file copied from one of the two configurations (config.cpu.yaml or config.gpu.conf) during app startup if `config.yaml` is not already present to the persistent storage. See [Repair section](#repair) on details on the repair step that removes the config if you have a custom config.
118118

119+
The default way is to spawn an embedding server backed by llama.cpp, where the local model runs on either CPU or GPU. The other option is to use a remote model from a OpenAI-compatible API. The configuration for the remote model is also present in the sample config files.
120+
API key or username/password for the remote API can be stored in the config file itself or environment variables can be used. `CCB_EM_APIKEY` for the API key and `CCB_EM_USERNAME` and `CCB_EM_PASSWORD` for the username and password respectively.
121+
To indicate the use of environment variables, set the value of `auth` in the config file to `from_env`, like so:
122+
```yaml
123+
auth: from_env
124+
```
125+
119126
## Repair
120127
v2.1.0 introduces repair steps. These run on app startup.
121128

appinfo/info.xml

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,31 @@ Setup background job workers as described here: https://docs.nextcloud.com/serve
5656
<display-name>Auto-download models from Huggingface</display-name>
5757
<description>When set to "false", "0" or "no", initial download of the Huggingface models will be skipped in the init phase. They would have to be downloaded and placed in the persistent storage manually or through a mount point.</description>
5858
</variable>
59+
<variable>
60+
<name>CC_EM_BASE_URL</name>
61+
<display-name>External OpenAI-compatible endpoint</display-name>
62+
<description>Set this to an OpenAI-compatible endpoint like https://api.my-local-llm.lan/v1. When set, the internal embedding server is not started. For authentication, set CC_EM_APIKEY or CC_EM_USERNAME and CC_EM_PASSWORD as needed.</description>
63+
</variable>
64+
<variable>
65+
<name>CC_EM_MODEL_NAME</name>
66+
<display-name>External embedding model name</display-name>
67+
<description>Model name to be used with the OpenAI-compatible endpoint set in CC_EM_BASE_URL. For example, "text-embedding-3-small" or any other model supported by the endpoint. If unset, no model name is sent in the requests.</description>
68+
</variable>
69+
<variable>
70+
<name>CC_EM_APIKEY</name>
71+
<display-name>API key for authentication to CC_EM_BASE_URL</display-name>
72+
<description>API key to be used for authenticating requests to the OpenAI-compatible endpoint set in CC_EM_BASE_URL. Either this or CC_EM_USERNAME and CC_EM_PASSWORD should be set if the endpoint requires authentication.</description>
73+
</variable>
74+
<variable>
75+
<name>CC_EM_USERNAME</name>
76+
<display-name>Username for authentication to CC_EM_BASE_URL</display-name>
77+
<description>Username to be used for authenticating requests to the OpenAI-compatible endpoint set in CC_EM_BASE_URL.</description>
78+
</variable>
79+
<variable>
80+
<name>CC_EM_PASSWORD</name>
81+
<display-name>Password for authentication to CC_EM_BASE_URL</display-name>
82+
<description>Password to be used for authenticating requests to the OpenAI-compatible endpoint set in CC_EM_BASE_URL.</description>
83+
</variable>
5984
</environment-variables>
6085
</external-app>
6186
</info>

config.cpu.yaml

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,21 @@ vectordb:
1616
# 'connection' overrides the env var 'CCB_DB_URL'
1717

1818
embedding:
19-
protocol: http
20-
host: localhost
21-
port: 5000
19+
# embedding service config
20+
# for external embedding service, set CC_EM_BASE_URL and CC_EM_APIKEY env vars during deployment
21+
# if the env vars are set, this config is ignored
22+
# request_timeout is always respected even for remote service
23+
base_url: http://localhost:5000/v1
2224
workers: 1
23-
offload_after_mins: 15 # in minutes
2425
request_timeout: 1800 # in seconds
26+
# only for external embedding service
27+
# remote_service: true
28+
# model_name: text-embedding-3-small
29+
# auth:
30+
# apikey: your_api_key_here
31+
# # -or-
32+
# username: your_username_here
33+
# password: your_password_here
2534
llama:
2635
# all options: https://python.langchain.com/api_reference/community/embeddings/langchain_community.embeddings.llamacpp.LlamaCppEmbeddings.html
2736
# 'model_alias' is reserved
@@ -30,6 +39,7 @@ embedding:
3039
n_batch: 16
3140
n_ctx: 8192
3241

42+
3343
llm:
3444
nc_texttotext:
3545

config.gpu.yaml

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,21 @@ vectordb:
1616
# 'connection' overrides the env var 'CCB_DB_URL'
1717

1818
embedding:
19-
protocol: http
20-
host: localhost
21-
port: 5000
19+
# embedding service config
20+
# for external embedding service, set CC_EM_BASE_URL and CC_EM_APIKEY env vars during deployment
21+
# if the env vars are set, this config is ignored
22+
# request_timeout is always respected even for remote service
23+
base_url: http://localhost:5000/v1
2224
workers: 1
23-
offload_after_mins: 15 # in minutes
2425
request_timeout: 1800 # in seconds
26+
# only for external embedding service
27+
# remote_service: true
28+
# model_name: text-embedding-3-small
29+
# auth:
30+
# apikey: your_api_key_here
31+
# # -or-
32+
# username: your_username_here
33+
# password: your_password_here
2534
llama:
2635
# all options: https://python.langchain.com/api_reference/community/embeddings/langchain_community.embeddings.llamacpp.LlamaCppEmbeddings.html
2736
# 'model_alias' is reserved
@@ -31,6 +40,7 @@ embedding:
3140
n_ctx: 8192
3241
n_gpu_layers: -1
3342

43+
3444
llm:
3545
nc_texttotext:
3646

context_chat_backend/config_parser.py

Lines changed: 57 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,13 @@
22
# SPDX-FileCopyrightText: 2024 Nextcloud GmbH and Nextcloud contributors
33
# SPDX-License-Identifier: AGPL-3.0-or-later
44
#
5+
import os
6+
57
from ruamel.yaml import YAML
68

79
from .models.loader import models
8-
from .types import TConfig
10+
from .types import TConfig, TEmbeddingAuthApiKey, TEmbeddingAuthBasic, TEmbeddingConfig
11+
from .utils import value_of
912
from .vectordb.loader import vector_dbs
1013

1114

@@ -47,6 +50,58 @@ def get_config(file_path: str) -> TConfig:
4750
f'Error: llm model should be at least one of {models["llm"]} in the config file'
4851
)
4952

53+
# convert protocol, host and port to base_url
54+
embedding = config.get('embedding')
55+
if (embedding is None or not isinstance(embedding, dict)) and not os.getenv('CC_EM_BASE_URL'):
56+
raise AssertionError(
57+
'Error: "embedding" key should be defined in the config file or CC_EM_BASE_URL env var should be set in the'
58+
' Deploy Options.'
59+
)
60+
61+
if os.getenv('CC_EM_BASE_URL'):
62+
if os.getenv('CC_EM_APIKEY'):
63+
auth = TEmbeddingAuthApiKey(apikey=os.environ['CC_EM_APIKEY'])
64+
elif os.getenv('CC_EM_USERNAME') and os.getenv('CC_EM_PASSWORD'):
65+
auth = TEmbeddingAuthBasic(
66+
username=os.environ['CC_EM_USERNAME'],
67+
password=os.environ['CC_EM_PASSWORD'],
68+
)
69+
else:
70+
auth = None
71+
72+
try:
73+
# override embedding config from env vars
74+
embedding_config = TEmbeddingConfig(
75+
base_url=os.environ['CC_EM_BASE_URL'],
76+
model_name=value_of(os.getenv('CC_EM_MODEL_NAME', None)),
77+
auth=auth,
78+
remote_service=True,
79+
workers=0,
80+
request_timeout=embedding.get('request_timeout', 1800) if embedding else 1800,
81+
)
82+
except Exception as e:
83+
raise AssertionError(
84+
'Error: could not create embedding config from env vars'
85+
) from e
86+
87+
elif embedding is None:
88+
raise AssertionError(
89+
'Error: "embedding" key should be defined in the config file if CC_EM_BASE_URL env var is not set in the'
90+
' Deploy Options.'
91+
)
92+
else:
93+
# embedding from config file
94+
if 'protocol' in embedding and 'host' in embedding and 'port' in embedding:
95+
embedding['base_url'] = f"{embedding['protocol']}://{embedding['host']}:{embedding['port']}/v1"
96+
del embedding['protocol']
97+
del embedding['host']
98+
del embedding['port']
99+
100+
try:
101+
embedding_config = TEmbeddingConfig(**embedding)
102+
except Exception as e:
103+
raise AssertionError('Error: could not create embedding config from config file') from e
104+
50105
return TConfig(
51106
debug=config.get('debug', False),
52107
uvicorn_log_level=config.get('uvicorn_log_level', 'info'),
@@ -58,6 +113,6 @@ def get_config(file_path: str) -> TConfig:
58113
doc_parser_worker_limit=config.get('doc_parser_worker_limit', 10),
59114

60115
vectordb=vectordb,
61-
embedding=config.get('embedding', {}), # for a more appropriate response
116+
embedding=embedding_config,
62117
llm=llm,
63118
)

context_chat_backend/network_em.py

Lines changed: 40 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,22 @@
33
# SPDX-License-Identifier: AGPL-3.0-or-later
44
#
55
import logging
6+
from collections.abc import Generator
67
from time import sleep
78
from typing import Literal, TypedDict
89

910
import httpx
1011
from langchain_core.embeddings import Embeddings
1112
from pydantic import BaseModel
1213

13-
from .types import EmbeddingException, RetryableEmbeddingException, TConfig
14+
from .types import (
15+
EmbeddingException,
16+
FatalEmbeddingException,
17+
RetryableEmbeddingException,
18+
TConfig,
19+
TEmbeddingAuthApiKey,
20+
TEmbeddingAuthBasic,
21+
)
1422

1523
logger = logging.getLogger('ccb.nextwork_em')
1624

@@ -34,6 +42,15 @@ class CreateEmbeddingResponse(TypedDict):
3442
usage: EmbeddingUsage
3543

3644

45+
class ApiKeyAuth(httpx.Auth):
46+
def __init__(self, apikey: str | bytes) -> None:
47+
self._apikey = apikey
48+
49+
def auth_flow(self, request: httpx.Request) -> Generator[httpx.Request, httpx.Response, None]:
50+
request.headers['Authorization'] = f'Bearer {self._apikey}'
51+
yield request
52+
53+
3754
class NetworkEmbeddings(Embeddings, BaseModel):
3855
app_config: TConfig
3956

@@ -47,14 +64,32 @@ def _get_embedding(self, input_: str | list[str], try_: int = 3) -> list[float]
4764
)
4865

4966
try:
50-
with httpx.Client() as client:
67+
match emconf.auth:
68+
case None:
69+
auth = httpx.USE_CLIENT_DEFAULT
70+
case TEmbeddingAuthApiKey(apikey=apikey):
71+
auth = ApiKeyAuth(apikey=apikey)
72+
case TEmbeddingAuthBasic(username=username, password=password):
73+
auth = httpx.BasicAuth(username=username, password=password)
74+
75+
data = {'input': input_}
76+
if emconf.model_name:
77+
data['model'] = emconf.model_name
78+
79+
with httpx.Client(verify=self.app_config.httpx_verify_ssl) as client:
5180
response = client.post(
52-
f'{emconf.protocol}://{emconf.host}:{emconf.port}/v1/embeddings',
53-
json={'input': input_},
81+
f'{emconf.base_url.removesuffix("/")}/embeddings',
82+
json=data,
5483
timeout=emconf.request_timeout,
84+
auth=auth,
5585
)
56-
if response.status_code != 200:
86+
if response.status_code // 100 == 4:
87+
raise FatalEmbeddingException(response.text)
88+
if response.status_code // 100 != 2:
5789
raise EmbeddingException(response.text)
90+
except FatalEmbeddingException as e:
91+
logger.error('Fatal error while getting embeddings: %s', str(e), exc_info=e)
92+
raise e
5893
except (
5994
EmbeddingException,
6095
httpx.RemoteProtocolError,

context_chat_backend/types.py

Lines changed: 30 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,20 +5,33 @@
55
from pydantic import BaseModel
66

77
__all__ = [
8+
'DEFAULT_EM_MODEL_ALIAS',
89
'EmbeddingException',
910
'LoaderException',
1011
'TConfig',
11-
'TEmbedding',
12+
'TEmbeddingAuthApiKey',
13+
'TEmbeddingAuthBasic',
14+
'TEmbeddingConfig',
1215
]
1316

14-
class TEmbedding(BaseModel):
15-
protocol: str
16-
host: str
17-
port: int
18-
workers: int
19-
offload_after_mins: int
20-
request_timeout: int
21-
llama: dict
17+
DEFAULT_EM_MODEL_ALIAS = 'em_model'
18+
19+
20+
class TEmbeddingAuthApiKey(BaseModel):
21+
apikey: str
22+
23+
class TEmbeddingAuthBasic(BaseModel):
24+
username: str
25+
password: str
26+
27+
class TEmbeddingConfig(BaseModel):
28+
base_url: str = 'http://localhost:5000/v1'
29+
workers: int = 1
30+
request_timeout: int = 1750
31+
model_name: str | None = DEFAULT_EM_MODEL_ALIAS
32+
auth: TEmbeddingAuthApiKey | TEmbeddingAuthBasic | None = None
33+
remote_service: bool = False
34+
llama: dict = dict() # noqa: C408
2235

2336

2437
class TConfig(BaseModel):
@@ -32,7 +45,7 @@ class TConfig(BaseModel):
3245
doc_parser_worker_limit: int
3346

3447
vectordb: tuple[str, dict]
35-
embedding: TEmbedding
48+
embedding: TEmbeddingConfig
3649
llm: tuple[str, dict]
3750

3851

@@ -50,3 +63,10 @@ class RetryableEmbeddingException(EmbeddingException):
5063
This keeps the indexing loop running and adds to the retry list.
5164
The parent exception would break the loop and stop the indexing process.
5265
"""
66+
67+
class FatalEmbeddingException(EmbeddingException):
68+
"""
69+
Exception that indicates a fatal error in the embedding request.
70+
71+
Either malformed request, authentication error, or other non-retryable error.
72+
"""

context_chat_backend/utils.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@
1414

1515
from fastapi.responses import JSONResponse as FastAPIJSONResponse
1616

17+
from .types import TConfig, TEmbeddingAuthApiKey, TEmbeddingAuthBasic, TEmbeddingConfig
18+
1719
T = TypeVar('T')
1820
_logger = logging.getLogger('ccb.utils')
1921

@@ -121,3 +123,22 @@ def wrapper(*args, **kwargs):
121123
return res
122124

123125
return wrapper
126+
127+
128+
def redact_config(config: TConfig | TEmbeddingConfig) -> TConfig | TEmbeddingConfig:
129+
'''
130+
Redact sensitive information from the config for logging
131+
'''
132+
if isinstance(config, TConfig):
133+
em_conf = config.embedding
134+
else:
135+
em_conf = config
136+
137+
if em_conf.auth:
138+
if isinstance(em_conf.auth, TEmbeddingAuthApiKey):
139+
em_conf.auth.apikey = '***REDACTED***'
140+
elif isinstance(em_conf.auth, TEmbeddingAuthBasic):
141+
em_conf.auth.username = '***REDACTED***'
142+
em_conf.auth.password = '***REDACTED***' # noqa: S105
143+
144+
return config

main.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
from context_chat_backend.types import TConfig # isort: skip
1313
from context_chat_backend.controller import app # isort: skip
1414
from context_chat_backend.logger import get_logging_config, setup_logging # isort: skip
15+
from context_chat_backend.utils import redact_config # isort: skip
1516

1617
LOGGER_CONFIG_NAME = 'logger_config.yaml'
1718

@@ -47,7 +48,7 @@ def _setup_log_levels(debug: bool):
4748
app_config: TConfig = app.extra['CONFIG']
4849
_setup_log_levels(app_config.debug)
4950

50-
print('App config:\n' + app_config.model_dump_json(indent=2), flush=True)
51+
print('App config:\n' + redact_config(app_config).model_dump_json(indent=2), flush=True)
5152

5253
uv_log_config = uvicorn.config.LOGGING_CONFIG # pyright: ignore[reportAttributeAccessIssue]
5354
uv_log_config['formatters']['json'] = logging_config['formatters']['json']

0 commit comments

Comments
 (0)