Skip to content

Commit b689509

Browse files
authored
fix: make logs persistent + other fixes (#140)
- remove root logs dir - revert CI env var in embedding server proc - remove unsupported vectordb configs - update readme: remove beta status - update readme for logs - fix: better handling of embedding server failure --------- Signed-off-by: Anupam Kumar <kyteinsky@gmail.com>
1 parent fdeac86 commit b689509

12 files changed

Lines changed: 40 additions & 35 deletions

File tree

.github/workflows/integration-test.yml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -163,7 +163,7 @@ jobs:
163163
pip install -r requirements.txt
164164
cp example.env .env
165165
echo "NEXTCLOUD_URL=http://localhost:8080" >> .env
166-
python3 -u ./main.py > logs/backend_logs 2>&1 &
166+
python3 -u ./main.py > backend_logs 2>&1 &
167167
echo $! > ../pid.txt # Save the process ID (PID)
168168
169169
- name: Register backend
@@ -233,7 +233,9 @@ jobs:
233233
- name: Show logs
234234
if: always()
235235
run: |
236-
tail data/nextcloud.log
236+
cat data/nextcloud.log
237237
echo '--------------------------------------------------'
238-
tail -v -n +1 context_chat_backend/logs/* || echo "No logs in logs directory"
238+
cat context_chat_backend/backend_logs || echo "No backend logs"
239+
echo '--------------------------------------------------'
240+
tail -v -n +1 context_chat_backend/persistent_storage/logs/* || echo "No logs in logs directory"
239241

.gitignore

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,3 @@ __pycache__/
55
.env
66
persistent_storage/*
77
.vscode/
8-
logs/

Dockerfile

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,9 +40,6 @@ RUN python3 -m pip install --no-cache-dir https://github.com/abetlen/llama-cpp-p
4040
RUN sed -i '/llama_cpp_python/d' requirements.txt
4141
RUN python3 -m pip install --no-cache-dir -r requirements.txt && python3 -m pip cache purge
4242

43-
# Create an empty logs dir
44-
RUN mkdir logs
45-
4643
# Copy application files
4744
COPY context_chat_backend context_chat_backend
4845
COPY main.py .

README.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,6 @@
77
[![REUSE status](https://api.reuse.software/badge/github.com/nextcloud/context_chat_backend)](https://api.reuse.software/info/github.com/nextcloud/context_chat_backend)
88

99
> [!NOTE]
10-
> This is a beta software. Expect breaking changes.
11-
>
1210
> Be mindful to install the backend before the Context Chat php app (Context Chat php app would sends all the user-accessible files to the backend for indexing in the background. It is not an issue even if the request fails to an uninitialised backend since those files would be tried again in the next background job run.)
1311
>
1412
> The HTTP request timeout is 50 minutes for all requests and can be changed with the `request_timeout` app config for the php app `context_chat` using the occ command (`occ config:app:set context_chat request_timeout --value=3000`, value is in seconds). The same also needs to be done for docker socket proxy. See [Slow responding ExApps](https://github.com/cloud-py-api/docker-socket-proxy?tab=readme-ov-file#slow-responding-exapps)
@@ -101,6 +99,11 @@ volumes:
10199
-v /var/run/docker.sock:/var/run/docker.sock:ro
102100
```
103101

102+
## Logs
103+
Logs are stored in the `logs/` directory in the persistent directory. In a docker container, it should be at `/nc_app_context_chat_backend/logs/`. The log file is named `ccb.log` and is set to otate at 20 MB with 10 backups. These logs are in JSONL format, i.e. each line is a valid JSON object.
104+
Now only warning and above logs are printed to the console. All the debug logs are written to the log file if `debug` is set to `true` in the config file.
105+
The logs of the embedding server are written to `logs/embedding_server_[date].log` in the persistent directory, it rotates with date change and is not in JSONL format, just raw stdout and stderr from the embedding server's process.
106+
104107
## Configuration
105108
Configuration resides inside the persistent storage as `config.yaml`. The location is `$APP_PERSISTENT_STORAGE`. By default it would be at `/nc_app_context_chat_backend_data/config.yaml` inside the container.
106109

config.cpu.yaml

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -14,17 +14,6 @@ vectordb:
1414
pgvector:
1515
# 'connection' overrides the env var 'CCB_DB_URL'
1616

17-
chroma:
18-
is_persistent: true
19-
# chroma_server_host:
20-
# chroma_server_http_port:
21-
# chroma_server_ssl_enabled:
22-
# chroma_server_api_default_path:
23-
24-
weaviate:
25-
# auth_client_secret:
26-
# url: http://localhost:8080
27-
2817
embedding:
2918
protocol: http
3019
host: localhost

config.gpu.yaml

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -14,17 +14,6 @@ vectordb:
1414
pgvector:
1515
# 'connection' overrides the env var 'CCB_DB_URL'
1616

17-
chroma:
18-
is_persistent: true
19-
# chroma_server_host:
20-
# chroma_server_http_port:
21-
# chroma_server_ssl_enabled:
22-
# chroma_server_api_default_path:
23-
24-
weaviate:
25-
# auth_client_secret:
26-
# url: http://localhost:8080
27-
2817
embedding:
2918
protocol: http
3019
host: localhost

context_chat_backend/controller.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -351,6 +351,8 @@ def _(sources: list[UploadFile]):
351351

352352
try:
353353
added_sources = exec_in_proc(target=embed_sources, args=(vectordb_loader, app.extra['CONFIG'], sources))
354+
except (DbException, EmbeddingException) as e:
355+
raise e
354356
except Exception as e:
355357
raise DbException('Error: failed to load sources') from e
356358
finally:

context_chat_backend/dyn_loader.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
import signal
1212
import subprocess
1313
from abc import ABC, abstractmethod
14+
from datetime import datetime
1415
from time import sleep, time
1516
from typing import Any
1617

@@ -22,7 +23,7 @@
2223

2324
from .models.loader import init_model
2425
from .network_em import NetworkEmbeddings
25-
from .types import LoaderException, TConfig
26+
from .types import EmbeddingException, LoaderException, TConfig
2627
from .vectordb.base import BaseVectorDB
2728
from .vectordb.loader import get_vector_db
2829
from .vectordb.types import DbException
@@ -43,7 +44,11 @@ def offload(self):
4344
class EmbeddingModelLoader(Loader):
4445
def __init__(self, config: TConfig):
4546
self.config = config
46-
self.logfile = open('logs/embedding_server.log', 'a+')
47+
logfile_path = os.path.join(
48+
os.environ['EM_SERVER_LOG_PATH'],
49+
f'embedding_server_{datetime.now().strftime("%Y-%m-%d")}.log',
50+
)
51+
self.logfile = open(logfile_path, 'a+')
4752

4853
def load(self):
4954
global pid
@@ -84,8 +89,7 @@ def load(self):
8489
try_ += 1
8590
sleep(3)
8691

87-
logger.error('Error: failed to start the embedding server')
88-
os.kill(os.getpid(), signal.SIGTERM)
92+
raise EmbeddingException('Error: the embedding server is not responding')
8993

9094
def offload(self):
9195
global pid

context_chat_backend/logger.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
import logging
1010
import logging.config
1111
import logging.handlers
12+
import os
1213
from time import gmtime
1314

1415
from ruamel.yaml import YAML
@@ -88,6 +89,19 @@ def get_logging_config() -> dict:
8889
try:
8990
yaml = YAML(typ='safe')
9091
config: dict = yaml.load(f)
92+
93+
persistent_storage = os.getenv('APP_PERSISTENT_STORAGE', 'persistent_storage')
94+
if (config.get('handlers', {}).get('file_json', {}).get('filename')):
95+
if (
96+
not config['handlers']['file_json']['filename'].startswith(persistent_storage)
97+
and not config['handlers']['file_json']['filename'].startswith('/')
98+
):
99+
config['handlers']['file_json']['filename'] = os.path.join(
100+
persistent_storage,
101+
config['handlers']['file_json']['filename'],
102+
)
103+
# create logs directory if it doesn't exist
104+
os.makedirs(os.path.dirname(config['handlers']['file_json']['filename']), exist_ok=True)
91105
except Exception as e:
92106
raise AssertionError('Error: could not load config from logger_config.yaml file') from e
93107

context_chat_backend/setup_functions.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,9 +44,14 @@ def setup_env_vars():
4444

4545
config_path = os.path.join(persistent_storage, 'config.yaml')
4646

47+
em_server_log_path = os.path.join(persistent_storage, 'logs')
48+
if not os.path.exists(em_server_log_path):
49+
os.makedirs(em_server_log_path, 0o750, True)
50+
4751
os.environ['APP_PERSISTENT_STORAGE'] = persistent_storage
4852
os.environ['VECTORDB_DIR'] = vector_db_dir
4953
os.environ['MODEL_DIR'] = model_dir
5054
os.environ['SENTENCE_TRANSFORMERS_HOME'] = os.getenv('SENTENCE_TRANSFORMERS_HOME', model_dir)
5155
os.environ['HF_HOME'] = os.getenv('HF_HOME', model_dir)
5256
os.environ['CC_CONFIG_PATH'] = os.getenv('CC_CONFIG_PATH', config_path)
57+
os.environ['EM_SERVER_LOG_PATH'] = os.getenv('EM_SERVER_LOG_PATH', em_server_log_path)

0 commit comments

Comments
 (0)