Skip to content

Commit 550f9e5

Browse files
chandrasekharan-zipstackclaudecoderabbitai[bot]
authored
[FIX] Unblock OSS first-time setup — sample envs, host-gateway, ollama hint (#2107)
* [FIX] Unblock OSS first-time setup — sample envs, host-gateway, ollama hint - backend/sample.env: set INTERNAL_SERVICE_API_KEY=dev-internal-key-123 so workers' internal API calls don't 500 against backend middleware. Add TEMPORARY_REMOTE_STORAGE alongside PERMANENT_REMOTE_STORAGE. - workers/sample.env: mirror PERMANENT_REMOTE_STORAGE, TEMPORARY_REMOTE_STORAGE, REMOTE_PROMPT_STUDIO_FILE_PATH. Without these the executor / ide-callback workers crash at json.loads("") on first Prompt Studio index. - unstract/sdk1 file_storage/env_helper.py: raise FileStorageError with a clear message when the storage env var is unset or invalid JSON, instead of letting json.loads("") raise an inscrutable JSONDecodeError. - adapters/llm1 + embedding1 ollama.json: fix typo docker.host.internal -> host.docker.internal in the Base URL hint. - docker/docker-compose.yaml: extract a YAML anchor x-host-gateway and apply it (<<: *host_gateway) to backend, prompt-service, runner, celery-*, and every worker-* service. Before this only backend and prompt-service had extra_hosts, so adapter Test passed in prompt-service but real prompt execution from the worker pool failed with EAI_NONAME against host.docker.internal. - run-platform.sh: fail fast with actionable hint when `docker info` can't reach the daemon (missing docker group membership, etc.). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Br691aZbhfcB4xdswrUjuw * [FIX] Address greptile / coderabbit review comments - env_helper.py: validate parsed JSON is a dict; normalize KeyError / TypeError / ValueError from provider construction into FileStorageError with the remediation message, so callers never see raw json/dict exceptions on a misconfigured env var. - docker-compose.yaml: extend the x-host-gateway anchor's docstring to warn that YAML merge does NOT concatenate lists — adding a sibling extra_hosts to a service shadows the anchor entry rather than appending. Future contributors must inline all entries instead. - run-platform.sh: branch the docker-daemon remediation hints by OS (Linux / macOS / Windows / other), so macOS users see the Docker-Desktop hint instead of irrelevant getent/usermod commands. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Br691aZbhfcB4xdswrUjuw * [FIX] Address Jaseem's PR review nits - env_helper.py: narrow except (KeyError, ValueError) to the FileStorageProvider resolution only; constructor / fsspec exceptions now propagate untouched instead of being mislabeled as env config errors. - env_helper.py: drop dead `except FileStorageError: raise e` (inner try doesn't catch FileStorageError, and `raise e` would reset traceback). - workers/sample.env: refresh comment to match new FileStorageError behavior. - backend/sample.env: unquote REMOTE_PROMPT_STUDIO_FILE_PATH for consistency with workers/sample.env. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Br691aZbhfcB4xdswrUjuw * Update unstract/sdk1/src/unstract/sdk1/file_storage/env_helper.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Chandrasekharan M <117059509+chandrasekharan-zipstack@users.noreply.github.com> * [FIX] Split long f-string in EnvHelper.get_storage to satisfy ruff E501 The coderabbit suggestion added an isinstance(credentials, dict) guard whose error message exceeded the 90-char line limit. Split the f-string across two adjacent string literals to keep behavior identical. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Br691aZbhfcB4xdswrUjuw --------- Signed-off-by: Chandrasekharan M <117059509+chandrasekharan-zipstack@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
1 parent 7aba21f commit 550f9e5

7 files changed

Lines changed: 98 additions & 26 deletions

File tree

backend/sample.env

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,9 @@ SESSION_EXPIRATION_TIME_IN_SECOND=7200
6464
WEB_APP_ORIGIN_URL="http://frontend.unstract.localhost"
6565

6666
# API keys for trusted services
67-
INTERNAL_SERVICE_API_KEY=
67+
# Workers send this as X-API-Key on internal calls to backend.
68+
# Must match workers/sample.env INTERNAL_SERVICE_API_KEY; rotate in both for prod.
69+
INTERNAL_SERVICE_API_KEY=dev-internal-key-123
6870

6971
# Unstract Core envs
7072
BUILTIN_FUNCTIONS_API_KEY=
@@ -199,7 +201,8 @@ API_FILE_STORAGE_CREDENTIALS='{"provider": "minio", "credentials": {"endpoint_ur
199201

200202
#Remote storage related envs
201203
PERMANENT_REMOTE_STORAGE='{"provider": "minio", "credentials": {"endpoint_url": "http://unstract-minio:9000", "key": "minio", "secret": "minio123"}}'
202-
REMOTE_PROMPT_STUDIO_FILE_PATH="unstract/prompt-studio-data"
204+
TEMPORARY_REMOTE_STORAGE='{"provider": "minio", "credentials": {"endpoint_url": "http://unstract-minio:9000", "key": "minio", "secret": "minio123"}}'
205+
REMOTE_PROMPT_STUDIO_FILE_PATH=unstract/prompt-studio-data
203206

204207
# Storage Provider for Tool registry
205208
TOOL_REGISTRY_STORAGE_CREDENTIALS='{"provider":"local"}'

docker/docker-compose.yaml

Lines changed: 27 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,21 @@ name: ${COMPOSE_PROJECT_NAME:-docker}
22
include:
33
- docker-compose-dev-essentials.yaml
44

5+
# Reusable host-gateway mapping so containers can reach services on the host
6+
# (e.g. host-installed Ollama at http://host.docker.internal:11434).
7+
# NOTE: YAML merge (<<:) merges mappings shallowly and does NOT concatenate
8+
# lists. If a service ever needs additional extra_hosts entries alongside
9+
# this one, inline ALL entries under that service (including
10+
# host.docker.internal:host-gateway) instead of using the anchor — a sibling
11+
# extra_hosts key would silently shadow the anchor's list.
12+
x-host-gateway: &host_gateway
13+
extra_hosts:
14+
- "host.docker.internal:host-gateway"
15+
516
services:
617
# Backend service
718
backend:
19+
<<: *host_gateway
820
image: unstract/backend:${VERSION}
921
container_name: unstract-backend
1022
restart: unless-stopped
@@ -34,12 +46,10 @@ services:
3446
- traefik.enable=true
3547
- traefik.http.routers.backend.rule=Host(`frontend.unstract.localhost`) && (PathPrefix(`/api/v1`) || PathPrefix(`/deployment`) || PathPrefix(`/public`))
3648
- traefik.http.services.backend.loadbalancer.server.port=8000
37-
extra_hosts:
38-
# "host-gateway" is a special string that translates to host docker0 i/f IP.
39-
- "host.docker.internal:host-gateway"
4049

4150
# Celery worker for dashboard metrics processing
4251
worker-metrics:
52+
<<: *host_gateway
4353
image: unstract/backend:${VERSION}
4454
container_name: unstract-worker-metrics
4555
restart: unless-stopped
@@ -61,6 +71,7 @@ services:
6171
# Processes post-execution callbacks via InternalAPIClient (no Django).
6272
# Handles: ide_index_complete/error, ide_prompt_complete/error.
6373
worker-ide-callback:
74+
<<: *host_gateway
6475
image: unstract/worker-unified:${VERSION}
6576
container_name: unstract-worker-ide-callback
6677
restart: unless-stopped
@@ -82,6 +93,7 @@ services:
8293

8394
# Celery Flower
8495
celery-flower:
96+
<<: *host_gateway
8597
image: unstract/backend:${VERSION}
8698
container_name: unstract-celery-flower
8799
restart: unless-stopped
@@ -105,6 +117,7 @@ services:
105117

106118
# Celery Beat
107119
celery-beat:
120+
<<: *host_gateway
108121
image: unstract/backend:${VERSION}
109122
container_name: unstract-celery-beat
110123
restart: unless-stopped
@@ -152,6 +165,7 @@ services:
152165
- traefik.enable=false
153166

154167
prompt-service:
168+
<<: *host_gateway
155169
image: unstract/prompt-service:${VERSION}
156170
container_name: unstract-prompt-service
157171
restart: unless-stopped
@@ -166,9 +180,6 @@ services:
166180
- ../prompt-service/.env
167181
labels:
168182
- traefik.enable=false
169-
extra_hosts:
170-
# "host-gateway" is a special string that translates to host docker0 i/f IP.
171-
- "host.docker.internal:host-gateway"
172183

173184
x2text-service:
174185
image: unstract/x2text-service:${VERSION}
@@ -184,6 +195,7 @@ services:
184195
- traefik.enable=false
185196

186197
runner:
198+
<<: *host_gateway
187199
image: unstract/runner:${VERSION}
188200
container_name: unstract-runner
189201
restart: unless-stopped
@@ -206,6 +218,7 @@ services:
206218
# ====================================================================
207219

208220
worker-api-deployment-v2:
221+
<<: *host_gateway
209222
image: unstract/worker-unified:${VERSION}
210223
container_name: unstract-worker-api-deployment-v2
211224
restart: unless-stopped
@@ -238,6 +251,7 @@ services:
238251
- ${TOOL_REGISTRY_CONFIG_SRC_PATH}:/data/tool_registry_config
239252

240253
worker-callback-v2:
254+
<<: *host_gateway
241255
image: unstract/worker-unified:${VERSION}
242256
container_name: unstract-worker-callback-v2
243257
restart: unless-stopped
@@ -264,6 +278,7 @@ services:
264278
- ${TOOL_REGISTRY_CONFIG_SRC_PATH}:/data/tool_registry_config
265279

266280
worker-file-processing-v2:
281+
<<: *host_gateway
267282
image: unstract/worker-unified:${VERSION}
268283
container_name: unstract-worker-file-processing-v2
269284
restart: unless-stopped
@@ -316,6 +331,7 @@ services:
316331
- ${TOOL_REGISTRY_CONFIG_SRC_PATH}:/data/tool_registry_config
317332

318333
worker-general-v2:
334+
<<: *host_gateway
319335
image: unstract/worker-unified:${VERSION}
320336
container_name: unstract-worker-general-v2
321337
restart: unless-stopped
@@ -343,6 +359,7 @@ services:
343359
- ${TOOL_REGISTRY_CONFIG_SRC_PATH}:/data/tool_registry_config
344360

345361
worker-notification-v2:
362+
<<: *host_gateway
346363
image: unstract/worker-unified:${VERSION}
347364
container_name: unstract-worker-notification-v2
348365
restart: unless-stopped
@@ -391,6 +408,7 @@ services:
391408
- ${TOOL_REGISTRY_CONFIG_SRC_PATH}:/data/tool_registry_config
392409

393410
worker-log-consumer-v2:
411+
<<: *host_gateway
394412
image: unstract/worker-unified:${VERSION}
395413
container_name: unstract-worker-log-consumer-v2
396414
restart: unless-stopped
@@ -440,6 +458,7 @@ services:
440458
- ${TOOL_REGISTRY_CONFIG_SRC_PATH}:/data/tool_registry_config
441459

442460
worker-log-history-scheduler-v2:
461+
<<: *host_gateway
443462
image: unstract/worker-unified:${VERSION}
444463
container_name: unstract-worker-log-history-scheduler-v2
445464
restart: unless-stopped
@@ -463,6 +482,7 @@ services:
463482
- traefik.enable=false
464483

465484
worker-scheduler-v2:
485+
<<: *host_gateway
466486
image: unstract/worker-unified:${VERSION}
467487
container_name: unstract-worker-scheduler-v2
468488
restart: unless-stopped
@@ -507,6 +527,7 @@ services:
507527
- ${TOOL_REGISTRY_CONFIG_SRC_PATH}:/data/tool_registry_config
508528

509529
worker-executor-v2:
530+
<<: *host_gateway
510531
image: unstract/worker-unified:${VERSION}
511532
container_name: unstract-worker-executor-v2
512533
restart: unless-stopped

run-platform.sh

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,28 @@ check_dependencies() {
3333
echo "$red_text""docker not found. Exiting.""$default_text"
3434
exit 1
3535
fi
36+
if ! docker info >/dev/null 2>&1; then
37+
echo "$red_text""Cannot connect to the Docker daemon.""$default_text"
38+
case "$(uname -s)" in
39+
Linux*)
40+
echo " On Linux (daemon access via the 'docker' group):"
41+
echo " - Check group membership: getent group docker"
42+
echo " - Add your user to it: sudo usermod -aG docker \$USER"
43+
echo " - Activate in current shell: newgrp docker"
44+
echo " - For new shells, a full desktop logout (not just terminal close) is required."
45+
;;
46+
Darwin*)
47+
echo " On macOS: ensure Docker Desktop is running (whale icon in the menu bar)."
48+
;;
49+
MINGW*|MSYS*|CYGWIN*)
50+
echo " On Windows: ensure Docker Desktop is running and WSL integration is enabled if applicable."
51+
;;
52+
*)
53+
echo " Ensure the Docker daemon is running and your user can reach its socket."
54+
;;
55+
esac
56+
exit 1
57+
fi
3658
# For 'docker compose' vs 'docker-compose', see https://stackoverflow.com/a/66526176.
3759
docker compose >/dev/null 2>&1
3860
if [ $? -eq 0 ]; then

unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/ollama.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
"type": "string",
2424
"title": "Base URL",
2525
"default": "",
26-
"description": "Provide the base URL where Ollama server is running. Example: `http://docker.host.internal:11434` or `http://localhost:11434`"
26+
"description": "Provide the base URL where Ollama server is running. Example: `http://host.docker.internal:11434` or `http://localhost:11434`"
2727
},
2828
"max_retries": {
2929
"type": "number",

unstract/sdk1/src/unstract/sdk1/adapters/llm1/static/ollama.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
"type": "string",
2424
"title": "Base URL",
2525
"default": "",
26-
"description": "Provide the base URL where Ollama server is running. Example: http://docker.host.internal:11434 or http://localhost:11434"
26+
"description": "Provide the base URL where Ollama server is running. Example: http://host.docker.internal:11434 or http://localhost:11434"
2727
},
2828
"max_tokens": {
2929
"type": "number",

unstract/sdk1/src/unstract/sdk1/file_storage/env_helper.py

Lines changed: 36 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -31,22 +31,42 @@ def get_storage(storage_type: StorageType, env_name: str) -> FileStorage:
3131
FileStorage: FIleStorage instance initialised using the provider
3232
and credentials configured in the env
3333
"""
34+
raw = os.environ.get(env_name)
35+
if not raw:
36+
raise FileStorageError(
37+
f"Required env var '{env_name}' is unset or empty. "
38+
f"Expected JSON config of the form: {EnvHelper.ENV_CONFIG_FORMAT}"
39+
)
40+
try:
41+
file_storage_creds = json.loads(raw)
42+
except json.JSONDecodeError as e:
43+
raise FileStorageError(
44+
f"Env var '{env_name}' is not valid JSON: {e}. "
45+
f"Expected: {EnvHelper.ENV_CONFIG_FORMAT}"
46+
) from e
47+
if not isinstance(file_storage_creds, dict):
48+
raise FileStorageError(
49+
f"Env var '{env_name}' must be a JSON object. "
50+
f"Expected: {EnvHelper.ENV_CONFIG_FORMAT}"
51+
)
3452
try:
35-
file_storage_creds = json.loads(os.environ.get(env_name, ""))
3653
provider = FileStorageProvider(file_storage_creds[CredentialKeyword.PROVIDER])
37-
credentials = file_storage_creds.get(CredentialKeyword.CREDENTIALS, {})
38-
if storage_type == StorageType.PERMANENT:
39-
file_storage = PermanentFileStorage(provider=provider, **credentials)
40-
elif storage_type == StorageType.SHARED_TEMPORARY:
41-
file_storage = SharedTemporaryFileStorage(
42-
provider=provider, **credentials
43-
)
44-
else:
45-
raise NotImplementedError()
46-
return file_storage
47-
except KeyError as e:
48-
logger.error(f"Required credentials are missing in the env: {str(e)}")
54+
except (KeyError, ValueError) as e:
55+
logger.error(f"Invalid storage configuration in env: {str(e)}")
4956
logger.error(f"The configuration format is {EnvHelper.ENV_CONFIG_FORMAT}")
50-
raise e
51-
except FileStorageError as e:
52-
raise e
57+
raise FileStorageError(
58+
f"Invalid storage configuration in env var '{env_name}': {e}. "
59+
f"Expected: {EnvHelper.ENV_CONFIG_FORMAT}"
60+
) from e
61+
credentials = file_storage_creds.get(CredentialKeyword.CREDENTIALS, {})
62+
if not isinstance(credentials, dict):
63+
raise FileStorageError(
64+
f"Env var '{env_name}' field '{CredentialKeyword.CREDENTIALS}' "
65+
f"must be a JSON object. Expected: {EnvHelper.ENV_CONFIG_FORMAT}"
66+
)
67+
if storage_type == StorageType.PERMANENT:
68+
return PermanentFileStorage(provider=provider, **credentials)
69+
elif storage_type == StorageType.SHARED_TEMPORARY:
70+
return SharedTemporaryFileStorage(provider=provider, **credentials)
71+
else:
72+
raise NotImplementedError()

workers/sample.env

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -316,6 +316,12 @@ UNSTRACT_RUNNER_API_BACKOFF_FACTOR=3
316316
WORKFLOW_EXECUTION_FILE_STORAGE_CREDENTIALS='{"provider": "minio", "credentials": {"endpoint_url": "http://unstract-minio:9000", "key": "minio", "secret": "minio123"}}'
317317
API_FILE_STORAGE_CREDENTIALS='{"provider": "minio", "credentials": {"endpoint_url": "http://unstract-minio:9000", "key": "minio", "secret": "minio123"}}'
318318

319+
# Remote storage for Prompt Studio / IDE flows. Must match backend/sample.env.
320+
# Required by executor and ide-callback workers; missing/empty values raise FileStorageError in EnvHelper.get_storage().
321+
PERMANENT_REMOTE_STORAGE='{"provider": "minio", "credentials": {"endpoint_url": "http://unstract-minio:9000", "key": "minio", "secret": "minio123"}}'
322+
TEMPORARY_REMOTE_STORAGE='{"provider": "minio", "credentials": {"endpoint_url": "http://unstract-minio:9000", "key": "minio", "secret": "minio123"}}'
323+
REMOTE_PROMPT_STUDIO_FILE_PATH=unstract/prompt-studio-data
324+
319325
# File Execution Configuration
320326
WORKFLOW_EXECUTION_DIR_PREFIX=unstract/execution
321327
API_EXECUTION_DIR_PREFIX=unstract/api

0 commit comments

Comments
 (0)