Skip to content

Fix remote-ingest worker: GPU acceleration docs, Celery prereqs, embedder env-var bug#2069

Open
JSv4 wants to merge 1 commit into
mainfrom
feature/remote-ingest-doc-fixes
Open

Fix remote-ingest worker: GPU acceleration docs, Celery prereqs, embedder env-var bug#2069
JSv4 wants to merge 1 commit into
mainfrom
feature/remote-ingest-doc-fixes

Conversation

@JSv4

@JSv4 JSv4 commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Bug fix: API_KEYVECTOR_EMBEDDER_API_KEY in scripts/remote_ingest/remote_worker.yml — the embedder image reads VECTOR_EMBEDDER_API_KEY but the compose file was setting API_KEY, causing HTTP 401 on every embedding call. Documents landed with no embeddings and no obvious error.
  • New doc docs/upload_methods/worker_celery_setup.md: ops reference for the Celery workers + queues a target must run so worker uploads become documents. Covers the Bump postgres from 14.5 to 15.0 in /compose/production/postgres #1 silent failure (uploads accepted as 202 but stay PENDING forever when no worker consumes worker_uploads), Beat schedule, scaling/HA, key settings, verification commands.
  • GPU acceleration docs: how to merge the accel override for the remote worker, NVIDIA/ROCm device-passthrough instructions with commented stanzas in remote_worker.accel.yml and compose/accelerated/accel.override.yml, note to benchmark on your specific hardware.
  • RAM sizing: CPU OCR is serial + 3-6 GB per in-flight parse; size --max-workers to available RAM, not CPU count.
  • Mint token fix: examples now show the command runs inside the Django container (docker compose run --rm django python manage.py mint_worker_token …).
  • Troubleshooting tables added to both READMEs and remote_ingest_worker.md.
  • Minor: --insecure/--max-attempts flags documented, bench_parse.py build-context comment fixed, Intel NPU device note added.

Test plan

  • Verify remote_worker.yml embedder service picks up VECTOR_EMBEDDER_API_KEY (not API_KEY)
  • Read through new worker_celery_setup.md for accuracy against config/settings/base.py Beat schedule and task routes
  • Confirm mkdocs.yml nav entry and docs/upload_methods/index.md link resolve correctly in the docs build

…var bug

- Fix silent embedding failure: `API_KEY` env var renamed to
  `VECTOR_EMBEDDER_API_KEY` in `scripts/remote_ingest/remote_worker.yml`
  to match what the embedder image actually reads (mismatch caused HTTP 401
  on every embed, landing docs with no embeddings and no obvious error)

- New doc `docs/upload_methods/worker_celery_setup.md`: ops reference for
  the Celery workers + queues a target must run so worker uploads become
  documents (the #1 silent failure: uploads accepted as 202 but stay PENDING
  forever when no worker consumes the `worker_uploads` queue)

- Add GPU acceleration section to `remote_ingest_worker.md` and
  `scripts/remote_ingest/README.md`: how to merge the accel override,
  NVIDIA/ROCm device-passthrough instructions (commented stanzas in both
  `remote_worker.accel.yml` and `compose/accelerated/accel.override.yml`),
  and a reminder to benchmark on your specific hardware

- RAM sizing guidance: CPU OCR is serial + 3-6 GB per in-flight parse;
  size `--max-workers` to available RAM, not CPU count

- Fix `mint_worker_token` examples to show it runs inside the Django
  container (`docker compose run --rm django python manage.py …`)

- Clarify `VECTOR_EMBEDDER_API_KEY`: one value wired to both the embedder
  service and the worker; mismatch → HTTP 401; any value works as long as
  both sides match

- Document `--insecure` and `--max-attempts` flags in the flag reference

- Add troubleshooting tables to both READMEs and `remote_ingest_worker.md`

- Fix `bench_parse.py` build-context comment (correct `-f` + context path)

- Add Intel NPU `/dev/accel/accel0` note to `compose/accelerated/README.md`

Closes #2067
@claude

claude Bot commented Jun 25, 2026

Copy link
Copy Markdown

Code Review

This PR does three things: fixes the API_KEYVECTOR_EMBEDDER_API_KEY env-var typo that caused silent HTTP 401 failures on every embedding call, adds a new Celery ops reference doc, and expands the GPU acceleration / troubleshooting docs. The documentation content is accurate — Beat schedules, queue routing, Celery settings, and scaling guidance all match the actual config/settings/base.py. A few actionable items below.


1. Empty-string default for VECTOR_EMBEDDER_API_KEY may still cause 401 when the variable is unset

scripts/remote_ingest/remote_worker.yml — embedder service

# current (after fix)
VECTOR_EMBEDDER_API_KEY: ${VECTOR_EMBEDDER_API_KEY:-}

The fix correctly wires the right env-var name to the embedder container, but the fallback is an empty string. If the embedder's entrypoint resolves the key with a falsy-default pattern (os.getenv("VECTOR_EMBEDDER_API_KEY") or "abc123" rather than os.getenv(..., "abc123")), an empty string in the container is still falsy, so the embedder silently resets to "abc123". The worker container gets "" from the same shell fallback, sends X-API-Key: "", and hits 401 — the original failure mode, just in a narrower scenario (only when the user forgets to export the variable).

compose/accelerated/accel.override.yml already uses the safer default:

VECTOR_EMBEDDER_API_KEY: ${VECTOR_EMBEDDER_API_KEY:-abc123}

Using the same :-abc123 fallback in remote_worker.yml (and remote_worker.accel.yml, which also uses :-) aligns all three files with the embedder image's documented default, making the "forgot to export" path safe regardless of which Python resolution pattern the embedder uses.


2. RENDER_GID export appears before the NVIDIA/AMD warning in the GPU section

docs/upload_methods/remote_ingest_worker.md — GPU acceleration (optional)

The section opens with a runnable code block:

export RENDER_GID=$(stat -c '%g' /dev/dri/renderD128)   # Intel/AMD; host-specific
docker compose -f remote_worker.yml -f remote_worker.accel.yml build …

The "Intel/AMD; host-specific" comment is easy to miss inside a code block. The prose warning that NVIDIA/AMD/ROCm hosts must edit the device passthrough comes after the block. On an NVIDIA host, /dev/dri/renderD128 may not exist, so the stat call silently produces 0 (or errors), and the user continues with a broken RENDER_GID. Moving the NVIDIA caveat to before the code block — or splitting into two clearly-labelled code blocks ("Intel/AMD" vs "NVIDIA / AMD ROCm") — would prevent the confusion.


3. --max-attempts flag documented in scripts README but not in the docs-site page

docs/upload_methods/remote_ingest_worker.md adds --insecure to its flag summary but omits the --max-attempts N flag that scripts/remote_ingest/README.md documents in the same PR. The two pages don't need to be identical, but a user who reads only the docs-site page will not know that stuck-PARKED documents are controlled by this flag, or that a PARKED status exists at all. Adding a one-liner for --max-attempts (or a note that PARKED is the terminal retry state) closes the gap.


Minor nit — accel.override.yml comment structure for AMD

In both remote_worker.accel.yml and compose/accelerated/accel.override.yml, the AMD stanza comment shows:

#   group_add: ["video", "render"]

while the surrounding file uses block-style lists (- item). When a user uncomments this line it will work (YAML flow style is valid), but it can look inconsistent next to the existing group_add in the base file. A block-style comment (# - "video" / # - "render") would match the rest of the file's formatting and reduce copy-paste confusion.


Summary

The core bug fix is correct and the docs content is accurate. Items 1 and 2 above are the most actionable: Item 1 is a latent regression risk (wrong-key-when-unset in a narrower scenario than the original bug); Item 2 is a usability trap for NVIDIA users following the GPU section step by step.

@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant