Worker: allow multiple workers to safely share a repo directory

## Problem

If you set `WORKER_REPO_DIR` to a path on a shared volume (e.g. an NFS mount, k8s PVC, or a shared docker volume), the intent is obvious: persist the adaptor
cache so workers don't re-download everything on every restart, and share it
across multiple workers so we're not duplicating gigabytes per pod.

This *almost* works today. A single worker pointed at a persistent volume
behaves correctly — restarts are fast, no redundant installs. But the moment
you point more than one worker at the same volume, things go sideways:

- The in-memory install queue in engine-multi only serialises installs within
  one process. Two workers seeing the same uninstalled adaptor will both shell
  out to `npm install` against the same `node_modules/` and `package.json`.
- npm has no awareness of other npm processes writing the same tree. Result:
  corrupt `node_modules`, half-written `package.json`, runs failing with
  module resolution errors that are a pain to diagnose (see #503).

## What we want

Multiple workers should be able to share one repo directory safely. The common
case (adaptor already installed) should stay fast — no coordination overhead
when there's nothing to do. The common but only once case (something needs installing)
should end up with a single install across all the workers sharing the volume,
with the others waiting.

Crash safety matters: if a worker dies mid-install, the next one shouldn't be
stuck forever waiting on a ghost.

## Out of scope

- Anything that requires external coordination (Redis, etcd, etc.). Keep it
  filesystem-only so it works wherever a shared volume works.
- Changing the CLI's behaviour.
- Garbage-collecting old adaptor versions from the cache.

## Related

- #919 — ephemeral storage pressure in workers. A shared persistent cache is
  one way to take pressure off pod-local disk, so this work helps there too.
- #73 — closed, but the in-process precedent. This is essentially the
  multi-process / multi-host version of that.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worker: allow multiple workers to safely share a repo directory #1414

Problem

What we want

Out of scope

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Worker: allow multiple workers to safely share a repo directory #1414

Description

Problem

What we want

Out of scope

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions