Skip to content

Worker: allow multiple workers to safely share a repo directory #1414

@stuartc

Description

@stuartc

Problem

If you set WORKER_REPO_DIR to a path on a shared volume (e.g. an NFS mount, k8s PVC, or a shared docker volume), the intent is obvious: persist the adaptor
cache so workers don't re-download everything on every restart, and share it
across multiple workers so we're not duplicating gigabytes per pod.

This almost works today. A single worker pointed at a persistent volume
behaves correctly — restarts are fast, no redundant installs. But the moment
you point more than one worker at the same volume, things go sideways:

  • The in-memory install queue in engine-multi only serialises installs within
    one process. Two workers seeing the same uninstalled adaptor will both shell
    out to npm install against the same node_modules/ and package.json.
  • npm has no awareness of other npm processes writing the same tree. Result:
    corrupt node_modules, half-written package.json, runs failing with
    module resolution errors that are a pain to diagnose (see Autoinstall is breaking ? #503).

What we want

Multiple workers should be able to share one repo directory safely. The common
case (adaptor already installed) should stay fast — no coordination overhead
when there's nothing to do. The common but only once case (something needs installing)
should end up with a single install across all the workers sharing the volume,
with the others waiting.

Crash safety matters: if a worker dies mid-install, the next one shouldn't be
stuck forever waiting on a ghost.

Out of scope

  • Anything that requires external coordination (Redis, etcd, etc.). Keep it
    filesystem-only so it works wherever a shared volume works.
  • Changing the CLI's behaviour.
  • Garbage-collecting old adaptor versions from the cache.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    New Issues

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions