Skip to content

Add durable per-worker num and execution context#61

Open
udipl wants to merge 5 commits into
socketry:mainfrom
udipl:worker-instance-num-context
Open

Add durable per-worker num and execution context#61
udipl wants to merge 5 commits into
socketry:mainfrom
udipl:worker-instance-num-context

Conversation

@udipl

@udipl udipl commented Jun 1, 2026

Copy link
Copy Markdown

Problem

A container spawns workers but exposes nothing to a worker about which worker it is. The Child::Instance handed to the run block carries only name. Under a real deployment (Rails on Falcon, where the app never calls container.run itself), code that needs a stable per-worker identifier has nothing to key on.

The concrete driver is GitLab's prometheus-client-mmap, whose multiprocess mode writes one set of mmap files per process, keyed by a configurable pid_provider. For that keying to behave, the identifier needs two properties at once:

  • Durable — a worker keeps the same identifier across a restart, so its metrics carry over a re-fork instead of fragmenting into a fresh file each time.
  • Bounded in cardinality — drawn from a small fixed set (0..N-1) rather than the open-ended space of OS PIDs, so the number of mmap files (and the metric series they back) stays constant instead of growing with every respawn.

A worker ordinal that is recycled on restart and reused on re-fork satisfies both; a PID satisfies neither.

The natural access point already exists: async-service's Managed::Environment#prepare!(instance) is called with the container instance at worker entry. What's missing is a durable ordinal on that instance, and — for Hybrid — a way to reach the process number, since Hybrid#run yields the inner thread instance to the block (the fork instance is consumed internally), so the thread worker can't see its fork's number.

What this adds

container.spawn do |instance|
  instance.num      # container-scoped ordinal (Integer), stable across restart
  instance.kind     # :process | :thread
  instance.parent   # the worker this is nested inside (a Hybrid thread => its fork), else nil
  instance.context  # [Frame(:process, 3, name), Frame(:thread, 0, name)] for a Hybrid thread
end

child.instance_num  # parent side: the ordinal of the worker a child represents

Async::Container::Frame = Data.define(:kind, :num, :name). context is built from the object graph (instance + parent chain) — no process- or thread-global state.

Implementation

  • Generic — container-scoped allocator: a monotonic counter plus a Set free-list. acquire reuses the lowest released num; release adds to the set (so a double-release can't hand the same num to two workers). spawn allocates before the fiber so the num is captured in the closure and is unchanged when a restart: true worker re-enters start; it releases in the fiber's ensure, only on permanent exit, and only for nums it allocated. Allocation runs on the single reactor thread, so no synchronisation.
  • context.rb (new) — Frame and a Context mixin (parent accessor + recursive context), included into each Child::Instance.
  • Forked / Threadedinstance_num threaded through startChild.forkInstance.for; Instance#num, Instance#kind, num added to as_json; Child#instance_num on the parent side. Signal and handle_interrupt paths are unchanged.
  • HybridHybrid#run sets worker.parent = <fork instance> on each inner thread worker, so a Hybrid thread's context is [process, thread] and its durable fork number is instance.parent.num.

Tests

  • Allocator: sequential allocation, lowest-num reuse after release, idempotent double-release, no allocation for a mark?-reused keyed child, num/kind visible in Forked (:process) and Threaded (:thread) workers, num preserved across a restart.
  • Context: [process] / [thread] with parent == nil for plain containers; [process, thread] under Hybrid; a 2-fork Hybrid where both workers are thread/0 but reach distinct parent nums process/0 / process/1 — i.e. parent.num is the fork number, not the thread number.

Scope / notes

  • nums are container-global, assigned in spawn order — not per-service 0..N-1. Compose with name at a higher layer for per-service numbering.
  • The exec path (Forked::Child.spawn) doesn't carry a num — it bypasses Instance.for. Left as-is.
  • Reuse can hand a dead worker's num to a new worker within one container lifetime; consumers needing exact isolation should fold a generation token into their key.
  • Deliberately no module-level/ambient lookup here. An Async::Container.context convenience for code with no instance handle could be a separate follow-up; it would need a process-global + thread-variable and isn't required for the prepare!-based use case above.

🤖 Generated with Claude Code

Containers expose nothing to a worker about which worker it is; the
Child::Instance handed to the run block carries only `name`. Code that
needs a stable per-worker identifier (e.g. a prometheus-client-mmap
pid_provider, which keys mmap files per process) has nothing to key on,
and under Falcon/async-service the app never holds the run block itself.

Such an identifier needs to be both durable across a restart (so metrics
survive a re-fork instead of fragmenting) and bounded in cardinality
(drawn from 0..N-1 rather than the open-ended PID space, so the file and
series count stays constant). A recycled worker ordinal satisfies both.

Add a container-scoped `num` allocated by Generic (a counter plus a Set
free-list; idempotent release), captured in the spawn closure so it is
unchanged when a `restart: true` worker re-enters `start`. Expose `num`
and `kind` on Child::Instance, `instance_num` on the parent-side Child,
and a `parent` link plus a `context` Frame stack built from the object
graph. Hybrid links each inner thread worker to its fork, so a Hybrid
thread can reach its durable process num via `instance.parent.num` with
no process- or thread-global state.
@samuel-williams-shopify

samuel-williams-shopify commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

I understand the problem, and I'm broadly open to exposing some kind of worker identity. I do want to push back slightly on using that identity as the primary mechanism for coordinating external resources.

Coordinating resources this way is error prone because the source of truth is not the consumer of the data. We see similar classes of problems with process IDs, where kill/waitpid style coordination can race and accidentally target the wrong process.

An ordinal (num in the original commit) gives us a stable piece of topology state, and that can be useful. But I think it's important to distinguish "worker identity/topology" from "resource allocation". For resources like ports, files, mmap regions, etc., the more robust model is usually an explicit lease from the component that owns the resource.

For example, in async-utilization, workers publish metrics to a memory-mapped file, but a central coordinator owns the allocation:

  1. When a worker starts, it connects to the monitor and registers itself.
  2. The monitor allocates a memory-mapped region and tells the worker where to write.
  3. The worker updates that region.
  4. If the worker exits or crashes, the connection is disrupted, so the region can be reclaimed and reused.
  5. If the supervisor crashes, workers reconnect/register again and the allocation state is rebuilt.

That model makes the coordinator the source of truth for both allocation and reclamation.

So my concern with this proposal is not the ordinal itself, but using implicit container state as a resource-allocation mechanism. I'm open to exposing a minimal ordinal (and possibly parent for Hybrid) as worker topology metadata, but I'd prefer we avoid growing this into a general context/resource-allocation API.

With that in mind, I think the smaller API of instance.ordinal plus instance.parent is closer to what I'd be comfortable merging, provided it does not over complicate the implementation and constrain us in the future as the internals evolve. As an alternative for the hybrid container, we could use nested ordinals, e.g. 0.0, 0.1 etc. This would be a bit easier to coordinate.

@samuel-williams-shopify samuel-williams-shopify force-pushed the worker-instance-num-context branch from 34db040 to cf47c83 Compare June 26, 2026 01:18
Assisted-By: devx/c78f867c-4c73-40b2-a763-4a9332e15ef9
@samuel-williams-shopify samuel-williams-shopify force-pushed the worker-instance-num-context branch from cf47c83 to 3be05d0 Compare June 26, 2026 01:24
Assisted-By: devx/c78f867c-4c73-40b2-a763-4a9332e15ef9
Assisted-By: devx/c78f867c-4c73-40b2-a763-4a9332e15ef9
Assisted-By: devx/c78f867c-4c73-40b2-a763-4a9332e15ef9
@samuel-williams-shopify

Copy link
Copy Markdown
Contributor

There is one more point worth mentioning. Async::Container::Controller supports blue-green restarts and during this time, ordinals would overlap, i.e. you'd have the old process and new process with potentially the same ordinals. Depending on the usage, this may be desirable or undesirable, e.g. if you used ordinals to control port allocation, you probably want to reuse the same port... for resources like memory mapped files, maybe not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants