Skip to content

Latest commit

 

History

History
134 lines (94 loc) · 4.33 KB

File metadata and controls

134 lines (94 loc) · 4.33 KB

Parallel apply design

HostKit plans are deterministic data. Parallel apply should consume that data through an explicit execution graph rather than introducing hidden async behavior inside resource applicators.

This note describes the intended direction. It is not an implementation contract yet.

Goals

  • Preserve the existing Project -> Plan -> Apply model.
  • Keep %HostKit.Plan{} inspectable and deterministic.
  • Use HostKit.Plan.ExecutionGraph as the scheduler input.
  • Make --parallel 1 equivalent to today's serial apply semantics.
  • Make --parallel N execute only dependency-ready changes concurrently.
  • Keep progress mailbox-first through %HostKit.Apply.Event{}.
  • Keep down/rollback as ordinary down plans; graph construction handles reversed dependencies.

Non-goals

  • Do not start tasks opportunistically inside individual resource apply functions.
  • Do not hide deploy engines behind recipes or providers.
  • Do not make telemetry the primary apply API.
  • Do not require plan artifacts to embed scheduler state.

Execution graph

HostKit.Plan.ExecutionGraph.build/2 derives graph nodes from active create/update/delete changes and labels ordering edges with stable reasons such as:

  • declared depends_on,
  • parent directory,
  • owner/group account,
  • source input,
  • symlink target path,
  • systemd timer/service,
  • systemd file/path references,
  • readiness checks.

The graph also computes topological layers and cycle diagnostics. A future scheduler should reject cyclic graphs before any resource is applied.

flowchart TD
  Plan[HostKit.Plan] --> Graph[Plan.ExecutionGraph]
  Graph --> Validate[acyclic?]
  Validate --> Ready[ready node queue]
  Ready --> Workers[bounded workers]
  Workers --> Events[Apply.Event timeline]
  Workers --> Results[apply results]
Loading

Scheduler model

A scheduler can maintain:

  • pending nodes,
  • running nodes,
  • completed nodes,
  • failed/skipped nodes,
  • dependency counts,
  • lock ownership.

At each step, it selects nodes whose dependencies are complete and whose locks are available, then runs up to the configured concurrency limit.

Conceptually:

HostKit.apply(plan, parallel: 1) # current serial behavior
HostKit.apply(plan, parallel: 8) # graph-scheduled behavior

The CLI shape can mirror that:

mix host_kit.apply --plan host_kit.plan.json --confirm --parallel 8

Locks and serialization

Dependency edges alone are not enough. Some operations must be serialized even if they do not depend on the same resource.

Initial lock groups should likely include:

  • package manager operations,
  • systemd daemon reload and unit operations,
  • same exact filesystem path,
  • parent/child filesystem path mutations when not already expressed by edges,
  • same systemd unit,
  • commands that declare overlapping outputs/stamps.

Locks should be derived and inspectable rather than scattered across resource apply code.

Failure behavior

A failed node should prevent dependent nodes from running. Independent nodes may either continue or be cancelled depending on policy.

Potential policies:

  • fail_fast: true — stop scheduling new work after first failure.
  • fail_fast: false — continue independent branches and report blocked dependents.

The default should remain conservative until dogfooding proves otherwise.

Events

Parallel apply needs timeline-capable events. Existing mailbox-first progress should remain primary:

%HostKit.Apply.Event{}

Future event payloads may need:

  • graph node id,
  • resource id,
  • action,
  • worker id,
  • started/completed timestamps,
  • dependency-blocked/skipped status,
  • lock wait metadata.

Telemetry may mirror these events, but callers should not need telemetry to observe apply progress.

Down plans

Down/rollback remains a plan. HostKit.Plan.down/2 returns %HostKit.Plan{} and HostKit.Plan.ExecutionGraph.build/2 already reverses dependency direction for delete changes.

A scheduler should not need a rollback-specific graph type.

Rollout path

  1. Keep graph construction and --show-graph as inspection tools.
  2. Expand graph derivation through dogfooding.
  3. Add scheduler tests against synthetic plans.
  4. Add parallel: 1 scheduler path and prove it matches current serial behavior.
  5. Add bounded parallel: N execution behind explicit opt-in.
  6. Dogfood on local/safe targets before enabling for remote production use.