Skip to content

feat(longhorn): generalize install for non-AWS providers#270

Open
aktech wants to merge 5 commits into
mainfrom
feat/longhorn-shared-package
Open

feat(longhorn): generalize install for non-AWS providers#270
aktech wants to merge 5 commits into
mainfrom
feat/longhorn-shared-package

Conversation

@aktech
Copy link
Copy Markdown
Member

@aktech aktech commented Apr 28, 2026

Closes #269.

longhorn

Moves the Longhorn install out of the AWS provider into pkg/storage/longhorn/. The existing and hetzner providers now accept a longhorn: block with the same shape AWS already exposes.

cluster:
  existing:        # also: hetzner
    longhorn: {}   # defaults: enabled, 2 replicas

Default-enabled on AWS and Hetzner (both provision the cluster — RWO-only EBS / hcloud-volumes need an RWX class for shared workloads). Opt-in on existing (omitted block = no install) since the user's pre-existing cluster may already have its own StorageClass. enabled: false opts out anywhere.

Validated end-to-end on a Hetzner k3s cluster: nic deploy provisioned the iSCSI DaemonSet and the Longhorn release, downstream RWX PVCs bound to a Longhorn share-manager, and JupyterHub user pods mounted shared-storage cross-node — replacing the in-cluster NFS-on-RWO workaround that fails on overlayfs.

Lift the Longhorn install logic out of the AWS provider into a shared
package so the existing-cluster and hetzner providers can opt in. On
clusters without managed RWX (Hetzner hcloud-volumes is RWO-only,
on-prem k3s, kind/k3d), charts that need RWX previously had to fall
back to the in-cluster NFS-on-RWO workaround, which is fragile on
overlayfs hosts. Closes #269.

Shared package pkg/storage/longhorn:
- Install(ctx, kubeconfigBytes, *Config) — idempotent helm install or
  upgrade. iSCSI prerequisite DaemonSet runs on first install only;
  upgrade path skips it (saves the 3-minute readiness wait).
- Config + IsEnabled / Replicas helpers. nil *Config = do not install.

aws provider:
- Longhorn field type is now *longhorn.Config; aws.LonghornConfig is a
  type alias so existing yaml under cluster.aws.longhorn keeps working.
- LonghornEnabled() retains AWS's nil-block-as-enabled default.
- Provider.Deploy calls longhorn.Install directly.

existing provider:
- New opt-in longhorn block (nil = no install).
- Deploy installs Longhorn when enabled.
- GetStorageClass / InfraSettings.StorageClass returns 'longhorn' when
  Longhorn is enabled and storage_class is unset; explicit
  storage_class always wins.

hetzner provider:
- Same opt-in. Deploy reads the kubeconfig hetzner-k3s wrote and
  installs Longhorn after cluster create.
- InfraSettings.StorageClass flips from hcloud-volumes to longhorn
  when enabled.
@aktech aktech force-pushed the feat/longhorn-shared-package branch from 82ca29a to b798497 Compare April 28, 2026 13:59
@aktech aktech marked this pull request as ready for review April 30, 2026 11:06
aktech added 3 commits April 30, 2026 12:38
Inline the two construction sites — the helpers hid no unexported fields
(longhorn.Config exports all four) and the inline form is shorter.
Drop the duplicate storageClassLonghorn const now that the shared
package exports the same value.
hcloud-volumes is RWO-only, so charts that need RWX (jupyterhub
shared-storage, etc.) require Longhorn. Match the AWS pattern: install
unless 'longhorn.enabled: false' is set explicitly.
@viniciusdc viniciusdc added the needs: review 👀 This PR is complete and ready for reviewing label Apr 30, 2026
WithAction("installed").
WithMetadata("chart_version", ChartVersion))

return nil
Copy link
Copy Markdown
Contributor

@tylerpotts tylerpotts May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing an earlier comment of mine on this same hunk that suggested flipping defaultClass to false — that was the wrong direction; I deleted it and am posting this corrected version.

Hardcoding persistence.defaultClass: true is correct: when a user opts into Longhorn, Longhorn should be the cluster's default StorageClass. The actual gap is that this install path doesn't displace the previous default. On any cluster that already ships a default SC (Hetzner's hcloud-volumes, vSphere's thin, etc.), Longhorn's chart annotates longhorn as default but leaves the prior default's annotation in place, so two SCs end up flagged as default at once — which Kubernetes treats as undefined behavior (API server picks one nondeterministically and logs a warning).

Observed on a fresh Hetzner k3s cluster provisioned through this code path:

NAME                       PROVISIONER          AGE
hcloud-volumes (default)   csi.hetzner.cloud    12m
longhorn (default)         driver.longhorn.io   11m
longhorn-static            driver.longhorn.io   11m

Downstream PVCs without an explicit storageClassName then get sprayed across both classes by API-server roulette. AWS sidesteps this because gp2 isn't pre-annotated as default, but Hetzner (and most existing on-prem clusters with a pre-installed CSI) hit it.

Suggested fix

After the Helm install succeeds and the longhorn SC is created with the default annotation, list all StorageClasses and strip the default-class annotation from any other SC that still carries it. Existing PVCs are unaffected (they're already bound); only new PVCs without an explicit class are redirected to Longhorn — which is the intended behavior when Longhorn is enabled.

A small helper following the package's existing style (mirroring ensureISCSI / ensureISCSIWithClient):

// stripPreviousDefaultStorageClasses removes the
// `storageclass.kubernetes.io/is-default-class` annotation from every
// StorageClass other than keepName. Kubernetes treats multiple default SCs
// as undefined behavior, so after Longhorn registers itself as default we
// need to displace whatever class previously held that role (e.g.
// hcloud-volumes on Hetzner).
func stripPreviousDefaultStorageClasses(ctx context.Context, kubeconfigBytes []byte, keepName string) error {
	client, err := newK8sClient(kubeconfigBytes)
	if err != nil {
		return fmt.Errorf("failed to create Kubernetes client: %w", err)
	}
	return stripPreviousDefaultStorageClassesWithClient(ctx, client, keepName)
}

func stripPreviousDefaultStorageClassesWithClient(ctx context.Context, client kubernetes.Interface, keepName string) error {
	const defaultAnnotation = "storageclass.kubernetes.io/is-default-class"

	classes, err := client.StorageV1().StorageClasses().List(ctx, metav1.ListOptions{})
	if err != nil {
		return fmt.Errorf("failed to list StorageClasses: %w", err)
	}

	for i := range classes.Items {
		sc := &classes.Items[i]
		if sc.Name == keepName {
			continue
		}
		if sc.Annotations[defaultAnnotation] != "true" {
			continue
		}

		patch := []byte(`{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}`)
		if _, err := client.StorageV1().StorageClasses().Patch(ctx, sc.Name, types.MergePatchType, patch, metav1.PatchOptions{}); err != nil {
			return fmt.Errorf("failed to clear default-class annotation on StorageClass %q: %w", sc.Name, err)
		}

		status.Send(ctx, status.NewUpdate(status.LevelInfo,
			fmt.Sprintf("Cleared default-class annotation from previous default StorageClass %q", sc.Name)).
			WithResource("storageclass").
			WithAction("updating"))
	}

	return nil
}

Then call it in Install right before the success status update — e.g. between the span.SetAttributes block and the status.Send(...LevelSuccess...) call near this line:

if err := stripPreviousDefaultStorageClasses(ctx, kubeconfigBytes, StorageClassName); err != nil {
	span.RecordError(err)
	return fmt.Errorf("failed to clear previous default StorageClass: %w", err)
}

(Worth doing the same in upgrade for clusters that gained Longhorn after a previous default was already in place.)

New imports needed: k8s.io/apimachinery/pkg/types (for MergePatchType); metav1 and kubernetes.Interface are already in the package via iscsi.go, and newK8sClient is reused as-is.

Copy link
Copy Markdown
Contributor

@dcmcand dcmcand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid extraction, ~400 lines of AWS-specific Longhorn install logic moved into pkg/storage/longhorn with the abstraction boundary respected (no provider imports leaking into storage). Tests are thorough: table-driven, fake kubernetes.Interface injection, iSCSI wait/timeout paths covered. The nil-receiver-safe IsEnabled() / Replicas() idiom is nice, and keeping the divergent enabled-by-default policies at the provider layer (rather than baking them into the shared package) is the right call.

Two blockers worth resolving before merge:

  1. longhorn.Install accepts *Config with no documented nil-precondition; span attrs are inconsistent on whether nil is possible.
  2. Hetzner reads the kubeconfig directly from disk while AWS and existing both go through p.GetKubeconfig - inconsistent and fragile.

Plus a few questions about destroy-path symmetry (ADR-0002 requires Longhorn uninstall before cluster teardown, and neither hetzner nor existing wire that up), the iSCSI-on-upgrade skip, and the silent DryRun return on existing.

One doc follow-up: ADR-0002 still says "Longhorn is installed via Helm inside the AWS provider's Deploy()" - worth adding a 2026-04 update or superseding ADR.

Comment thread pkg/storage/longhorn/install.go Outdated
span.SetAttributes(
attribute.String("chart_version", ChartVersion),
attribute.Int("replica_count", cfg.Replicas()),
attribute.Bool("dedicated_nodes", cfg != nil && cfg.DedicatedNodes),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocker. The asymmetry here reveals that cfg could be nil:

attribute.Int("replica_count", cfg.Replicas()),           // nil-safe via receiver
attribute.Bool("dedicated_nodes", cfg != nil && cfg.DedicatedNodes),

In practice all current callers gate on IsEnabled() first, but Install is exported with no documented precondition. A nil cfg would silently install with defaults rather than erroring or no-oping. Either add an explicit early return at the top of Install, or document the precondition with a one-line comment on the function.

Comment thread pkg/provider/hetzner/provider.go Outdated
// CSI is RWO-only; Longhorn provides the RWX StorageClass that downstream
// charts (e.g. jupyterhub shared-storage for group dirs) need.
if hCfg.LonghornEnabled() {
kubeconfigBytes, err := os.ReadFile(kubeconfigPath) //nolint:gosec // Path constructed from known cache dir + project name
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocker. AWS and existing both call p.GetKubeconfig(ctx, projectName, clusterConfig) here; only Hetzner reads kubeconfigPath directly via os.ReadFile. The path-variable approach is fragile to refactors and bypasses the abstraction. Provider.GetKubeconfig is already defined a few lines down (line 337) and does the same read with proper tracing. Use it for consistency.

span.RecordError(err)
return fmt.Errorf("failed to read kubeconfig for Longhorn install: %w", err)
}
if err := longhorn.Install(ctx, kubeconfigBytes, hCfg.Longhorn); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question. ADR-0002 documents that "Longhorn must be uninstalled before the cluster is torn down. PVs backed by Longhorn can block node group deletion." Now that Hetzner installs Longhorn here, the Destroy path (line 245+) needs a symmetric uninstall before tofu destroy to avoid orphaned hcloud-volumes. The shared pkg/storage/longhorn doesn't currently export an Uninstall. Worth adding it and calling it from Destroy (and in existing's Destroy if Longhorn was installed). If asymmetry is intentional, please drop a comment.

Comment thread pkg/provider/aws/config.go Outdated
// LonghornConfig is kept as a package-local alias so existing yaml under
// `longhorn:` still unmarshals into the same shape; the underlying type now
// lives in pkg/storage/longhorn so non-AWS providers can share install logic.
type LonghornConfig = longhorn.Config
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question. The comment says this alias exists "so existing yaml under longhorn: still unmarshals into the same shape", but YAML shape comes from the struct tags on longhorn.Config directly - the alias isn't required for unmarshaling and isn't referenced outside tests. Looks like dead surface area; consider removing.

// open-iscsi/iscsi-initiator-utils (Amazon Linux 2023, k3s minimal images).
func ensureISCSI(ctx context.Context, kubeconfigBytes []byte) error {
tracer := otel.Tracer("nebari-infrastructure-core")
ctx, span := tracer.Start(ctx, "longhorn.ensureISCSI")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question. This skips iSCSI on Helm upgrade because "it was already provisioned on first install". That assumption breaks if someone runs helm uninstall longhorn while leaving the cluster intact - next deploy goes straight to Helm install with no iSCSI. The DaemonSet is idempotent and the wait is only ~3 minutes, so running ensureISCSI on the upgrade path too would be cheap and remove the implicit assumption. Intentional?

attribute.Bool("dry_run", opts.DryRun),
)

if opts.DryRun {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question. Other providers' DryRun paths emit a status update (e.g. hetzner Destroy sends "Dry run: would destroy ..."). Should existing's Deploy DryRun similarly emit something visible like "Dry run: would install Longhorn on existing cluster" rather than returning silently?

@@ -0,0 +1,390 @@
package longhorn
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistic. TestBuildHelmValuesNonDedicatedOmitsNodeSelector could be a row in the main TestBuildHelmValues table with a wantAbsent []string field, putting all the cases in one place. TestBuildHelmValuesDedicatedNodesStructure is fine standalone since the type assertions don't fit the table cleanly.

Comment thread pkg/storage/longhorn/iscsi.go Outdated
// kubernetes.Interface so unit tests can inject a fake client.
func ensureISCSIWithClient(ctx context.Context, client kubernetes.Interface) error {
tracer := otel.Tracer("nebari-infrastructure-core")
ctx, span := tracer.Start(ctx, "longhorn.ensureISCSIWithClient")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistic. Both ensureISCSI (line 89) and ensureISCSIWithClient start spans for the same logical operation. The nested span will appear in traces as a child, which inflates timelines slightly. Either drop the inner span or make it clear the outer one is just a thin wrapper for the kubeconfig to client conversion.

Comment thread pkg/provider/aws/config.go Outdated
// LonghornConfig is kept as a package-local alias so existing yaml under
// `longhorn:` still unmarshals into the same shape; the underlying type now
// lives in pkg/storage/longhorn so non-AWS providers can share install logic.
type LonghornConfig = longhorn.Config
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistic (re: LonghornReplicaCount below, line 63 - outside diff hunk). That method is called when LonghornEnabled() is true, which returns true even when c.Longhorn == nil. Replicas() is nil-safe so this works correctly, but it's non-obvious. A one-line comment on LonghornReplicaCount noting that Replicas() handles the nil case would save the next reader from adding a redundant nil check.

@tylerpotts
Copy link
Copy Markdown
Contributor

Ran into a Longhorn replica-scheduling issue while testing this PR on a fresh AWS cluster (m7i.xlarge, default 20 GB EBS root). Posting in case it informs the defaults.

What happened

Single-node cluster, no longhorn: block in user config → falls back to defaultReplicaCount = 2 (pkg/storage/longhorn/config.go:26). Longhorn defaults replica-soft-anti-affinity to false, so two replicas require two distinct nodes. Result on a 1-node cluster:

  • 3 of 5 PVCs became robustness: degraded, state: attached (1 of 2 replicas scheduled — works, but no redundancy).
  • 2 of 5 became robustness: faulted, state: detached — replica records existed but had no nodeID and no dataPath (never scheduled at all). Hub pod stuck in ContainerCreating indefinitely with AttachVolume.Attach failed ... volume is not ready for workloads.

Scaling the node group to 2 healed the 3 degraded volumes (rebuilt onto the new node). The 2 faulted volumes still couldn't schedule, this time with Scheduled=False reason=ReplicaSchedulingFailure msg=insufficient storage — at 14 GB scheduled / 21 GB total per node, the 25% min-available reserve leaves ~2 GB free, which can't fit the bigger replicas.

Suggestions

A few directions worth considering (not all need to land in this PR):

  1. Reconsider defaultReplicaCount = 2. It silently breaks single-node clusters (kind/k3d dev, minimal Hetzner, AWS with min_nodes: 1). Either default to 1 with a doc note on durability, or auto-pick based on observable node count at install time.
  2. Expose replica-soft-anti-affinity via Config. Lets single-node clusters keep replicaCount: 2 co-located on one node — weaker durability but at least the volume attaches. Today this is hardcoded off via Longhorn defaults.
  3. Document the AWS EBS root-disk size assumption. 20 GB is too small for any non-trivial Longhorn workload. A separate dedicated EBS volume mounted at /var/lib/longhorn is the standard practice; the Config.DedicatedNodes path could grow a dedicated_disk companion.
  4. Surface a helpful error. When cfg.Replicas() > observedNodes at install time, log a warning. The current failure mode is silent until a Pod tries to attach, which is a long time later.

Happy to send a follow-up PR for any of these if useful — let me know which direction you'd want.

Copy link
Copy Markdown
Contributor

@tylerpotts tylerpotts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While debugging an attach/detach loop on jupyterhub/shared-storage in our tyler-hetzner-dev cluster, I traced it to the longhorn-engine v1.8.1 nil-pointer panic in (*Controller).Snapshot (longhorn/longhorn#12081) — triggered by a snapshot storm at engine startup. The fix landed in v1.10.1 / v1.11.0.

Since this PR is what pins the chart version going forward, suggest bumping to the latest stable (v1.11.2, released 2026-05-05) so new clusters skip the bug entirely. Greenfield installs aren't affected by the v1beta1 → v1beta2 CR migration upstream calls out for v1.8 → v1.10 (no existing CRs to migrate). Heads up that v1.10.0 removed deploy/prerequisite/longhorn-iscsi-installation.yaml upstream — the embedded DaemonSet still works, but the source-URL comment needs to come along; see inline.

Comment thread pkg/storage/longhorn/config.go Outdated

// ChartVersion pins the upstream Longhorn Helm chart version. Bump
// together with iscsiDaemonSetYAML when upgrading.
ChartVersion = "1.8.1"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest stable, released 2026-05-05. Includes the (*Controller).Snapshot panic fix (longhorn/longhorn#12081).

Suggested change
ChartVersion = "1.8.1"
ChartVersion = "1.11.2"


// iscsiDaemonSetYAML is the Longhorn iSCSI prerequisite DaemonSet.
// Source: https://github.com/longhorn/longhorn/blob/v1.8.1/deploy/prerequisite/longhorn-iscsi-installation.yaml
// Embedded to avoid runtime HTTP fetches and support air-gapped installs.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up: deploy/prerequisite/longhorn-iscsi-installation.yaml was removed upstream in v1.10.0 (longhorn/longhorn@600801b5 — "removed outdated prerequisite installation scripts"), so this URL 404s on a v1.11.2 ref. The DaemonSet content itself still works; just need to point at the last release that shipped it.

Suggested change
// Embedded to avoid runtime HTTP fetches and support air-gapped installs.
// Source: https://github.com/longhorn/longhorn/blob/v1.9.2/deploy/prerequisite/longhorn-iscsi-installation.yaml (removed upstream in v1.10.0; see longhorn/longhorn@600801b5)

Inline review comments:
- #1  demote any other default StorageClass after Longhorn install (Hetzner
      ships hcloud-volumes pre-annotated; K8s undefined behaviour with two
      defaults) — new ensureSoleDefaultStorageClass{,WithClient} helper
- #2  doc nil-cfg precondition on Install; normalise nil cfg at entry to
      remove the inconsistent span-attr nil handling
- #3  Hetzner Deploy uses p.GetKubeconfig (matches AWS/existing) instead of
      os.ReadFile on the path variable
- #4  add longhorn.Uninstall (helm uninstall, idempotent); call it before
      tofu destroy (AWS) and hetzner-k3s delete (Hetzner) per ADR-0002
      §"Destroy Flow"
- #5  drop dead LonghornConfig alias in pkg/provider/aws/config.go;
      callers switch to longhorn.Config
- #6  run ensureISCSI on the upgrade path too (DaemonSet apply is
      idempotent; near-instant when healthy; protects against drift)
- #7  emit a status update on existing.Deploy DryRun (was silent)
- #8  collapse TestBuildHelmValuesNonDedicatedOmitsNodeSelector into the
      main table-driven test via wantAbsent field
- #9  drop redundant inner span on ensureISCSIWithClient; outer span
      records errors via the wrapped call
- #10 doc that LonghornReplicaCount is safe when c.Longhorn is nil
- #11 bump Longhorn chart 1.8.1 → 1.11.2 (includes (*Controller).Snapshot
      nil-pointer fix from longhorn/longhorn#12081)
- #12 update embedded iscsi DaemonSet YAML source URL to v1.9.2 (file was
      removed upstream in v1.10.0)

Review summary follow-up:
- ADR-0002 amendment block: scope expansion (AWS/Hetzner opt-out,
  existing opt-in), behavioural changes from this PR, chart pin update.

All package tests green; go vet clean.
@tylerpotts
Copy link
Copy Markdown
Contributor

tylerpotts commented May 18, 2026

Heads-up: deleting-confirmation-flag blocks destroy on Longhorn ≥1.6

Hit this while destroying a Hetzner cluster on this branch's flow. The longhorn-uninstall Job (using longhornio/longhorn-manager:v1.11.2) exits immediately with:

fatal: cannot uninstall Longhorn because deleting-confirmation-flag is set to `false`.
Please set it to `true` using Longhorn UI or
kubectl -n longhorn-system edit settings.longhorn.io deleting-confirmation-flag

Net effect: nic destroy fails with BackoffLimitExceeded and the cluster is half-torn-down.

This safety gate was added in Longhorn ~v1.6 and is enforced by the v1.11.2 uninstall image. Since this PR is standardizing on v1.11.2 going forward, the destroy flow should set the flag before launching the uninstall Job, e.g.:

kubectl -n longhorn-system patch settings.longhorn.io deleting-confirmation-flag \
  --type=merge -p '{"value":"true"}'

Worth doing in the Longhorn uninstall code path in nic destroy so users do not need to know the incantation.

@tylerpotts
Copy link
Copy Markdown
Contributor

Heads-up: cluster nodes need an NFS client for Longhorn RWX volumes

While testing the RWX restore flow on a Hetzner cluster from this branch, hit a confusing mount error:

MountVolume.MountDevice failed for volume "pvc-...": rpc error: code = Internal
desc = mount failed: exit status 32
Mounting command: /usr/local/sbin/nsmounter
Mounting arguments: mount -t nfs -o vers=4.1,noresvport,timeo=600,retrans=5,softerr ...
Output: mount: ...: bad option; for several filesystems (e.g. nfs, cifs) you might
need a /sbin/mount.<type> helper program. dmesg(1) may have more information
after failed mount system call.

Two missing host-level pieces:

  1. nfs-common is not installed on the cluster nodes. Longhorn's nsmounter nsenters into the host's mount namespace and calls mount -t nfs, which needs /sbin/mount.nfs on the host. Without it, every RWX mount fails with the misleading "bad option" message above.
  2. NFS kernel modules don't auto-load. Even after installing nfs-common, Ubuntu 24.04's lazy auto-loading didn't kick in inside the nsmounter context. Worker pods continued to get the same error until modprobe nfs nfsv4 ran on the host, and /etc/modules-load.d/longhorn-nfs.conf was written to persist across reboots.

Result: RWO volumes work out of the box, RWX volumes silently fail at first mount. Operators have to debug a misleading error message and patch the host themselves.

Suggested fix in the storage layer: as part of installing Longhorn (this PR's flow), also ensure on every node that will run consumer pods:

  • nfs-common (or distro equivalent) installed
  • nfs and nfsv4 kernel modules configured to load at boot (e.g. via /etc/modules-load.d/longhorn-nfs.conf)

Could be done via a privileged DaemonSet that nsenters and runs apt-get install -y nfs-common plus writes the modules-load file — same pattern as Longhorn's own node-provisioning helpers.

This is the same category as the deleting-confirmation-flag and the orphaned-volume issues already on this PR: things that look like a Longhorn problem at the surface but are actually NIC storage-layer prerequisites the operator shouldn't have to know about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs: review 👀 This PR is complete and ready for reviewing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Generalize Longhorn install to non-AWS providers

4 participants