NetApp® ONTAP® Snapshots — PegaProx Community Plugin

A PegaProx community plugin that adds VM-consistent NetApp® ONTAP® snapshot management directly to the PegaProx UI — for NFS, iSCSI, and NVMe-oF (NVMe/TCP, NVMe/FC) datastores.

Current stable: 1.1.2 · Development: 1.2.0 · Changelog · Known Issues

What this plugin does

This plugin connects PegaProx to one or more NetApp ONTAP systems and gives you full snapshot lifecycle management for Proxmox VE — without leaving the PegaProx interface:

Snapshot any VM or set of VMs on a shared ONTAP datastore — crash-consistent, app-consistent (QEMU guest agent), or suspend-based.
Restore individual VMs (SFSR for NFS, LV copy for SAN) or revert an entire datastore to a snapshot in seconds (volume revert).
Clone VMs from any snapshot to a new VMID with fresh MAC addresses.
Schedule automatic snapshots with retention policies, pre/post hooks, and email notifications.
Replicate snapshots to a secondary ONTAP cluster via SnapMirror® and restore or clone directly from the replica — without touching the primary.
Provision new SAN datastores end-to-end (iSCSI and NVMe-oF): ONTAP volume + LUN/namespace + iGroup/subsystem creation, host-side iSCSI/NVMe setup, LVM VG creation, and PVE storage registration — in a single wizard.
Import VMs from Datastore (Beta) — adopt an existing ONTAP volume with live VMs into the plugin without reprovisioning. Reads the snapmanifest from the volume, reconstructs VM inventory, reassigns VMIDs on conflicts, and registers the datastore. Covers cluster migrations, storage takeovers, and SnapMirror DR failover scenarios.
Self-update — check for new releases or dev builds directly from the Settings tab and apply updates with one click.

All operations run as background jobs with live log streaming. Every snapshot embeds a manifest (VM inventory + configs) that travels inside the ONTAP snapshot, making restores self-contained.

Feature Matrix

Feature	NFS	iSCSI	NVMe-oF
Auto-Discovery	✅	✅	✅
VM-consistent Snapshots (crash / app / suspend)	✅	✅	✅
Scheduled Snapshots	✅	✅	✅
Email notifications per schedule	✅	✅	✅
Manifest (VM inventory, disk layout, configs) rides inside ONTAP snapshot	✅	✅	✅
Restore — SFSR (Singe File (VM Disk) Storage Restore, NFS only)	✅	❌ n/a	❌ n/a
Restore — Single VM (LV-copy via temp clone)	❌ n/a	🟡 Beta	🟡 Beta¹
Restore — Volume Revert (all VMs)	✅	🟡 Beta	🟡 Beta
VM Clone from snapshot	✅	🟡 Beta	🟡 Beta¹
Clone from ONTAP-native snapshots	✅	🟡 Beta	🟡 Beta
Multi-VM snapshot	✅	🟡 Beta	🟡 Beta
ONTAP-native snapshot visibility	✅	🟡 Beta	🟡 Beta
SnapMirror® visibility & DR restore/clone	✅	🟡 Beta	🟡 Beta
Storage Provisioning (auto-setup)	✅	🟡 Beta	🟡 Beta
Storage Resize	✅ grow & shrink	🟡 Beta grow only	🟡 Beta grow only
Job cancellation	✅	🟡 Beta	🟡 Beta
Import VMs from Datastore (adopt existing volumes with VMs)	🟡 Beta	🟡 Beta	🟡 Beta
Plugin self-update (from GitHub, release or dev)	✅	✅	✅
Full DR Scenario — Failover, Test-DR, Failback	🔵 In Development	🔵 In Development	🔵 In Development

Legend: ✅ Stable · 🟡 Beta · 🟠 Alpha · 🔵 In Development · 🔄 Planned · ❌ N/A

¹ NVMe Single VM Restore and Clone on ASA use a full volume clone via the ONTAP CLI bridge (private/cli/volume/clone). Direct namespace clone APIs are not available on ASA, but the volume clone approach achieves identical results (see platform table below).

Maturity levels:

✅ Stable — Tested in a lab environment and found to be reliable and stable under test conditions.

🟡 Beta — Implemented and partially tested. Occasional errors may still occur that require investigation. Use with caution.

🟠 Alpha — Implemented, but errors still occur regularly and may require manual intervention (e.g. a clone volume not cleaned up automatically). Not suitable for routine use.

🔵 In Development — Feature is implemented in code but has not been tested yet.

🔄 Planned — Not yet implemented.

❌ N/A — Not applicable for this protocol.

Protocol status:

🟢 NFS — Stable. All core workflows (snapshot, restore, clone, SnapMirror DR) are fully implemented and tested.

🟡 SAN — iSCSI — Beta. Auto-discovery, snapshots, schedules, single-VM restore, volume revert, VM clone, end-to-end provisioning, and SnapMirror DR restore/clone are fully implemented and tested.

🟡 SAN — NVMe-oF — Beta. Auto-discovery, snapshots, schedules, single-VM restore, volume revert, VM clone, end-to-end provisioning, and SnapMirror DR restore/clone are fully implemented and tested on NetApp ASA (NVMe/TCP, ONTAP 9.18.1) and AFF (NVMe/TCP, ONTAP 9.16.1).

Platform & Protocol Compatibility

The plugin auto-detects the ONTAP platform (san_optimized flag) and adapts the available restore methods accordingly. No manual configuration is needed.

Platform	Protocol	Snapshot	Single VM Restore	Volume Revert	Clone
FAS / AFF	NFS	✅	✅ SFSR	—	✅ FlexClone
FAS / AFF	iSCSI	✅	✅ LUN clone	✅	✅ LUN clone
FAS / AFF	NVMe-oF	✅	✅ NS clone	✅	✅ NS clone
ASA	iSCSI	✅	✅ LUN clone	✅	✅ LUN clone
ASA	NVMe-oF	✅	✅ Volume clone²	✅	✅ Volume clone²

How ASA NVMe single-VM restore/clone works: Direct namespace clone APIs are not available on ASA (POST protocols/nvme/namespaces → 404, POST storage/volumes FlexClone → 405). The plugin uses the ONTAP CLI bridge (POST private/cli/volume/clone) to create a full volume clone from the snapshot instead. The NVMe namespace inside the clone volume inherits the parent subsystem mapping and becomes immediately visible on the Proxmox hosts as a new block device — exactly what is needed for the LVM vgimportclone + dd restore/clone flow.

² ASA NVMe uses POST private/cli/volume/clone (CLI bridge) instead of the native REST namespace clone. The restore/clone result is identical to iSCSI/FAS/AFF.

Requirements

PegaProx

Version 0.9.9 or later.

ONTAP

All features are included in ONTAP One (ONTAP 9.10.1+) at no extra cost:

Feature	License	Included in ONTAP One
Volume Snapshots	Base	✓
Single-File Snapshot Restore (SFSR)	SnapRestore®	✓
Volume Snapshot Restore (revert)	SnapRestore®	✓
FlexClone	FlexClone®	✓
NVMe-oF / iSCSI	SAN	✓

Tested platforms: ONTAP 9.13+ (NFS/iSCSI), NetApp ASA (All-SAN Array) with NVMe/TCP on ONTAP 9.18.1, NetApp AFF with NVMe/TCP on ONTAP 9.16.1 — including end-to-end provisioning, snapshot, restore, clone, and SnapMirror DR restore/clone.

Proxmox packages (PVE nodes)

For NFS — no additional packages required.

For iSCSI:

apt install open-iscsi multipath-tools lvm2

For NVMe-oF:

apt install nvme-cli lvm2
# Load NVMe/TCP kernel module and persist across reboots
modprobe nvme-tcp
echo nvme-tcp >> /etc/modules-load.d/nvme-tcp.conf

Network access from the PegaProx host

PegaProx  →  Proxmox API         TCP 8006
PegaProx  →  ONTAP cluster-mgmt  TCP 443
PegaProx  →  Proxmox nodes       TCP 22 (SSH)
PegaProx  →  SMTP server         TCP 25/465/587  (optional, for email notifications)

Installation

🚀 After the plugin is installed and enabled, the built-in Initial Setup Wizard guides you through every remaining step interactively — ONTAP connectivity check, dedicated user creation, SSH key setup, host package installation, and first discovery. The only thing the wizard cannot do is the initial git clone below.

1. Install the plugin

The plugin directory must be placed inside the plugins/ subdirectory of your PegaProx installation:

Install method	PegaProx base directory	Plugin destination
Source (default)	`/opt/PegaProx`	`/opt/PegaProx/plugins/netapp_storage`
APT package	`/var/lib/pegaprox`	`/var/lib/pegaprox/plugins/netapp_storage`

Clone the latest stable release from GitHub:

# Adjust the path to match your PegaProx installation
git clone --branch v1.1.2 --depth 1 \
    https://github.com/custosonlinux/netapp_storage \
    /opt/PegaProx/plugins/netapp_storage

# Fix ownership (only needed for package/service installs running as pegaprox)
chown -R pegaprox:pegaprox /opt/PegaProx/plugins/netapp_storage

Required: create a writable home directory for the PegaProx service user

The plugin generates an SSH keypair for PVE host access. It needs a writable home directory to store ~/.ssh/:

mkdir -p /home/pegaprox
chown pegaprox:pegaprox /home/pegaprox
chmod 750 /home/pegaprox
usermod -d /home/pegaprox pegaprox

Without this, the Initial Setup Wizard's SSH key generation step will fail with a permission error. The plugin itself still loads — you will see a clear error message in the wizard if this is missing.

Note: The GitHub repository root is the plugin directory — it contains manifest.json, __init__.py, api/, core/, etc. directly.

Update an existing installation:

cd /opt/PegaProx/plugins/netapp_storage
git fetch --tags
git checkout v1.1.2   # replace with the latest tag

Alternatively, use the built-in updater in the plugin UI: Settings → ⬆️ Plugin Update → Check for Update → Update Now. No shell access required; PegaProx must be restarted after the update.

2. Restart PegaProx

systemctl restart pegaprox

3. Enable the plugin

In the PegaProx UI: Settings → Plugins → NetApp Storage → Enable.

4. Run the Initial Setup Wizard

Open the plugin UI and click 🚀 Initial Setup in the Settings tab. The wizard walks you through:

PegaProx system check — verifies sshpass, SSH key availability, and the clone mount directory.
ONTAP connectivity — tests the API connection to your NetApp cluster.
Dedicated ONTAP user — optional: creates a pegaprox user with the correct role.
PVE host SSH access — verifies SSH connectivity to each Proxmox node; generates and deploys SSH keys if needed.
PVE host packages — installs open-iscsi, multipath-tools, nvme-cli, and lvm2 on each node as required for the selected protocol.
Initial Discovery — scans your Proxmox hosts for existing NFS, iSCSI, and NVMe-oF datastores and registers them automatically.

After the wizard completes, the plugin is fully operational.

The plugin adds its tables to the central PegaProx database on first load (/opt/PegaProx/config/pegaprox.db).

Setup

The 🚀 Initial Setup Wizard (Settings tab) automates steps 2–4 below interactively. Manual setup is only needed for advanced configurations or if you prefer CLI control.

1. ONTAP user

Create a dedicated ONTAP user. The required role depends on which features you use:

Snapshots and restore only (NFS) — a role limited to snapshot and file-clone commands is sufficient:

security login role create -role pegaprox-snap -cmddirname "volume snapshot"             -access all
security login role create -role pegaprox-snap -cmddirname "volume snapshot restore"      -access all
security login role create -role pegaprox-snap -cmddirname "volume snapshot restore-file" -access all
security login role create -role pegaprox-snap -cmddirname "storage/file/clone"           -access all

security login create -user-or-group-name pegaprox \
  -application http -authmethod password -role pegaprox-snap

Full feature set (SAN provisioning, iSCSI/NVMe, SnapMirror) — requires cluster-admin scope:

security login create -user-or-group-name pegaprox \
  -application http -authmethod password -role admin

The admin role is needed for provisioning operations: creating volumes, LUNs, NVMe subsystems/namespaces, iGroups, and SnapMirror management.

2. Add ONTAP endpoint

In the plugin UI under Settings → NetApp Systems → Add:

Field	Description
Name	Friendly label (e.g. `prod-cluster`)
Host	Cluster management LIF hostname or IP
Username / Password	ONTAP credentials
SSL Verify	Recommended: enabled

3. Add Proxmox host

Under Settings → Proxmox Hosts → Add — add each Proxmox node or cluster that has datastores backed by ONTAP. Standalone nodes (not in a PVE cluster) are supported.

4. Run Auto-Discovery

Under Settings → Discovery → Run — the plugin scans your Proxmox hosts for NFS, iSCSI, and NVMe datastores and matches them to ONTAP volumes automatically.

You can also add volume mappings manually if auto-discovery cannot identify the correct mapping.

SAN-specific setup (iSCSI / NVMe-oF)

snapmanifest LV

SAN datastores (LVM-over-iSCSI or LVM-over-NVMe) do not have a filesystem that can hold manifest files. The plugin uses a small dedicated LV called snapmanifest that lives inside the same VG as your VM disks. It is formatted ext4 (64 MB by default) and rides inside every ONTAP snapshot automatically.

After discovery has found your SAN mapping, click "Setup snapmanifest" next to the mapping in the Settings tab. This creates and formats the LV. It is a one-time operation per VG.

Restore methods (SAN)

Two restore methods are available for SAN datastores. The plugin selects the correct options automatically based on platform and protocol.

Single VM Restore (iSCSI / NVMe on FAS·AFF, iSCSI / NVMe on ASA)

Restores only the target VM's logical volumes without affecting other VMs on the same datastore:

The target VM is stopped.
A temporary clone is created from the snapshot on ONTAP (LUN clone for iSCSI; namespace clone for FAS/AFF NVMe; volume clone via CLI bridge for ASA NVMe).
The clone is mapped to the Proxmox host.
vgimportclone imports the clone's LVM VG under a temporary name.
Each disk LV of the target VM is copied (dd bs=512M iflag=direct oflag=direct) from the temporary VG to the live VG.
The temporary clone is unmapped and deleted from ONTAP.
The VM config is restored from the plugin database.
The VM is started.

Other VMs on the same datastore remain running throughout.

VM Clone (SAN)

Creates a new VM from a snapshot with a new VMID and freshly generated MAC addresses:

A temporary ONTAP clone is created from the snapshot.
vgimportclone imports the clone VG and reads the snapmanifest to discover disk layout.
New LVs are created in the live VG and the disks are copied via dd.
A new VM config is written with remapped disk references and regenerated MACs.
The temporary clone is cleaned up.

The VMID is reserved in PVE immediately before the disk copy begins to prevent ID conflicts during long-running operations.

DR Restore from SnapMirror Secondary (iSCSI)

Restores a VM directly from a SnapMirror replicated snapshot on the secondary ONTAP cluster, without touching the primary:

The target VM is stopped.
A temporary VMID placeholder config is written to PVE immediately to reserve the VMID.
A FlexClone is created from the replicated snapshot on the secondary ONTAP cluster.
A temporary iGroup is created and the clone LUN is mapped.
The Proxmox host establishes a single-path iSCSI connection to a secondary LIF.
vgimportclone imports the clone VG under a temporary name.
Each target VM disk LV is copied (dd) from the temporary secondary VG to the primary VG.
The temporary iSCSI connection is disconnected and the clone and iGroup are removed from the secondary.
The VM config is restored and the VM is started.

Single-path iSCSI note: DR connections use a single LIF on the secondary — multipath is not active for this path. With find_multipaths yes in multipath.conf, no /dev/mapper/<WWID> device is created. The plugin detects the device via /dev/disk/by-id/scsi-<WWID> as fallback.

DR Clone from SnapMirror Secondary (iSCSI)

Same as DR Restore, but instead of overwriting the original VM's disks, creates a new VM with a new VMID and freshly generated MAC addresses. Disk LVs are remapped to the new VMID automatically.

Volume Revert (all SAN, including ASA NVMe)

Reverts the entire ONTAP volume to the snapshot state — affects all VMs on that datastore:

The target VM is stopped.
The LVM VG is deactivated on the Proxmox host (vgchange -an).
ONTAP reverts the entire volume to the snapshot state.
The VG is re-scanned and reactivated (pvscan --cache && vgchange -ay).
The VM config is restored from the plugin database.
The VM is started.

⚠️ Volume Revert is destructive: all data written to the volume after the snapshot is permanently lost. All VMs on the same SAN datastore are affected.

Storage Provisioning (NFS / iSCSI / NVMe-oF)

The Provisioning tab automates the complete setup of a new datastore — from ONTAP object creation to PVE storage registration — across all cluster nodes in a single operation.

What is automated

NFS:

ONTAP side — create (or reuse) a volume, create a dedicated export policy, and add per-host export rules (the host's IP that routes to the NFS LIF is detected automatically).
PVE cluster — pvesm add nfs (cluster-wide, run once via pmxcfs).
Snapmanifest directory — .netapp-snapmanifest/ is created inside the mount point so snapshot manifests work immediately.

iSCSI:

ONTAP side — create (or reuse) a thin-provisioned SAN volume, a LUN, and an iGroup; add all selected host IQNs; map the LUN to the iGroup.
Per PVE host — iSCSI discovery (iscsiadm -m discovery), target login, multipath device detection (waits until /dev/mapper/<WWID> appears).
First host — pvcreate, vgcreate (linear or thin-provisioned LVM).
All hosts — pvscan --cache -aay to populate the LVM event cache so the VG activates on every node.
PVE cluster — pvesm add lvm / lvmthin (cluster-wide, run once).

NVMe-oF:

ONTAP side — create (or reuse) a namespace and NVMe subsystem; add all selected host NQNs; map the namespace. Supports both standard AFF/FAS platforms and ASA (All-Flash SAN Array) with automatic API fallback.
Per PVE host — nvme connect-all (with automatic timeout handling for non-DDC LIFs), namespace rescan, wait for block device.
First host — pvcreate, vgcreate, snapmanifest LV initialization.
All hosts — pvscan --cache -aay VG activation.
PVE cluster — pvesm add lvm / lvmthin (cluster-wide, run once).

Resize datastore

Resize runs as a background job and is non-disruptive for VMs that are running.

NFS — grow or shrink:

ONTAP volume is resized to the new size.
No host-side action needed — the NFS client sees the updated size immediately through the existing mount.

iSCSI — grow only:

ONTAP volume is resized to new_size × san_volume_multiplier (default 2.5×, configurable in config.json). The extra headroom accommodates ONTAP snapshots taken after the resize. Costs no physical space on thin-provisioned volumes.
LUN is resized to new_size.
Per PVE host — SCSI bus rescan (/sys/class/scsi_device/*/device/rescan, udevadm settle), multipath table reload, multipathd resize map, pvresize on the multipath device, pvscan --cache.

NVMe-oF — grow only:

ONTAP volume is resized to new_size × san_volume_multiplier (default 2.5×, same as iSCSI).
NVMe namespace is resized to new_size.
Per PVE host — NVMe namespace rescan, pvresize on each PV belonging to the VG, pvscan --cache.

The san_volume_multiplier (default 2.5) applies to both initial provisioning and resize. On thin-provisioned volumes the extra ONTAP volume capacity is not physically allocated until written, so the overhead is free until snapshots consume it.

After a SAN resize, pvresize makes the extra space available to LVM. To expose it to VMs, extend the desired LV (lvextend) and resize the filesystem inside the VM afterwards.

Remove datastore

The Provisioning tab also handles teardown: pvesm remove, VG deactivation and removal, iSCSI logout / NVMe disconnect — and optionally deletes the ONTAP LUN/namespace and volume.

Requirements

NFS: No additional packages required on PVE nodes.
iSCSI: open-iscsi, multipath-tools, lvm2 on all PVE nodes.
NVMe-oF: nvme-cli, lvm2, kernel module nvme-tcp on all PVE nodes.
A valid /etc/multipath.conf with NetApp settings on all PVE nodes (see template below; required for iSCSI).
SSH access from PegaProx to all PVE nodes (configured under Settings → Proxmox Hosts).

SAN datastore — multi-host manual setup (NVMe-oF)

If you prefer to configure NVMe-oF connectivity manually before using the Provisioning tab, follow these steps.

multipath.conf — NetApp recommended settings

Required on every PVE node for iSCSI. Add to /etc/multipath.conf:

defaults {
    find_multipaths    yes
    user_friendly_names yes
}
devices {
    device {
        vendor                "NETAPP"
        product               "LUN.*"
        path_grouping_policy  group_by_prio
        prio                  alua
        hardware_handler      "1 alua"
        failback              immediate
        path_checker          tur
        no_path_retry         queue
        features              "3 queue_if_no_path pg_init_retries 50"
        rr_weight             uniform
        rr_min_io_rq          1
    }
}

After writing: systemctl restart multipathd.

NVMe-oF — discovery.conf

NVMe/TCP connections are configured in /etc/nvme/discovery.conf, one entry per target/interface pair:

--transport=tcp --traddr=<target_ip> --host-iface=<nic> --host-traddr=<host_ip>

After editing, reconnect with nvme connect-all. The same LVM VG activation step (pvscan --cache -aay) applies for NVMe-backed LVM VGs on secondary hosts.

Note: Some ONTAP ASA deployments only expose a Discovery Domain Controller (DDC, port 8009) on a subset of LIFs. nvme connect-all may hang indefinitely trying to discover on non-DDC LIFs. The plugin's provisioning flow handles this automatically via a timeout wrapper. If you run nvme connect-all manually, use timeout 30 nvme connect-all to prevent hangs.

Email notifications

Each schedule can send email notifications on snapshot job completion. Configure SMTP in Settings → SMTP first, then enable notifications per schedule:

Option	Description
Enable notifications	Master toggle per schedule
Notify on	All events / Failures only / Success only
Recipients	Comma-separated email addresses
Send test email	Sends a test email using the current SMTP settings and the entered recipients

Notifications are sent as HTML emails with a plain-text fallback. The email format includes:

Status banner (full-width, colour-coded): green for success, amber for success-with-warnings, red for failure.
Summary table — schedule name, snapshot name, datastore, status dot, and the list of snapshotted VMs. Each VM is shown as a colour-coded badge including VMID, display name, and type (QEMU / LXC).
Dark terminal log block — last 50 job log lines with per-line severity tags:
- [INFO] — informational
- [WARN] — warnings (amber)
- [ERR] — errors (red)
Plain-text fallback — included as a text/plain MIME part for clients that don't render HTML.

The banner colour is determined by the overall outcome: done with no warnings → green; done with at least one warning → amber; failed or any error → red.

Job management

All snapshot, restore, and clone operations run as background jobs and are visible under Jobs & History.

Cancel: Running jobs can be cancelled via the Cancel button. The job stops at the next safe checkpoint (between steps or between disk copies). Any partial work (temporary ONTAP clones, imported VGs, reserved VMIDs) is cleaned up automatically.
Delete: Completed, failed, or cancelled jobs can be deleted individually or in bulk via "Cleanup".
Stale jobs: If a job is stuck at "running" after a PegaProx restart (the job thread is gone but the DB entry was not updated), the Cancel button will detect the dead thread and immediately mark the job as cancelled.

Troubleshooting

Stale iSCSI clone LUN after a failed job

If a clone or restore job fails after the temporary ONTAP LUN has been mapped to the Proxmox host but before cleanup completes, the host may be left with a stale multipath device. Because the NetApp multipath configuration uses no_path_retry queue, any process that touches the lost device — including LVM (vgs, pvs) — will hang indefinitely.

Symptoms:

vgs, pvs, or any LVM command hangs on the affected host
multipath -ll shows a device with all paths in failed faulty running state
The ONTAP volume still exists (visible in System Manager or CLI) but the LUN is no longer mapped

Cleanup — run on every affected PVE host:

Identify the stale WWID:

multipath -ll | grep -B1 'failed faulty'
# Note the WWID, e.g.: 3600a098038323449383f5a38746e4842

Disable I/O queuing — this unblocks any hanging LVM commands immediately:
```
multipathd disablequeueing map 3600a098038323449383f5a38746e4842
```

Flush the multipath device:

multipath -f 3600a098038323449383f5a38746e4842

Remove the stale SCSI paths (replace sdl sdk sdm sdn with the actual device names shown by multipath -ll):
```
for d in sdl sdk sdm sdn; do
  echo 1 > /sys/block/$d/device/delete
done
```

Delete the temporary ONTAP clone volume (pgxclone_*) via ONTAP System Manager or CLI:

# ONTAP CLI:
vol delete -vserver <svm> -volume pgxclone_<uuid> -foreground true

Verify cleanup:

multipath -ll    # stale WWID must be gone
vgs              # must return immediately without hanging

Why does this happen? The queue_if_no_path feature in the NetApp multipath configuration keeps I/O queued in kernel memory when all paths to a LUN are lost — this prevents data loss during transient network outages but also means any process accessing the device blocks until paths return or the device is explicitly flushed. The plugin flushes stale devices automatically at the end of every job. This manual procedure is only needed if the automatic cleanup itself failed (e.g. due to an ONTAP API timeout or network error during the cleanup step).

Performance — SAN disk copy

The dd copy used during Single VM Restore and VM Clone is tuned for NVMe storage and high-bandwidth networks:

dd if=<src_lv> of=<dst_lv> bs=512M iflag=direct oflag=direct conv=fsync

bs=512M — large blocks minimize syscall overhead.
iflag=direct oflag=direct — O_DIRECT on both sides bypasses the page cache and lets NVMe saturate the full device bandwidth without wasting RAM.
Timeout: 4 hours — covers very large volumes even at constrained throughput.

DR iSCSI throughput: During DR restore/clone from a SnapMirror secondary, the dd copy runs across clusters — data flows from the secondary ONTAP cluster to the primary VG over the production network. Throughput is bounded by the inter-site link bandwidth, not by local NVMe/iSCSI speed. For large VMs over limited WAN links, DR operations can take significantly longer than primary restores.

Configuration (`config.json`)

See config.example.json for all options:

Key	Default	Description
`snapshot_prefix`	`"NPP_"`	Prefix added to all snapshot names
`default_consistency`	`"crash"`	Default consistency level (`crash`, `app`, `suspend`)
`default_restore_method`	`"sfsr"`	Default restore method (`sfsr`, `flexclone`, `san`)
`job_poll_interval_s`	`3`	How often to poll ONTAP job status (seconds)
`job_poll_timeout_s`	`300`	Max wait time for an ONTAP job (seconds)
`manifest_subdir`	`".netapp-snapmanifest"`	Directory inside the NFS mount for manifests
`flexclone_mount_base`	`"/mnt/pegaprox-clone"`	Temp mount point for FlexClone restores
`san_volume_multiplier`	`2.5`	ONTAP volume size = LUN/namespace size × this factor. Leaves headroom for snapshots. Applies to iSCSI and NVMe-oF provisioning and resize.

Naming conventions

All internal names created by the plugin follow the patterns below. This makes it easy to identify plugin-owned objects on ONTAP and on Proxmox hosts, and to clean up manually if needed.

ONTAP snapshot names

Type	Pattern	Example
Manual snapshot	`{prefix}{user_input}`	`NPP_before_update`
Scheduled snapshot	`{prefix}{YYYYMMDD}_{HHMM}[_{schedule_name}]`	`NPP_20260507_1400_nightly`

prefix defaults to NPP_ and is configurable via snapshot_prefix in config.json.

Temporary ONTAP objects (FlexClone volumes, LUNs, namespaces)

All temporary objects that the plugin creates on ONTAP during a clone or restore operation use the same prefix: pgxclone_. They are deleted automatically when the job completes (or fails).

Object	Pattern	Example
NFS FlexClone volume	`pgxclone_{job_id[:8]}`	`pgxclone_ab12cd34`
NFS FlexClone junction path	`/{clone_name}`	`/pgxclone_ab12cd34`
iSCSI temporary LUN (primary restore/clone)	`pgxclone_{job_id[:8]}`	`pgxclone_ab12cd34`
NVMe temporary namespace	`pgxclone_{job_id[:8]}`	`pgxclone_ab12cd34`
Full ONTAP LUN/NS path	`/vol/{volume_name}/{clone_name}`	`/vol/proxvol01/pgxclone_ab12cd34`
iSCSI DR FlexClone volume (on secondary)	`pgxdrclone_{job_id[:8]}`	`pgxdrclone_ab12cd34`
iSCSI DR temporary iGroup (on secondary)	`pgxdr_{job_id[:8]}`	`pgxdr_ab12cd34`

The pgxclone_ prefix (short, no hyphens, no special characters) was chosen because ONTAP LUN path components do not reliably allow hyphens on all platforms (notably ASA). The pgxdrclone_ and pgxdr_ prefixes follow the same convention for DR objects created on the secondary cluster.

Local temporary mount points on PVE nodes

These are local directories only — they never appear in ONTAP.

Purpose	Pattern	Example
FlexClone NFS mount	`{flexclone_mount_base}/{clone_name}`	`/mnt/pegaprox-clone/pgxclone_ab12cd34`
DR restore NFS mount	`{flexclone_mount_base}/dr-{job_id[:8]}`	`/mnt/pegaprox-clone/dr-ab12cd34`
DR clone NFS mount	`{flexclone_mount_base}/dr-clone-{job_id[:8]}`	`/mnt/pegaprox-clone/dr-clone-ab12cd34`

flexclone_mount_base defaults to /mnt/pegaprox-clone.

SAN: LVM objects on Proxmox

Object	Pattern	Example
snapmanifest LV	`netapp_snapmanifest`	fixed name, configurable via `snapmanifest_lv_name`
Temp mount point for snapmanifest write	`/tmp/.pgsi_{random[:10]}`	`/tmp/.pgsi_3f8a2c1b4e`
Imported temp VG (vgimportclone)	`{vg_name}` or `{vg_name}1`	`proxvg1` (suffix added by LVM if name collides)

The temp VG name after vgimportclone is chosen by LVM automatically: it tries the base VG name and appends an incrementing number on collision.

NFS manifest storage

Object	Pattern	Example
Manifest directory	`{nfs_mount}/{manifest_subdir}/{snap_name}/`	`/mnt/nfs/.netapp-snapmanifest/NPP_20260507_1400/`
Manifest file	`…/manifest.json`
VM config at snapshot time	`…/{vmid}.conf`	`…/100.conf`

manifest_subdir defaults to .netapp-snapmanifest. Configurable in config.json.

`manifest_path` prefixes (stored in DB)

The manifest_path column in netapp_snapshots uses a prefix to indicate where the manifest lives:

Prefix	Meaning
(plain file path)	NFS — manifest is on the NFS datastore
`db:{snapshot_id}`	DB-only fallback (NFS write failed, or not applicable)
`snapmanifest:{vg}/{lv}/{snap_name}`	SAN — manifest is on the snapmanifest LV and in the DB

Default VM names for cloned VMs

When no name is provided by the user, the plugin generates a default:

Clone type	Default name
NFS clone	`clone-{original_vm_name}`
SAN clone	`san-clone-{original_vm_name}`
DR clone	`dr-clone-{original_vm_name}`

Consistency levels

Level	Behaviour
`crash`	Snapshot taken immediately — fastest, crash-consistent
`app`	QEMU Guest Agent `fsfreeze-freeze` before snapshot, `fsfreeze-thaw` after
`suspend`	VM suspended before snapshot, resumed after

LXC containers: only crash is supported (no guest agent).

⚠️ One-datastore-per-VM requirement: The plugin snapshots an entire ONTAP volume at once. A VM whose disks are spread across multiple datastores (backed by different ONTAP volumes) will only have the disks on the currently selected datastore included in the snapshot. The other volumes are not snapshotted simultaneously, so the resulting snapshot set is not crash-consistent across volumes. For reliable snapshots and restores, keep all disks of a VM on the same datastore.

Manifest

NFS

Every plugin-managed NFS snapshot stores metadata inside the NFS datastore:

<nfs_mount_path>/.netapp-snapmanifest/<snap-name>/
  manifest.json    snapshot metadata + VM inventory
  100.conf         Proxmox config of VM 100 at snapshot time
  101.conf         …

SAN (iSCSI / NVMe-oF)

The manifest is written to the snapmanifest LV (a dedicated 64 MB ext4 LV in the same VG) before each ONTAP snapshot is created. This ensures the manifest travels inside the snapshot and is available for restore:

/dev/{vg}/netapp_snapmanifest  (ext4, 64 MB)
  manifest.json
  vmconfigs/100.conf
  vmconfigs/101.conf

Additionally, the manifest is always stored in the plugin database as a fallback.

ONTAP-native snapshots (not created by the plugin) also contain the snapmanifest LV at the state it was in when the snapshot was taken (i.e. the last plugin-managed snapshot's manifest). The plugin reads this manifest during restore and clone to determine disk layout without relying on the current VM configuration.

API reference

All routes are relative to /api/plugins/netapp_storage/api/.

Method	Path	Description
GET	`endpoints`	List ONTAP endpoints
POST	`endpoints/add`	Add endpoint
POST	`endpoints/update`	Update endpoint (name, host, credentials, flags)
POST	`endpoints/delete`	Delete endpoint
POST	`endpoints/test`	Test connectivity
GET	`pve-hosts`	List Proxmox hosts
POST	`pve-hosts/add`	Add host
POST	`pve-hosts/delete`	Delete host
POST	`pve-hosts/test`	Test SSH connectivity
GET	`volume-mappings`	List volume mappings
POST	`volume-mappings/delete`	Delete a volume mapping
POST	`discover`	Run auto-discovery
GET	`snapshots`	List snapshots (last 200)
POST	`snapshots/create`	Create snapshot (async)
POST	`snapshots/delete`	Delete snapshot
GET	`snapshots/volumes`	List ONTAP volumes for an endpoint
GET	`snapshots/vms-for-mapping`	List VMs on a mapped datastore
GET	`snapshots/manifest`	Read snapshot manifest
POST	`san/snapmanifest-init`	Initialize snapmanifest LV on a SAN mapping
GET	`san/snapmanifest-check`	Check snapmanifest LV status
POST	`restore/start`	Start restore job (`method`: `sfsr` / `san_single` / `san` / `dr`)
GET	`restore/status`	Restore job status
POST	`clone/start`	Start clone job
POST	`clone/dr-start`	Start DR clone job
GET	`clone/nextid`	Suggest next free VMID
GET	`clone/nodes`	List available Proxmox nodes
GET	`schedules`	List schedules
POST	`schedules/add`	Create schedule
POST	`schedules/update`	Update schedule
POST	`schedules/delete`	Delete schedule
POST	`schedules/run-now`	Trigger schedule immediately
GET	`jobs/status`	List all jobs or single job (`?job_id=`)
POST	`jobs/cancel`	Cancel a running job
POST	`jobs/delete`	Delete a completed/failed/cancelled job
POST	`jobs/cleanup`	Delete all completed and failed jobs
GET	`snapmirror/relationships`	List SnapMirror relationships
POST	`snapmirror/scan`	Scan / refresh SnapMirror relationships
POST	`snapmirror/update`	Trigger a SnapMirror transfer
GET	`snapmirror/secondary-snapshots`	List snapshots on a secondary volume
POST	`snapmirror/ensure-export`	Ensure secondary volume is exported (NFS DR)
POST	`snapmirror/check-secondary`	Check secondary connectivity (NFS export / iSCSI LIF / NVMe LIF)
GET	`snapmirror/dr-snap-vms`	List VMs available in a replicated snapshot (reads from DB manifest)
GET	`provisioning/datastores`	List provisioned datastores
POST	`provisioning/datastores`	Create datastore (starts provisioning job)
POST	`provisioning/datastores/import`	Register an existing datastore in the Provisioning tab
POST	`provisioning/datastores/remove`	Remove datastore (starts removal job)
POST	`provisioning/datastores/resize`	Resize datastore
POST	`provisioning/datastores/add-host`	Add a PVE host to an existing datastore
POST	`provisioning/datastores/remove-host`	Remove a PVE host from a datastore
GET	`provisioning/ontap-resources`	Browse volumes/LUNs/iGroups on an endpoint (wizard)
GET	`provisioning/pve-hosts`	List configured PVE hosts (wizard)
GET	`settings/smtp`	Load SMTP configuration
POST	`settings/smtp/save`	Save SMTP configuration
POST	`settings/smtp/test`	Test SMTP connection
POST	`settings/notify-test`	Send a test notification email
GET	`settings/export`	Export all plugin config as JSON
POST	`settings/import`	Restore plugin config from JSON
GET	`settings/update/info`	Check GitHub for latest release / branch info
POST	`settings/update/apply`	Download and apply a plugin update from GitHub
GET	`provisioning/recovery/scan-volumes`	Scan ONTAP volumes for existing datastores
GET	`provisioning/recovery/manifests`	Read snapmanifest from an existing volume
POST	`provisioning/recovery/bind`	Bind (adopt) an existing volume as a plugin datastore
POST	`provisioning/recovery/restore-vms`	Import VM configs from a bound datastore
GET	`provisioning/recovery/used-vmids`	List VMIDs in use on the target cluster
GET	`ui`	Plugin management UI

Roadmap

v1.2 — Full Disaster Recovery (In Development — v1.2.0)

Building on SnapMirror DR restore/clone, v1.2.x adds a fully orchestrated DR workflow driven from the DR-side PegaProx — avoiding the chicken-and-egg problem of needing the primary site to trigger failover.

v1.2.0 — DR Configuration & Plan Management + UI Overhaul (implemented, pre-release)

Disaster Recovery:

DR Peer Sync — direct plugin-to-plugin HTTPS communication replaces the previous NFS config-volume approach; shared sync token (X-DR-Sync-Token); roles: PRIMARY / SECONDARY / STANDALONE; heartbeat every 30 s, DB sync every 60 s.
DR Plans — link source datastores to their SnapMirror destinations; dashboard shows lag and health per datastore.
VM Start Groups — define ordered startup groups (DNS/AD → app servers → optional services) with per-group startup delay and auto/manual release.

UI — Snapshot Timeline:

Timeline ribbon — SVG-based horizontal timeline above the snapshot table; Day/Week/Month/Year zoom; ● dots for existing snapshots (hover tooltip, click jumps to row); △ triangles for scheduled future runs; orange retention band; "+N earlier" overflow indicator.

UI — Snapshot Table:

Virtual scrolling — windowed row rendering above 80 rows; sticky header; bulk selection (_vsSelectedKeys) preserved across viewport boundaries.
Mobile card layout — responsive @media ≤680px card view with data-label column headers.
Relative timestamps — created_at shown as 3 hours ago with absolute ISO-8601 tooltip.
Status vocabulary — unified badge (done / running / pending / failed) with consistent icon + colour.
Monospace identifiers — volume, SVM, VMID, node, snapshot name, and ONTAP job ID in <code> throughout.

UI — Safety & Confirmations:

Danger modals — Restore and Delete require typing the snapshot name; blast-radius VM list shown; optional safety snapshot before restore (default on).
Audit log — every destructive action (delete, restore) logged to netapp_audit_log with user, timestamp, target, result, and ONTAP job ID. Visible under Jobs → Audit.
Actionable error toasts — 12 s display, ✕ close, "Show details" toggle with full ONTAP JSON response in monospace.

UI — Schedule Wizard:

6-step wizard — ① Basics, ② Schedule & Retention (with live next-run preview), ③ VMs & Scripts, ④ SnapMirror (auto-skipped if no relationship), ⑤ Notifications, ⑥ Summary.
Retention warning — inline alert when reducing retention would immediately purge snapshots on the next run.

UI — Accessibility:

Keyboard navigation — trapFocus() on all modals; global Escape handler closes topmost modal; role="dialog" aria-modal="true" on all main modals; aria-live="polite" on toast region.

v1.2.1 — Failover (implemented, full end-to-end test pending)

Planned failover — final SnapMirror sync → break → mount volumes on DR PVE → restore VM configs from snapmanifest → start VMs in group order.
Emergency failover — immediate break without a final sync (for complete primary site loss).
Pre-checks cover plan completeness, SnapMirror health, and DR host reachability.
Per-datastore snapshot selection (automatic latest or manual pick from ONTAP snapshot list).

v1.2.2 — DR Test via FlexClone (planned)

Bring up a DR test environment without breaking SnapMirror: FlexClone each DR volume → mount clones on DR PVE with isolated storage IDs → optionally start VMs with a VMID offset to avoid conflicts → one-click cleanup.

v1.2.3 — Failback (planned)

Guided return to primary: reverse-SnapMirror (DR → primary), final resync, re-mount on primary PVE, restore SnapMirror in the original direction. Covers the case where the primary ONTAP system was replaced entirely.

v1.3 — Deep PegaProx Integration (Planned)

VM context panel — snapshot status and single-VM restore directly in the PegaProx VM detail view, without switching tabs.
Datastore context — snapshot history, SnapMirror status, and resize in the PegaProx storage view.
Single host configuration — eliminate dual-entry of PVE hosts; plugin reads cluster topology directly from PegaProx.
Tamper-proof snapshots — ONTAP Snapshot Locking (compliance / governance) for ransomware protection and regulatory requirements (GDPR, BSI). Includes conflict validation between lock duration and schedule retention.

License

GNU Affero General Public License v3.0 (AGPLv3) — see LICENSE.

Trademarks

NetApp, ONTAP, SnapMirror, SnapVault, SnapRestore, and FlexClone are registered trademarks of NetApp, Inc. in the United States and/or other countries. All other trademarks are the property of their respective owners.

This project is an independent community plugin and is not affiliated with, endorsed by, or sponsored by NetApp, Inc.

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
api		api
core		core
db		db
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
KNOWN_ISSUES.md		KNOWN_ISSUES.md
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
config.example.json		config.example.json
manifest.json		manifest.json
ui.html		ui.html

Folders and files

Latest commit

History

Repository files navigation

NetApp® ONTAP® Snapshots — PegaProx Community Plugin

What this plugin does

Feature Matrix

Platform & Protocol Compatibility

Requirements

PegaProx

ONTAP

Proxmox packages (PVE nodes)

Network access from the PegaProx host

Installation

1. Install the plugin

2. Restart PegaProx

3. Enable the plugin

4. Run the Initial Setup Wizard

Setup

1. ONTAP user

2. Add ONTAP endpoint

3. Add Proxmox host

4. Run Auto-Discovery

SAN-specific setup (iSCSI / NVMe-oF)

snapmanifest LV

Restore methods (SAN)

Single VM Restore (iSCSI / NVMe on FAS·AFF, iSCSI / NVMe on ASA)

VM Clone (SAN)

DR Restore from SnapMirror Secondary (iSCSI)

DR Clone from SnapMirror Secondary (iSCSI)

Volume Revert (all SAN, including ASA NVMe)

Storage Provisioning (NFS / iSCSI / NVMe-oF)

What is automated

Resize datastore

Remove datastore

Requirements

SAN datastore — multi-host manual setup (NVMe-oF)

multipath.conf — NetApp recommended settings

NVMe-oF — discovery.conf

Email notifications

Job management

Troubleshooting

Stale iSCSI clone LUN after a failed job

Performance — SAN disk copy

Configuration (config.json)

Naming conventions

ONTAP snapshot names

Temporary ONTAP objects (FlexClone volumes, LUNs, namespaces)

Local temporary mount points on PVE nodes

SAN: LVM objects on Proxmox

NFS manifest storage

manifest_path prefixes (stored in DB)

Default VM names for cloned VMs

Consistency levels

Manifest

NFS

SAN (iSCSI / NVMe-oF)

API reference

Roadmap

v1.2 — Full Disaster Recovery (In Development — v1.2.0)

v1.2.0 — DR Configuration & Plan Management + UI Overhaul (implemented, pre-release)

v1.2.1 — Failover (implemented, full end-to-end test pending)

v1.2.2 — DR Test via FlexClone (planned)

v1.2.3 — Failback (planned)

v1.3 — Deep PegaProx Integration (Planned)

License

Trademarks

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Configuration (`config.json`)

`manifest_path` prefixes (stored in DB)

Packages