This document describes the end-to-end workflow for taking volume-level snapshots from the CloudStack Management Server, organized in the sequence that CloudStack orchestrates the operation.
A volume snapshot in CloudStack captures the state of a disk at a point in time. The snapshot can be stored on primary storage, secondary storage, or replicated across zones and storage pools. The workflow involves multiple layers: API, orchestration, storage engine, and storage-specific strategy plugins.
File: api/src/main/java/org/apache/cloudstack/api/command/user/snapshot/CreateSnapshotCmd.java
The user (or scheduler) calls the createSnapshot API. The command is a BaseAsyncCreateCmd, meaning snapshot allocation and execution happen in two separate phases (create and execute).
In the execute() phase:
snapshot = _volumeService.takeSnapshot(
getVolumeId(), getPolicyId(), getEntityId(),
getAccount(), getQuiescevm(), getLocationType(),
getAsyncBackup(), getTags(), getZoneIds(),
getStoragePoolIds(), useStorageReplication());Key parameters available to the caller:
volumeId– the volume to snapshotpolicyId– optional snapshot policy to applylocationType–PRIMARYorSECONDARYasyncBackup– whether to back up to secondary asynchronouslyzoneIds– destination zones to copy the snapshot tostoragePoolIds– specific primary storage pools to copy the snapshot touseStorageReplication– use native cross-zone storage replication (StorPool)
File: server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java
Before execute() is called, create() runs allocSnapshot() which:
- Verifies the caller has access to the volume.
- Validates resource limits (snapshot count, secondary storage quota).
- Generates a snapshot name in the format
<vmName>_<volumeName>_<timestamp>. - Creates a
SnapshotVOrecord in the database in stateAllocated. - Increments resource counters for the account (snapshot count, storage size).
The allocation returns the snapshot ID, which is then used by the execute() phase.
File: server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java
takeSnapshotInternal() performs pre-flight checks before dispatching work:
- Re-validates volume exists and is in
Readystate. - Rejects snapshots on
Externalhypervisor type volumes. - Resolves
zoneIdsandpoolIdsfrom snapshot policy details if apolicyIdis provided. - Validates each destination zone exists.
- Checks that the caller has access to both the volume and (if attached) the VM.
- If the storage pool is managed and
locationTypeis unset, defaults toLocationType.PRIMARY. - Calls
snapshotHelper.addStoragePoolsForCopyToPrimary()to resolve storage pool IDs whenuseStorageReplicationis enabled.
Path selection based on VM attachment:
Volume attached to running VM?
├── YES → Serialize via VM Work Job Queue
│ (Step 4a — job queue path)
└── NO → Direct execution
(Step 4b — direct path)
File: server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java
When the volume is attached to a VM, CloudStack serializes the operation using the VM Work Job queue. This prevents concurrent conflicting operations on the same VM.
Outcome<Snapshot> outcome = takeVolumeSnapshotThroughJobQueue(
vm.getId(), volumeId, policyId, snapshotId,
account.getId(), quiesceVm, locationType,
asyncBackup, zoneIds, poolIds);A VmWorkTakeVolumeSnapshot work item is created and dispatched. The job framework eventually calls orchestrateTakeVolumeSnapshot(VmWorkTakeVolumeSnapshot work) from within the VM work job dispatcher.
If the current thread is already running inside the job dispatcher (re-entrant case), a placeholder work record is created and orchestrateTakeVolumeSnapshot() is called directly to avoid deadlock.
VmWorkTakeVolumeSnapshot carries:
// engine/components-api/src/main/java/com/cloud/vm/VmWorkTakeVolumeSnapshot.java
new VmWorkTakeVolumeSnapshot(userId, accountId, vmId, handlerName,
volumeId, policyId, snapshotId, quiesceVm,
locationType, asyncBackup, zoneIds, poolIds);File: server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java
When the volume is not attached to a VM, a CreateSnapshotPayload is built and attached directly to the volume:
CreateSnapshotPayload payload = new CreateSnapshotPayload();
payload.setSnapshotId(snapshotId);
payload.setSnapshotPolicyId(policyId);
payload.setAccount(account);
payload.setQuiescevm(quiescevm);
payload.setLocationType(locationType);
payload.setAsyncBackup(asyncBackup);
payload.setZoneIds(zoneIds);
payload.setStoragePoolIds(poolIds);
volume.addPayload(payload);
return volService.takeSnapshot(volume);File: server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java
Whether coming from the job queue or directly, orchestrateTakeVolumeSnapshot() handles the final preparation:
- Re-validates the volume is still
Ready. - Detects whether the volume is encrypted and on a running VM; rejects such snapshots unless the storage is StorPool (which supports live encrypted volume snapshots).
- Builds the
CreateSnapshotPayloadwith all execution parameters. - Attaches the payload to the volume.
- Calls
volService.takeSnapshot(volume)— delegating toSnapshotManagerImpl.
StorPool encrypted volume exception:
boolean isSnapshotOnStorPoolOnly =
volume.getStoragePoolType() == StoragePoolType.StorPool &&
SnapshotInfo.BackupSnapshotAfterTakingSnapshot.value();
// Allow live snapshot of encrypted volumes on StorPool primary storageFile: server/src/main/java/com/cloud/storage/snapshot/SnapshotManagerImpl.java
This is the core snapshot execution method:
- Extracts
CreateSnapshotPayloadfrom the volume. - Determines whether to use KVM file-based storage path.
- Checks if backup to secondary storage is needed for this zone.
- For KVM file-based storage with secondary backup, allocates an image store.
- Selects the appropriate
SnapshotStrategyviaStorageStrategyFactory.getSnapshotStrategy(snapshot, TAKE).
Strategy selection priority (highest wins):
| Strategy | Priority | Handles |
|---|---|---|
StorPoolSnapshotStrategy |
HIGHEST (for DELETE/COPY) | DELETE, COPY on StorPool storage |
StorageSystemSnapshotStrategy |
HIGH | Managed storage (TAKE, DELETE) |
DefaultSnapshotStrategy |
DEFAULT | File-based hypervisor snapshots |
CephSnapshotStrategy |
HIGH | Ceph RBD snapshots |
ScaleIOSnapshotStrategy |
HIGH | ScaleIO/PowerFlex snapshots |
- Calls
snapshotStrategy.takeSnapshot(snapshot)which returns aSnapshotInfoon primary storage.
File: engine/storage/snapshot/src/main/java/org/apache/cloudstack/storage/snapshot/SnapshotServiceImpl.java
The storage engine creates the snapshot on primary storage:
- Creates a snapshot state object on the primary data store.
- Transitions snapshot state:
CreateRequested. - Transitions volume state:
Volume.Event.SnapshotRequested. - Issues an asynchronous command to the primary data store driver (
PrimaryDataStoreDriver.takeSnapshot()). - Waits for the async callback via
AsyncCallFuture<SnapshotResult>. - On success:
- Updates physical size from the driver response.
- Publishes
EVENT_SNAPSHOT_ON_PRIMARYusage event. - Transitions volume:
Volume.Event.OperationSucceeded.
- On failure:
- Transitions snapshot to
OperationFailed. - Transitions volume:
Volume.Event.OperationFailed.
- Transitions snapshot to
File: server/src/main/java/com/cloud/storage/snapshot/SnapshotManagerImpl.java
After the snapshot is created on primary, CloudStack decides whether to back it up:
BackupSnapshotAfterTakingSnapshot == true?
├── YES
│ ├── KVM file-based → postSnapshotDirectlyToSecondary()
│ │ (snapshot already on secondary — update DB reference only)
│ └── Otherwise → backupSnapshotToSecondary()
│ ├── asyncBackup == true → schedule BackupSnapshotTask
│ └── asyncBackup == false → synchronous backupSnapshot() + postSnapshotCreation()
└── NO
├── storagePoolIds provided AND asyncBackup → schedule BackupSnapshotTask for pool copy
└── Otherwise → markBackedUp() (snapshot stays on primary only)
BackupSnapshotTask (async retry runner):
- Retries backup up to
snapshot.backup.to.secondary.retriestimes. - On exhausting retries, calls
snapshotSrv.cleanupOnSnapshotBackupFailure()to remove the snapshot record.
File: plugins/storage/volume/storpool/src/main/java/org/apache/cloudstack/storage/snapshot/StorPoolSnapshotStrategy.java
When storagePoolIds are provided and the storage is StorPool, the snapshot is replicated natively between clusters:
- Export the snapshot from the local StorPool cluster to the remote location using
snapshotExport(). - Persist recovery information in
snapshot_detailstable with the exported name and location, so that partial cross-zone copies can be recovered. - Copy from remote on the destination StorPool cluster using
snapshotFromRemote(). - Reconcile the snapshot on the remote cluster using
snapshotReconcile(). - Update the
snapshot_store_ref.install_pathin the database to reflect the destination path. - Invoke the async callback with success or failure.
Recovery detail saved:
// Stored so incomplete exports can be cleaned up later
String detail = "~" + snapshotName + ";" + location;
new SnapshotDetailsVO(snapshot.getId(), SP_RECOVERED_SNAPSHOT, detail, true);File: server/src/main/java/com/cloud/storage/snapshot/SnapshotManagerImpl.java
After snapshot creation (and optional backup):
postCreateSnapshot(): Updates snapshot policy retention — removes the oldest snapshot if the retention count is exceeded.snapshotZoneDao.addSnapshotToZone(): Associates the snapshot with its origin zone.- Usage event: Publishes
EVENT_SNAPSHOT_CREATEwith the physical size of the snapshot. - Resource limit correction: For delta (incremental) snapshots, decrements the pre-allocated resource count by
(volumeSize − snapshotPhysicalSize)since the actual snapshot is smaller than the volume. copyNewSnapshotToZones()(synchronous backup path only): Copies the snapshot to secondary storage in additional destination zones.copyNewSnapshotToZonesOnPrimary()(synchronous backup path only): Copies the snapshot to additional primary storage pools.
File: server/src/main/java/com/cloud/storage/snapshot/SnapshotManagerImpl.java
The outer try/catch in takeSnapshot() ensures resource cleanup on any failure:
} catch (CloudRuntimeException | UnsupportedOperationException cre) {
ResourceType storeResourceType = getStoreResourceType(...);
_resourceLimitMgr.decrementResourceCount(snapshotOwner.getId(), ResourceType.snapshot);
_resourceLimitMgr.decrementResourceCount(snapshotOwner.getId(), storeResourceType, volumeSize);
throw cre;
} catch (Exception e) {
// Same resource rollback
throw new CloudRuntimeException("Failed to create snapshot", e);
}Additional cleanup methods:
| Method | Trigger | Action |
|---|---|---|
cleanupVolumeDuringSnapshotFailure() |
Snapshot creation fails completely | Removes snapshot_store_ref entries (non-Destroyed) and deletes the SnapshotVO record |
cleanupOnSnapshotBackupFailure() |
Async backup exhausts all retries | Transitions snapshot state, removes async job MS_ID, deletes snapshot record |
StorPoolSnapshotStrategy.deleteSnapshot() |
Snapshot DELETE operation on StorPool | Calls StorPool API snapshotDelete, transitions state, cleans up DB |
User/Scheduler
│
▼
CreateSnapshotCmd.create()
│ allocSnapshot() → SnapshotVO persisted (Allocated state)
▼
CreateSnapshotCmd.execute()
│
▼
VolumeApiServiceImpl.takeSnapshot()
│
▼
takeSnapshotInternal()
│ validate volume, account, zones, policies
│
├── [Volume attached to VM] ─────────────────────────────┐
│ takeVolumeSnapshotThroughJobQueue() │
│ VmWorkTakeVolumeSnapshot dispatched │
│ ← job queue serializes VM operations → │
│ ▼
└── [Volume not attached] ──► orchestrateTakeVolumeSnapshot()
│ build CreateSnapshotPayload
│ volume.addPayload(payload)
▼
SnapshotManagerImpl.takeSnapshot()
│
│ StorageStrategyFactory.getSnapshotStrategy(TAKE)
▼
snapshotStrategy.takeSnapshot(snapshot)
│
▼
SnapshotServiceImpl.takeSnapshot()
│ PrimaryDataStoreDriver.takeSnapshot() [async]
│ ← waits on AsyncCallFuture →
│ snapshot created on primary storage
▼
Backup decision
├── BackupSnapshotAfterTakingSnapshot=true
│ backupSnapshotToSecondary() [sync or async]
└── BackupSnapshotAfterTakingSnapshot=false
markBackedUp() / schedule pool copy
▼
postCreateSnapshot()
snapshotZoneDao.addSnapshotToZone()
UsageEventUtils.publishUsageEvent()
_resourceLimitMgr.decrementResourceCount()
copyNewSnapshotToZones() [if zoneIds]
copyNewSnapshotToZonesOnPrimary() [if poolIds]
▼
Return SnapshotInfo to caller
| Class | Package | Role |
|---|---|---|
CreateSnapshotCmd |
api/.../command/user/snapshot |
API command entry point; two-phase create+execute |
VolumeApiServiceImpl |
server/.../storage |
Validates, dispatches, and orchestrates snapshot requests |
VmWorkTakeVolumeSnapshot |
engine/components-api/.../vm |
Work item for job queue; carries all snapshot parameters |
SnapshotManagerImpl |
server/.../storage/snapshot |
Core business logic; strategy selection; resource accounting |
SnapshotHelper |
server/.../snapshot |
Resolves storage pool IDs for cross-zone replication |
SnapshotServiceImpl |
engine/storage/snapshot |
Interacts with primary data store driver asynchronously |
DefaultSnapshotStrategy |
engine/storage/snapshot |
Hypervisor-based (file) snapshot implementation |
StorageSystemSnapshotStrategy |
engine/storage/snapshot |
Managed storage native snapshot implementation |
StorPoolSnapshotStrategy |
plugins/storage/volume/storpool |
StorPool native snapshot; handles DELETE and cross-zone COPY |
StorageStrategyFactory |
engine/storage |
Selects the highest-priority strategy for each operation |
| Parameter | Default | Description |
|---|---|---|
backup.snapshot.after.taking.snapshot (BackupSnapshotAfterTakingSnapshot) |
true |
Whether to back up snapshot to secondary storage after creation |
snapshot.backup.retries |
3 |
Number of retry attempts for asynchronous snapshot backup |
snapshot.backup.retry.interval |
300 (seconds) |
Interval between retry attempts for async backup |
use.storage.replication |
false |
Use native storage replication (e.g., StorPool cross-zone copy) instead of secondary storage copy |
snapshot.copy.multiply.exp.backoff |
— | Exponential backoff configuration for snapshot copy retries |
CloudStack implements rollback at multiple layers to maintain consistency:
- Resource limit rollback — On any exception in
SnapshotManagerImpl.takeSnapshot(), snapshot count and storage quotas are decremented back to their original values. - Volume state rollback —
Volume.Event.OperationFailedis fired so the volume returns toReadystate. - Snapshot state machine — Snapshot transitions to
ErrororDestroyedso it can be cleaned up by the background expunge process. - Async backup failure cleanup — After exhausting all retries,
cleanupOnSnapshotBackupFailure()runs in a transaction to delete the snapshot record and associated job metadata. - StorPool cross-zone recovery — The exported (but not yet imported) snapshot name is persisted in
snapshot_detailswith the keySP_RECOVERED_SNAPSHOT, enabling manual or automated cleanup of partial cross-zone copies.