Skip to content

adaptive: honor user-provided capacityBytes when provider stats are unavailable#13059

Open
genegr wants to merge 2 commits intoapache:4.20from
genegr:feat/adaptive-lifecycle-honor-user-capacity
Open

adaptive: honor user-provided capacityBytes when provider stats are unavailable#13059
genegr wants to merge 2 commits intoapache:4.20from
genegr:feat/adaptive-lifecycle-honor-user-capacity

Conversation

@genegr
Copy link
Copy Markdown

@genegr genegr commented Apr 22, 2026

Description

Registering an adaptive-plugin-backed managed primary pool currently fails with Capacity bytes not available from the storage provider, user provided capacity bytes must be specified even when capacityBytes= is actually passed to createStoragePool, whenever the provider cannot report capacity at that moment (for example a FlashArray pod with no quota and no footprint yet, or a transient probe failure).

Root cause lives in AdaptiveDataStoreLifeCycleImpl.initialize(): the user-supplied capacity was guarded behind stats != null, so any null stats caused a fall-through to the "no user capacity either" error branch even when the user did provide one.

This change accepts the user-supplied value unconditionally and uses the provider stats only as an upper-bound sanity check when they are actually available. The "no user-provided capacity, no provider capacity" branch is preserved and still raises the same InvalidParameterValueException.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)

Feature/Enhancement Scale or Bug Severity

Major for any deployment that uses the adaptive storage framework against a provider which cannot report capacity synchronously at pool-register time — registration will always fail regardless of what capacityBytes is passed.

How Has This Been Tested?

Validated end-to-end on a 4.23-SNAPSHOT lab:

  • Registered a FlashArray primary pool (provider="Flash Array", transport=nvme-tcp) against an empty Purity pod with capacitybytes=1099511627776 and capacityiops=100000. Before this change, the registration failed with the error above; after this change, the pool enters the Up state using the user-provided capacity.
  • Registered a pool against a pod that does report stats; the pool's capacity comes from the provider as before, and the upper-bound check still rejects a user-supplied capacityBytes that exceeds the provider's capacity.
  • Exercised a 20 GiB volume end-to-end (create, attach to a Rocky 9 VM, mkfs.ext4 + SHA-256 write/verify, live-migrate between two KVM hosts with the data disk attached — no I/O gap across the migrate).

@winterhazel
Copy link
Copy Markdown
Member

@genegr as this is a simple bug fix, could you rebase over branch 4.20 to include it on the next 4.20/4.22 minor versions?

…navailable

AdaptiveDataStoreLifeCycleImpl.initialize() guarded the user-provided
capacityBytes behind stats != null when computing the pool capacity to
persist. As a consequence, any adaptive provider that could not report
capacity yet (for example a FlashArray pod that has not been given a
quota and has no footprint yet, or a transient probe failure) caused the
whole pool registration to fail with "Capacity bytes not available from
the storage provider, user provided capacity bytes must be specified"
even when the operator had passed capacityBytes= on createStoragePool.

Accept the user-supplied value unconditionally and use the provider
stats only as an upper-bound sanity check when they are actually
available. The "no user-provided capacity, no provider capacity" branch
is preserved and still raises the same InvalidParameterValueException.

Signed-off-by: Eugenio Grosso <eugenio.grosso@gmail.com>
@genegr genegr force-pushed the feat/adaptive-lifecycle-honor-user-capacity branch from e7d2553 to 25641c3 Compare April 23, 2026 00:23

if (CollectionUtils.isNotEmpty(quarantinedAddressesIDs)) {
sc.setParameters("id", quarantinedAddressesIDs.toArray());
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@genegr IP related changes are not relevant to the PR description.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 16.26%. Comparing base (9f96c9d) to head (0970c23).

Files with missing lines Patch % Lines
...tore/lifecycle/AdaptiveDataStoreLifeCycleImpl.java 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               4.20   #13059   +/-   ##
=========================================
  Coverage     16.26%   16.26%           
  Complexity    13435    13435           
=========================================
  Files          5665     5665           
  Lines        500556   500555    -1     
  Branches      60790    60789    -1     
=========================================
+ Hits          81416    81432   +16     
+ Misses       410036   410016   -20     
- Partials       9104     9107    +3     
Flag Coverage Δ
uitests 4.15% <ø> (ø)
unittests 17.12% <0.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@genegr genegr changed the base branch from main to 4.20 April 23, 2026 07:37
@genegr
Copy link
Copy Markdown
Author

genegr commented Apr 23, 2026

Rebased onto apache/cloudstack:4.20 per @winterhazel's request — single commit 25641c3938 adaptive: honor user-provided capacityBytes when provider stats are unavailable.

No content changes; only the base.

Copy link
Copy Markdown
Contributor

@sureshanaparti sureshanaparti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@sureshanaparti
Copy link
Copy Markdown
Contributor

@genegr can you check/fix the build error.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes managed primary storage pool registration for adaptive plugin providers by honoring user-supplied capacityBytes even when provider capacity stats are unavailable at registration time.

Changes:

  • Use user-provided capacityBytes regardless of whether provider stats are returned.
  • When provider stats are available, keep an upper-bound sanity check to reject capacityBytes that exceed provider capacity.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 225 to 229
if (capacityBytes != null && capacityBytes != 0) {
if (stats != null && stats.getCapacityInBytes() > 0 && stats.getCapacityInBytes() < capacityBytes) {
throw new InvalidParameterValueException("Capacity bytes provided exceeds the capacity of the storage endpoint: provided by user: " + capacityBytes + ", storage capacity from storage provider: " + stats.getCapacityInBytes());
}
parameters.setCapacityBytes(capacityBytes);
}
if (capacityBytes != null && capacityBytes != 0) {
if (stats != null && stats.getCapacityInBytes() > 0 && stats.getCapacityInBytes() < capacityBytes) {
throw new InvalidParameterValueException("Capacity bytes provided exceeds the capacity of the storage endpoint: provided by user: " + capacityBytes + ", storage capacity from storage provider: " + stats.getCapacityInBytes());
Address Copilot feedback on PR apache#13059:

- The capacityBytes check now uses > 0 instead of != 0, so a negative value no longer slips through and gets persisted as a storage capacity.

- Reworded the exception message for the exceeds-capacity check: Provided capacity bytes exceed ... (grammar).

Signed-off-by: Eugenio Grosso <eugenio.grosso@gmail.com>
@genegr
Copy link
Copy Markdown
Author

genegr commented Apr 23, 2026

Pushed 0970c239c1 adaptive: address review on capacityBytes validation to address the new round of Copilot feedback:

  • capacityBytes > 0 instead of capacityBytes != 0 on the guarded block — a negative value now falls through to the "no user-provided capacity" branch rather than being persisted as garbage capacity (@Copilot line 229). The existing "zero == not provided" semantics are preserved.
  • Exception message reworded to "Provided capacity bytes exceed the capacity of the storage endpoint" (@Copilot line 227).

About the build check failure (@sureshanaparti)

I looked into the GHA build failure on the previous head 25641c3938 and it's a Maven file-lock contention timeout, unrelated to this PR's code:

org.eclipse.aether.SyncContext$FailedToAcquireLockException:
Could not acquire shared lock for artifacts: ... in 900 SECONDS;
consider using 'aether.syncContext.named.time' property to increase lock timeout

The artifacts it couldn't lock are a mix of 4.23.0.0-SNAPSHOT jars (which this 4.20-based PR doesn't produce) — the symptom of another concurrent job on the shared runner holding the ~/.m2/repository aether lock. I verified the PR branch's root pom.xml is <version>4.20.4.0-SNAPSHOT</version>, matching upstream/4.20, and no code-side reason for the failure. The new push above will kick a fresh CI run that should clear the flake.

Re: inline comment on server/.../IpAddressManagerImpl.java:905

That file isn't touched by this PR — gh api repos/apache/cloudstack/pulls/13059/files confirms the only file changed is plugins/storage/volume/adaptive/src/main/java/org/apache/cloudstack/storage/datastore/lifecycle/AdaptiveDataStoreLifeCycleImpl.java (+7/-6). I suspect the inline anchor got misfiled via GitHub; happy to address any IP-related concern if you can re-anchor on the intended diff hunk (or let me know what you'd like me to look at).

@sureshanaparti
Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17587

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants