Skip to content

Inserts timer in check detach volume#6508

Merged
DaanHoogland merged 8 commits intoapache:mainfrom
RodrigoDLopez:inserts-timer-in-check-detach-volume
Dec 16, 2022
Merged

Inserts timer in check detach volume#6508
DaanHoogland merged 8 commits intoapache:mainfrom
RodrigoDLopez:inserts-timer-in-check-detach-volume

Conversation

@RodrigoDLopez
Copy link
Copy Markdown
Contributor

@RodrigoDLopez RodrigoDLopez commented Jun 27, 2022

Description

When detaching a volume, if the volume is in use inside the VM, the hypervisor waits until it is possible to execute the task. This waiting period causes ACS to interpret that the disk could not be removed and presents an error message to the user. This behavior can generate inconsistencies because the volume detach is performed, but the ACS does not identify this information. If the user requests the volume detach again, a new error message is presented, saying that the volume in question is not contained in the target instance.

To work around the situation, the wait.detach.device global configuration was created. It determines, in milliseconds, the time ACS will wait before considering the detach command to have failed. The default value of this property is 10000 (10 seconds).

Also, changes have been made so that an exception is not thrown when the user tries to remove a volume that is no longer contained in the VM. That way, if the detach command fails to extrapolate the time configured by the wait.detach.device, but the volume is still removed from the instance by the hypervisor, the user can later remove it via the UI, without having to stop the VM to perform this procedure.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale

  • Major
  • Minor

How Has This Been Tested?

A value was defined in the wait.detach.device configuration, after that we sent the detachVolume command. By watching the Agent log we could see if the command was successful or if the volume was busy during the detachVolume; if the Volume was busy, we will wait for the value defined in wait.detach.device before assuming that ACS failed to detach the Volume and then we will notify the operators that the volume could not be detached with the message: "Could not detach volume after sending the command and waiting for [wait.detach.device value] milliseconds. Probably the VM does not support the sent detach command or the device is busy at the moment. Try again in a couple of minutes.". Otherwise, the successful message will be shown to notify that the volume was successfully detached.

@sureshanaparti
Copy link
Copy Markdown
Contributor

Hi @RodrigoDLopez Can this detach wait time passed through detach cmd (DettachCommand) and handle the timeout in resource layer, later this might be useful to extend to other hypervisors if needed.

@nvazquez nvazquez added this to the 4.18.0.0 milestone Jun 30, 2022
Copy link
Copy Markdown
Contributor

@JoaoJandre JoaoJandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Daan that the attachOrDetachDevice method should be void. Apart from that, CLGTM.

@github-actions
Copy link
Copy Markdown

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@DaanHoogland
Copy link
Copy Markdown
Contributor

@RodrigoDLopez can you look at the comment and the conflicts?

@acs-robot
Copy link
Copy Markdown

Found UI changes, kicking a new UI QA build
@blueorangutan ui

@blueorangutan
Copy link
Copy Markdown

@acs-robot a Jenkins job has been kicked to build UI QA env. I'll keep you posted as I make progress.

@RodrigoDLopez
Copy link
Copy Markdown
Contributor Author

@RodrigoDLopez can you look at the comment and the conflicts?

Sorry for the delay @daan. Now the wait.detach.device value is configurable via global settings as suggested by @sureshanaparti. And changed the signature of the methods to void. Thus, it is not necessary to return null.

@blueorangutan
Copy link
Copy Markdown

UI build: ✔️
Live QA URL: http://qa.cloudstack.cloud:8080/client/pr/6508 (SL-JID-2439)

@codecov
Copy link
Copy Markdown

codecov bot commented Sep 29, 2022

Codecov Report

Merging #6508 (648f8e2) into main (48ffa5d) will increase coverage by 0.03%.
The diff coverage is 3.33%.

@@             Coverage Diff              @@
##               main    #6508      +/-   ##
============================================
+ Coverage     10.84%   10.87%   +0.03%     
- Complexity     7097     7117      +20     
============================================
  Files          2485     2485              
  Lines        245366   245531     +165     
  Branches      38318    38336      +18     
============================================
+ Hits          26606    26701      +95     
- Misses       215491   215560      +69     
- Partials       3269     3270       +1     
Impacted Files Coverage Δ
...om/cloud/storage/dao/GuestOSHypervisorDaoImpl.java 32.25% <0.00%> (-0.36%) ⬇️
...storage/image/deployasis/DeployAsIsHelperImpl.java 13.26% <0.00%> (-0.21%) ⬇️
...ain/java/com/cloud/network/vpc/VpcManagerImpl.java 4.12% <0.00%> (-0.01%) ⬇️
...ud/hypervisor/kvm/storage/KVMStorageProcessor.java 4.99% <2.08%> (-0.01%) ⬇️
...n/java/com/cloud/storage/VolumeApiServiceImpl.java 12.90% <33.33%> (+0.03%) ⬆️
.../org/apache/cloudstack/quota/QuotaManagerImpl.java 44.92% <0.00%> (-1.08%) ⬇️
...n/java/com/cloud/vm/VirtualMachineProfileImpl.java 36.44% <0.00%> (-0.70%) ⬇️
...torage/allocator/AbstractStoragePoolAllocator.java 8.88% <0.00%> (-0.16%) ⬇️
...ava/com/cloud/api/query/dao/VolumeJoinDaoImpl.java 5.78% <0.00%> (-0.07%) ⬇️
...dstack/storage/datastore/PrimaryDataStoreImpl.java 2.23% <0.00%> (-0.02%) ⬇️
... and 13 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@RodrigoDLopez RodrigoDLopez requested review from DaanHoogland and JoaoJandre and removed request for DaanHoogland October 4, 2022 18:47
Copy link
Copy Markdown
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some sonar cloud comments, @RodrigoDLopez . I think we are about ready with this.

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 4419

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

Trillian Build Failed (tid-5104)

1 similar comment
@blueorangutan
Copy link
Copy Markdown

Trillian Build Failed (tid-5104)

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@sonarqubecloud
Copy link
Copy Markdown

SonarCloud Quality Gate failed.    Quality Gate failed

Bug C 1 Bug
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 3 Code Smells

1.8% 1.8% Coverage
0.0% 0.0% Duplication

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 4547

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@RodrigoDLopez
Copy link
Copy Markdown
Contributor Author

code lgtm

Has anyone tested it ?

@RodrigoDLopez will you extend this to support detach ISO ?

This PR will only address Volume detach with delay. As we do not have a use case for applying a delay in the ISO detach process, we will not extend it right now; however, if the community presents a solid use case, we can address it in a new issue/PR.

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-5199)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 39821 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr6508-t5199-kvm-centos7.zip
Smoke tests completed. 103 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_08_upgrade_kubernetes_ha_cluster Failure 556.08 test_kubernetes_clusters.py

@DaanHoogland
Copy link
Copy Markdown
Contributor

@RodrigoDLopez will you apply @stephankruggg 's sugestions?
@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

address Stephan reviews

Co-authored-by: Stephan Krug <stekrug@icloud.com>
@apache apache deleted a comment from blueorangutan Dec 1, 2022
@apache apache deleted a comment from blueorangutan Dec 1, 2022
@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 4732

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-5345)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 43810 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr6508-t5345-kvm-centos7.zip
Smoke tests completed. 102 look OK, 2 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_08_upgrade_kubernetes_ha_cluster Failure 578.89 test_kubernetes_clusters.py
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers Failure 463.01 test_vpc_redundant.py

@DaanHoogland
Copy link
Copy Markdown
Contributor

code lgtm

Has anyone tested it ?

@RodrigoDLopez can you add a test scenario to the description?

@DaanHoogland
Copy link
Copy Markdown
Contributor

A value was defined in the wait.detach.device configuration and verified in the VR, if the LOG files were rotated after reaching the limit size and time configured to rotate the log files. In case the file size is not big enough, the files are not rotated.

I'm a bit confused @RodrigoDLopez . are you mixing two test scenarios in this description, one of which does not apply to this PR?

@RodrigoDLopez
Copy link
Copy Markdown
Contributor Author

A value was defined in the wait.detach.device configuration and verified in the VR, if the LOG files were rotated after reaching the limit size and time configured to rotate the log files. In case the file size is not big enough, the files are not rotated.

I'm a bit confused @RodrigoDLopez . are you mixing two test scenarios in this description, one of which does not apply to this PR?

I am so sorry @DaanHoogland. I mixed up two of my open PRs. I updated this test scenario description. Again, sorry about that.

@DaanHoogland
Copy link
Copy Markdown
Contributor

I am so sorry @DaanHoogland. I mixed up two of my open PRs. I updated this test scenario description. Again, sorry about that.

perfectly understandable, no problem, can happen.

@DaanHoogland DaanHoogland merged commit 2ed7868 into apache:main Dec 16, 2022
@RodrigoDLopez RodrigoDLopez deleted the inserts-timer-in-check-detach-volume branch January 9, 2023 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants