Skip to content

Linstor: add HA support and small cleanups#8407

Merged
yadvr merged 3 commits into
apache:4.19from
LINBIT:mydev-main
Feb 13, 2024
Merged

Linstor: add HA support and small cleanups#8407
yadvr merged 3 commits into
apache:4.19from
LINBIT:mydev-main

Conversation

@rp-

@rp- rp- commented Dec 27, 2023

Copy link
Copy Markdown
Contributor

Description

This PR add HA support for Linstor primary storage.
It checks output of drbd event and drbd status output, to check if it has quorum or connections to other hosts,
which in turn reveals if this node is isolated or not.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Nested HA Linstor cluster with virtualBMC IPMI:

  • Simulated node crash (kernel panic)
  • Blocking of drbd incoming and outgoing connections

How did you try to break this feature and the system with this change?

@codecov

codecov Bot commented Dec 27, 2023

Copy link
Copy Markdown

Codecov Report

Attention: 89 lines in your changes are missing coverage. Please review.

Comparison is base (af2e277) 30.93% compared to head (2d37f0e) 30.94%.
Report is 1 commits behind head on 4.19.

Files Patch % Lines
...oud/hypervisor/kvm/storage/LinstorStoragePool.java 0.00% 70 Missing ⚠️
...cloudstack/storage/datastore/util/LinstorUtil.java 0.00% 8 Missing ⚠️
...torage/datastore/provider/LinstorHostListener.java 0.00% 6 Missing ⚠️
.../hypervisor/kvm/storage/LinstorStorageAdaptor.java 0.00% 3 Missing ⚠️
...tore/driver/LinstorPrimaryDataStoreDriverImpl.java 0.00% 1 Missing ⚠️
.../provider/LinstorPrimaryDatastoreProviderImpl.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               4.19    #8407      +/-   ##
============================================
+ Coverage     30.93%   30.94%   +0.01%     
- Complexity    34200    34226      +26     
============================================
  Files          5346     5347       +1     
  Lines        375449   375493      +44     
  Branches      54605    54616      +11     
============================================
+ Hits         116140   116200      +60     
+ Misses       244020   243998      -22     
- Partials      15289    15295       +6     
Flag Coverage Δ
simulator-marvin-tests 24.84% <0.00%> (+0.01%) ⬆️
uitests 4.39% <ø> (ø)
unit-tests 16.54% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sureshanaparti

Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8149

@yadvr yadvr left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - all changes are in the specific plugin - as long as the maintainer can certify testing/stability and smoketests pass - we should accept it.

@DaanHoogland

Copy link
Copy Markdown
Contributor

clgtm and i agree with rohit, but can you expand a bit on the testing @rp-? (as there is 0% coverage in this PR)

@rp-

rp- commented Feb 6, 2024

Copy link
Copy Markdown
Contributor Author

clgtm and i agree with rohit, but can you expand a bit on the testing @rp-? (as there is 0% coverage in this PR)

Well as I said, I setup a nested cluster with a VBMC IPMI and configured everything in CloudStack and
tested a few failover scenarios. Like trigger a kernel panic on a node with a running VM (HA enabled) and let it fail over to another node and also see if it got fenced correctly.
The same with the node still seen by CloudStack, but blocking any DRBD traffic and having it look like the node can't access its storage anymore.
This will also trigger the fence action.
Other actions like VM starting/stoping/deleting worked as before.

Do you want me to test any specific scenarios?

@DaanHoogland DaanHoogland left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm, trouwens

@rp-

rp- commented Feb 7, 2024

Copy link
Copy Markdown
Contributor Author

I made a small change, to don't specify the full path to hostname to remedy concerns from this issue: #8310

@rp- rp- force-pushed the mydev-main branch 2 times, most recently from 2c5bcd0 to ce06a0f Compare February 7, 2024 12:45
@github-actions

github-actions Bot commented Feb 8, 2024

Copy link
Copy Markdown

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@yadvr yadvr added this to the 4.18.2.0 milestone Feb 8, 2024
@yadvr

yadvr commented Feb 8, 2024

Copy link
Copy Markdown
Member

The hostname issue is probably a critical/blocker for some users, @rp- can you consider a smaller fix for 4.18 branch? Thanks.

@yadvr

yadvr commented Feb 8, 2024

Copy link
Copy Markdown
Member

@rp- could you help review/fix conflicts and change base branch if it's applicable for 4.18 or 4.19

@rp- rp- changed the base branch from main to 4.19 February 9, 2024 04:48
@rp-

rp- commented Feb 9, 2024

Copy link
Copy Markdown
Contributor Author

@rp- could you help review/fix conflicts and change base branch if it's applicable for 4.18 or 4.19

rebased to 4.19

@rp-

rp- commented Feb 9, 2024

Copy link
Copy Markdown
Contributor Author

@rohityadavcloud here #8633 is a PR for 4.18

@github-actions

github-actions Bot commented Feb 9, 2024

Copy link
Copy Markdown

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@DaanHoogland

Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8627

@yadvr yadvr left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - changes are strictly in the linstor plugin that our smoketests can't test. Github actions confirm other tests are passing with simulator.

@yadvr yadvr merged commit 70b634f into apache:4.19 Feb 13, 2024
@yadvr yadvr modified the milestones: 4.18.2.0, 4.19.1.0 Feb 13, 2024
dhslove pushed a commit to ablecloud-team/ablestack-cloud that referenced this pull request Feb 20, 2024
* linstor: Outline get storagepools from resourcegroup into function

* linstor: move getHostname() to kvm/Pool and reimplement

* linstor: implement CloudStack HA support
@yadvr

yadvr commented Feb 22, 2024

Copy link
Copy Markdown
Member

Hi @rp- @Philipp-Reisner quick question - does Linstor currently support VM HA with ACS 4.18/4.19? cc @harikrishna-patnala @borisstoyanov @NuxRo

@rp-

rp- commented Feb 22, 2024

Copy link
Copy Markdown
Contributor Author

@rohityadavcloud
Not in any released version, this patchset here is needed and also this fix here to get it working with larger setups:
#8670

rp- added a commit to LINBIT/cloudstack that referenced this pull request Feb 24, 2026
* linstor: Outline get storagepools from resourcegroup into function

* linstor: move getHostname() to kvm/Pool and reimplement

* linstor: implement CloudStack HA support
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

No open projects
Status: No status

Development

Successfully merging this pull request may close these issues.

5 participants