Reconcile VMLocation when it's moved to different RP and set appropriate conditions#1558
Reconcile VMLocation when it's moved to different RP and set appropriate conditions#1558anwithaun-1 wants to merge 21 commits into
Conversation
anwithaun-1
left a comment
There was a problem hiding this comment.
Please take a look at the addressed comments.
anwithaun-1
left a comment
There was a problem hiding this comment.
Resolved all the comments. pls review again.
|
@aruneshpa @bryanv @hpannem , pls review |
| } | ||
|
|
||
| // 3. Validate Ancestry | ||
| // Check if the VM is in the Root RP or a Child RP (common for VKS) |
There was a problem hiding this comment.
what happens if the vm is in correct namespace however wrong child RP? Is this not under the PR's scope?
There was a problem hiding this comment.
what is considered as wrong Child RP?
If the VM is in correct namespace and moved to any of the child RP of that namespace, then it's considered as a valid location as of now
There was a problem hiding this comment.
ok. However we need to confirm this with Bryan or Andrew.
There was a problem hiding this comment.
Hmm. I am with Hemanth on this. The only valid use-case for child RP is VKS. But if the VM is not in the VKS child RP, then we need to report it as an error / invalid location.
There was a problem hiding this comment.
@aruneshpa How to ensure that if the child RP is VKS or not?
| return err | ||
| } | ||
|
|
||
| pkgcnd.Delete(vmCtx.VM, vmopv1.VirtualMachineInAuthorizedLocation) |
There was a problem hiding this comment.
So, absence of this condition has two scenarios.
- the reconcile has not yet happened.
- The reconcile happened and its in correct location.
Would it be better to distinguish these two scenarios?
True: In correct location.
False: not in correct location.
Absent: Not yet run or unable to determine.
There was a problem hiding this comment.
Correct. The way the condition is named right now, it must be set to True during every reconcile if the VM is in a valid location.
If it is only supposed to be set when the VM is in invalid location, then we can call it "VMInInvalidLocation" or something but that violates the principal that the condition types must have positive polarity.
There was a problem hiding this comment.
@hpannem @aruneshpa
ReconcileLocation is called when VM is created or when resourcePool is changed. So in my opinion (Absent: Not yet run or unable to determine ) is not required. We can distinguish between the VM in correct location and in incorrect location. Also "unable to determine" can happen for many reasons. if we are not able to reconcile VM's location and just remove the condition in this case, then it leads to the impression that the VM is normal.
Are these states Ok ?
True: In correct location (when vm is created or or after creation when it's moved to valid location )
False: not in correct location (when vm is moved to invalid location)
There was a problem hiding this comment.
Think of user perspective, who is monitoring the CR.
- VM reconcile is not yet run. There could be many reasons for this. Operator down... or CR is queued and reconciler didn't get a chance to this. Condition absence == reconciler not run. User waits until condition is inserted and checks the status.
- Once reconciled, condition is put. Usually, I believe conditions should not get deleted once they are inserted. Just change true or false or unknown. There may be exceptions in some cases like upgrade etc. However, here, it would be good to keep it.
One correction: "Unable to determine" will go as an unknown on the condition. Condition will be set as "unknown" with a reason and message. Standard k8s pattern. aka. metav1.ConditionUnknown.
True: In correct location.
False: not in correct location.
Unknown: unable to determine.
Absent: Not yet run
| ) | ||
| if err != nil { | ||
| logger.Error(err, "failed to get expected namespace resource pool") | ||
| return nil |
There was a problem hiding this comment.
Hmm. Should this error be ignored? If we are can't be certain that we are in the RP that we should be in, it is okay to error and requeue IMO.
There was a problem hiding this comment.
Reasons why I am not returning the Err here:
- In case if we fail to get namespace RP, then it'll block all the next VM reconcilations. That is the reason , I went ahead with ignoring this error. Still I have handled err in case if I am not able to validate the location or if the VM is in the RP that it shouldn't be in.
- Unit tests failure due to the empty zonename property which is expected by the GetNamespaceFolderAndRPMoID api.
@aruneshpa could you pls provide suggestions on this.
| return err | ||
| } | ||
|
|
||
| pkgcnd.Delete(vmCtx.VM, vmopv1.VirtualMachineInAuthorizedLocation) |
There was a problem hiding this comment.
Correct. The way the condition is named right now, it must be set to True during every reconcile if the VM is in a valid location.
If it is only supposed to be set when the VM is in invalid location, then we can call it "VMInInvalidLocation" or something but that violates the principal that the condition types must have positive polarity.
| "summary.runtime.connectionState", | ||
| "summary.runtime.host", | ||
| "summary.runtime.powerState", | ||
| "resourcePool", |
There was a problem hiding this comment.
Should we also watch the folder ("parent") here?
There was a problem hiding this comment.
@aruneshpa Initially I had added "parent" and removed it as @bryanv mentioned that "the VM managed object doesn't have a parent field" in of the review thread above. I am not sure about this. Could you please provide more clarity.
Minimum allowed line rate is |
What does this PR do, and why is it needed?
Problem:
Currently, the VM Operator does not set appropriate conditions or halt reconciliation when a VM is moved out of its designated resource pool. This leads to two major issues:
Permission Creep: DevOps admin privileges are inadvertently extended to reconfigure VMs that have drifted into different inventory locations.
Lack of Visibility: There is no "stop-gap" or clear status indicator telling administrators that the VM is in an orphaned state, leading to unpredictable reconciliation behavior.
Solution:
State Conflict: Two instances of the same workload existing simultaneously in different locations.
Which issue(s) is/are addressed by this PR? (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #
VMSVC-3539
VMSVC-3541
Are there any special notes for your reviewer:
Please add a release note if necessary: