DISCLAIMER: Wasn't sure where to file this, it somewhat seems like a bug but may be more of a feature request.
Current behavior
There's a couple things at play here that all revolve around the status of a deployed resource and the behavior on failure. I refer to ConsoleDefender in most of this issue since that is what I am working with, but I assume these issues/improvements exist/could be made on the other objects as well.
When I deploy a ConsoleDefender and it fails on one of the tasks, let's say for example "Create Defender YAML file": the operator outputs the failure log, then starts running the tasks again from the start. This can make it difficult to parse/follow the log output.
Additionally there is no ability (that I can see) to grab a status from the ConsoleDefender object, the only indication I have of failure for most tasks is through viewing the logs which as mentioned is tricky to do.
Finally, there appears to be poor handling of required values. I have dug myself into a lot of holes while trying to get an initial deployment going because I was missing one of the required pieces of the spec, but there was no validation done at "kubectl apply" time that blocked me from applying it.
Steps to reproduce
A couple scenarios to try:
The "poor validation" issue:
- Deploy the operator
- Deploy a
ConsoleDefender that is missing the orchestrator value
- Notice that on the surface from a view of your cluster everything looks fine: The
ConsoleDefender got created despite having missing pieces of the spec.
The "lack of status" issue:
- Deploy the operator
- Deploy a
ConsoleDefender that is missing the accessToken value
- Notice that again on the surface everything looks fine. This time the console pod goes to running as well, which might mislead sometime to think that everything is working fine despite the admin account/license not being set up.
- Validate that there is no way to check status aside from the logs (I did a yaml dump of the
ConsoleDefender and there is no status field).
The "messy logs" issue:
- Follow the above steps 1-4
- Follow the logs of the operator pod
- You should see that it continually loops repeating all of the tasks despite failure.
Possible solutions
A couple suggestions:
- Exposing a status field on the
ConsoleDefender that tells the current task and when necessary the most recent failed task + error "log"
- Extend the current validation on CRDs to include the required construct on any fields that are required and will cause creation to fail when they don't exist
- Implement a better way of handling failure for the operator: The operator should have some knowledge of what failures can be fixed with a retry and what failures should result in a pause until the
ConsoleDefender is updated. Example of this: If the operator fails to deploy the license because it is invalid, it should not retry. On other failures where a retry might work, the operator should be smarter in how it retries - rather than repeating all tasks it should (in general) just retry the failed task, unless the ConsoleDefender has changed.
DISCLAIMER: Wasn't sure where to file this, it somewhat seems like a bug but may be more of a feature request.
Current behavior
There's a couple things at play here that all revolve around the status of a deployed resource and the behavior on failure. I refer to
ConsoleDefenderin most of this issue since that is what I am working with, but I assume these issues/improvements exist/could be made on the other objects as well.When I deploy a
ConsoleDefenderand it fails on one of the tasks, let's say for example "Create Defender YAML file": the operator outputs the failure log, then starts running the tasks again from the start. This can make it difficult to parse/follow the log output.Additionally there is no ability (that I can see) to grab a status from the
ConsoleDefenderobject, the only indication I have of failure for most tasks is through viewing the logs which as mentioned is tricky to do.Finally, there appears to be poor handling of required values. I have dug myself into a lot of holes while trying to get an initial deployment going because I was missing one of the required pieces of the spec, but there was no validation done at "
kubectl apply" time that blocked me from applying it.Steps to reproduce
A couple scenarios to try:
The "poor validation" issue:
ConsoleDefenderthat is missing theorchestratorvalueConsoleDefendergot created despite having missing pieces of the spec.The "lack of status" issue:
ConsoleDefenderthat is missing theaccessTokenvalueConsoleDefenderand there is no status field).The "messy logs" issue:
Possible solutions
A couple suggestions:
ConsoleDefenderthat tells the current task and when necessary the most recent failed task + error "log"ConsoleDefenderis updated. Example of this: If the operator fails to deploy the license because it is invalid, it should not retry. On other failures where a retry might work, the operator should be smarter in how it retries - rather than repeating all tasks it should (in general) just retry the failed task, unless theConsoleDefenderhas changed.