I've observed a rare possibility that a node can receive a wake up violation for failing to boot within 30 minutes when the node is in fact shutting down.
Here's the sequence of events:
- Node boots due to farmerbot. Upon boot it sends an uptime report resulting in both
power_managed and power_managed_boot set to None
- But, in the same block as that uptime event, there is also a power target change for
Up for this node. Maybe this shouldn't happen in normal circumstances, but it can and actually has. Since the power state for this node is still Down at this point, power_managed_boot will be set
- The node only sets its power state to
Up in the next block after its first uptime report, typically
- There is a power target change to
Down for this node more than 30 minutes after the target change to Up
- When the node shuts down, it first sets its power state to
Down and thus both power_managed and power_managed_boot are not None
- Next, the node sends a final uptime report before shutting down (usually in the next block after the power state change). At this point, minting interprets this uptime report as a wake up event and assigns the node a violation
If we accept that it's legitimate to send multiple power target changes until a node wakes up, then this definitely shouldn't result in a violation.
Perhaps the solution would be to reorder the sequence of operations in Zos, but I guess that it was implemented this way for a reason, and of course rolling out changes to Zos is slow.
I've observed a rare possibility that a node can receive a wake up violation for failing to boot within 30 minutes when the node is in fact shutting down.
Here's the sequence of events:
power_managedandpower_managed_bootset toNoneUpfor this node. Maybe this shouldn't happen in normal circumstances, but it can and actually has. Since the power state for this node is stillDownat this point,power_managed_bootwill be setUpin the next block after its first uptime report, typicallyDownfor this node more than 30 minutes after the target change toUpDownand thus bothpower_managedandpower_managed_bootare notNoneIf we accept that it's legitimate to send multiple power target changes until a node wakes up, then this definitely shouldn't result in a violation.
Perhaps the solution would be to reorder the sequence of operations in Zos, but I guess that it was implemented this way for a reason, and of course rolling out changes to Zos is slow.