Skip to content

Nodes can receive a wake up violation when they are actually shutting down #29

@scottyeager

Description

@scottyeager

I've observed a rare possibility that a node can receive a wake up violation for failing to boot within 30 minutes when the node is in fact shutting down.

Here's the sequence of events:

  1. Node boots due to farmerbot. Upon boot it sends an uptime report resulting in both power_managed and power_managed_boot set to None
  2. But, in the same block as that uptime event, there is also a power target change for Up for this node. Maybe this shouldn't happen in normal circumstances, but it can and actually has. Since the power state for this node is still Down at this point, power_managed_boot will be set
  3. The node only sets its power state to Up in the next block after its first uptime report, typically
  4. There is a power target change to Down for this node more than 30 minutes after the target change to Up
  5. When the node shuts down, it first sets its power state to Down and thus both power_managed and power_managed_boot are not None
  6. Next, the node sends a final uptime report before shutting down (usually in the next block after the power state change). At this point, minting interprets this uptime report as a wake up event and assigns the node a violation

If we accept that it's legitimate to send multiple power target changes until a node wakes up, then this definitely shouldn't result in a violation.

Perhaps the solution would be to reorder the sequence of operations in Zos, but I guess that it was implemented this way for a reason, and of course rolling out changes to Zos is slow.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions