Skip to content

bug: predicted machines are getting recreated again in Zero DPU #1862

@vinodchitraliNVIDIA

Description

@vinodchitraliNVIDIA

Version

main

Describe the bug.

predicted machines are getting recreated again Zero DPU.

This happening immediately after ingestion.

Second predicted host is rebooting the machine and taking to scout boot. When a real host machine id is sent to control plane, bcz of DB voilation the state wont move. Hence the state machine is stuct.

Minimum reproducible example

Relevant log output

+===+=============================================================+=========================================================================+
| U | fm100ht5ldjmk2g8d4reoopaqu3ec4gtp6n6ds02tdjvfpt23bbr88loj40 | Ready                                                                   |
+---+-------------------------------------------------------------+-------------------------------------------------------------------------+
| U | fm100ht6ipagom8c1q317sbj20ci4h4ojv5hmrun9k1jrcd0t2atcc5hi90 | Ready                                                                   |
+---+-------------------------------------------------------------+-------------------------------------------------------------------------+
| U | fm100ht76l3lfm636jojjsta9nq8b3sk7imlv7c5n92i26r1oge5996k1q0 | Ready                                                                   |
+---+-------------------------------------------------------------+-------------------------------------------------------------------------+
| U | fm100htdcr4ldsaprnqhnnto4rqoelsrk321fbo6olnqbsnf04dg8md93b0 | Ready                                                                   |
+---+-------------------------------------------------------------+-------------------------------------------------------------------------+
| U | fm100htjftl03lsc13vit0ck6ep8duhgk0b8cj1h5a0e80sg0kr248go36g | Ready                                                                   |
+---+-------------------------------------------------------------+-------------------------------------------------------------------------+
| U | fm100htmmqoqqqbe41og7vsv5slhk3mofvv0he52ojggn00cm82nm8l4040 | Ready                                                                   |
+---+-------------------------------------------------------------+-------------------------------------------------------------------------+
| U | fm100htmqd91q0jniqpcc33vs21em5199359di60v6g6kmn264b2suh3q10 | Ready                                                                   |
+---+-------------------------------------------------------------+-------------------------------------------------------------------------+
| U | fm100httrhv8c050jd1ie9ki325s9va5gkdiid495mm8ird9amkhph65dhg | Ready                                                                   |
+---+-------------------------------------------------------------+-------------------------------------------------------------------------+
| U | fm100ps20o2kknoj9td86bhb909mrpon5dd557e57rns95uhuaj90fkjp60 | HostInitializing/SetBootOrder                                           |
|   |                                                             | { set_boot_order_info: Some(SetBootOrderInfo { set_boot_order_jid: None |
|   |                                                             | set_boot_order_state: SetBootOrder                                      |
|   |                                                             | retry_count: 0 }) }                                                     |
+---+-------------------------------------------------------------+-------------------------------------------------------------------------+
| U | fm100ps3vglomg1s72pnrh6o6dn4s1t1ac3pug1kovc8sg6eka5f39mra3g | HostInitializing/SetBootOrder                                           |
|   |                                                             | { set_boot_order_info: Some(SetBootOrderInfo { set_boot_order_jid: None |
|   |                                                             | set_boot_order_state: SetBootOrder                                      |
|   |                                                             | retry_count: 0 }) }                                                     |
+---+-------------------------------------------------------------+-------------------------------------------------------------------------+
| U | fm100ps722crcqbedvuh1t0cvg3e3ofbtah95ldbg33insetue7ng3dmp9g | HostInitializing/SetBootOrder                                           |
|   |                                                             | { set_boot_order_info: Some(SetBootOrderInfo { set_boot_order_jid: None |
|   |                                                             | set_boot_order_state: SetBootOrder                                      |
|   |                                                             | retry_count: 0 }) }                                                     |
+---+-------------------------------------------------------------+-------------------------------------------------------------------------+
| U | fm100ps9d95mupbb46fsiglkd5q1jrci0g5pic662u0sa4bggjccp0dg7hg | HostInitializing/SetBootOrder                                           |
|   |                                                             | { set_boot_order_info: Some(SetBootOrderInfo { set_boot_order_jid: None |
|   |                                                             | set_boot_order_state: SetBootOrder                                      |
|   |                                                             | retry_count: 0 }) }                                                     |
+---+-------------------------------------------------------------+-------------------------------------------------------------------------+
| U | fm100pshuf636ofd49bfmukc3iuk6a3406bl960nbdg8aj3p0i4e785ikng | HostInitializing/SetBootOrder                                           |
|   |                                                             | { set_boot_order_info: Some(SetBootOrderInfo { set_boot_order_jid: None |
|   |                                                             | set_boot_order_state: SetBootOrder                                      |
|   |                                                             | retry_count: 0 }) }                                                     |
+---+-------------------------------------------------------------+-------------------------------------------------------------------------+
| U | fm100psmm915o2qfj44qlncfvlkqbsebr15dvdbqedrj3kcm0mk9tv8o4ag | HostInitializing/SetBootOrder                                           |
|   |                                                             | { set_boot_order_info: Some(SetBootOrderInfo { set_boot_order_jid: None |
|   |                                                             | set_boot_order_state: SetBootOrder                                      |
|   |                                                             | retry_count: 0 }) }                                                     |
+---+-------------------------------------------------------------+-------------------------------------------------------------------------+
| U | fm100psthtahc2pb01ib1ok65vjkjlprhncljr5c1car494hdpcu2d96210 | HostInitializing/SetBootOrder                                           |
|   |                                                             | { set_boot_order_info: Some(SetBootOrderInfo { set_boot_order_jid: None |
|   |                                                             | set_boot_order_state: SetBootOrder                                      |
|   |                                                             | retry_count: 0 }) }                                                     |
+---+-------------------------------------------------------------+-------------------------------------------------------------------------+
| U | fm100psvspafs5t3f774shqqqsm093m4opl85qli192mj5u45u4vrlmc2lg | HostInitializing/SetBootOrder                                           |
|   |                                                             | { set_boot_order_info: Some(SetBootOrderInfo { set_boot_order_jid: None |
|   |                                                             | set_boot_order_state: SetBootOrder                                      |
|   |                                                             | retry_count: 0 }) }                                                     |
+---+-------------------------------------------------------------+-------------------------------------------------------------------------+

Other/Misc.

No response

Code of Conduct

  • I agree to follow NCX Infra Controller's Code of Conduct
  • I have searched the open bugs and have found no duplicates for this bug report

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugA defect in existing software (deprecated - use issue type, but it's needed for reporting now)

    Type

    No fields configured for Bug.

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions