Skip to content

Fix tablet-report-parser.pl for K8s support bundle tablet reports#198

Open
jasonriddell wants to merge 1 commit into
mainfrom
fix/197-k8s-support-bundle-tablet-report
Open

Fix tablet-report-parser.pl for K8s support bundle tablet reports#198
jasonriddell wants to merge 1 commit into
mainfrom
fix/197-k8s-support-bundle-tablet-report

Conversation

@jasonriddell
Copy link
Copy Markdown
Member

Summary

  • Fix fatal error when entities replica IPs (pod IPs) can't match universe nodes' private_ip (pod FQDNs) in Kubernetes deployments. Changed die to a warning so the script continues.
  • Fix empty node_uuid in the tablet table by falling back to nodeName when nodeUuid is absent from K8s universe-details.json, which also resolves UNIQUE constraint failures for replicated tablets.

Closes #197

Test plan

  • Tested with a K8s support bundle containing 36 tserver tablet reports, dump-entities.json, and universe-details.json
  • Verified the script completes without fatal errors
  • Verified node_uuid is populated in the tablet table (using nodeName as identifier)
  • Verified no UNIQUE constraint violations on tablet(node_uuid, tablet_uuid)
  • Non-K8s (VM) deployments are unaffected — nodeUuid is present so the ||= fallback is never triggered, and IP-based matching succeeds as before

Made with Cursor

K8s universe-details.json uses pod FQDNs as private_ip and omits
nodeUuid from nodeDetailsSet. This caused two issues:

1. Fatal error matching entities replica IPs to universe nodes, since
   pod IPs never match FQDNs.
2. Empty node_uuid in the tablet table, causing UNIQUE constraint
   failures for replicated tablets.

Fix: fall back to nodeName as node identifier when nodeUuid is absent,
and warn instead of dying when entities IPs can't be matched to
universe nodes.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown
Contributor

@eugeneckim eugeneckim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasonriddell

I looked at this and these do appear to be low risk changes. I'm not married to the following list of things (hence approving):

  • Print an info/warn when the fallback to nodeName is being used (and what value is being used).
  • When the warning finds a mis match, it might be helpful (if not too heavy a lift) to print the specific mismatches.
  • Finally, the previous behaviour was to die and i think we'd no longer die with the new behaviour. Do we need to preserve any of the death type behaviour within the if else if logic?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tablet-report-parser.pl add support for support bundle tablet reports for k8s environments

2 participants