Skip to content

feat(network-nodes): add reliability controls#27

Merged
roderik merged 8 commits intomainfrom
feat/k8s-reliability-hardening
Sep 18, 2025
Merged

feat(network-nodes): add reliability controls#27
roderik merged 8 commits intomainfrom
feat/k8s-reliability-hardening

Conversation

@roderik
Copy link
Copy Markdown
Member

@roderik roderik commented Sep 18, 2025

Summary\n- add configurable PodDisruptionBudgets for validator and RPC StatefulSets with safeguards for min/max exclusivity\n- ship opt-in NetworkPolicy with default ingress/egress baseline for Besu pods, including secure namespace scoping and external peer egress via ipBlocks\n- expose priorityClass hooks so validator/RPC pods can preempt lower tiers\n\n## Testing\n- bun run typecheck\n- bun run check\n- bun test\n- bun run check:fix

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @roderik, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the operational reliability and security of network nodes by integrating Kubernetes-native controls. It provides mechanisms to manage pod disruptions gracefully and to define network access rules, contributing to a more robust and secure infrastructure for Besu nodes.

Highlights

  • PodDisruptionBudgets: Introduced configurable PodDisruptionBudgets for both validator and RPC StatefulSets, allowing for controlled voluntary disruptions and ensuring minimum availability during maintenance.
  • NetworkPolicy: Implemented an opt-in Kubernetes NetworkPolicy with a default ingress/egress baseline for Besu pods, enhancing network security by restricting traffic flow.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions Bot added the qa:running QA workflow is currently running label Sep 18, 2025
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Sep 18, 2025

To view in Slack, search for: 1758177120.620669

@github-actions github-actions Bot added the status:ready-for-review Pull request is ready for review label Sep 18, 2025
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@github-actions github-actions Bot added qa:success QA workflow passed successfully feat New feature and removed qa:running QA workflow is currently running labels Sep 18, 2025
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex Review: Here are some suggestions.

Reply with @codex fix comments to fix any unresolved comments.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

Comment on lines +33 to +37
{{- if $hasMin }}
minAvailable: {{ $minRaw }}
{{- end }}
{{- if $hasMax }}
maxUnavailable: {{ $maxRaw }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Avoid rendering both minAvailable and maxUnavailable in PDB

When a user enables a PodDisruptionBudget and sets maxUnavailable, the template still emits the default minAvailable: 1 because both flags are checked independently. Kubernetes requires exactly one of these fields, so the manifest will be rejected unless callers remember to explicitly null out minAvailable. The chart should automatically suppress minAvailable whenever maxUnavailable is provided (or vice‑versa) to keep the rendered PDB valid.

Useful? React with 👍 / 👎.

Comment on lines +149 to +154
# Allow outbound P2P connections to external peers (public networks).
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 30303
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] External egress rule blocks outbound peers despite comment

The default egress rule meant to "allow outbound P2P connections to external peers" specifies namespaceSelector: {}. NetworkPolicies only match cluster pods when a namespace selector is supplied, so when this policy is enabled Besu pods are still unable to reach Internet peers and will fail to join the network. This rule should either omit the to block entirely or use an ipBlock (e.g. 0.0.0.0/0) to genuinely allow external connections.

Useful? React with 👍 / 👎.

@github-actions github-actions Bot added qa:running QA workflow is currently running and removed qa:success QA workflow passed successfully labels Sep 18, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces reliability controls by adding configurable PodDisruptionBudgets and NetworkPolicies for Besu nodes. The changes are a great step towards improving the resilience and security of deployments. My review focuses on ensuring the correctness and security of these new configurations.

I've found a critical issue in the poddisruptionbudgets.yaml template that could lead to deployment failures due to a missing validation check. There's also a high-severity issue in the default networkPolicy egress rule in values.yaml, which doesn't behave as described in its comment. Additionally, I've noted several medium-severity issues related to misleading comments and overly permissive default settings that could be improved for clarity and security. Overall, the changes are valuable, and addressing these points will make them more robust.

Comment on lines +14 to +16
{{- if not (or $hasMin $hasMax) }}
{{- fail (printf "podDisruptionBudgets.%s requires minAvailable or maxUnavailable" $component.name) -}}
{{- end }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

A PodDisruptionBudget resource is invalid if both minAvailable and maxUnavailable are specified. This template currently lacks a check for this mutual exclusivity, which can lead to deployment failures if a user configures both values. You should add a check to fail the template rendering if both are set.

  {{- if and $hasMin $hasMax }}
    {{- fail (printf "podDisruptionBudgets.%s: minAvailable and maxUnavailable are mutually exclusive" $component.name) -}}
  {{- else if not (or $hasMin $hasMax) }}
    {{- fail (printf "podDisruptionBudgets.%s requires minAvailable or maxUnavailable" $component.name) -}}
  {{- end }}

Comment on lines +150 to +154
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 30303
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The comment on line 149 states this rule is for allowing connections to "external peers (public networks)". However, the rule to: - namespaceSelector: {} allows egress traffic to all pods in all namespaces within the cluster on port 30303, not to external IPs outside the cluster.

To correctly allow egress to external networks, you should use an ipBlock rule. For example:

- to:
  - ipBlock:
      cidr: 0.0.0.0/0
      except:
        # Add your cluster's internal CIDRs here to avoid bypassing other policies
        - 10.0.0.0/8
        - 172.16.0.0/12
        - 192.168.0.0/16
  ports:
    - protocol: TCP
      port: 30303

As the default rule does not match its description and implements incorrect behavior, this is a high-severity issue.

| livenessProbe.timeoutSeconds | int | `2` | Timeout in seconds before marking the probe as failed. |
| nameOverride | string | `""` | Override for the short chart name used in resource naming. |
| networkPolicy.annotations | object | `{}` | Additional annotations to add to the NetworkPolicy metadata. |
| networkPolicy.egress | list | `[{"ports":[{"port":53,"protocol":"UDP"}],"to":[{"namespaceSelector":{},"podSelector":{"matchLabels":{"k8s-app":"kube-dns"}}}]},{"ports":[{"port":30303,"protocol":"TCP"}],"to":[{"podSelector":{"matchLabels":{"app.kubernetes.io/name":"besu-statefulset"}}}]},{"ports":[{"port":30303,"protocol":"TCP"}],"to":[{"namespaceSelector":{}}]}]` | NetworkPolicy egress rules. Leave empty to deny all egress when enabled. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The default values for networkPolicy.egress and networkPolicy.ingress (line 94) are very long and include complex JSON structures. This makes the documentation table difficult to read and parse visually. For better readability, consider replacing these long default values with a concise placeholder like [...] or See values.yaml and let users refer to the values.yaml file for the detailed default configuration.

| podDisruptionBudgets.rpc.enabled | bool | `false` | Enable the RPC PodDisruptionBudget. |
| podDisruptionBudgets.rpc.labels | object | `{}` | Additional labels applied to the RPC PodDisruptionBudget. |
| podDisruptionBudgets.rpc.maxUnavailable | string | `nil` | Maximum RPC pods that can be disrupted at once. Accepts integers or percentages. |
| podDisruptionBudgets.rpc.minAvailable | int | `1` | Minimum RPC pods that must remain available; ignored when maxUnavailable is set. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The description states that minAvailable is "ignored when maxUnavailable is set". This is misleading because the Kubernetes API does not allow both fields to be specified in a PodDisruptionBudget. If both are set, the resource creation will fail. The documentation should be updated to clarify that minAvailable and maxUnavailable are mutually exclusive. This also applies to the validator PDB on line 138.

validator:
# -- Enable the validator PodDisruptionBudget.
enabled: false
# -- Minimum validator pods that must remain available; ignored when maxUnavailable is set.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment is misleading. minAvailable is not simply ignored when maxUnavailable is set; the Kubernetes API requires that only one of these fields be specified for a PodDisruptionBudget. Setting both will cause a deployment failure. The comment should be updated to reflect that they are mutually exclusive. This also applies to the comment for the RPC PDB on line 72.

    # -- Minimum validator pods that must remain available. Must not be set if maxUnavailable is set.

rpc:
# -- Enable the RPC PodDisruptionBudget.
enabled: false
# -- Minimum RPC pods that must remain available; ignored when maxUnavailable is set.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment is misleading. minAvailable is not simply ignored when maxUnavailable is set; the Kubernetes API requires that only one of these fields be specified for a PodDisruptionBudget. Setting both will cause a deployment failure. The comment should be updated to reflect that they are mutually exclusive.

    # -- Minimum RPC pods that must remain available. Must not be set if maxUnavailable is set.

- podSelector:
matchLabels:
app.kubernetes.io/name: graph-node
- podSelector: {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The podSelector: {} rule allows ingress from any pod within the namespace. This is an overly permissive default that weakens the security posture of the NetworkPolicy, as it effectively bypasses the more specific podSelector rules for other in-namespace clients. Consider removing this from the default rules to enforce a more restrictive policy. Users who need broader namespace access can add it back to their custom values.

        # - podSelector: {}

@github-actions github-actions Bot added qa:success QA workflow passed successfully and removed qa:running QA workflow is currently running labels Sep 18, 2025
Comment on lines +33 to +38
{{- if $hasMin }}
minAvailable: {{ $minRaw }}
{{- end }}
{{- if $hasMax }}
maxUnavailable: {{ $maxRaw }}
{{- end }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential bug: The `PodDisruptionBudget` template lacks mutual exclusion validation, allowing both minAvailable and maxUnavailable to be set, which generates an invalid Kubernetes manifest and causes deployment failure.
  • Description: The template for PodDisruptionBudget renders both minAvailable and maxUnavailable if a user provides values for both in their configuration. The template's validation logic correctly ensures at least one of the fields is present but fails to enforce their mutual exclusivity. The Kubernetes API specification states that a PodDisruptionBudget resource can only specify one of minAvailable or maxUnavailable. Consequently, if a user configures both, the generated manifest will be invalid, causing the Kubernetes API server to reject it and leading to a deployment failure.

  • Suggested fix: Add a validation check to the template to ensure minAvailable and maxUnavailable are not set simultaneously. If both are detected, the Helm chart should fail with an informative error message, for example: {{- if and $hasMin $hasMax }} {{- fail (printf "podDisruptionBudgets.%s cannot specify both minAvailable and maxUnavailable" $component.name) -}} {{- end }}.
    severity: 0.6, confidence: 0.95

Did we get this right? 👍 / 👎 to inform future reviews.

@github-actions github-actions Bot added qa:running QA workflow is currently running and removed qa:success QA workflow passed successfully labels Sep 18, 2025
@github-actions github-actions Bot added qa:success QA workflow passed successfully qa:running QA workflow is currently running and removed qa:running QA workflow is currently running qa:success QA workflow passed successfully labels Sep 18, 2025
@github-actions github-actions Bot added qa:success QA workflow passed successfully qa:running QA workflow is currently running and removed qa:running QA workflow is currently running qa:success QA workflow passed successfully labels Sep 18, 2025
@github-actions github-actions Bot added qa:success QA workflow passed successfully qa:running QA workflow is currently running and removed qa:running QA workflow is currently running qa:success QA workflow passed successfully labels Sep 18, 2025
@github-actions github-actions Bot added qa:success QA workflow passed successfully qa:running QA workflow is currently running and removed qa:running QA workflow is currently running qa:success QA workflow passed successfully labels Sep 18, 2025
@github-actions github-actions Bot added qa:success QA workflow passed successfully qa:running QA workflow is currently running and removed qa:running QA workflow is currently running qa:success QA workflow passed successfully labels Sep 18, 2025
@roderik roderik merged commit cb507dd into main Sep 18, 2025
7 checks passed
@roderik roderik deleted the feat/k8s-reliability-hardening branch September 18, 2025 06:45
@github-actions github-actions Bot added status:merged Pull request has been merged and removed status:ready-for-review Pull request is ready for review labels Sep 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat New feature qa:success QA workflow passed successfully status:merged Pull request has been merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant