Skip to content

Fix kubernetes exception identity#3070

Closed
Pr030304 wants to merge 2 commits into
Netflix:masterfrom
Pr030304:fix-kubernetes-exception-identity
Closed

Fix kubernetes exception identity#3070
Pr030304 wants to merge 2 commits into
Netflix:masterfrom
Pr030304:fix-kubernetes-exception-identity

Conversation

@Pr030304
Copy link
Copy Markdown

PR Type

  • Bug fix
  • New feature
  • Core Runtime change (higher bar -- see CONTRIBUTING.md)
  • Docs / tooling
  • Refactoring

Summary

Fix Kubernetes unit-test failures caused by KubernetesException being defined in two different modules. The Kubernetes plugin now uses a single canonical exception class across kubernetes.py and kube_utils.py.

Issue

Fixes #2956

Reproduction

Runtime: local

Commands to run:

python -m pytest test/unit/test_kubernetes.py -v

Where evidence shows up: test output in the local console

Before (error / log snippet)
metaflow.plugins.kubernetes.kube_utils.KubernetesException: ...

while the tests expect:

metaflow.plugins.kubernetes.kubernetes.KubernetesException

This happens because kube_utils.py and kubernetes.py define separate KubernetesException classes, so pytest.raises(KubernetesException) in test_kubernetes.py does not match exceptions raised from kube_utils.py.

After (evidence that fix works)
test/unit/test_kubernetes.py::test_kubernetes_exception_is_shared_between_modules PASSED
test/unit/test_kubernetes.py::test_kubernetes_decorator_validate_kube_labels_fail[labels0] PASSED
test/unit/test_kubernetes.py::test_kubernetes_parse_keyvalue_list[items0-True] PASSED
...
======================= 17 passed in 2.03s =======================
## Root Cause

The Kubernetes plugin defined KubernetesException in two places:

  • metaflow/plugins/kubernetes/kube_utils.py
  • metaflow/plugins/kubernetes/kubernetes.py
    Helper functions such as validate_kube_labels and parse_kube_keyvalue_list raise the class from kube_utils.py, while the tests import KubernetesException from kubernetes.py.

Even though both classes have the same name and base class, they are distinct Python types. As a result, pytest.raises(KubernetesException) does not catch exceptions raised by the helper utilities, causing the Kubernetes unit tests to fail.

Why This Fix Is Correct

The fix restores the intended invariant that the Kubernetes plugin should expose one shared KubernetesException type across its modules.

It is intentionally minimal:

  • kube_utils.py remains the home of the canonical exception class
  • kubernetes.py now imports and re-exports that same class
  • runtime behavior is unchanged apart from exception identity being consistent
    This avoids introducing a new exception module or reshaping the plugin layout for what is fundamentally a class identity bug.

Failure Modes Considered

  1. Backward compatibility for existing imports
    Code that imports KubernetesException from metaflow.plugins.kubernetes.kubernetes still works, because kubernetes.py continues to expose that name.

  2. Runtime behavior changes in the Kubernetes backend
    This change does not alter exception messages, error handling paths, or scheduling behavior. It only ensures that all Kubernetes-related code raises the same exception class.

Tests

  • Unit tests added/updated
  • Reproduction script provided (required for Core Runtime)
  • CI passes
  • If tests are impractical: explain why below and provide manual evidence above
  • [ ]
    Added a regression test in test/unit/test_kubernetes.py to verify that both modules reference the same KubernetesException class.

Locally validated the Kubernetes unit test file and confirmed all tests pass after the fix.

Non-Goals

This PR does not:

  • change Kubernetes job execution behavior
  • change validation logic in validate_kube_labels or parse_kube_keyvalue_list
  • introduce a new exception hierarchy for Kubernetes

AI Tool Usage

  • No AI tools were used in this contribution
  • AI tools were used (describe below)

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 31, 2026

Greptile Summary

This PR fixes two distinct bugs: (1) a class identity mismatch where KubernetesException was defined separately in both kubernetes.py and kube_utils.py, causing pytest.raises(KubernetesException) to miss exceptions raised from helper utilities; and (2) a StopIteration-inside-generator bug in card_server.py that would produce a RuntimeError in Python 3.7+ and also had an off-by-one, yielding max_cards - 1 cards instead of max_cards.

  • kubernetes.py: Removes the duplicate KubernetesException definition and re-exports the canonical class from kube_utils.py. All downstream importers (kubernetes_decorator.py, kubernetes_cli.py, test_kubernetes.py) continue to work unchanged because kubernetes.py still exposes the name.
  • kube_utils.py: Unchanged. The comment "avoid circular import by having the exception class contained here" confirms this was always the intended canonical home.
  • card_server.py: Reorders the curr_idx increment and max_cards guard, and replaces raise StopIteration with return. The old code raised StopIteration inside a generator which Python 3.7+ converts to RuntimeError (PEP 479), and also incremented the counter before the guard, meaning only max_cards - 1 items were ever yielded.
  • Tests: New regression test asserts KubernetesException is KubeUtilsKubernetesException; new test_card_server.py verifies the exact count and values returned at the card limit.

Confidence Score: 5/5

Safe to merge — both fixes are minimal, correct, and well-tested with no runtime-behavior regressions.

All changes are straightforward bug fixes with accompanying regression tests. The Kubernetes exception identity fix is a one-line import swap with no behavioral side-effects. The card server fix correctly addresses both the Python 3.7+ StopIteration-in-generator prohibition and the off-by-one count. No P0 or P1 issues found.

No files require special attention.

Important Files Changed

Filename Overview
metaflow/plugins/kubernetes/kubernetes.py Removes the duplicate local KubernetesException class and imports the canonical one from kube_utils.py, fixing the exception identity mismatch.
metaflow/plugins/cards/card_server.py Fixes a two-part bug: replaces raise StopIteration (breaks generators in Python 3.7+ with RuntimeError) with return, and also corrects an off-by-one that caused max_cards-1 cards to be yielded instead of max_cards.
test/unit/test_kubernetes.py Adds a regression test asserting that KubernetesException from both modules is the same object; also imports the alias for comparison.
test/unit/test_card_server.py New test file covering the cards_for_run limit behaviour: verifies exactly max_cards cards are yielded and that iteration stops cleanly when the limit is hit.

Reviews (1): Last reviewed commit: "Unify KubernetesException across kuberne..." | Re-trigger Greptile

@talsperre talsperre closed this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: KubernetesException defined in two modules causes unit test failures

2 participants