Skip to content

test: Resolve L0_backend_python test flakiness issues#8830

Merged
pskiran1 merged 19 commits into
mainfrom
spolisetty/tri-1315-fix-ci-test-l0_backend_python-base
Jun 16, 2026
Merged

test: Resolve L0_backend_python test flakiness issues#8830
pskiran1 merged 19 commits into
mainfrom
spolisetty/tri-1315-fix-ci-test-l0_backend_python-base

Conversation

@pskiran1

@pskiran1 pskiran1 commented Jun 11, 2026

Copy link
Copy Markdown
Member

What does the PR do?

File Changes
qa/python_models/dlpack_square/model.py Added is_cpu() check to use DLPack for GPU. Fixed output tensor name from OUTPUT0 to OUT to match the model config.
qa/L0_backend_python/async_execute/concurrency_test.py Added tearDown() to always close the gRPC stream and client, preventing indefinite hangs when a test assertion fails before stop_stream().
qa/L0_backend_python/decoupled/decoupled_test.py Increased execute_delay from 4s to 10s in test_decoupled_execute_cancel. The prior 1s margin between the model's first loop iteration and client cancellation was insufficient on loaded CI machines.
qa/L0_backend_python/test.sh (1) Clean up orphaned /dev/shm/triton_python_backend_shm_region_* files at before test starts. (2) Filter stale regions in the 4MB size-validation loop so only regions from the current server are checked.
qa/L0_backend_python/examples/test.sh Added retry logic (3 attempts) for git clone python_backend and the instance_kind example, which downloads from GitHub at runtime and is prone to transient 504/timeout errors.
qa/L0_backend_python/model_control/model_control_test.py Timeout fix. Increased curl subprocess timeout from 10s to 30s. The server serializes model load requests, and later models in the invalid-name validation sequence were timing out on loaded CI.
qa/python_models/bls_model_loading/model.py Modified test_load_with_file_override to accept UnicodeDecodeError alongside TritonModelException for the expected-failure path, since binary ONNX data occasionally triggers a pybind11 decode error at the bytes→string boundary. Additionally updated is model ready validation.

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs: triton-inference-server/python_backend#439

Where should the reviewer start?

Test plan:

  • CI Pipeline ID: 54736980

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

@pskiran1 pskiran1 added the PR: test Adding missing tests or correcting existing test label Jun 11, 2026
Comment thread qa/L0_backend_python/async_execute/concurrency_test.py Fixed
@pskiran1 pskiran1 merged commit 4ecff61 into main Jun 16, 2026
3 checks passed
@pskiran1 pskiran1 deleted the spolisetty/tri-1315-fix-ci-test-l0_backend_python-base branch June 16, 2026 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR: test Adding missing tests or correcting existing test

Development

Successfully merging this pull request may close these issues.

4 participants