Skip to content

Revert "ci: add runs-on/cache to skip C++ rebuild on PR CI (#469)"#481

Open
JavaPythonAIForBAT wants to merge 1 commit into
sgl-project:mainfrom
JavaPythonAIForBAT:ci/revert-build-cache
Open

Revert "ci: add runs-on/cache to skip C++ rebuild on PR CI (#469)"#481
JavaPythonAIForBAT wants to merge 1 commit into
sgl-project:mainfrom
JavaPythonAIForBAT:ci/revert-build-cache

Conversation

@JavaPythonAIForBAT
Copy link
Copy Markdown
Contributor

Summary

Revert #469 to unblock PR CI. The cache scheme it introduced has two issues that together break test-build-deepep-a2:

  1. A2 job builds the wrong wheel. push_build_cache.yml's build-cache-a2 runs a bare bash build.sh, whose defaults are BUILD_DEEPEP_OPS=ON and SOC_VERSION=Ascend910_9382 — i.e. an A3 wheel — and stores it under the A2 cache key. PR runs on A2 hardware then restore an A3 wheel and fail at test time.
  2. Cache key vs container CANN mismatch. Cache keys hard-code cann8.5.0, but pr-test-npu.yml containers were upgraded to cann 9.0.0 in cann9.0.0 Adapt #476. Even a correctly-built wheel would be loaded inside a mismatched runtime.

This PR removes both:

  • push_build_cache.yml (deleted)
  • Get build hash / Restore build cache steps and the cache-hit branch in Prepare Deepep for the three pr-test-npu.yml jobs

Container images / other CI behavior are preserved as-is.

The cache mechanism can be reintroduced in a follow-up once the wheel-architecture and CANN-version coupling are fixed (see also #480 for the -a deepep2 and key-bump portion).

Test plan

  • test-all-build, test-build-deepep-a3, test-build-deepep-a2 all run a full bash build.sh ... (no cache) and reach the test stages
  • internode / daily-build-test / build_and_release workflows untouched

🤖 Generated with Claude Code

…ct#469)"

This reverts commit f3f7d23 (sgl-project#469).

Removes the runs-on/cache-based wheel reuse from pr-test-npu.yml and
deletes push_build_cache.yml. The cache scheme has two issues that
together broke test-build-deepep-a2:

  1. push_build_cache.yml's a2 job ran a bare `bash build.sh`, whose
     defaults produce an A3 wheel (Ascend910_9382 + csrc/deepep/ops),
     and stored it under the A2 cache key — so PR runs on A2 hardware
     restored an A3 wheel.
  2. cache key strings hard-code cann8.5.0 while pr-test-npu.yml
     containers were upgraded to cann 9.0.0 in sgl-project#476, so even a
     correctly-built wheel would be loaded inside a mismatched runtime.

Reverting now to unblock PR CI; the cache mechanism can be reintroduced
once both issues are addressed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant