[MIOpen] Dapper validation phase#8879
Closed
randyspauldingamd wants to merge 63 commits into
Closed
Conversation
CpuActivationPackedMultiThread sized per-thread work with two stacked ceilings (ceil(num_items / 16M), then ceil(num_jobs / num_threads) * 16M), so chunk_size * num_threads overshot num_items by whole chunks. Trailing threads received offsets and ends past the buffer -> heap OOB read -> SIGSEGV. The overshoot only occurs when the thread count does not divide the work evenly, so it was host core-count dependent and reproduced on some machines but not others. Recompute the per-thread item counts with a proportional split: thread t processes [num_items * t / num_threads, num_items * (t + 1) / num_threads). Consecutive threads reuse the same boundary expression, so the ranges are contiguous, non-overlapping, and the final thread ends exactly at num_items. This fixes the item-count calculation directly and removes the separate remainder branch and every clamp -- there is no std::min and no special last-thread case. num_threads <= num_jobs <= ceil(num_items / 16M) guarantees num_items >= num_threads, so every launched thread has a non-empty range. Return std::size_t from CpuActivationGetNumThreads as well: std::min already yields std::size_t, so the previous unsigned return type silently narrowed the value. The single caller deduces the type with auto and uses it only in std::size_t arithmetic, so widening the return type removes the narrowing without any other change. Test-only; library and kernel code are untouched; no performance impact.
❌ PR Check — Action Required
📖 Need help? See the Policy FAQ for details on every check and how to fix failures. |
|
🚫 Please fix the failed policies before requesting reviews. The following policy checks failed:
The |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
MIOpen has adapted Dependency Parser to run in CI. This will reduce CI test time dramatically for small PRs. It uses a naive file change-to-test executable mapping, so the time reduction depends on how many test files consume the modified files. Some larger PRs and PRs that touch core files will not see as good of a time reduction.
Technical Details
This PR enables the Dapper selection and filter generation in MIOpen CI, but the filter is not used. Instead, it adds a ctest
miopen_gtest_sharded_dapperthat interrogates the test results from all shards to determine Dapper's efficacy. It also includes a validation stage which ensures that Dapper would have caught any test failures caused by the changes. If any expected tests did not run,miopen_gtest_sharded_dapperfails. Note that Dapper operates in a subtractive-only manner; meaning it ignores tests that either were not enabled by the user or were disabled via the base gtest_filter.While this PR contains some groundwork for TheRock CI, the intent is to have no effect on TheRock at this time.
Test Plan
Ran a full MIOpen-CI run on gfx90A and gfx950.
Test Result
PASS on gfx90A: this run had 3 changes:
Overall Test ResultandDapper Test Result, deniedMinimal Complianceand causedDapper Complianceto FAIL.Minimal Complianceand causedDapper Complianceto FAIL.The test itself PASSED since all failures that were tested were caught.
gfx950: TBD
Submission Checklist