docs(designs): import flexible_rank_assignments design from Discussion #635#716
Open
DLWoodruff wants to merge 11 commits into
Open
docs(designs): import flexible_rank_assignments design from Discussion #635#716DLWoodruff wants to merge 11 commits into
DLWoodruff wants to merge 11 commits into
Conversation
…Pyomo#635 The design was incubated in GitHub Discussion Pyomo#635 over several edit rounds; it is now stable enough to live in the repo where the implementation will follow. Subsequent revisions should go through PR review against this file rather than discussion edits. Content is a verbatim copy of the discussion body as of today, with the top-of-doc heading promoted from H2 to H1 to match the other docs in this directory.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #716 +/- ##
=======================================
Coverage 71.43% 71.43%
=======================================
Files 154 154
Lines 19463 19463
=======================================
Hits 13903 13903
Misses 5560 5560 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Option A's con about per-window MPI_Win_lock granularity applies equally to Option D (same single-window-on-fullcomm topology), but was previously only listed under A. Add it to D's cons, and explain why it's not a deal-breaker: reads use LOCK_SHARED so they don't serialize against each other, writes use LOCK_EXCLUSIVE only on a rank's own slot, and splitting a hot cylinder pair into its own window (à la Option B) is an incremental escape hatch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The earlier rewording of this bullet overstated the lock-granularity concern. MPI_Win_lock has per-target-rank semantics in the spec, so cross-cylinder serialization is not the issue; the residual is implementation-dependent progress-engine contention, which is minor. Also split "bookkeeping" into the MPI-side cost (unavoidable, modest) and the mpi-sppy-side cost (avoidable, already covered by the preceding bullet about computing layouts locally). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The original BX-2 bullet jumped straight from "per-scenario layout bad" to the writer rule, without explaining what the current layout duplicates, why non-leaf nodes are the right unit, where the canonical vector physically lives, or how certification works under single-writer publish. Add: - A "what the current layout looks like / what BX-2 changes it to" pair of paragraphs that make the redundancy concrete. - A "why non-leaf nodes specifically" paragraph anchoring on NAC. - A two-stage size example (100 scenarios x 20 nonants -> 100x reduction at ROOT). - A "certification and publish" walkthrough showing that the elected writer's knowledge comes from the existing cylinder- internal Allreduce, not from any new out-of-band mechanism. - A "why the writer has the value to transcribe" note pointing to NAC + the writer rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two small navigation aids: a spoiler note before the window topology options pointing at the eventual Option D recommendation, and a note on the BEST_XHAT/RECENT_XHATS options indicating BX-2 is the preferred direction (with the longer write-up that follows). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
By definition all_scenario_names is the global full set. The asymmetric-ranks design changes how scenarios are distributed across ranks within each cylinder but does not change which scenarios each cylinder handles, so there is no open question here. If subset-of-scenarios support ever lands, it should use a different name rather than reinterpret all_scenario_names. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
No spoke in the current codebase has a hard >=2 rank requirement. The only n_proc constraint in spbase.py:108 is n_proc <= num_scenarios, which permits n_proc=1. MPI collectives are well-defined on single-rank communicators (Allreduce is identity, Barrier is no-op), so cylinder_comm.Allreduce-based machinery in spokes does not break at n_proc=1. The question had no real constraint behind it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The question was self-answering: rank ratios apply to the bundled scenario count, not the original count. That is a design statement, not an open question, so it does not belong in this section. If we want the substance preserved, it should land in Scenario Distribution or User Interface as a design note instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both halves of this question are already addressed in the Option D cons writeup: the per-rank window-metadata cost is covered by the "Bookkeeping" bullet (MPI-side, modest but unavoidable), and the "limit which ranks participate" sub-question is covered by the incremental escape hatch of splitting hot cylinder pairs into their own window a la Option B. Nothing left to ask here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The word "slab" slipped in from conversation as shorthand for a rank's local window memory. The rest of the doc uses "buffer" for the same concept (25 occurrences), so unify on that. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These two terms are easy to conflate (window = addressing scope, not a contiguous block of memory; buffer = the per-rank local region exposed through the window). Define them up front so readers have the right mental model before hitting the dense architectural sections. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
doc/designs/.BEST_XHAT/RECENT_XHATS, bound-value-field classification, multi-rank overlap maps) that further revisions should go through PR review against this file rather than discussion-body edits.doc/designs/(e.g.jensens_bound_design.md,solver_options_redesign.md).What this PR is and isn't
Follow-up after merge
Test plan
🤖 Generated with Claude Code