docs(designs): import flexible_rank_assignments design from Discussion #635 by DLWoodruff · Pull Request #716 · Pyomo/mpi-sppy

DLWoodruff · 2026-05-20T20:55:31Z

Summary

Promotes the design that was incubating in Discussion #635 into a tracked file under doc/designs/.
The discussion is stable enough now (Options A–D for window topology, Options BX-1 / BX-2 for BEST_XHAT / RECENT_XHATS, bound-value-field classification, multi-rank overlap maps) that further revisions should go through PR review against this file rather than discussion-body edits.
Content is a verbatim copy of the discussion body as of today, with the top-of-doc heading promoted from H2 to H1 to match the style of the other docs in doc/designs/ (e.g. jensens_bound_design.md, solver_options_redesign.md).

What this PR is and isn't

Is: a docs-only import. No code changes, no behavior changes.
Isn't: a commitment to any particular Option (D, BX-1, BX-2, etc.); those remain open in the design.

Follow-up after merge

Replace the body of Discussion Design: Flexible Rank Assignments for Cylinders #635 with a pointer to this file so the canonical source is unambiguous.
The discussion's comment thread stays on GitHub (no migration needed there).

Test plan

No code changed; nothing to run.
Markdown renders on the GitHub PR preview.

🤖 Generated with Claude Code

…Pyomo#635 The design was incubated in GitHub Discussion Pyomo#635 over several edit rounds; it is now stable enough to live in the repo where the implementation will follow. Subsequent revisions should go through PR review against this file rather than discussion edits. Content is a verbatim copy of the discussion body as of today, with the top-of-doc heading promoted from H2 to H1 to match the other docs in this directory.

codecov · 2026-05-20T21:05:25Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.43%. Comparing base (265345b) to head (7dc9c96).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #716   +/-   ##
=======================================
  Coverage   71.43%   71.43%           
=======================================
  Files         154      154           
  Lines       19463    19463           
=======================================
  Hits        13903    13903           
  Misses       5560     5560

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Option A's con about per-window MPI_Win_lock granularity applies equally to Option D (same single-window-on-fullcomm topology), but was previously only listed under A. Add it to D's cons, and explain why it's not a deal-breaker: reads use LOCK_SHARED so they don't serialize against each other, writes use LOCK_EXCLUSIVE only on a rank's own slot, and splitting a hot cylinder pair into its own window (à la Option B) is an incremental escape hatch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The earlier rewording of this bullet overstated the lock-granularity concern. MPI_Win_lock has per-target-rank semantics in the spec, so cross-cylinder serialization is not the issue; the residual is implementation-dependent progress-engine contention, which is minor. Also split "bookkeeping" into the MPI-side cost (unavoidable, modest) and the mpi-sppy-side cost (avoidable, already covered by the preceding bullet about computing layouts locally). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The original BX-2 bullet jumped straight from "per-scenario layout bad" to the writer rule, without explaining what the current layout duplicates, why non-leaf nodes are the right unit, where the canonical vector physically lives, or how certification works under single-writer publish. Add: - A "what the current layout looks like / what BX-2 changes it to" pair of paragraphs that make the redundancy concrete. - A "why non-leaf nodes specifically" paragraph anchoring on NAC. - A two-stage size example (100 scenarios x 20 nonants -> 100x reduction at ROOT). - A "certification and publish" walkthrough showing that the elected writer's knowledge comes from the existing cylinder- internal Allreduce, not from any new out-of-band mechanism. - A "why the writer has the value to transcribe" note pointing to NAC + the writer rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two small navigation aids: a spoiler note before the window topology options pointing at the eventual Option D recommendation, and a note on the BEST_XHAT/RECENT_XHATS options indicating BX-2 is the preferred direction (with the longer write-up that follows). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

By definition all_scenario_names is the global full set. The asymmetric-ranks design changes how scenarios are distributed across ranks within each cylinder but does not change which scenarios each cylinder handles, so there is no open question here. If subset-of-scenarios support ever lands, it should use a different name rather than reinterpret all_scenario_names. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

No spoke in the current codebase has a hard >=2 rank requirement. The only n_proc constraint in spbase.py:108 is n_proc <= num_scenarios, which permits n_proc=1. MPI collectives are well-defined on single-rank communicators (Allreduce is identity, Barrier is no-op), so cylinder_comm.Allreduce-based machinery in spokes does not break at n_proc=1. The question had no real constraint behind it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The question was self-answering: rank ratios apply to the bundled scenario count, not the original count. That is a design statement, not an open question, so it does not belong in this section. If we want the substance preserved, it should land in Scenario Distribution or User Interface as a design note instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Both halves of this question are already addressed in the Option D cons writeup: the per-rank window-metadata cost is covered by the "Bookkeeping" bullet (MPI-side, modest but unavoidable), and the "limit which ranks participate" sub-question is covered by the incremental escape hatch of splitting hot cylinder pairs into their own window a la Option B. Nothing left to ask here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The word "slab" slipped in from conversation as shorthand for a rank's local window memory. The rest of the doc uses "buffer" for the same concept (25 occurrences), so unify on that. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

These two terms are easy to conflate (window = addressing scope, not a contiguous block of memory; buffer = the per-rank local region exposed through the window). Define them up front so readers have the right mental model before hitting the dense architectural sections. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

DLWoodruff and others added 10 commits May 20, 2026 18:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(designs): import flexible_rank_assignments design from Discussion #635#716

docs(designs): import flexible_rank_assignments design from Discussion #635#716
DLWoodruff wants to merge 11 commits into
Pyomo:mainfrom
DLWoodruff:flexible_rank_assignments

DLWoodruff commented May 20, 2026

Uh oh!

codecov Bot commented May 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DLWoodruff commented May 20, 2026

Summary

What this PR is and isn't

Follow-up after merge

Test plan

Uh oh!

codecov Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented May 20, 2026 •

edited

Loading