Skip to content

Implement CMake build system for turbo-stack#233

Draft
hctorres wants to merge 64 commits into
mainfrom
192-task-implement-improved-build-system-within-turbo-stack
Draft

Implement CMake build system for turbo-stack#233
hctorres wants to merge 64 commits into
mainfrom
192-task-implement-improved-build-system-within-turbo-stack

Conversation

@hctorres
Copy link
Copy Markdown
Collaborator

Summary

  • Replaces the legacy mkmf-based build with a full CMake build system orchestrated by scripts/build.sh -> scripts/build_turbo_stack.sh
  • Infrastructure backend selected via --infra FMS2 (default, Spack-installed) or --infra TIM (pre-built from TIM_ROOT); both expose TURBO::infra_r8 so downstream targets are backend-agnostic
  • cmake/TurboCompilerFlags.cmake sets Fortran precision and optimization flags for GNU, Intel/IntelLLVM, NVHPC, and Flang compilers
  • cmake/add_mom_test.cmake provides add_mom_test() / add_mom_tests() wrappers for pFUnit CTest integration (40 MPI-aware unit tests)
  • marbl_build/CMakeLists.txt wires MARBL into the build via the MARBL::marbl target defined in MOM6's MARBLTargets.cmake
  • spack/create_spack_environment.sh automates Spack environment creation
  • Build scripts support --debug (clean rebuild), --ninja (opt-in; default is Unix Makefiles), --infra, and --build_dir

Environment

For now you will need to set up your own Spack environment using spack/spack.yaml (or one of the compiler-specific variants in spack/). A Derecho-specific environment configuration script is planned and will be added soon.

Test plan

  • scripts/build.sh — FMS2 Release, Unix Makefiles, 40/40 tests pass
  • scripts/build.sh --ninja — FMS2 Release, Ninja, 40/40 tests pass
  • scripts/build.sh --debug — FMS2 Debug clean rebuild, 40/40 tests pass
  • scripts/build.sh --infra TIM — TIM Release, Unix Makefiles, 40/40 tests pass
  • scripts/build.sh --infra TIM --ninja — TIM Release, Ninja, 40/40 tests pass
  • scripts/build.sh --infra TIM --debug — TIM Debug clean rebuild, 40/40 tests pass

hctorres and others added 28 commits April 13, 2026 17:36
CMake:
- TurboOptions: remove TURBO_INFRA/TURBO_MEMORY_MODE; now sourced from
  MOM6Options.cmake as the canonical definition for both repos.
- CMakeLists.txt: resolve MOM6_SOURCE_DIR early and include MOM6Options
  before TurboCompilerFlags so option declarations are never duplicated.
- TurboCompilerFlags: fix dead Clang branch (CMake never assigns "Clang"
  as Fortran compiler ID) to Flang|LLVMFlang; remove unreachable set()
  after FATAL_ERROR.
- add_mom_test: guard empty SUITE_LINK_LIBRARIES to avoid passing bare
  LINK_LIBRARIES keyword; guard BASE_MOM_PFUNIT_INFRA so an empty
  OTHER_SOURCES is never passed to add_pfunit_ctest.
- marbl_build: use CMAKE_CURRENT_SOURCE_DIR instead of CMAKE_SOURCE_DIR
  so the path stays valid if the directory is used outside this project.
- find_package(TIM REQUIRED COMPONENTS R8) to match updated TIMConfig.

Build scripts:
- set -eo pipefail in build.sh, build_turbo_stack.sh,
  create_spack_environment.sh.
- build_turbo_stack.sh: add TURBO_STACK_ROOT guard; set CMAKE_BUILD_TYPE
  to Release by default (Debug only when --debug); propagate build type
  to TIM pre-build; use rm -rf on TIM build dir for clean/stale-cache
  cases instead of --fresh (which leaves generated .cmake files behind);
  pass TIM_DIR to the config-file directory, not the build root; fix
  comment typo and trailing whitespace.
- create_spack_environment.sh: remove stale #--fresh --force comment.
- Trailing newlines added to build.sh and build_turbo_stack.sh.

CI:
- build-tests.yaml: note that ./build.sh is the legacy mkmf build, not
  scripts/build.sh.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…fault generator, stale comment updates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 12, 2026 07:19
@hctorres hctorres linked an issue May 12, 2026 that may be closed by this pull request
1 task
hctorres and others added 30 commits May 18, 2026 16:44
Attach NetCDF::NetCDF_C to TURBO::infra_r8 so the absolute libnetcdf.so
path lands on every link line. FMS::fms_r8 only carries
NetCDF::NetCDF_Fortran in its INTERFACE_LINK_LIBRARIES plus bare -lnetcdf
flags from nf-config --flibs, which fail to resolve on NetCDF installs
with split lib/ vs lib64/ directories (e.g. NCAR Derecho module).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When TURBO_INFRA=TIM, the tests pull AMReX in transitively via
TURBO::infra_r8 and need the full C++ runtime at link time
(libstdc++, libgcc_s, etc -- libgcc_s carries the unwind support
that __cxa_call_terminate depends on for static-archive C++ exception
handling). gfortran as the link driver auto-adds -lgfortran but
nothing from the C++ side; g++ as the driver auto-adds the full C++
runtime and CMake still appends -lgfortran because Fortran objects
contribute to the target. Works for both FMS2 and TIM.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add --parallel N / -j N to scripts/build_turbo_stack.sh and rename
scripts/build_dependencies_from_source.sh's existing --jobs N to the same
--parallel|-j spelling.  Plumb the flag through both orchestrators
(build_on_derecho.sh, build_with_spack.sh) so a single invocation
propagates the same parallelism to the deps step and the turbo-stack
build step.

Default in both child scripts is now 1 (serial), not nproc.  nproc
over-reports on shared login nodes and on PBS/SLURM allocations that
don't pin cpusets -- using it as a default produced silent
oversubscription on shared hosts.  Make the caller pass the value they
actually want.

Verified locally: build_with_spack.sh succeeds for --infra FMS2 and
--infra TIM, both at the new serial default and with --parallel 32;
40/40 ctest passes in both modes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The parser was using `*) break ;;` for unrecognized args, which silently
exited the parse loop and left the unknown args in $@.  Nothing
downstream read $@, so anything unrecognized was dropped without a
warning -- a typo like `--paralel 32` would happily configure FMS2 with
serial build and never complain.

Change `*)` to print an error and exit 1.  Add `--) shift; break ;;` so
the explicit end-of-args sentinel is honored, and splice the remaining
$@ into `cmake --build` so callers can forward verbatim flags:

  build_turbo_stack.sh -- -v             # cmake's own --verbose
  build_turbo_stack.sh -- --target MOM6  # build a single target
  build_turbo_stack.sh -- -- -j 16       # second `--` goes to the generator

Happy path is unchanged: with no `--` on the command line, $@ is empty
and the spliced call is identical to before.  Verified locally:
build_with_spack.sh --parallel 32 still produces 40/40 ctest passes;
`--paralel 32` now exits 1 with a clear error pointing at `--`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three renames in scripts/setup_environment/:

  derecho.sh                              -> derecho_cpu_gcc_openmpi.sh
  with_spack.sh                           -> spack_local_environment.sh
  derecho_modules_emulation_with_spack.sh -> emulate_derecho_modules_locally_with_spack.sh

The first establishes a <machine>_<arch>_<compiler>_<mpi> convention
that scales as we add Derecho GPU and other variants.  The second is
more self-describing than the prior bare `with_spack.sh`.  The third
moves the verb to the front so the dev-harness nature is obvious -- it
no longer reads as just-another-machine sitting alongside
`derecho_cpu_gcc_openmpi.sh`.

Update every reference to the old names across the repo:

  - scripts/build_on_derecho.sh, scripts/build_with_spack.sh
    (the source paths that would have broken sourcing)
  - in-script doc comments and self-references in
    spack_local_environment.sh,
    emulate_derecho_modules_locally_with_spack.sh,
    build_dependencies_from_source.sh
  - scripts/README.md layout block + workflow examples
  - CLAUDE.md script-structure + architecture diagrams
  - spack/derecho_modules_emulation_with_spack.yaml header comment

The spack env name (`derecho_modules_emulation_with_spack`) and the
matching yaml filename were intentionally left as-is -- only the shell
scripts were renamed.

Fixed one unrelated stale flag reference noticed while sweeping
scripts/README.md: build_dependencies_from_source.sh now exposes
`--parallel|-j` rather than `--jobs`.

Verified locally: build_with_spack.sh --infra TIM --parallel 32 still
passes 40/40 ctest with all the new source paths in place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the script exported TURBO_STACK_ROOT, MOM6_ROOT, TIM_ROOT,
and FMS_ROOT pinned to /glade/u/home/htorres/turbo_build_pr_tester/... .
That made the file unusable for anyone but the original author and
broke the symmetry with build_with_spack.sh, which requires
TURBO_STACK_ROOT (and SPACK_ROOT) from the shell profile.

Remove the hardcoded exports.  TURBO_STACK_ROOT is still validated at
the top -- with no fallback -- so misconfiguration fails fast with a
clear error.  The MOM6 / FMS / TIM / pFUnit / AMReX _ROOT vars stay
optional (build_dependencies_from_source.sh falls back to the
corresponding submodule when they are unset) and are now documented in
the script's usage block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`build_args+=(--ninja)` was guarded by $ninja, but $ninja was never
declared or parsed in this script -- the line could never fire.  It
gave the misleading impression that --ninja was an accepted flag on
build_on_derecho.sh when in fact it was silently dropped by the
parser's `*) break ;;` (now an explicit error after f43457f).

Just delete the dead line.  If a future Derecho user wants Ninja
builds, add --ninja to the parser at that point.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Whitespace only.  Bodies of --no-fms / --no-pfunit / --no-amrex /
--no-tim now land at the same column 24 as --tag / --prefix /
--parallel|-j / --rebuild above them.  The asymmetry crept in when
--parallel|-j (longer than the previous --jobs) bumped the alignment
for the first four cases but the bottom four kept their original
indent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two flag-surface changes to scripts/build_dependencies_from_source.sh:

1. Drop the --no-fms / --no-pfunit / --no-amrex / --no-tim quartet.
   They duplicate --only LIST, which strictly subsumes them.  One way
   to select what to build (--only) is enough.

2. Generalize --rebuild from a global boolean to --rebuild [LIST].
   Bare --rebuild keeps the previous "rebuild every selected dep"
   semantics.  --rebuild amrex rebuilds only AMReX -- useful when
   iterating on a single dep on slow hosts like Derecho, where the
   prior all-or-nothing forced you to either wait through 10+ minutes
   of FMS/pFUnit/TIM you didn't need or hand-delete sentinels.  Bash
   "optional argument" handled by consuming $2 only when it looks like
   a value (not another --flag, not empty).

Validation for --rebuild LIST mirrors the existing --only validation
(unknown dep names error out at parse time).  New helper
_should_rebuild encapsulates the per-dep decision; the old "if
$_rebuild != true then SHA-check else rebuild" branch in
_build_and_install_dep collapses to a clearer if/elif.

Also expand the --tag help text to spell out that the slot can be
keyed on toolchain / GPU backend / build type -- the prior one-liner
called it "per-toolchain prefix tag" which is too narrow.

Verified locally:
  - `--no-fms` rejected as unknown option
  - `--rebuild bogus` rejected with valid-names hint
  - `--tag testname --only tim --parallel 32` installs to
    deps/testname/install/
  - `--rebuild tim` triggers the "force-rebuild requested" path
  - `build_with_spack.sh --infra TIM --parallel 32` still passes 40/40

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end test driver for the cmake-build-system PRs.  Drives both the
FMS2 and TIM build flavors on Derecho via build_on_derecho.sh against
the matching feature branches in MOM6 / TIM / FMS.

Caller responsibilities:
  - Have turbo-stack already checked out at the desired branch and
    state.  Script does NOT pull, switch branches, or update submodules
    under TURBO_STACK_ROOT -- it trusts the caller's working tree.
  - Export TURBO_STACK_ROOT (required, no default).

Driver responsibilities:
  - Override clones for MOM6 / TIM / FMS into TURBO_BUILD_TEST_DIR
    (default $HOME/turbo_build_pr_tester).  Idempotent fetch + hard
    reset to origin/<branch> on subsequent runs.
  - --clean flag for a fresh-slate run: rm -rf the override clones,
    turbo-stack's build/ and deps/, and the per-flavor logs.  Guarded
    against TURBO_BUILD_TEST_DIR being "", /, or $HOME.
  - --only FMS2|TIM to run a single flavor while iterating.
  - --parallel N (-j N) forwarded to build_on_derecho.sh; default 128
    = one full Derecho compute node.
  - tee each build to TURBO_BUILD_TEST_DIR/logs/{fms2,tim}.log;
    exit code captured via ${PIPESTATUS[0]} so tee doesn't mask the
    real status.
  - Print the SHA + branch of every checkout up front for
    reproducibility, and a PASS/FAIL/SKIPPED summary at the end.  Both
    flavors run even when one fails (failure propagates only at the
    final exit).
  - Branch names centralized in a config block at the top so
    retargeting a different PR set is a one-block edit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five fixes to test_new_build_system_on_derecho.sh -- two correctness
bugs, one fragility, two stale comments / drift:

1. --clean was silently broken.  The inline comment "# where deps build
   from submodules go" sat after a `\` line-continuation, which is NOT
   valid bash: `\<space>` escapes the space (literal arg), the `#`
   starts a comment terminating the line, and the next line's
   "$log_dir" becomes an orphan statement that bash tries to execute.
   Effects: $log_dir was never removed by --clean, and the script
   aborted on the orphan line before continuing.  Hoist the comment
   above the rm.

2. ${TURBO_BUILD_SYSTEM_TEST_DIR:=$TMPDIR/...} fails under `set -u`
   when $TMPDIR is unset (common on bare ssh shells, containers, some
   non-PBS Linux setups).  Derecho's PBS sets it so the script "worked
   on Derecho"; elsewhere it died at line 79 before doing anything.
   Use ${TMPDIR:-/tmp} as a fallback.

3. Renamed the default path component to turbo_build_system_test so it
   matches the variable name (was turbo_build_pr_tester from before
   the rename to TURBO_BUILD_SYSTEM_TEST_DIR).

4. Updated the --clean safety-guard comment that still referenced
   "$HOME defaults" -- default is under $TMPDIR now.

5. The TIM build block reassigned the global $build_dir while the
   FMS2 block correctly used a local-style $fms_build_dir.  Mirror the
   FMS2 pattern with $tim_build_dir.  Not a current correctness issue
   (nothing downstream reads $build_dir), but a footgun if anyone adds
   another build later.

Verified locally:
  - bash -n: syntax clean
  - $TMPDIR unset: script now reaches TURBO_STACK_ROOT validation
    instead of dying at the := expansion
  - --clean against a sandbox: both $log_dir and $TURBO_STACK_ROOT/deps
    now get cleaned in one rm -rf, and the script proceeds past
    --clean to the clone step (previously aborted)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First of three commits implementing the deps-script refactor (plan in
/home/lalo/.claude/plans/toasty-dancing-teacup.md).

scripts/build_dep.sh defines a `build_dep` function -- one call per dep,
with three source-discovery modes:
  1. --source PATH                       explicit override
  2. --clone --url U --ref R             clone into --clone-dest
     --clone-dest DIR                    (also exports <NAME>_ROOT)
  3. $<NAME>_ROOT env var                set externally (e.g. by
                                         fetch_source.sh)
  4. Submodule fallback                  per-name table inside script
Caller passes per-call cmake args after a `--` separator (matches the
pattern build_turbo_stack.sh already uses).

Sentinel expanded beyond the old SHA-only format -- now a KV file with
sha + source path + install prefix + sha256 of sorted cmake args.
Skip-on-rerun fires only when all four match.  Catches the case where
an AMReX rebuild flips -DAMReX_GPU_BACKEND=CUDA on or off, which the
old sha-only sentinel would have silently ignored.  Atomic write via
temp file + mv so a SIGINT mid-write can't leave a half-baked sentinel.

Side effects preserved from the old script: appends $install_prefix to
CMAKE_PREFIX_PATH (with dedup guard); for name=pfunit also exports
PFUNIT_DIR via the versioned cmake-dir glob.  Log phrasing preserved
("[build_dep] $name @ ${sha:0:8} already installed -- skipping" etc.)
for CI-log grep continuity.

scripts/fetch_source.sh defines a `fetch_source` function -- narrowly
scoped to source-only-consumed deps (MOM6, MARBL) that turbo-stack's
own CMakeLists.txt reads via <NAME>_ROOT but does not build.  Same
clone-or-update logic as test_new_build_system_on_derecho.sh's inline
_clone_or_update helper, lifted into a reusable library.  --url is
required (no default): hardcoded name-to-URL map would duplicate
.gitmodules and silently rot when forks move.

No callers touched in this commit.  build_dependencies_from_source.sh
still in use; bash -n green on both new files; build_with_spack.sh
--parallel 32 still passes 40/40 ctest.  Trivially reversible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 2 of 3 in the deps-script refactor (plan in
/home/lalo/.claude/plans/toasty-dancing-teacup.md).  build_dep.sh and
fetch_source.sh were added in e08c29c with no callers touched.  This
commit moves every caller off the old monolithic deps script and onto
the new per-dep model.

Files migrated:
  - scripts/setup_environment/derecho_cpu_gcc_openmpi.sh: grew from
    7 lines (just module loads) to ~50 lines.  After module loads, sources
    build_dep.sh and calls build_dep four times -- one for FMS, pFUnit,
    AMReX, TIM -- with the cmake flags previously hardcoded inside
    build_dependencies_from_source.sh.  Forwards --parallel via the
    TURBO_DEP_PARALLEL env var.  Future per-machine env scripts (e.g.
    derecho_gpu_nvhpc_openmpi.sh) will differ from this only by adding
    -DAMReX_GPU_BACKEND=CUDA after `--` in the AMReX call.

  - scripts/setup_environment/emulate_derecho_modules_locally_with_spack.sh:
    same shape after the spack-env activation -- four build_dep calls.

  - scripts/build_on_derecho.sh: dropped the source-build_dependencies_from_source.sh
    block.  Sets TURBO_DEP_PARALLEL from --parallel, then sources the env
    script which now handles deps.

  - scripts/build_with_spack.sh: replaced the source-build_dependencies_from_source.sh
    --only tim block with a single inline build_dep tim call (still gated on
    --infra TIM, since that conditional is orchestrator-level).

  - test_new_build_system_on_derecho.sh: replaced the inline _clone_or_update
    helper (12 lines) with `source fetch_source.sh` and three fetch_source
    calls for MOM6/TIM/FMS overrides.  Identical behavior, less duplication.

build_dependencies_from_source.sh is now orphaned but still present;
commit 3 will delete it.

Verified locally:
  - bash -n green on all 7 touched scripts.
  - build_with_spack.sh --parallel 32: 40/40 ctest.
  - build_with_spack.sh --infra TIM --parallel 32: 40/40 ctest.
  - Sentinel-skip rerun: emits
    `[build_dep] tim @ ae308ea8 already installed -- skipping`
    (preserved phrasing from the old `[build_dependencies_from_source]`
    log lines, just renamed bracket tag).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 3 of 3 in the deps-script refactor (plan in
/home/lalo/.claude/plans/toasty-dancing-teacup.md).  build_dep.sh +
fetch_source.sh added in e08c29c; callers migrated in 6e5f2cd.  Now
delete the orphaned old script and reconcile docs/comments.

Deleted:
  - scripts/build_dependencies_from_source.sh (304 lines, fully
    subsumed by scripts/build_dep.sh).

Doc + comment updates:
  - scripts/README.md: full rewrite of the layout, workflow, and
    dependency-resolution sections.  Pipeline narrowed from 3 steps to 2
    (env scripts now own the build_dep call list).  New sections
    documenting the build_dep + fetch_source function signatures and
    the four-mode source resolution.
  - CLAUDE.md: script-structure table updated to list build_dep.sh and
    fetch_source.sh; component-relationships block expanded with the
    Derecho orchestrator path; source-override example updated to show
    multi-dep overrides via env vars.
  - scripts/build_turbo_stack.sh: header comment pointer updated from
    build_dependencies_from_source.sh to build_dep.sh.
  - CMakeLists.txt: two TURBO_INFRA comments updated.
  - spack/derecho_modules_emulation_with_spack.yaml: header comment
    updated.
  - test_new_build_system_on_derecho.sh: in-script comments referencing
    the old deps script + _clone_or_update helper updated.

Verified locally:
  - `git ls-files | xargs grep` finds no remaining references to
    `build_dependencies_from_source` or `_clone_or_update` in tracked
    files.
  - build_with_spack.sh --infra TIM --parallel 32: 40/40 ctest.
  - build_with_spack.sh --parallel 32 (separately, FMS2 path): also
    40/40 in earlier verification.

Note: setup_env_separation_plan.md is untracked and still contains the
old script references in its historical narrative.  Out of scope for
this commit since it's not in the repo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The boolean flags in test_new_build_system_on_derecho.sh were named
override_MOM6_submodule, override_TIM_submodule, override_FMS_submodule.
The naming was misleading: it implied "this flag controls whether we
override the submodule", but after the build_dep refactor that's no
longer true.  $<NAME>_ROOT exported in the calling environment also
overrides the submodule (via build_dep's source-resolution precedence
layer), and the script does nothing to enable or disable that path.

What the flags actually control is whether THIS SCRIPT does a fresh
clone via fetch_source.  Rename to fetch_MOM6 / fetch_TIM / fetch_FMS
to match the fetch_source function being called, and expand the
surrounding comment to spell out the two ways a caller can override a
submodule:

  1. Leave fetch_<NAME>=true (default) -- the script clones the
     PR branch from the configured URL and exports <NAME>_ROOT.
  2. Set fetch_<NAME>=false AND export <NAME>_ROOT before launching --
     the script doesn't touch the clone dir; downstream build_dep
     picks up the caller's <NAME>_ROOT.

The deps_that_override_submodules_dir directory name is unchanged --
it still genuinely holds submodule overrides.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit renamed override_*_submodule -> fetch_* to better
describe what the flags actually do.  But the flags were direct
assignments (`fetch_MOM6=true`), which meant flipping one off required
editing the script -- the previous commit's documentation acknowledged
this with an awkward "(after editing)" caveat.

Promote each flag to `: "${fetch_<NAME>:=true}"` so callers can flip
them off via the environment without touching the file:

  fetch_MOM6=false MOM6_ROOT=$HOME/dev/MOM6 \
      ./test_new_build_system_on_derecho.sh

Pair an exported <NAME>_ROOT with fetch_<NAME>=false to test against a
local dev tree: the script stays out of the override dir, build_dep
(called from the env script) picks up the env-var path via its own
precedence layer.

Expand the header Configuration: block to list MOM6_ROOT / TIM_ROOT /
FMS_ROOT and the fetch_<NAME> flags with concrete examples.  Trim the
now-redundant inline comment block above the flag definitions; the
workflow lives in the header.

Verified: `fetch_MOM6=false bash -c '...'` correctly leaves the flag
at "false" while the := default fires when the var is unset.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before this commit, the per-machine env scripts and build_with_spack.sh
hardcoded the dep cmake build + install root as
$TURBO_STACK_ROOT/deps/default.  That put 10+ GB of generated artifacts
inside the source-controlled checkout, requiring the test driver's
--clean block to reach back into $TURBO_STACK_ROOT/deps to wipe them
between runs.  The pattern contradicted the earlier goal of keeping
all build artifacts under $TURBO_BUILD_SYSTEM_TEST_DIR.

Make the dep root configurable via the TURBO_DEPS_ROOT env var:

  - scripts/setup_environment/derecho_cpu_gcc_openmpi.sh:
    `: "${TURBO_DEPS_ROOT:=$TURBO_STACK_ROOT/deps/default}"` before
    computing _install_prefix / _build_root.
  - scripts/setup_environment/emulate_derecho_modules_locally_with_spack.sh:
    same.
  - scripts/build_with_spack.sh: same `:=` default at the top of the
    inline `build_dep tim` block.

  - test_new_build_system_on_derecho.sh: exports
    TURBO_DEPS_ROOT="$TURBO_BUILD_SYSTEM_TEST_DIR/turbo-stack-deps"
    so the env script's build_dep calls land their builds + installs
    under the test dir alongside everything else this driver creates.
    The --clean block drops the `$TURBO_STACK_ROOT/deps` line and
    nukes $TURBO_DEPS_ROOT instead.  Header docstring updated to spell
    out that all artifacts now live under $TURBO_BUILD_SYSTEM_TEST_DIR.

Behavior preserved for callers who don't set TURBO_DEPS_ROOT: the `:=`
default still resolves to $TURBO_STACK_ROOT/deps/default, matching
what the scripts did before.  Existing build_with_spack.sh users (no
test driver) see no change.

Verified locally:
  - bash -n green on all four touched scripts.
  - build_with_spack.sh --infra TIM --parallel 32 (default TURBO_DEPS_ROOT):
    40/40 ctest passes; artifacts at $TURBO_STACK_ROOT/deps/default/.
  - env TURBO_DEPS_ROOT=/tmp/test_deps_root_override
    bash scripts/build_with_spack.sh --infra TIM --parallel 32:
    40/40 ctest passes; artifacts land at /tmp/test_deps_root_override/
    (no $TURBO_STACK_ROOT/deps/ pollution).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit on this file moved TURBO_DEPS_ROOT from a single
global export ($TURBO_BUILD_SYSTEM_TEST_DIR/turbo-stack-deps) to two
per-flavor exports (one under each $build_dir/turbo-stack-with-*/deps).
The original global export and its accompanying mkdir + --clean
references were left in place, creating dead code:

  - The global `export TURBO_DEPS_ROOT="$TURBO_BUILD_SYSTEM_TEST_DIR/turbo-stack-deps"`
    was overwritten by both per-flavor blocks before the env script
    ever read it.  No dep ever landed at the original path.
  - The mkdir created an empty unused directory there.
  - The --clean rm list referenced the stale value.
  - The block comment claimed TURBO_DEPS_ROOT was "exported below so
    the env script picks it up" -- accurate for the prior design but
    misleading after the per-flavor switch.

Cleanup:
  - Drop the global export.
  - Replace the block comment with one that explains the per-flavor
    isolation pattern.
  - Drop $TURBO_DEPS_ROOT from the mkdir line and the --clean rm list.
    Per-flavor deps live at $build_dir/turbo-stack-with-*/deps, so
    wiping $build_dir already wipes the from-source dep installs --
    documented inline in the --clean block for the next reader.

No behavior change for normal runs (the per-flavor exports already
won).  --clean now does fewer redundant rm-rf invocations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The old parallel-forwarding mechanism was clunky:
  - Orchestrators parsed --parallel N, then exported a custom
    TURBO_DEP_PARALLEL env var for one consumer (env script's
    build_dep calls) AND passed --parallel N as a CLI arg to another
    (build_turbo_stack.sh).
  - Env scripts had to detect TURBO_DEP_PARALLEL, build a
    _parallel_args=() array, and splice "${_parallel_args[@]}" into
    each of four build_dep calls.
  - Leaf tools (build_dep, build_turbo_stack) always passed --parallel
    to cmake --build with a default of 1.

CMake already supports CMAKE_BUILD_PARALLEL_LEVEL as a native env var:
when --parallel isn't given to `cmake --build`, cmake reads
CMAKE_BUILD_PARALLEL_LEVEL as its own default.  Adopt that mechanism:

Orchestrators (build_on_derecho.sh, build_with_spack.sh, test driver):
  - When --parallel N is given, `export CMAKE_BUILD_PARALLEL_LEVEL=N`
    once.  No more TURBO_DEP_PARALLEL.  No more --parallel flag
    forwarding to child scripts.

Env scripts (derecho_cpu_gcc_openmpi.sh, emulate_*):
  - Drop the _parallel_args=() array and the TURBO_DEP_PARALLEL
    detection.  Each build_dep call shrinks by one line.

Leaf tools (build_dep.sh, build_turbo_stack.sh):
  - Default changes from _parallel="1" to _parallel="".  When unset,
    don't pass --parallel to cmake -- cmake's native env-var fallback
    handles it (CMAKE_BUILD_PARALLEL_LEVEL if set, generator default
    otherwise).  --parallel N on the CLI still works as a per-call
    override.

Behavior change to note:
  - No --parallel anywhere + no CMAKE_BUILD_PARALLEL_LEVEL set +
    Ninja generator (--ninja): now uses nproc (Ninja's native default)
    instead of 1 (the old explicit cap).  User accepted this.
  - Make-with-no-overrides behavior unchanged (still 1).
  - With --parallel N at the orchestrator: identical to before, but
    the value flows through CMAKE_BUILD_PARALLEL_LEVEL rather than
    being plumbed.
  - CMAKE_BUILD_PARALLEL_LEVEL=N exported externally (in shell
    profile, qsub directive, CI config) now works without any wrapper
    --parallel flag.

Verified locally:
  - bash -n green on all 7 touched scripts.
  - bash scripts/build_with_spack.sh --parallel 32: 40/40 ctest.
    bash -x confirms CMAKE_BUILD_PARALLEL_LEVEL=32 exported.
  - bash scripts/build_with_spack.sh --infra TIM --parallel 32:
    40/40 ctest.
  - env CMAKE_BUILD_PARALLEL_LEVEL=16 bash scripts/build_with_spack.sh
    --infra TIM: 40/40 ctest (external env var propagates without
    the orchestrator's --parallel flag).
  - grep across all tracked files confirms no TURBO_DEP_PARALLEL or
    _parallel_args references remain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…--build_dir.

Bundles together the deps-build-root simplification + pre-PR doc refresh.

CLI surface simplification:
  The previous design exposed --deps-build-root as a CLI flag on both
  build_on_derecho.sh and build_with_spack.sh, with three-level
  precedence (CLI flag > env var > --build_dir derivation > default).
  In practice the only common use cases are "deps next to my build"
  and "deps in the default location" -- the explicit flag adds CLI
  surface for a niche use case.

  Drop the flag from both orchestrators.  Deps location now derives
  from --build_dir directly: $build_dir/deps when given, else
  $TURBO_STACK_ROOT/deps/default.  Power users who genuinely need
  deps in a path unrelated to the turbo-stack build_dir can source
  the env script directly with its --deps-build-root sourced flag,
  which is still supported.

Strict argument parsers:
  Both orchestrators previously had `*) break ;;` to silently accept
  unknown options.  Combined with bash's source-inherits-$@ behavior,
  this leaked stray args through to sourced env scripts -- e.g.
  build_with_spack.sh --deps-build-root /tmp/x errored inside
  spack_local_environment.sh, not at the orchestrator boundary.

  Switch both to strict error-on-unknown, matching build_turbo_stack.sh
  after f43457f.  Typos now fail loudly with a clear error at the
  orchestrator level.

Env-script parameter passing:
  Replace the env-var (TURBO_DEPS_BUILD_ROOT) plumbing with sourced
  args: orchestrators compute the resolved path and pass it via
  `source env.sh --deps-build-root "$path"`.  Env scripts parse `$@`
  for their own --deps-build-root flag and use a local
  $_deps_build_root variable; no exported state.  Defaults to
  $TURBO_STACK_ROOT/deps/default when neither orchestrator nor caller
  provides a value.

Test driver:
  Comment block updated to describe the new mechanism ($build_dir
  passed to build_on_derecho.sh auto-derives deps location).  No
  behavioral change -- the test driver already passes --build_dir, so
  the auto-derivation triggers the same per-flavor isolation as
  before.

Doc refresh:
  scripts/README.md: fix the "3-step pipeline" header (was 2 steps);
  add "Where do dep builds + installs land?" section explaining the
  --build_dir auto-derive and power-user --deps-build-root path; add
  "Parallel build jobs" section documenting CMAKE_BUILD_PARALLEL_LEVEL
  propagation; drop the reference to setup_env_separation_plan.md
  (that doc was local-only design scratch and is not part of the PR).
  CLAUDE.md: same pattern -- drop the plan-doc reference, add brief
  paragraphs on deps location and parallelism plumbing.

Verified locally:
  - bash -n green on all 7 touched scripts.
  - build_with_spack.sh --parallel 32: 40/40 ctest (default path).
  - build_with_spack.sh --infra TIM --parallel 32 --build_dir /tmp/X:
    40/40 ctest, deps land at /tmp/X/deps/install (auto-derived).
  - build_with_spack.sh --infra TIM --parallel 32: 40/40 ctest
    (default deps path = $TURBO_STACK_ROOT/deps/default).
  - build_with_spack.sh --foo bar: exits 1 with "unknown option" at
    the orchestrator's parser (no more silent swallow).
  - grep across tracked files: no TURBO_DEPS_(ROOT|BUILD_ROOT)
    references; no setup_env_separation_plan references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the test driver invoked build_on_derecho.sh once per flavor.
Each invocation sourced derecho_cpu_gcc_openmpi.sh, which derived its
deps location from the per-flavor --build_dir and rebuilt FMS / pFUnit
/ AMReX / TIM into a flavor-private install tree.  Net result: AMReX
alone got built twice per run, ~5-10 min wasted on Derecho per second
flavor.

Source the env script ONCE up front instead, with an explicit
--deps-build-root pointing at a shared location
($TURBO_BUILD_SYSTEM_TEST_DIR/shared-deps).  Deps build + install once;
the env script appends the install prefix to CMAKE_PREFIX_PATH; both
per-flavor build_turbo_stack.sh calls find the shared install
naturally.  This is safe because the dep cmake args don't depend on
TURBO_INFRA -- FMS / pFUnit / AMReX / TIM build the same regardless of
which infra turbo-stack itself will use.

Replace the per-flavor build_on_derecho.sh invocations with direct
build_turbo_stack.sh invocations.  The test driver now plays the
orchestrator role itself (env source + N turbo-stack builds), which is
the right shape for a multi-build harness -- build_on_derecho.sh
remains the simple "one-call-does-everything" path for individual
dev runs.

The env-script source happens with set -e active, so a module-load
failure or dep-build failure bails the whole test driver cleanly.
The per-flavor turbo-stack builds remain under set +e with
${PIPESTATUS[0]} capture, so a failure in one flavor still lets the
other run and the summary reports both verdicts.

--clean now wipes $shared_deps_dir explicitly (previously it was
covered by wiping $build_dir, but the deps are no longer under
$build_dir).

Header docstring updated:
  - --parallel description now says "exported as CMAKE_BUILD_PARALLEL_LEVEL"
    (no longer accurate to say "forwarded to build_on_derecho.sh" since
    we don't invoke it anymore).

Layout under $TURBO_BUILD_SYSTEM_TEST_DIR after this change:
  deps_that_override_submodules/   # MOM6 / TIM / FMS source clones
  shared-deps/                     # built-once FMS/pFUnit/AMReX/TIM
    build/<name>/
    install/
  turbo-stack-build/               # per-flavor turbo-stack builds
    turbo-stack-with-FMS2/
    turbo-stack-with-TIM/
  logs/turbo-stack-with-{FMS2,TIM}.log

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…2's.

Previously the top-level CMakeLists trusted that FMS2's and TIM's
FindNetCDF.cmake copies were identical and put only FMS2's on
CMAKE_MODULE_PATH. Move the append into each TURBO_INFRA branch so the
chosen backend supplies its own copy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
build_dep's CMAKE_PREFIX_PATH / PFUNIT_DIR exports lived only on the
rebuild path, so a re-source after the first successful build (when
the sentinel already matches) returned without exposing the install
and downstream find_package() calls fell through to whatever the
enclosing environment supplied. Extract the side-effect block into
_build_dep_expose_install and call it on the "already installed --
skipping" path as well.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
turbo-stack tracks features ahead of the released FMS package (e.g.
the omp_offload-aware mpp_do_group_update on FMS dev/turbo), so
linking against spack's FMS quietly used a stale version. Treat FMS
like TIM: remove it from spack/spack.yaml and have build_with_spack.sh
call build_dep fms unconditionally so the build always reflects
\$FMS_ROOT (or the submodule fallback). CLAUDE.md updated to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Task]: Implement improved build system within turbo-stack

3 participants