Skip to content

[toolchain] Harden installation failure handling and RapidJSON CMake support#7583

Open
QuantumMisaka wants to merge 8 commits into
deepmodeling:developfrom
QuantumMisaka:toolchain-issues
Open

[toolchain] Harden installation failure handling and RapidJSON CMake support#7583
QuantumMisaka wants to merge 8 commits into
deepmodeling:developfrom
QuantumMisaka:toolchain-issues

Conversation

@QuantumMisaka

Copy link
Copy Markdown
Collaborator

Linked Issue

Fix #7565 #7566 #7569

Summary

This PR hardens the ABACUS toolchain installation flow for issues found during toolchain/install review.

Changes include:

  • Preserve non-zero installer failures through the top-level toolchain wrappers
  • Propagate argument parsing failures instead of continuing silently
  • Install RapidJSON with a CMake package config so ABACUS can consume an installed RapidJSON dependency reliably
  • Add focused shell/CMake regression tests for the fixed behavior

Root Cause

The wrapper scripts could mask failures from lower-level installers or argument validation paths, making a failed install appear successful.

RapidJSON was installed as headers only, which left CMake package discovery fragile for builds that expect an installed package layout.

Validation

Validated on the SAI CPU-MISC/rush-cpu environment from commit 583c5dc66.

Checks run:

  • toolchain/tests/test_wrapper_failure_propagation.sh
  • toolchain/tests/test_installer_argument_failures.sh
  • toolchain/tests/test_rapidjson_cmake.sh
  • Full toolchain/toolchain_gnu.sh -j 2 installation
  • ./build_abacus_gnu.sh -j 2 with RapidJSON enabled
  • Real ABACUS run of examples/02_scf/01_pw_Si2

The final real calculation job was Slurm 619964, completed with ExitCode=0:0, five SCF iterations (DS1-DS5), and TOTAL Time : 2 in ABACUS output.

Note: the full verification environment hit intermittent GitHub/proxy TLS EOFs while fetching some release tarballs. After supplying checksum-matching source archives, the same installation
continued and passed. These network failures also confirmed that wrapper-level failures now propagate as non-zero exits.

Comment thread cmake/AbacusRapidJSON.cmake Outdated

@Growl1234 Growl1234 Jul 3, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it duplicated effort of leveraging RapidJSONConfig.cmake (including RapidJSONTargets.cmake)?? There should not be any other fallbacks since the target exported is (and should be) totally decided on the RapidJSON side!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. The fallback helper was removed; ABACUS now relies on the exported RapidJSON CMake target and fails if the package config does not provide it.

@Growl1234

Growl1234 commented Jul 3, 2026

Copy link
Copy Markdown

RapidJSON was installed as headers only, which left CMake package discovery fragile for builds that expect an installed package layout.

If the system-installed package has a incomplete CMake configuration, for example, RapidJSON installed from dnf pulls in an invalid one, we should consider blocking system detection or report error when wanted target is not present, rather than adapting to the already fragile case.

Comment thread CMakeLists.txt
Comment on lines 162 to 165
if(ENABLE_RAPIDJSON)
find_package(RapidJSON CONFIG REQUIRED)
abacus_add_feature_definitions(__RAPIDJSON)
target_link_libraries(abacus_external_deps INTERFACE RapidJSON)
include(cmake/AbacusRapidJSON.cmake)
abacus_configure_rapidjson(abacus_external_deps)
endif()

This comment was marked as resolved.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. The fallback path was dropped; ENABLE_RAPIDJSON now requires the exported RapidJSON target and links it through abacus_external_deps.

@mohanchen mohanchen added the Compile & CICD & Docs & Dependencies Issues related to compiling ABACUS label Jul 3, 2026

@Growl1234 Growl1234 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I don't think it's worth bringing in too many helpers just for testing small case; the maintenance burden of toolchain is already high, and fully assigning it to an AI agent should have not been really a good idea...

At least, keeping only non-overlapping behavioral checks would be clearer and less brittle.

Comment thread toolchain/tests/test_rapidjson_cmake.sh Outdated
EOF
}

write_test_project() {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we avoid duplicating the production RapidJSON block in this fixture? The test then needs the source-text checks below to ensure that CMakeLists.txt stays in sync with this copy. A behavioral test of the actual top-level configuration would leave a single source of truth.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 10f35c561. The duplicated fixture block was removed; the test now runs the real top-level configure and checks the generated codemodel.

Comment thread toolchain/tests/test_rapidjson_cmake.sh Outdated
rm -rf "$tmpdir"
}

test_variable_only_package_fails() {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same variable-only negative case that test_top_level_rejects_variable_only_package() exercises immediately below. The latter covers the real consumer, so the standalone case can be removed without losing regression coverage.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 10f35c561. The standalone variable-only case was removed; coverage remains in the real top-level rejection test.

Comment thread CMakeLists.txt Outdated
if(ENABLE_RAPIDJSON)
find_package(RapidJSON CONFIG REQUIRED)
if(NOT TARGET RapidJSON)
message(FATAL_ERROR "RapidJSON package configuration did not define the RapidJSON target")

@Growl1234 Growl1234 Jul 4, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we can make message clearer:

Suggested change
message(FATAL_ERROR "RapidJSON package configuration did not define the RapidJSON target")
message(
FATAL_ERROR
"RapidJSON was found, but target RapidJSON is missing. Check if your RapidJSON installation provides a complete exported CMake configuration."
)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in 10f35c561.

@Growl1234 Growl1234 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, just a minor concern :)

Consider using "Fix #7565, fix #7566, fix #7569" in the description so that all issues are actually linked.

>"${build_dir}.log" 2>&1
}

assert_abacus_target_has_rapidjson_usage() {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm considering if this goes beyond the contract changed in this PR. The existing target_link_libraries(... RapidJSON) path is not modified here; the new behavior is only that a config package must define the canonical RapidJSON target. The two top-level configure cases already cover that contract, so could we drop the File API/Python inspection and keep this test focused?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Compile & CICD & Docs & Dependencies Issues related to compiling ABACUS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Code scan] Preserve installer failures through toolchain wrapper logging pipelines

3 participants