Nexus: Move divert_nexus() and file path operations out of testing.py#5943
Nexus: Move divert_nexus() and file path operations out of testing.py#5943brockdyer03 wants to merge 74 commits into
divert_nexus() and file path operations out of testing.py#5943Conversation
… to reduce `testing.py` dependency.
….py` dependency.
… to reduce `testing.py` dependency.
….py` dependency.
…to update-test-paths
…uce `testing.py` dependency.
…testing.py` dependency.
… to `create_pseudo_files()`
…es()`, switch over `test_pwscf_simulation.py`
…away from `testing.py`
There was a problem hiding this comment.
-
Standard usage of tmp_path does not appear to be safe if multiple invocations execute simultaneously. e.g. The documentation states
When distributing tests on the local machine using pytest-xdist, care is taken to automatically configure a basetemp directory for the sub processes such that all temporary data lands below a single per-test run temporary directorysuggesting that the default pytest execution is unsafe. Multiple simultaneous executions of Nexus testing are run daily, so we will 100% hit problems. The standard way to protect against these simultaneous execution problems is to include one of the relevant process IDs in a directory or filename, or "simply" ensure to generate a race-proof unique directory name for test execution. Based purely on the pytest tmp_path documentation, I think this PR needs modifying to use unique test directories or test names. However, perhaps the implementation is safer than the documentation indicates (???). -
We should configure tmp_path_retention_policy="failed" ; it is bad form to pollute /tmp on a shared machine. https://docs.pytest.org/en/stable/reference/reference.html#confval-tmp_path_retention_policy . (Accidentally filling or abusing /tmp on a shared machine will generate a call from the system administrators. Generally there are specialized temp and work directories for execution.)
-
I will note that without the ability to override the test directory we will probably run into problems in future for any attempted "real" executions of Nexus. There are machines where little can be put in /tmp or mpirun can not be executed. Since the current plan is to continue to test real workflows via the QMCPACK CMake I think we can live without an override for now. Only real workflows generate large intermediate files. However if anyone reports hitting limits or has strange errors, this will be a candidate. Pytest does support
--basetemp=mydirso improving this would only be a little plumbing.
One possibility to address nearly all of the above would be that when invoked via cmake/ctest we set --basetemp to the cmake build directory while for standalone execution we use whatever the default tmp_path gives us. This might be the minimal change and would definitely be OK on all the supercomputers.
Finally, I do wish to underscore that the switch to using a dedicated test directory is a very welcome improvement over the current situation and will avoid pollution of the git repo when run directly.
As you noted, this only applies to runs using
The defaults for Pytest mean that only three test instances are kept, the current and the previous 2 runs. As you run again and again, the oldest ones get deleted. I would be fine changing it to only retain tmp paths from failed tests though.
It is indeed quite simple to override the test directory. Additionally, should we ever need to switch from Pytest to another testing framework then it'd be quite simple to do an introspection on the function signature and pass it a temp directory manually. I doubt though that another test framework will come along without their own method/migration guide that would be even easier to implement.
I agree. A big part of the motivation of this PR was my annoyance with how the test directory gets gummed up with output files. |
|
Starting from the end of my last comment:
Are you able to implement this? This alone alleviates my blocking concerns. i.e. Modify the pytest invocation in the CMakeLists.txt to specify basetemp as the cmake build directory. Now back to the top:
According to the docs, the problem is the other way around. pytest-xdist takes care to generate unique directories while regular pytest does not. This can be verified by inspection, just in case the pytest docs are out of date (unlikely) or the actual code is safer and more conservative than documented (unlikely but not without precedent). Alternatively, if you can implement the CMakeLists.txt modification suggested at the top, this change is less critically needed and this PR can be promptly merged.
I prefer that only failed runs are kept by default to minimize per user usage of /tmp. We have machines that run 10-20 different invocations every night. It is also very bad form to keep these kinds of files around on a shared machine. Changing the default would be good. Alternatively, if you can implement the CMakeLists.txt modification suggested at the top, this change isn't as critically needed in my view (most invocations will be from cmake/ctest) and this PR can be promptly merged. |
jtkrogel
left a comment
There was a problem hiding this comment.
Made it up to test_pyscf. @brockdyer03 I think you get the flavor. Make these repeat changes everywhere.
Following that, I will resume review.
| from pathlib import Path | ||
| from copy import deepcopy | ||
| import functools | ||
| from nexus.nexus_base import ( |
There was a problem hiding this comment.
Single line, or multiple from nexus... lines, no dangling
| nexus_noncore_storage[key] = nexus_noncore[key] | ||
| nexus_noncore[key] = deepcopy(nexus_noncore[key]) | ||
|
|
||
| return nexus_core_storage, nexus_noncore_storage |
There was a problem hiding this comment.
Why are these passed around all the time? Just use a global.
Unless this function is called only in other internal functions, not in individual tests
There was a problem hiding this comment.
That function is never called in individual tests. I specifically made it not a global because I was concerned about potential contamination between test runs. Should something fail during a test that causes the cleanup step to not be run (which I am pretty sure is impossible), restricting this to local scope means that other tests would not be affected by an unintentionally shared global variable.
| logfile = FakeLog() | ||
| generic_settings.devlog = logfile | ||
| object_interface._logfile = logfile | ||
| return logfile, logging_storage |
There was a problem hiding this comment.
Similar, bind to global instead of passing around
There was a problem hiding this comment.
See above comment.
| logging_storage = { | ||
| 'devlog': generic_settings.devlog, | ||
| 'objlog': object_interface._logfile, | ||
| } |
There was a problem hiding this comment.
Indent, here and elsewhere
| tmp_dir: Path, | ||
| pseudos: list[str], | ||
| pseudo_strs: list[str | None] | None = None | ||
| ): |
| TEST_FILES = { | ||
| "pwf.in": TEST_DIR / "test_pwscf_postprocessor_analyzers_files/pwf.in", | ||
| "pwf.out": TEST_DIR / "test_pwscf_postprocessor_analyzers_files/pwf.out", | ||
| } |
| lowdin_file = os.path.join(tpath,'pwf.lowdin_long') | ||
| lowdin_file = tmp_path / 'pwf.lowdin_long' | ||
|
|
||
| pa.write_lowdin(lowdin_file,long=True) |
There was a problem hiding this comment.
protect. I really don't know if I can catch all these. Probably best just to str convert at the beginning of each test function
| infile_path = os.path.join(tpath,'projwfc.in') | ||
| open(infile_path,'w').write(projwfc_in) | ||
| infile_path = tmp_path / 'projwfc.in' | ||
| infile_path.write_text(projwfc_in) |
| write_path = tmp_path / 'projwfc_write.in' | ||
| pi_write = ProjwfcInput(infile_path) | ||
|
|
||
| pi_write.write(write_path) |
| if not os.path.exists(sim.locdir): | ||
| os.makedirs(sim.locdir) | ||
| #end if | ||
| sim_dir = Path(sim.locdir).resolve() |
There was a problem hiding this comment.
I don't like how we rely on Path conversion from the native str type to make a test. Revert.
No, the problem is that
I think it'd be best to do it in |
|
Thanks. The problem I am worried about is higher level, something like: [ repeat for N] i.e. There are many invocations of the entire test framework under the same ID running at the same time. This has similar problems to running tests in parallel. It is a real problem that we have seen in other contexts with the current test system. The test system has to generate unique paths for all the files to avoid this problem. Please make this CMakeLists.txt fix the default for invocations as we do for all QMCPACK builds. No changes to configuration required for the tests to run from the build directory. |
I ran a test yesterday where I ran ctest with 8 parallel processes, and I watched the Pytest temp directory to see how it handled it. That sort of run is like running 8 separate instances of Pytest with them all using the same If you ran like how you describe above, you only get 2 instances of Pytest, which would create two temp directories,
Could you provide a more specific path for the test outputs? I am not sure if you want a |
|
Noting what was discussed off-GitHub: test output should go somewhere in the cmake build directory i.e. CMAKE_BINARY_DIRECTORY when run from the QMCPACK cmake/ctest side. This solves any parallelism, simultaneous execution and use of /tmp concerns for the most frequently executed case. |
|
Wanted to add a note to my latest commit. Turns out that if I just create a single One downside is that, because it would be difficult/complicated to know which test modules actually need a temp directory, I just create a temp directory for every test module, not just those that need it. This currently is not a big problem since an empty directory doesn't really make a huge difference since an empty directory doesn't actually take up that much space on disk, but it can make it slightly annoying to navigate the temp directory, should someone need to. It also should be noted that the test module-specific temp directories are only for non-coverage runs. This is because the way that the coverage runs are done requires invocation of only a single instance of Pytest, which means that it can handle creation of temp directories with no problems. I did, however, still make sure that it is pointed to |
Proposed changes
This PR takes another big step towards the end of #5881 by taking two of the heaviest parts of
testing.pyand either migrating it tonexus/nexus/tests/__init__.py(referred to astests/__init__.py) or switching it to usepytest.Test Isolation Code (Ongoing)
The pieces that handle test isolation were migrated as such:
testing.py::divert_nexus_core()->tests/__init__.py::divert_nexus_core()testing.py::restore_nexus_core()->tests/__init__.py::restore_nexus_core()testing.py::divert_nexus_log()->tests/__init__.py::divert_nexus_log()testing.py::restore_nexus_log()->tests/__init__.py::restore_nexus_log()The two functions that collected their behavior are now combined into a single function decorator called
isolate_nexus_core. The name was chosen overisolate_nexusas it felt like that name had too much ambiguity surrounding it. To be clear, the decorator will divert and restore bothnexus_coreandnexus_log, despite what its name may suggest. The usage of this decorator has two cases, one where you only need to isolate the core/log, and one where you also will need to modifynexus_corewith a temp path.An example of an isolate w/out temp path can be found in
test_nexus_base.py::test_write_splash(), and an example with temp path can be found intest_pwscf_simulation.py::test_check_result().Dummy Pseudopotential Creation
Previously, the function
setup_unit_test_output_directory()accepted a set of arguments that would create dummy pseudopotential files in a directory specified by a test with some text inside them. The majority of this functionality has been migrated totests/__init__.py::create_pseudo_files(). This function takes in three arguments, the temporary test directory, a list of pseudopotential file names, and an optional list of strings corresponding to the text that goes in each. An example of usage can be found intest_vasp_simulation.py::test_check_result().File Path Handling (Ongoing)
This is perhaps the most significant change in this PR. The functions that existed in
testing.pythat contributed to file handling are listed below:nexus_path()executable_path()create_file()create_path()unit_test_file_path()collect_unit_test_file_paths()unit_test_output_path()setup_unit_test_output_directory()The largest change is the setup of unit test output directories, which is now handled by Pytest's built in temporary directory code. In short, simply adding the special argument
tmp_pathto a test's function signature will tell Pytest to create a temp directory and pass in its path to the test function. On my system, this temp directory is created at/tmp/pytest-of-brock/pytest-<N>/<func-name>Mwhere N and M are integers. N will increment every time thepytestcommand is invoked, and M increments if two tests have the same name (e.g.test_vasp_simulation.py::test_check_result()andtest_pwscf_simulation.py::test_check_result()get atmp_pathcalledtest_check_result0andtest_check_result1.). On the event of test failure, the first thing printed is the location of the temporary directory for that test. Additionally,pytestdefaults to storing the current test run, plus the previous 2 test runs, for a total of 3 test runs stored in/tmp/pytest-of-user/.The
nexus_path()andexecutable_path()functions were for getting the path to a specific Nexus directory, such as thebindirectory. This has been replaced by relative paths in each test, such as the following for getting the path to theqmcascript:The
unit_test_file_path()andcollect_unit_test_file_paths()functions were similarly replaced by hardcoded file paths in each test.Non-obvious Changes
There are some things that changed that might not have an obvious reason. For example, this PR has fewer tests than before, not because I have accidentally deleted tests, but because several test modules contained a function called
test_files()that was designed to collect the test module's related files and check that they existed. These functions were removed as the tests just have hardcoded dictionaries with the test paths, which naturally will lower the test count.TODO
isolate_nexus_coreto perform cleanup even on test failureisolate_nexus_coreto inspect the function signature and automatically handletmp_pathcases.tests/__init__.py.What type(s) of changes does this code introduce?
Does this introduce a breaking change?
What systems has this change been tested on?
Laptop, Fedora 43 KDE Plasma
Python 3.14.4
uv 0.11.8
Packages specified in
uv.lock.Checklist