[AutoTuner] Restore resume check by luarss · Pull Request #3070 · The-OpenROAD-Project/OpenROAD-flow-scripts

luarss · 2025-04-13T10:43:21Z

This pull request includes several changes to the AutoTuner tests to re-enable previously disabled tests, improve the test setup, and add new utility methods.

Fixes #3005

Re-enabling tests and improving setup:

flow/test/test_autotuner.sh: Re-enabled the test_tune_resume test in the AutoTuner script by uncommenting the test execution line.

Codebase simplification and improvements:

tools/AutoTuner/test/resume_check.py: Removed unnecessary directory changes and imported the glob module to facilitate file pattern matching. [1] [2]
tools/AutoTuner/test/resume_check.py: Introduced the check_trial_times method to check the modification times of trial iterations, improving the robustness of the test_tune_resume test.

vvbandeira · 2025-04-21T16:35:52Z

+        folders = glob.glob(os.path.join(experiment_dir, f"variant-*-or-{iteration}"))
+        return max((os.path.getmtime(folder) for folder in folders), default=9e99)


This is a little obfuscated:

The function's name does not match the expected return value type/format. A check should return True/False.

Returning 9e99 is not clear on what is going on.

Creating a folder does not confirm the run status. For this test to be true to its purpose, we need to guarantee that it has not finished, not just started.

We should consider using a get_experiment_status function to check if all iterations have finished running; this function would return at least "RUNNING" and "FINISHED", other states might be helpful but are not required now.

The function's name does not match the expected return value type/format. A check should return True/False.

Can possibly change it to get_trial_times

Returning 9e99 is not clear on what is going on.

It is just a dummy value, to compare for latest modified time in while loop line 121-128

# Check if first config is complete while True: cur_modified_time = self.check_trial_times() print(f"Current modified time: {cur_modified_time}") print(f"Latest modified time: {latest_modified_time}") if abs(cur_modified_time - latest_modified_time) < 1e-3: break latest_modified_time = cur_modified_time time.sleep(10)

Creating a folder does not confirm the run status. For this test to be true to its purpose, we need to guarantee that it has not finished, not just started.

This function returns the latest modified time of a given iteration (folder names are matched using iteration glob) - so if a run is completed the folder should no longer be modified.

4, We should consider using a get_experiment_status function to check if all iterations have finished running; this function would return at least "RUNNING" and "FINISHED", other states might be helpful but are not required now.

get_experiment_status is helpful in general, but might not be too useful in resume_check because we need to check iteration completion, as opposed to experiment completion (or all_iterations completion)

Signed-off-by: Jack Luar <jluar@precisioninno.com>

…ror handling Signed-off-by: Jack Luar <jluar@precisioninno.com>

Signed-off-by: Jack Luar <jluar@precisioninno.com>

Replace the fixed time.sleep(120) with ExperimentAnalysis-based polling to reliably detect when trials complete before stopping the initial run. This addresses the flakiness reported in issue The-OpenROAD-Project#3005 and the review feedback from draft PR The-OpenROAD-Project#3070. Key changes: - Use Ray Tune ExperimentAnalysis to poll experiment status instead of fixed sleep - Add managed_process context manager for safe subprocess cleanup - Add stop_ray_cluster helper that retries until Ray shuts down cleanly - Re-enable the resume check test in test_autotuner.sh Signed-off-by: Harsh <harshkumar3446@gmail.com>

Replace the fixed time.sleep(120) with ExperimentAnalysis-based polling to reliably detect when trials complete before stopping the initial run. This addresses the flakiness reported in issue The-OpenROAD-Project#3005 and the review feedback from draft PR The-OpenROAD-Project#3070. Key changes: - Use Ray Tune ExperimentAnalysis to poll experiment status instead of fixed sleep - Add managed_process context manager for safe subprocess cleanup - Add stop_ray_cluster helper that retries until Ray shuts down cleanly - Re-enable the resume check test in test_autotuner.sh Signed-off-by: Harsh <harshkumar3446@gmail.com> Signed-off-by: Harsh Kumar <harshkumar3446@gmail.com>

github-actions · 2026-03-24T22:06:48Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 21 days if no further activity occurs. Remove the Stale label or comment to keep it open.

luarss added the autotuner Flow autotuner label Apr 13, 2025

luarss requested a review from vvbandeira April 14, 2025 01:00

luarss force-pushed the topic/resume-unit-test branch from 99ea816 to 8e18ff7 Compare April 17, 2025 13:07

vvbandeira requested changes Apr 21, 2025

View reviewed changes

luarss closed this Apr 22, 2025

luarss reopened this Apr 22, 2025

luarss added 4 commits May 7, 2025 16:31

restore resume check using last modified filetime

75f399c

Signed-off-by: Jack Luar <jluar@precisioninno.com>

refactor resume check: rename exec variable and improve subprocess er…

0dab7e3

…ror handling Signed-off-by: Jack Luar <jluar@precisioninno.com>

make error clearer

3009894

Signed-off-by: Jack Luar <jluar@precisioninno.com>

clarify function name

cf5ed3d

Signed-off-by: Jack Luar <jluar@precisioninno.com>

luarss force-pushed the topic/resume-unit-test branch from 8e18ff7 to cf5ed3d Compare May 7, 2025 16:33

luarss added 2 commits May 9, 2025 16:55

fix function call

ab31e29

Signed-off-by: Jack Luar <jluar@precisioninno.com>

revert list comprehension into for loop for better readability

e8cdcfe

Signed-off-by: Jack Luar <jluar@precisioninno.com>

luarss requested a review from vvbandeira May 15, 2025 12:53

vvbandeira marked this pull request as draft August 22, 2025 12:55

harsh-kumar-patwa mentioned this pull request Mar 8, 2026

Re-enable AutoTuner ResumeCheck tests #3005

Closed

harsh-kumar-patwa mentioned this pull request Mar 8, 2026

[AutoTuner] Fix flaky resume check test #3966

Closed

2 tasks

github-actions Bot added the Stale label Mar 24, 2026

vvbandeira closed this Mar 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AutoTuner] Restore resume check #3070

[AutoTuner] Restore resume check #3070
luarss wants to merge 6 commits into
The-OpenROAD-Project:masterfrom
luarss:topic/resume-unit-test

luarss commented Apr 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

vvbandeira Apr 21, 2025

Uh oh!

luarss Apr 21, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		folders = glob.glob(os.path.join(experiment_dir, f"variant-*-or-{iteration}"))
		return max((os.path.getmtime(folder) for folder in folders), default=9e99)

Uh oh!

Conversation

luarss commented Apr 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

vvbandeira Apr 21, 2025

Choose a reason for hiding this comment

Uh oh!

luarss Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

luarss commented Apr 13, 2025 •

edited

Loading

luarss Apr 21, 2025 •

edited

Loading