Skip to content

[AutoTuner] Restore resume check #3070

Closed
luarss wants to merge 6 commits into
The-OpenROAD-Project:masterfrom
luarss:topic/resume-unit-test
Closed

[AutoTuner] Restore resume check #3070
luarss wants to merge 6 commits into
The-OpenROAD-Project:masterfrom
luarss:topic/resume-unit-test

Conversation

@luarss

@luarss luarss commented Apr 13, 2025

Copy link
Copy Markdown
Contributor

This pull request includes several changes to the AutoTuner tests to re-enable previously disabled tests, improve the test setup, and add new utility methods.

Fixes #3005

Re-enabling tests and improving setup:

Codebase simplification and improvements:

@luarss luarss added the autotuner Flow autotuner label Apr 13, 2025
@luarss luarss requested a review from vvbandeira April 14, 2025 01:00
@luarss luarss force-pushed the topic/resume-unit-test branch from 99ea816 to 8e18ff7 Compare April 17, 2025 13:07
Comment thread tools/AutoTuner/test/resume_check.py Outdated
Comment thread tools/AutoTuner/test/resume_check.py Outdated
Comment on lines +108 to +109
folders = glob.glob(os.path.join(experiment_dir, f"variant-*-or-{iteration}"))
return max((os.path.getmtime(folder) for folder in folders), default=9e99)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little obfuscated:

  1. The function's name does not match the expected return value type/format. A check should return True/False.
  2. Returning 9e99 is not clear on what is going on.
  3. Creating a folder does not confirm the run status. For this test to be true to its purpose, we need to guarantee that it has not finished, not just started.

We should consider using a get_experiment_status function to check if all iterations have finished running; this function would return at least "RUNNING" and "FINISHED", other states might be helpful but are not required now.

@luarss luarss Apr 21, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The function's name does not match the expected return value type/format. A check should return True/False.

Can possibly change it to get_trial_times

  1. Returning 9e99 is not clear on what is going on.

It is just a dummy value, to compare for latest modified time in while loop line 121-128

            # Check if first config is complete
            while True:
                cur_modified_time = self.check_trial_times()
                print(f"Current modified time: {cur_modified_time}")
                print(f"Latest modified time: {latest_modified_time}")
                if abs(cur_modified_time - latest_modified_time) < 1e-3:
                    break
                latest_modified_time = cur_modified_time
                time.sleep(10)
  1. Creating a folder does not confirm the run status. For this test to be true to its purpose, we need to guarantee that it has not finished, not just started.

This function returns the latest modified time of a given iteration (folder names are matched using iteration glob) - so if a run is completed the folder should no longer be modified.

4, We should consider using a get_experiment_status function to check if all iterations have finished running; this function would return at least "RUNNING" and "FINISHED", other states might be helpful but are not required now.

get_experiment_status is helpful in general, but might not be too useful in resume_check because we need to check iteration completion, as opposed to experiment completion (or all_iterations completion)

@luarss luarss closed this Apr 22, 2025
@luarss luarss reopened this Apr 22, 2025
luarss added 4 commits May 7, 2025 16:31
Signed-off-by: Jack Luar <jluar@precisioninno.com>
…ror handling

Signed-off-by: Jack Luar <jluar@precisioninno.com>
Signed-off-by: Jack Luar <jluar@precisioninno.com>
Signed-off-by: Jack Luar <jluar@precisioninno.com>
@luarss luarss force-pushed the topic/resume-unit-test branch from 8e18ff7 to cf5ed3d Compare May 7, 2025 16:33
luarss added 2 commits May 9, 2025 16:55
Signed-off-by: Jack Luar <jluar@precisioninno.com>
Signed-off-by: Jack Luar <jluar@precisioninno.com>
@luarss luarss requested a review from vvbandeira May 15, 2025 12:53
@vvbandeira vvbandeira marked this pull request as draft August 22, 2025 12:55
harsh-kumar-patwa added a commit to harsh-kumar-patwa/OpenROAD-flow-scripts that referenced this pull request Mar 8, 2026
Replace the fixed time.sleep(120) with ExperimentAnalysis-based polling
to reliably detect when trials complete before stopping the initial run.
This addresses the flakiness reported in issue The-OpenROAD-Project#3005 and the review
feedback from draft PR The-OpenROAD-Project#3070.

Key changes:
- Use Ray Tune ExperimentAnalysis to poll experiment status instead of
  fixed sleep
- Add managed_process context manager for safe subprocess cleanup
- Add stop_ray_cluster helper that retries until Ray shuts down cleanly
- Re-enable the resume check test in test_autotuner.sh

Signed-off-by: Harsh <harshkumar3446@gmail.com>
harsh-kumar-patwa added a commit to harsh-kumar-patwa/OpenROAD-flow-scripts that referenced this pull request Mar 8, 2026
Replace the fixed time.sleep(120) with ExperimentAnalysis-based polling
to reliably detect when trials complete before stopping the initial run.
This addresses the flakiness reported in issue The-OpenROAD-Project#3005 and the review
feedback from draft PR The-OpenROAD-Project#3070.

Key changes:
- Use Ray Tune ExperimentAnalysis to poll experiment status instead of
  fixed sleep
- Add managed_process context manager for safe subprocess cleanup
- Add stop_ray_cluster helper that retries until Ray shuts down cleanly
- Re-enable the resume check test in test_autotuner.sh

Signed-off-by: Harsh <harshkumar3446@gmail.com>
Signed-off-by: Harsh Kumar <harshkumar3446@gmail.com>
@github-actions

Copy link
Copy Markdown
Contributor

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 21 days if no further activity occurs. Remove the Stale label or comment to keep it open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

autotuner Flow autotuner Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Re-enable AutoTuner ResumeCheck tests

2 participants