test(windows): test infrastructure + Windows production fixes (stacked on #946)#947
test(windows): test infrastructure + Windows production fixes (stacked on #946)#947lkomali wants to merge 7 commits into
Conversation
Try out this PRQuick install: pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@82b0b193793c43c9ebd3431b3313017de82cadf1Recommended with virtual environment (using uv): uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@82b0b193793c43c9ebd3431b3313017de82cadf1Last updated for commit: |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
4bb3ba9 to
bff6081
Compare
| f"{tempfile.gettempdir()}{os.sep}" | ||
| f"aiperf_child_{os.getpid()}_{uuid.uuid4().hex[:8]}_stderr.log" | ||
| ) | ||
| err_fd = os.open(err_path, os.O_WRONLY | os.O_CREAT | os.O_TRUNC) |
There was a problem hiding this comment.
Creating the stderr crash log with os.open(..., O_CREAT, ...) and no mode uses the default permissive file mode in a shared temp directory, which can expose tracebacks or secrets. Fix: create it with restrictive permissions such as 0o600 via os.open(err_path, flags, 0o600).
| 1: Benchmark failed (errors occurred) | ||
| Other: Unexpected error or signal | ||
| """ | ||
| if IS_WINDOWS: |
There was a problem hiding this comment.
This redirects Windows subprocess stdout and stderr to NUL unconditionally before validation and error handling, so normal sweep output and parent-captured diagnostics are silently lost. Fix: only release inherited handles when stdio is actually a non-TTY pipe and preserve stderr in the parent capture or a per-run log.
dynamo-ops
left a comment
There was a problem hiding this comment.
Previous review comments (bootstrap.py file permissions, subprocess_runner.py NUL redirect) are still pending. No new high-confidence issues found in this revision — the Windows porting changes are well-reasoned with good test coverage.
…of windows-port Signed-off-by: lkomali <lkomali@nvidia.com>
268ac11 to
5d4e35f
Compare
…x-only Signed-off-by: lkomali <lkomali@nvidia.com>
… timeout multiplier Signed-off-by: lkomali <lkomali@nvidia.com>
…nk/chmod gating, CSV encoding, path comparison, socket buffer expectations, and matplotlib backend Signed-off-by: lkomali <lkomali@nvidia.com>
…r disclosure Signed-off-by: lkomali <lkomali@nvidia.com>
… scheduler precision at high QPS Signed-off-by: lkomali <lkomali@nvidia.com>
Summary
Adds the Windows production fixes that surfaced through testing + the test infrastructure changes to make unit/integration tests pass on Windows.
Production fixes (uncovered through testing)
--concurrency 6on Py 3.13signal.SIGKILL→SIGTERMfallback on Windows (no SIGKILL on Windows; would raiseAttributeError)atexit-cleaned if empty)path.is_absolute()→path.anchorfor Windows path detection (WindowsPath("/tmp/foo").is_absolute() == False)--gpu-telemetrynot passed (DCGM is Linux-only; saves ~200MB per run)--ui dashboardfallback to--ui simpleon non-TTY (Textual hangs on piped stdout on Windows)urlparse("C:\\...")would crash)Test infrastructure
CTRL_C_EVENTmapping), 100-worker stress (WinError 1450), DCGM testsIS_WINDOWSconditional timeout intest_request_cancellation(300s on Win, 120s elsewhere)_remove_if_empty,SocketDefaultsplatform gating, harness shlex modesTest plan
uv run pytest tests/unit/— 10697 pass on macOSuv run pytest tests/unit/transports/test_socket_defaults.py+tests/unit/common/test_bootstrap_macos.py— covers new platform gating