fix: Parallel unit test runner with file-level subprocess isolation by JasonW404 · Pull Request #3285 · ModelEngine-Group/nexent

JasonW404 · 2026-06-23T01:36:32Z

Summary

Rewrites the unit test runner to support file-level parallel execution via subprocess isolation, achieving ~7x speedup while maintaining 100% pass rate.

Results (229 test files, ~6,200+ tests)

Mode	Wall Clock	Pass Rate	Workers
Serial	7m7s	229/229 (100%)	1
Parallel	1m0.76s	229/229 (100%)	20 (auto)

Changes

`test/run_all_test.py` — Parallel runner

File-level parallel execution using ThreadPoolExecutor
Each test file runs in its own subprocess (preserves isolation)
Configurable via environment variables:
- NEXENT_PYTEST_WORKERS: auto (default, uses CPU count), 0 (serial), or integer
- NEXENT_PYTEST_FILE_TIMEOUT: per-file timeout in seconds (default: 600)
- NEXENT_PYTEST_TARGETS: override test directories/files
Coverage data per file (.coverage.N), combined via coverage combine

`test/backend/services/test_mcp_service.py` — Bug fix

Clear proxy env vars (socks://) in autouse fixture to prevent httpx.AsyncClient ValueError (8 test failures)

`test/backend/services/test_remote_mcp_service.py` — Bug fix

Mock check_runtime_host_port_available to prevent port conflict in container enable test (1 test failure)

`test/sdk/core/models/test_openai_llm.py` — Memory fix

Reduce memory leak from repeated module imports (~20MB RSS reduction)

`.github/workflows/auto-unit-test.yml` — CI optimization

Default to parallel mode (NEXENT_PYTEST_WORKERS=auto)
Per-file timeout: 300s (configurable via workflow_dispatch)
Dispatch inputs for worker count and timeout override

Why file-level parallel instead of pytest-xdist?

Direct pytest -n auto across the full test suite fails because many tests use global sys.modules mocking and import patching that conflict during a single pytest collection phase. File-level subprocess isolation preserves the existing per-file isolation guarantee while parallelizing across files.

Note: This PR does NOT add pytest-xdist as a dependency. The parallel runner uses Python's built-in concurrent.futures.ThreadPoolExecutor with per-file subprocess.run() calls — no xdist needed.

- Rewrite test/run_all_test.py as file-level parallel runner using ThreadPoolExecutor with configurable workers (NEXENT_PYTEST_WORKERS) and per-file timeout (NEXENT_PYTEST_FILE_TIMEOUT) - Add pytest-xdist to backend test extras - Fix test_mcp_service.py: clear proxy env vars (socks://) in fixture to prevent httpx.AsyncClient ValueError - Fix test_remote_mcp_service.py: mock check_runtime_host_port_available to prevent port conflict in container enable test - Fix test_openai_llm.py: reduce memory leak from repeated module imports - Update CI workflow: default to parallel mode, add dispatch inputs for worker count and per-file timeout Serial: 229/229 pass (7m7s). Parallel: 229/229 pass (1m1s, ~7x speedup).

Copilot

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

The parallel runner uses ThreadPoolExecutor with per-file subprocess isolation, not pytest-xdist. The xdist package was added but never used due to sys.modules mock conflicts during pytest collection.

JasonW404 · 2026-06-24T04:34:06Z

+    if xml.returncode != 0:
+        logger.error("Coverage XML generation failed:\n%s\n%s", xml.stdout, xml.stderr)
+        return False
+    logger.info("Coverage XML file generated: %s", coverage_xml_file)


coverage combine 使用 str(current_dir) 作为数据源目录，但每个 worker 的 .coverage.{index} 文件是通过 COVERAGE_FILE 环境变量写入 current_dir 的。coverage combine 默认在源目录中查找 .coverage.* 文件，但 --data-file 参数指定的是合并后的输出文件路径，不是源目录。如果 coverage combine 的 positional argument 被解释为数据文件列表而非目录，可能导致合并失败。建议验证 CI 中此路径是否正确工作。

YehongPan · 2026-06-24T04:49:01Z

🔍 Code Review Comments

1. [逻辑漏洞] Windows 路径分隔符与 coverage combine
_run_test_file 中 env["PYTHONPATH"] 使用 path_separator 拼接，但 coverage_dir 下的 .coverage.{index} 文件名在 Windows 上可能因路径分隔符问题导致 coverage combine 失败。

2. [代码规范] 容器环境下 CPU 计数不准确
_worker_count 中 os.cpu_count() 在容器环境中可能返回宿主机的 CPU 数量而非容器限制值，应使用 len(os.sched_getaffinity(0)) 或读取 cgroup 限制。

3. [逻辑漏洞] coverage combine 参数问题
_combine_coverage 中 coverage combine 传入 str(current_dir) 作为参数，但 coverage combine 期望的是 coverage data 文件路径而非目录路径，可能导致 combine 找不到 .coverage.* 文件。

YehongPan

Code Review

[逻辑漏洞] _run_test_file 中 env["PYTHONPATH"] 使用 path_separator 拼接，但 coverage_dir 下的 .coverage.{index} 文件名在 Windows 上可能因路径分隔符问题导致 coverage combine 失败。
[代码规范] _worker_count 中 os.cpu_count() 在容器环境中可能返回宿主机的 CPU 数量而非容器限制值，应使用 len(os.sched_getaffinity(0)) 或读取 cgroup 限制。
[逻辑漏洞] _combine_coverage 中 coverage combine 传入 str(current_dir) 作为参数，但 coverage combine 期望的是 coverage data 文件路径而非目录路径，可能导致 combine 找不到 .coverage.* 文件。

xuyaqist · 2026-06-25T08:40:26Z

+    if raw_workers in {"", "0", "false", "none", "off"}:
+        return 1
+    if raw_workers == "auto":
+        return max(1, min(os.cpu_count() or 1, total_files))


os.cpu_count() 在容器环境中可能返回宿主机的 CPU 数量而非容器实际分配的 CPU 配额。建议使用 len(os.sched_getaffinity(0)) 获取实际可用 CPU 数（Linux），或读取 cgroup 的 cpu.cfs_quota_us 限制，避免在 CI 容器中过度并行导致 OOM。

xuyaqist · 2026-06-25T08:40:29Z

+    try:
+        result = subprocess.run(
+            cmd,
+            capture_output=True,


subprocess.run 使用 capture_output=True，当测试输出非常大时会全部加载到内存中。建议对 stdout/stderr 设置最大缓冲区大小（如 max_output_size），超出部分截断并写入临时文件，防止单个测试文件的巨量输出耗尽内存。

hhhhsc701 · 2026-06-25T10:41:06Z

+    if raw_workers in {"", "0", "false", "none", "off"}:
+        return 1
+    if raw_workers == "auto":
+        return max(1, min(os.cpu_count() or 1, total_files))


[P2] auto 模式直接用 os.cpu_count() 作为并发上限，在 16/32 核 runner 上会同时起大量 pytest+coverage 子进程，内存和文件句柄都容易被打满。建议默认 cap 到较小值，只有显式 env 才放大。

hhhhsc701 · 2026-06-25T10:41:06Z

+    if cov_config.exists():
+        cmd.append("--cov-config=test/.coveragerc")
+
+    env = os.environ.copy()


[P2] 每个测试子进程继承完整 os.environ，代理变量、服务端点或凭据会泄漏进所有测试。这个 PR 只在单个 mcp 测试里清代理，runner 层仍会导致其他 httpx/requests 测试不稳定；建议统一清理/白名单 env。

hhhhsc701 · 2026-06-25T10:41:06Z

+        logger.error("No test files found")
+        return False
+
+    for coverage_artifact in [current_dir / ".coverage", *current_dir.glob(".coverage.*")]:


[P2] 这里删除 test/.coverage* 和 coverage.xml，是 repo 内共享路径。两个 CI job 或本地同时跑这个脚本时会互相删覆盖率文件；建议为每次运行创建独立 coverage 目录或带 run id 的 data_file。

hhhhsc701 · 2026-06-25T10:41:06Z

+            for index, test_file in enumerate(test_files)
        ]
+        for future in as_completed(futures):
+            result = future.result()


[P2] future.result() 没有兜底，_run_test_file 里任何未捕获异常都会中断 as_completed 循环，剩余测试结果和 coverage combine 都不会执行。建议把 executor 异常转成失败结果继续汇总。

hhhhsc701 · 2026-06-25T10:41:06Z

+    if failed:
+        logger.error("\nFailed test files: %s", len(failed))
+        for result in failed[:10]:
+            logger.error("\n%s\n%s\n%s", result["file"], result["stdout"][-4000:], result["stderr"][-2000:])


[P2] 失败时直接把 stdout/stderr 尾部写入 CI 日志，测试输出里如果包含 token、连接串或 presigned URL 会被持久化。建议增加敏感字段脱敏或只输出 pytest short summary。

hhhhsc701 · 2026-06-26T02:09:55Z

+    if raw_workers in {"", "0", "false", "none", "off"}:
+        return 1
+    if raw_workers == "auto":
+        return max(1, min(os.cpu_count() or 1, total_files))


[P2] auto 模式直接用 os.cpu_count() 作为并发上限，在 16/32 核 runner 上会同时起大量 pytest+coverage 子进程，内存和文件句柄都容易被打满。建议默认 cap 到较小值，只有显式 env 才放大。

hhhhsc701 · 2026-06-26T02:10:17Z

+    if cov_config.exists():
+        cmd.append("--cov-config=test/.coveragerc")
+
+    env = os.environ.copy()


[P2] 每个测试子进程继承完整 os.environ，代理变量、服务端点或凭据会泄漏进所有测试。这个 PR 只在单个 mcp 测试里清代理，runner 层仍会导致其他 httpx/requests 测试不稳定；建议统一清理/白名单 env。

hhhhsc701 · 2026-06-26T02:10:39Z

+        logger.error("No test files found")
+        return False
+
+    for coverage_artifact in [current_dir / ".coverage", *current_dir.glob(".coverage.*")]:


[P2] 这里删除 test/.coverage* 和 coverage.xml，是 repo 内共享路径。两个 CI job 或本地同时跑这个脚本时会互相删覆盖率文件；建议为每次运行创建独立 coverage 目录或带 run id 的 data_file。

hhhhsc701 · 2026-06-26T02:11:03Z

+            for index, test_file in enumerate(test_files)
        ]
+        for future in as_completed(futures):
+            result = future.result()


[P2] future.result() 没有兜底，_run_test_file 里任何未捕获异常都会中断 as_completed 循环，剩余测试结果和 coverage combine 都不会执行。建议把 executor 异常转成失败结果继续汇总。

hhhhsc701 · 2026-06-26T02:11:25Z

+    if failed:
+        logger.error("\nFailed test files: %s", len(failed))
+        for result in failed[:10]:
+            logger.error("\n%s\n%s\n%s", result["file"], result["stdout"][-4000:], result["stderr"][-2000:])


[P2] 失败时直接把 stdout/stderr 尾部写入 CI 日志，测试输出里如果包含 token、连接串或 presigned URL 会被持久化。建议增加敏感字段脱敏或只输出 pytest short summary。

* Revert "fix: Parallel unit test runner with file-level subprocess isolation (#3285)" This reverts commit b6e59cf. * fix: restore parallel UT runner with coverage reporting * fix: pin codecov action for sonar * ci: add repo controlled codecov gates

Copilot AI review requested due to automatic review settings June 23, 2026 01:36

JasonW404 requested review from Dallas98 and WMC001 as code owners June 23, 2026 01:36

Copilot AI reviewed Jun 23, 2026

chore: remove unused pytest-xdist dependency

b8129a0

The parallel runner uses ThreadPoolExecutor with per-file subprocess isolation, not pytest-xdist. The xdist package was added but never used due to sys.modules mock conflicts during pytest collection.

WMC001 approved these changes Jun 23, 2026

View reviewed changes

WMC001 merged commit b6e59cf into develop Jun 23, 2026
18 checks passed

JasonW404 commented Jun 24, 2026

View reviewed changes

YehongPan reviewed Jun 24, 2026

View reviewed changes

xuyaqist reviewed Jun 25, 2026

View reviewed changes

hhhhsc701 reviewed Jun 25, 2026

View reviewed changes

hhhhsc701 reviewed Jun 26, 2026

View reviewed changes

This was referenced Jun 27, 2026

[codex] fix unit test coverage reporting #3314

Merged

[codex] fix unit test coverage reporting #3319

Closed

fix: add parallel UT coverage reporting and recover Codecov #3323

Merged

Uh oh!

Conversation

JasonW404 commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Results (229 test files, ~6,200+ tests)

Changes

test/run_all_test.py — Parallel runner

test/backend/services/test_mcp_service.py — Bug fix

test/backend/services/test_remote_mcp_service.py — Bug fix

test/sdk/core/models/test_openai_llm.py — Memory fix

.github/workflows/auto-unit-test.yml — CI optimization

Why file-level parallel instead of pytest-xdist?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

YehongPan commented Jun 24, 2026

🔍 Code Review Comments

Uh oh!

YehongPan left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

JasonW404 commented Jun 23, 2026 •

edited

Loading

`test/run_all_test.py` — Parallel runner

`test/backend/services/test_mcp_service.py` — Bug fix

`test/backend/services/test_remote_mcp_service.py` — Bug fix

`test/sdk/core/models/test_openai_llm.py` — Memory fix

`.github/workflows/auto-unit-test.yml` — CI optimization