Skip to content

Commit cb1fbf8

Browse files
authored
update pytest config to add xdist / timeouts + ci improvements (#17)
* update pytest config to add xdist / timeouts * wix ci * update ci --------- Co-authored-by: Tapan Chugh <tapanc@cs.washington.edu>
1 parent eeefb00 commit cb1fbf8

7 files changed

Lines changed: 66 additions & 23 deletions

File tree

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
name: Setup Environment
2+
description: Setup Python, uv, and install dev dependencies
3+
4+
runs:
5+
using: composite
6+
steps:
7+
- name: Setup uv
8+
uses: astral-sh/setup-uv@v3
9+
10+
- name: Set up Python
11+
uses: actions/setup-python@v5
12+
with:
13+
python-version: "3.13"
14+
15+
- name: Install dependencies
16+
shell: bash
17+
run: uv pip install --system -e ".[dev]"

.github/workflows/ci.yml

Lines changed: 20 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -6,51 +6,49 @@ on:
66
branches: [main]
77

88
jobs:
9-
pre-commit:
9+
lint:
1010
runs-on: ubuntu-latest
11+
env:
12+
UV_NO_SYNC: 1
1113
steps:
1214
- name: Checkout code
1315
uses: actions/checkout@v4
1416

15-
- name: Setup uv
16-
uses: astral-sh/setup-uv@v3
17-
18-
- name: Set up Python
19-
uses: actions/setup-python@v5
20-
with:
21-
python-version: "3.13"
22-
23-
- name: Install dependencies
24-
run: uv pip install --system -e ".[dev]"
17+
- name: Setup environment
18+
uses: ./.github/actions/setup-environment
2519

2620
- name: Install pre-commit
2721
run: uv pip install --system pre-commit
2822

29-
- name: Run pre-commit
23+
- name: Run linting
3024
run: SKIP=mypy pre-commit run --all-files
3125

26+
typecheck:
27+
runs-on: ubuntu-latest
28+
env:
29+
UV_NO_SYNC: 1
30+
steps:
31+
- name: Checkout code
32+
uses: actions/checkout@v4
33+
34+
- name: Setup environment
35+
uses: ./.github/actions/setup-environment
36+
3237
- name: Run mypy
3338
run: uv run mypy src/ tests/ --exclude 'tests/benchmarks/appworld/'
3439

3540
test:
3641
runs-on: ubuntu-latest
3742
env:
3843
CI: 1
44+
UV_NO_SYNC: 1
3945

4046
steps:
4147
- name: Checkout code
4248
uses: actions/checkout@v4
4349

44-
- name: Setup uv
45-
uses: astral-sh/setup-uv@v3
46-
47-
- name: Set up Python
48-
uses: actions/setup-python@v5
49-
with:
50-
python-version: "3.13"
51-
52-
- name: Install dependencies
53-
run: uv pip install --system -e ".[dev]"
50+
- name: Setup environment
51+
uses: ./.github/actions/setup-environment
5452

5553
- name: Run unit tests
5654
run: uv run pytest tests/unit/ -v -p no:warnings

docs/evals.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,21 @@ Validate existing logs without running new tests:
8787
.venv/bin/pytest tests/benchmarks/bfcl/test_bfcl.py --validate-only --log-dir outputs/experiment1/raw
8888
```
8989

90+
### Parallel Execution
91+
92+
Run tests in parallel using multiple workers:
93+
94+
```bash
95+
# Run with 4 workers
96+
.venv/bin/pytest tests/benchmarks/bfcl/test_bfcl.py -n 4
97+
98+
# Run with 8 workers
99+
.venv/bin/pytest tests/benchmarks/appworld/test_appworld.py --dataset train -n 8
100+
101+
# Auto-detect number of CPUs
102+
.venv/bin/pytest tests/benchmarks/bfcl/test_bfcl.py -n auto
103+
```
104+
90105
## Further Reading
91106

92107
### BFCL

pyproject.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,8 @@ where = ["src"]
2727
dev = [
2828
"pytest>=7.0",
2929
"pytest-asyncio>=0.21",
30+
"pytest-xdist>=3.0",
31+
"pytest-timeout>=2.0",
3032
"ruff",
3133
"mypy",
3234
]
@@ -86,6 +88,8 @@ filterwarnings = [
8688
"ignore::sqlalchemy.exc.SADeprecationWarning:appworld.apps.lib.models.db",
8789
"ignore::pydantic.warnings.PydanticDeprecatedSince20:appworld.apps.*",
8890
]
91+
timeout = 0
92+
timeout_method = "thread" # Use thread-based timeout for better async compatibility
8993

9094
[tool.mypy]
9195
warn_return_any = true

tests/benchmarks/appworld/README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,15 @@ pytest tests/benchmarks/appworld/test_appworld.py --validate-only
5454
--temperature 0.001 # Temperature for sampling (default: 0.001)
5555
```
5656

57+
### Parallel Execution
58+
```bash
59+
-n 4 # Run with 4 workers
60+
-n 8 # Run with 8 workers
61+
-n auto # Auto-detect number of CPUs
62+
```
63+
64+
Example: `pytest tests/benchmarks/appworld/test_appworld.py --dataset train -n 4`
65+
5766
## File Structure
5867

5968
```

tests/benchmarks/appworld/test_appworld.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ def pytest_generate_tests(metafunc: pytest.Metafunc) -> None:
5050

5151

5252
@pytest.mark.asyncio
53+
@pytest.mark.timeout(300)
5354
async def test_appworld(
5455
task_id: str,
5556
model: str,

tests/conftest.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,6 @@ def pytest_addoption(parser: pytest.Parser) -> None:
5050
parser.addoption("--output-dir", default="outputs", help="Output directory for results")
5151
parser.addoption("--validate-only", action="store_true", help="Only validate existing logs")
5252
parser.addoption("--log-dir", default="outputs/raw", help="Directory with logs (for validate mode)")
53-
parser.addoption("--max-workers", default=4, type=int, help="Max concurrent tests (default: 4)")
5453

5554

5655
def pytest_configure(config: pytest.Config) -> None:

0 commit comments

Comments
 (0)