Skip to content

Commit 8e653c4

Browse files
mldangeloclaude
andauthored
feat: add smoke tests for CLI integration testing (#14)
* feat: add smoke tests for CLI integration testing - Add smoke tests that verify end-to-end CLI functionality - Test basic CLI operations (--version, --help, error handling) - Test eval command with echo provider (no external dependencies) - Test output formats (JSON, YAML, CSV) - Test CLI flags (--repeat, --max-concurrency, --verbose, --no-cache) - Test exit codes (0 for success, 100 for failures, 1 for errors) - Test assertions (contains, icontains, failing assertions) - Add pytest configuration with 'smoke' marker for selective testing - Add comprehensive README documenting smoke test purpose and usage Total: 20 smoke tests, all passing ✅ Smoke tests run against the installed promptfoo CLI via subprocess, testing the Python wrapper integration with the Node.js CLI. Run smoke tests: pytest tests/smoke/ # Run all smoke tests pytest tests/ -m smoke # Run only smoke-marked tests pytest tests/ -m 'not smoke' # Skip smoke tests (unit tests only) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * ci: run unit tests and smoke tests in CI Previously the CI was only testing CLI invocation but not running pytest. Changes: - Install dev dependencies (pytest, mypy, ruff) in test jobs - Run unit tests with: pytest tests/ -v -m 'not smoke' - Run smoke tests with: pytest tests/smoke/ -v - Both 'test' and 'test-npx-fallback' jobs now run full test suite This ensures: ✅ Unit tests run on all platforms (ubuntu, windows) and Python versions (3.9, 3.13) ✅ Smoke tests verify end-to-end CLI functionality ✅ Both global install and npx fallback paths are tested * fix: use Optional for Python 3.9 compatibility in smoke tests * fix: make platform-specific tests work on both Unix and Windows - Split test_split_path into platform-specific versions (Unix/Windows) - Split test_find_external_promptfoo_prevents_recursion for platform paths - Use platform-appropriate node path in test_main_exits_when_neither_external_nor_npx_available - Tests now skip appropriately on incompatible platforms * fix: increase smoke test timeout for npx fallback scenarios The first npx call can be slow as it downloads promptfoo. Increased timeout from 60s to 120s to accommodate this. * fix: handle None stdout/stderr in smoke tests Add safety checks for None values from subprocess.run() output, which can occur on Windows in certain error conditions. * fix: address linting issues and add temp output to gitignore - Fix line too long (123 > 120) in test_cli.py - Run ruff format on test files - Add tests/smoke/.temp-output/ to .gitignore Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: update AGENTS.md with smoke test documentation - Add comprehensive testing strategy section with unit vs smoke tests - Document test directory structure - Add smoke test details and commands - Update CI/CD section to mention both test types - Update project structure to include tests directory Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * style: add return type annotations and fix documentation wording - Add `-> None` return type annotations to all smoke test methods - Add Generator return type to setup_and_teardown fixture - Update documentation to clarify tests run via Python wrapper (not just npx) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: resolve Windows CI test failures - Add os.path.isfile mock to unit test to prevent _find_windows_promptfoo() from finding real promptfoo installations on Windows CI runners - Add UTF-8 encoding with error replacement to smoke tests to handle Windows cp1252 encoding issues with npx output - Add warmup_npx fixture to pre-download promptfoo via npx before tests, preventing timeout on first test when npx needs to download package Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: mock telemetry in CLI unit tests Add record_wrapper_used mock to tests that mock subprocess.run to prevent PostHog telemetry calls from interfering with mock call counts. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent 80b5c67 commit 8e653c4

File tree

11 files changed

+731
-16
lines changed

11 files changed

+731
-16
lines changed

.github/workflows/test.yml

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -126,8 +126,14 @@ jobs:
126126
- name: Pin Python version
127127
run: uv python pin ${{ matrix.python-version }}
128128

129-
- name: Install package
130-
run: uv sync
129+
- name: Install package with dev dependencies
130+
run: uv sync --extra dev
131+
132+
- name: Run unit tests
133+
run: uv run pytest tests/ -v -m 'not smoke'
134+
135+
- name: Run smoke tests
136+
run: uv run pytest tests/smoke/ -v
131137

132138
- name: Test CLI can be invoked
133139
run: uv run promptfoo --version
@@ -192,8 +198,14 @@ jobs:
192198
- name: Pin Python version
193199
run: uv python pin ${{ matrix.python-version }}
194200

195-
- name: Install package
196-
run: uv sync
201+
- name: Install package with dev dependencies
202+
run: uv sync --extra dev
203+
204+
- name: Run unit tests
205+
run: uv run pytest tests/ -v -m 'not smoke'
206+
207+
- name: Run smoke tests (with npx fallback)
208+
run: uv run pytest tests/smoke/ -v
197209

198210
- name: Test CLI fallback to npx (no global install)
199211
run: uv run promptfoo --version

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ htmlcov/
4242
.tox/
4343
.mypy_cache/
4444
.ruff_cache/
45+
tests/smoke/.temp-output/
4546

4647
# Distribution
4748
dist/

AGENTS.md

Lines changed: 70 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -135,9 +135,12 @@ Runs on every PR and push to main:
135135
- **Lint**: Ruff linting (`uv run ruff check src/`)
136136
- **Format Check**: Ruff formatting (`uv run ruff format --check src/`)
137137
- **Type Check**: mypy static analysis (`uv run mypy src/promptfoo/`)
138-
- **Tests**: pytest on multiple Python versions (3.9, 3.13) and OSes (Ubuntu, Windows)
138+
- **Unit Tests**: Fast tests with mocked dependencies (`uv run pytest -m 'not smoke'`)
139+
- **Smoke Tests**: Integration tests against real CLI (`uv run pytest tests/smoke/`)
139140
- **Build**: Package build validation
140141

142+
Tests run on multiple Python versions (3.9, 3.13) and OSes (Ubuntu, Windows).
143+
141144
### Release Workflow (`.github/workflows/release-please.yml`)
142145

143146
Triggered on push to main:
@@ -214,7 +217,38 @@ uv run pytest
214217

215218
### Test Structure
216219

217-
Tests are located in the root directory (not yet created, but should be in `tests/` when added).
220+
Tests are organized in the `tests/` directory:
221+
222+
```
223+
tests/
224+
├── __init__.py
225+
├── test_cli.py # Unit tests for CLI wrapper logic
226+
├── test_environment.py # Unit tests for environment detection
227+
├── test_instructions.py # Unit tests for installation instructions
228+
└── smoke/
229+
├── __init__.py
230+
├── README.md # Smoke test documentation
231+
├── test_smoke.py # Integration tests against real CLI
232+
└── fixtures/
233+
└── configs/ # YAML configs for smoke tests
234+
├── basic.yaml
235+
├── assertions.yaml
236+
└── failing-assertion.yaml
237+
```
238+
239+
### Test Types
240+
241+
**Unit Tests** (`tests/test_*.py`):
242+
- Fast, isolated tests for individual functions
243+
- Mock external dependencies
244+
- Run on every PR
245+
246+
**Smoke Tests** (`tests/smoke/`):
247+
- Integration tests that run the actual CLI via subprocess
248+
- Use the `echo` provider (no external API dependencies)
249+
- Test the full Python → Node.js integration
250+
- Slower but verify end-to-end functionality
251+
- Marked with `@pytest.mark.smoke`
218252

219253
### Test Matrix
220254

@@ -229,16 +263,36 @@ CI tests across:
229263
# Install dependencies with dev extras
230264
uv sync --extra dev
231265

232-
# Run all tests
266+
# Run all tests (unit + smoke)
233267
uv run pytest
234268

269+
# Run only unit tests (fast)
270+
uv run pytest -m 'not smoke'
271+
272+
# Run only smoke tests (slow, requires Node.js)
273+
uv run pytest tests/smoke/
274+
235275
# Run with coverage
236276
uv run pytest --cov=src/promptfoo
237277

278+
# Run specific test class
279+
uv run pytest tests/test_cli.py::TestMainFunction
280+
238281
# Run specific test
239-
uv run pytest tests/test_cli.py::test_wrapper_detection
282+
uv run pytest tests/smoke/test_smoke.py::TestEvalCommand::test_basic_eval
240283
```
241284

285+
### Smoke Test Details
286+
287+
Smoke tests verify critical CLI functionality:
288+
- **Basic CLI**: `--version`, `--help`, unknown commands, missing files
289+
- **Eval Command**: Output formats (JSON, YAML, CSV), flags (`--repeat`, `--verbose`)
290+
- **Exit Codes**: 0 for success, 100 for assertion failures, 1 for errors
291+
- **Echo Provider**: Variable substitution, multiple variables
292+
- **Assertions**: `contains`, `icontains`, failing assertions
293+
294+
The smoke tests use a 120-second timeout to accommodate the first `npx` call which downloads promptfoo.
295+
242296
## Security Practices
243297

244298
### 1. No Credentials in Repository
@@ -365,14 +419,23 @@ promptfoo-python/
365419
├── src/
366420
│ └── promptfoo/
367421
│ ├── __init__.py # Package exports
368-
│ └── cli.py # Main wrapper implementation
422+
│ ├── cli.py # Main wrapper implementation
423+
│ ├── environment.py # Environment detection
424+
│ └── instructions.py # Node.js installation instructions
425+
├── tests/
426+
│ ├── test_cli.py # Unit tests for CLI
427+
│ ├── test_environment.py # Unit tests for environment detection
428+
│ ├── test_instructions.py # Unit tests for instructions
429+
│ └── smoke/
430+
│ ├── test_smoke.py # Integration smoke tests
431+
│ └── fixtures/configs/ # Test configuration files
369432
├── AGENTS.md # This file (agent documentation)
370433
├── CHANGELOG.md # Auto-generated by release-please
371434
├── CLAUDE.md # Points to AGENTS.md
372435
├── LICENSE # MIT License
373436
├── README.md # User-facing documentation
374437
├── pyproject.toml # Package configuration
375-
├── release-please-config.json # Release-please configuration
438+
├── release-please-config.json # Release-please configuration
376439
└── .release-please-manifest.json # Release version tracking
377440
```
378441

@@ -443,5 +506,5 @@ git push --force
443506

444507
---
445508

446-
**Last Updated**: 2026-01-05
509+
**Last Updated**: 2026-01-11
447510
**Maintained By**: @promptfoo/engineering

pyproject.toml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,3 +102,16 @@ show_error_codes = true
102102
pretty = true
103103
check_untyped_defs = true
104104
disallow_incomplete_defs = true
105+
106+
[tool.pytest.ini_options]
107+
testpaths = ["tests"]
108+
python_files = ["test_*.py"]
109+
python_classes = ["Test*"]
110+
python_functions = ["test_*"]
111+
addopts = [
112+
"-v",
113+
"--strict-markers",
114+
]
115+
markers = [
116+
"smoke: smoke tests that run the full CLI (slow, requires Node.js)",
117+
]

tests/smoke/README.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Smoke Tests
2+
3+
These smoke tests verify that the core promptfoo CLI functionality works correctly through the Python wrapper.
4+
5+
## What are Smoke Tests?
6+
7+
Smoke tests are high-level integration tests that verify the most critical functionality works end-to-end. They:
8+
9+
- Run against the actual installed CLI via the Python wrapper (using either global promptfoo or npx)
10+
- Test the Python wrapper integration with the Node.js CLI
11+
- Use the `echo` provider to avoid external API dependencies
12+
- Verify command-line arguments, file I/O, and output formats
13+
- Check exit codes and error handling
14+
15+
## Running Smoke Tests
16+
17+
```bash
18+
# Run all smoke tests
19+
pytest tests/smoke/
20+
21+
# Run with verbose output
22+
pytest tests/smoke/ -v
23+
24+
# Run a specific test class
25+
pytest tests/smoke/test_smoke.py::TestEvalCommand
26+
27+
# Run a specific test
28+
pytest tests/smoke/test_smoke.py::TestEvalCommand::test_basic_eval
29+
```
30+
31+
## Test Structure
32+
33+
- `test_smoke.py` - Main smoke test suite
34+
- `fixtures/` - Test configuration files
35+
- `configs/` - YAML configuration files for testing
36+
37+
## Test Coverage
38+
39+
### Basic CLI Operations
40+
- Version flag (`--version`)
41+
- Help output (`--help`, `eval --help`)
42+
- Unknown command handling
43+
- Missing file error handling
44+
45+
### Eval Command
46+
- Basic evaluation with echo provider
47+
- Output formats (JSON, YAML, CSV)
48+
- Command-line flags (`--max-concurrency`, `--repeat`, `--verbose`)
49+
- Cache control (`--no-cache`)
50+
51+
### Exit Codes
52+
- Exit code 0 for success
53+
- Exit code 100 for assertion failures
54+
- Exit code 1 for configuration errors
55+
56+
### Echo Provider
57+
- Basic prompt echoing
58+
- Variable substitution
59+
- Multiple variable handling
60+
61+
### Assertions
62+
- `contains` assertion
63+
- `icontains` assertion (case-insensitive)
64+
- Multiple assertions per test
65+
- Failing assertion behavior
66+
67+
## Why Echo Provider?
68+
69+
The `echo` provider is perfect for smoke tests because:
70+
71+
1. **No external dependencies** - Doesn't require API keys or network calls
72+
2. **Deterministic** - Always returns the same output for the same input
73+
3. **Fast** - No network latency
74+
4. **Predictable** - Easy to write assertions against
75+
76+
## Adding New Smoke Tests
77+
78+
1. Create a new test config in `fixtures/configs/` if needed
79+
2. Add test methods to the appropriate test class in `test_smoke.py`
80+
3. Use the `run_promptfoo()` helper to execute CLI commands
81+
4. Make assertions on stdout, stderr, exit codes, and output files
82+
83+
## Notes
84+
85+
- Smoke tests run slower than unit tests (they spawn subprocesses)
86+
- They require Node.js and promptfoo to be installed
87+
- They test the integration between Python and Node.js
88+
- They should be kept focused on critical functionality

tests/smoke/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
"""Smoke tests for promptfoo CLI."""
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
2+
description: 'Smoke test - multiple assertions'
3+
4+
providers:
5+
- echo
6+
7+
prompts:
8+
- 'Hello {{name}}, welcome to {{place}}'
9+
10+
tests:
11+
- vars:
12+
name: Alice
13+
place: Wonderland
14+
assert:
15+
- type: contains
16+
value: Hello
17+
- type: contains
18+
value: Alice
19+
- type: contains
20+
value: Wonderland
21+
- type: icontains
22+
value: WELCOME
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
2+
description: 'Smoke test - basic config validation'
3+
4+
providers:
5+
- echo
6+
7+
prompts:
8+
- 'Hello {{name}}'
9+
10+
tests:
11+
- vars:
12+
name: World
13+
assert:
14+
- type: contains
15+
value: Hello
16+
- type: contains
17+
value: World
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
2+
description: 'Smoke test - config with failing assertion'
3+
4+
providers:
5+
- echo
6+
7+
prompts:
8+
- 'Hello {{name}}'
9+
10+
tests:
11+
- vars:
12+
name: World
13+
assert:
14+
# This assertion will fail because echo returns "Hello World"
15+
# but we're asserting it contains "IMPOSSIBLE_STRING_NOT_IN_OUTPUT"
16+
- type: contains
17+
value: IMPOSSIBLE_STRING_NOT_IN_OUTPUT_12345

0 commit comments

Comments
 (0)