feat: add smoke tests for CLI integration testing (#14)

mldangelo · claude · web-flow · commit 8e653c44bee6 · 2026-01-11T04:18:47.000-05:00
* feat: add smoke tests for CLI integration testing - Add smoke tests that verify end-to-end CLI functionality - Test basic CLI operations (--version, --help, error handling) - Test eval command with echo provider (no external dependencies) - Test output formats (JSON, YAML, CSV) - Test CLI flags (--repeat, --max-concurrency, --verbose, --no-cache) - Test exit codes (0 for success, 100 for failures, 1 for errors) - Test assertions (contains, icontains, failing assertions) - Add pytest configuration with 'smoke' marker for selective testing - Add comprehensive README documenting smoke test purpose and usage Total: 20 smoke tests, all passing ✅ Smoke tests run against the installed promptfoo CLI via subprocess, testing the Python wrapper integration with the Node.js CLI. Run smoke tests: pytest tests/smoke/ # Run all smoke tests pytest tests/ -m smoke # Run only smoke-marked tests pytest tests/ -m 'not smoke' # Skip smoke tests (unit tests only) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * ci: run unit tests and smoke tests in CI Previously the CI was only testing CLI invocation but not running pytest. Changes: - Install dev dependencies (pytest, mypy, ruff) in test jobs - Run unit tests with: pytest tests/ -v -m 'not smoke' - Run smoke tests with: pytest tests/smoke/ -v - Both 'test' and 'test-npx-fallback' jobs now run full test suite This ensures: ✅ Unit tests run on all platforms (ubuntu, windows) and Python versions (3.9, 3.13) ✅ Smoke tests verify end-to-end CLI functionality ✅ Both global install and npx fallback paths are tested * fix: use Optional for Python 3.9 compatibility in smoke tests * fix: make platform-specific tests work on both Unix and Windows - Split test_split_path into platform-specific versions (Unix/Windows) - Split test_find_external_promptfoo_prevents_recursion for platform paths - Use platform-appropriate node path in test_main_exits_when_neither_external_nor_npx_available - Tests now skip appropriately on incompatible platforms * fix: increase smoke test timeout for npx fallback scenarios The first npx call can be slow as it downloads promptfoo. Increased timeout from 60s to 120s to accommodate this. * fix: handle None stdout/stderr in smoke tests Add safety checks for None values from subprocess.run() output, which can occur on Windows in certain error conditions. * fix: address linting issues and add temp output to gitignore - Fix line too long (123 > 120) in test_cli.py - Run ruff format on test files - Add tests/smoke/.temp-output/ to .gitignore Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: update AGENTS.md with smoke test documentation - Add comprehensive testing strategy section with unit vs smoke tests - Document test directory structure - Add smoke test details and commands - Update CI/CD section to mention both test types - Update project structure to include tests directory Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * style: add return type annotations and fix documentation wording - Add `-> None` return type annotations to all smoke test methods - Add Generator return type to setup_and_teardown fixture - Update documentation to clarify tests run via Python wrapper (not just npx) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: resolve Windows CI test failures - Add os.path.isfile mock to unit test to prevent _find_windows_promptfoo() from finding real promptfoo installations on Windows CI runners - Add UTF-8 encoding with error replacement to smoke tests to handle Windows cp1252 encoding issues with npx output - Add warmup_npx fixture to pre-download promptfoo via npx before tests, preventing timeout on first test when npx needs to download package Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: mock telemetry in CLI unit tests Add record_wrapper_used mock to tests that mock subprocess.run to prevent PostHog telemetry calls from interfering with mock call counts. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -126,8 +126,14 @@ jobs:
       - name: Pin Python version
         run: uv python pin ${{ matrix.python-version }}
 
-      - name: Install package
-        run: uv sync
+      - name: Install package with dev dependencies
+        run: uv sync --extra dev
+
+      - name: Run unit tests
+        run: uv run pytest tests/ -v -m 'not smoke'
+
+      - name: Run smoke tests
+        run: uv run pytest tests/smoke/ -v
 
       - name: Test CLI can be invoked
         run: uv run promptfoo --version
@@ -192,8 +198,14 @@ jobs:
       - name: Pin Python version
         run: uv python pin ${{ matrix.python-version }}
 
-      - name: Install package
-        run: uv sync
+      - name: Install package with dev dependencies
+        run: uv sync --extra dev
+
+      - name: Run unit tests
+        run: uv run pytest tests/ -v -m 'not smoke'
+
+      - name: Run smoke tests (with npx fallback)
+        run: uv run pytest tests/smoke/ -v
 
       - name: Test CLI fallback to npx (no global install)
         run: uv run promptfoo --version
diff --git a/.gitignore b/.gitignore
@@ -42,6 +42,7 @@ htmlcov/
 .tox/
 .mypy_cache/
 .ruff_cache/
+tests/smoke/.temp-output/
 
 # Distribution
 dist/
diff --git a/AGENTS.md b/AGENTS.md
@@ -135,9 +135,12 @@ Runs on every PR and push to main:
 - **Lint**: Ruff linting (`uv run ruff check src/`)
 - **Format Check**: Ruff formatting (`uv run ruff format --check src/`)
 - **Type Check**: mypy static analysis (`uv run mypy src/promptfoo/`)
-- **Tests**: pytest on multiple Python versions (3.9, 3.13) and OSes (Ubuntu, Windows)
+- **Unit Tests**: Fast tests with mocked dependencies (`uv run pytest -m 'not smoke'`)
+- **Smoke Tests**: Integration tests against real CLI (`uv run pytest tests/smoke/`)
 - **Build**: Package build validation
 
+Tests run on multiple Python versions (3.9, 3.13) and OSes (Ubuntu, Windows).
+
 ### Release Workflow (`.github/workflows/release-please.yml`)
 
 Triggered on push to main:
@@ -214,7 +217,38 @@ uv run pytest
 
 ### Test Structure
 
-Tests are located in the root directory (not yet created, but should be in `tests/` when added).
+Tests are organized in the `tests/` directory:
+
+```
+tests/
+├── __init__.py
+├── test_cli.py              # Unit tests for CLI wrapper logic
+├── test_environment.py      # Unit tests for environment detection
+├── test_instructions.py     # Unit tests for installation instructions
+└── smoke/
+    ├── __init__.py
+    ├── README.md            # Smoke test documentation
+    ├── test_smoke.py        # Integration tests against real CLI
+    └── fixtures/
+        └── configs/         # YAML configs for smoke tests
+            ├── basic.yaml
+            ├── assertions.yaml
+            └── failing-assertion.yaml
+```
+
+### Test Types
+
+**Unit Tests** (`tests/test_*.py`):
+- Fast, isolated tests for individual functions
+- Mock external dependencies
+- Run on every PR
+
+**Smoke Tests** (`tests/smoke/`):
+- Integration tests that run the actual CLI via subprocess
+- Use the `echo` provider (no external API dependencies)
+- Test the full Python → Node.js integration
+- Slower but verify end-to-end functionality
+- Marked with `@pytest.mark.smoke`
 
 ### Test Matrix
 
@@ -229,16 +263,36 @@ CI tests across:
 # Install dependencies with dev extras
 uv sync --extra dev
 
-# Run all tests
+# Run all tests (unit + smoke)
 uv run pytest
 
+# Run only unit tests (fast)
+uv run pytest -m 'not smoke'
+
+# Run only smoke tests (slow, requires Node.js)
+uv run pytest tests/smoke/
+
 # Run with coverage
 uv run pytest --cov=src/promptfoo
 
+# Run specific test class
+uv run pytest tests/test_cli.py::TestMainFunction
+
 # Run specific test
-uv run pytest tests/test_cli.py::test_wrapper_detection
+uv run pytest tests/smoke/test_smoke.py::TestEvalCommand::test_basic_eval
 ```
 
+### Smoke Test Details
+
+Smoke tests verify critical CLI functionality:
+- **Basic CLI**: `--version`, `--help`, unknown commands, missing files
+- **Eval Command**: Output formats (JSON, YAML, CSV), flags (`--repeat`, `--verbose`)
+- **Exit Codes**: 0 for success, 100 for assertion failures, 1 for errors
+- **Echo Provider**: Variable substitution, multiple variables
+- **Assertions**: `contains`, `icontains`, failing assertions
+
+The smoke tests use a 120-second timeout to accommodate the first `npx` call which downloads promptfoo.
+
 ## Security Practices
 
 ### 1. No Credentials in Repository
@@ -365,14 +419,23 @@ promptfoo-python/
 ├── src/
 │   └── promptfoo/
 │       ├── __init__.py         # Package exports
-│       └── cli.py              # Main wrapper implementation
+│       ├── cli.py              # Main wrapper implementation
+│       ├── environment.py      # Environment detection
+│       └── instructions.py     # Node.js installation instructions
+├── tests/
+│   ├── test_cli.py             # Unit tests for CLI
+│   ├── test_environment.py     # Unit tests for environment detection
+│   ├── test_instructions.py    # Unit tests for instructions
+│   └── smoke/
+│       ├── test_smoke.py       # Integration smoke tests
+│       └── fixtures/configs/   # Test configuration files
 ├── AGENTS.md                   # This file (agent documentation)
 ├── CHANGELOG.md                # Auto-generated by release-please
 ├── CLAUDE.md                   # Points to AGENTS.md
 ├── LICENSE                     # MIT License
 ├── README.md                   # User-facing documentation
 ├── pyproject.toml              # Package configuration
-├── release-please-config.json # Release-please configuration
+├── release-please-config.json  # Release-please configuration
 └── .release-please-manifest.json # Release version tracking
 ```
 
@@ -443,5 +506,5 @@ git push --force
 
 ---
 
-**Last Updated**: 2026-01-05
+**Last Updated**: 2026-01-11
 **Maintained By**: @promptfoo/engineering
diff --git a/pyproject.toml b/pyproject.toml
@@ -102,3 +102,16 @@ show_error_codes = true
 pretty = true
 check_untyped_defs = true
 disallow_incomplete_defs = true
+
+[tool.pytest.ini_options]
+testpaths = ["tests"]
+python_files = ["test_*.py"]
+python_classes = ["Test*"]
+python_functions = ["test_*"]
+addopts = [
+    "-v",
+    "--strict-markers",
+]
+markers = [
+    "smoke: smoke tests that run the full CLI (slow, requires Node.js)",
+]
diff --git a/tests/smoke/README.md b/tests/smoke/README.md
@@ -0,0 +1,88 @@
+# Smoke Tests
+
+These smoke tests verify that the core promptfoo CLI functionality works correctly through the Python wrapper.
+
+## What are Smoke Tests?
+
+Smoke tests are high-level integration tests that verify the most critical functionality works end-to-end. They:
+
+- Run against the actual installed CLI via the Python wrapper (using either global promptfoo or npx)
+- Test the Python wrapper integration with the Node.js CLI
+- Use the `echo` provider to avoid external API dependencies
+- Verify command-line arguments, file I/O, and output formats
+- Check exit codes and error handling
+
+## Running Smoke Tests
+
+```bash
+# Run all smoke tests
+pytest tests/smoke/
+
+# Run with verbose output
+pytest tests/smoke/ -v
+
+# Run a specific test class
+pytest tests/smoke/test_smoke.py::TestEvalCommand
+
+# Run a specific test
+pytest tests/smoke/test_smoke.py::TestEvalCommand::test_basic_eval
+```
+
+## Test Structure
+
+- `test_smoke.py` - Main smoke test suite
+- `fixtures/` - Test configuration files
+  - `configs/` - YAML configuration files for testing
+
+## Test Coverage
+
+### Basic CLI Operations
+- Version flag (`--version`)
+- Help output (`--help`, `eval --help`)
+- Unknown command handling
+- Missing file error handling
+
+### Eval Command
+- Basic evaluation with echo provider
+- Output formats (JSON, YAML, CSV)
+- Command-line flags (`--max-concurrency`, `--repeat`, `--verbose`)
+- Cache control (`--no-cache`)
+
+### Exit Codes
+- Exit code 0 for success
+- Exit code 100 for assertion failures
+- Exit code 1 for configuration errors
+
+### Echo Provider
+- Basic prompt echoing
+- Variable substitution
+- Multiple variable handling
+
+### Assertions
+- `contains` assertion
+- `icontains` assertion (case-insensitive)
+- Multiple assertions per test
+- Failing assertion behavior
+
+## Why Echo Provider?
+
+The `echo` provider is perfect for smoke tests because:
+
+1. **No external dependencies** - Doesn't require API keys or network calls
+2. **Deterministic** - Always returns the same output for the same input
+3. **Fast** - No network latency
+4. **Predictable** - Easy to write assertions against
+
+## Adding New Smoke Tests
+
+1. Create a new test config in `fixtures/configs/` if needed
+2. Add test methods to the appropriate test class in `test_smoke.py`
+3. Use the `run_promptfoo()` helper to execute CLI commands
+4. Make assertions on stdout, stderr, exit codes, and output files
+
+## Notes
+
+- Smoke tests run slower than unit tests (they spawn subprocesses)
+- They require Node.js and promptfoo to be installed
+- They test the integration between Python and Node.js
+- They should be kept focused on critical functionality
diff --git a/tests/smoke/__init__.py b/tests/smoke/__init__.py
@@ -0,0 +1 @@
+"""Smoke tests for promptfoo CLI."""
diff --git a/tests/smoke/fixtures/configs/assertions.yaml b/tests/smoke/fixtures/configs/assertions.yaml
@@ -0,0 +1,22 @@
+# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
+description: 'Smoke test - multiple assertions'
+
+providers:
+  - echo
+
+prompts:
+  - 'Hello {{name}}, welcome to {{place}}'
+
+tests:
+  - vars:
+      name: Alice
+      place: Wonderland
+    assert:
+      - type: contains
+        value: Hello
+      - type: contains
+        value: Alice
+      - type: contains
+        value: Wonderland
+      - type: icontains
+        value: WELCOME
diff --git a/tests/smoke/fixtures/configs/basic.yaml b/tests/smoke/fixtures/configs/basic.yaml
@@ -0,0 +1,17 @@
+# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
+description: 'Smoke test - basic config validation'
+
+providers:
+  - echo
+
+prompts:
+  - 'Hello {{name}}'
+
+tests:
+  - vars:
+      name: World
+    assert:
+      - type: contains
+        value: Hello
+      - type: contains
+        value: World
diff --git a/tests/smoke/fixtures/configs/failing-assertion.yaml b/tests/smoke/fixtures/configs/failing-assertion.yaml
@@ -0,0 +1,17 @@
+# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
+description: 'Smoke test - config with failing assertion'
+
+providers:
+  - echo
+
+prompts:
+  - 'Hello {{name}}'
+
+tests:
+  - vars:
+      name: World
+    assert:
+      # This assertion will fail because echo returns "Hello World"
+      # but we're asserting it contains "IMPOSSIBLE_STRING_NOT_IN_OUTPUT"
+      - type: contains
+        value: IMPOSSIBLE_STRING_NOT_IN_OUTPUT_12345
diff --git a/tests/smoke/test_smoke.py b/tests/smoke/test_smoke.py
diff --git a/tests/test_cli.py b/tests/test_cli.py