fix: restore example collection during directory traversal (#794) (#795)

planetf1 · web-flow · commit da53bb0918ed · 2026-04-09T14:15:06.000Z
* fix: restore example collection during directory traversal (#794) pytest_pycollect_makemodule only fires for files matching python_files (test_*.py) — return ExampleModule from pytest_collect_file so directory traversal collects examples. * fix: require # pytest: marker for example collection (#796) Examples without a marker are now silently ignored instead of being collected as tests. Removes spurious marker from helper/helpers.py and documents opt-in behaviour in AGENTS.md and MARKERS_GUIDE.md. * docs: expand required Ollama models list in CONTRIBUTING.md The previous list only covered 4 models (CI subset). Expanded to include all models needed by examples and tests, grouped by context (CI, examples, test suite) with a one-liner to pull the lot. * fix: harden example collection hooks against silent failures - pytest_pycollect_makemodule: return SkippedFile (not None) for markerless files, closing the direct-specification bypass - pytest_collect_file: narrow except to OSError, return None on failure (exclude) instead of falling through to collect - Fix stale docstring and comments from the old collection model * fix: harden collection hooks and add regression test Address code review findings from PR #795: - Fix duplicate collection: guard pytest_collect_file with isinitpath() so directly-specified files defer to pytest_pycollect_makemodule - Remove dead try/except OSError (_extract_markers_from_file is self-contained) - Key examples_to_skip by full path instead of basename to prevent collisions across subdirectories - Use file_path.parts instead of str() substring match in pytest_pycollect_makemodule for correct path filtering - Add test/test_example_collection.py regression test verifying support files are excluded, examples are collected, and no duplicates occur Signed-off-by: Nigel Jones <nigelgj@ie.ibm.com> Signed-off-by: Nigel Jones <jonesn@uk.ibm.com> --------- Signed-off-by: Nigel Jones <nigelgj@ie.ibm.com> Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
diff --git a/AGENTS.md b/AGENTS.md
@@ -51,7 +51,7 @@ Tests use a four-tier granularity system (`unit`, `integration`, `e2e`, `qualita
 
 See **[test/MARKERS_GUIDE.md](test/MARKERS_GUIDE.md)** for the full marker reference (tier definitions, backend markers, resource gates, auto-skip logic, common patterns).
 
-**Examples in `docs/examples/`** use comment-based markers:
+**Examples in `docs/examples/`** are opt-in — unlike `test/` files (auto-collected, default `unit`), examples require an explicit `# pytest:` comment to be collected. Files without this comment are silently ignored (they won't appear in skip summaries either). This is because examples have variable dependencies and limited setup:
 ```python
 # pytest: e2e, ollama, qualitative
 """Example description..."""
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -354,12 +354,46 @@ uv run ruff check .
 ### Required Models
 
 #### Ollama
-- `granite4:micro-h`
-- `granite3.2-vision`
-- `granite4:micro`
-- `qwen2.5vl:7b`
 
-_Note: ollama models can be obtained by running `ollama pull <model>`_
+HuggingFace and cloud backends download or host models automatically. Ollama
+models must be pulled locally before running the tests that need them.
+
+**CI (unit + integration tests):**
+
+- `granite4:micro` — default model for `start_session()` and most examples
+- `granite4:micro-h` — hybrid variant used by conftest fixtures
+
+**Examples (`docs/examples/`):**
+
+- `deepseek-r1:8b` — safety / guardian examples
+- `granite3-guardian:2b` — mini-researcher guardian backend
+- `granite3.2-vision` — vision (Ollama chat) example
+- `granite3.3:8b` — m\_decompose example
+- `granite4:latest` — melp examples
+- `llama3.2` — repair-with-guardian example
+- `llama3.2:3b` — tutorial / mify examples (via `META_LLAMA_3_2_3B`)
+- `phi:2.7b` — SOFAI graph-colouring example
+- `pielee/qwen3-4b-thinking-2507_q8:latest` — SOFAI S2 solver
+- `qwen2.5vl:7b` — vision (OpenAI-via-Ollama) example
+
+**Additional test models (`test/`):**
+
+- `granite4:small-h` — hybrid-small tests
+- `llama3.2:1b` — lightweight inference tests
+- `llama3:8b` — legacy Llama 3 tests
+- `llava` — multimodal tests
+- `mistral:7b` — Mistral backend tests
+- `smollm2:1.7b` — SmolLM tests
+
+Pull everything:
+
+```bash
+for m in granite4:micro granite4:micro-h deepseek-r1:8b \
+  granite3-guardian:2b granite3.2-vision granite3.3:8b granite4:latest \
+  llama3.2 llama3.2:3b phi:2.7b pielee/qwen3-4b-thinking-2507_q8:latest \
+  qwen2.5vl:7b granite4:small-h llama3.2:1b llama3:8b llava mistral:7b \
+  smollm2:1.7b; do ollama pull "$m"; done
+```
 
 ### Test Markers
 
diff --git a/docs/examples/conftest.py b/docs/examples/conftest.py
@@ -282,22 +282,22 @@ def pytest_collection_finish(session):
 
 
 def pytest_terminal_summary(terminalreporter, exitstatus, config):
-    # Append the skipped examples if needed.
-    if len(examples_to_skip) == 0:
-        return
-
-    terminalreporter.ensure_newline()
-    terminalreporter.section("Skipped Examples", sep="=", blue=True, bold=True)
-    terminalreporter.line("The following examples were skipped during collection:\n")
-    for filename, reason in examples_to_skip.items():
-        terminalreporter.line(f"  • {filename}: {reason}")
+    if examples_to_skip:
+        terminalreporter.ensure_newline()
+        terminalreporter.section("Skipped Examples", sep="=", blue=True, bold=True)
+        terminalreporter.line(
+            "The following examples were skipped during collection:\n"
+        )
+        for filepath, reason in examples_to_skip.items():
+            terminalreporter.line(f"  • {pathlib.Path(filepath).name}: {reason}")
 
 
 def pytest_pycollect_makemodule(module_path, parent):
     """Intercepts Module creation to skip files before import.
 
-    Runs for both directory traversal and direct file specification.
-    Returning a SkippedFile prevents pytest from importing the file,
+    Only fires for files matching python_files (default test_*.py) during
+    directory traversal, or for any file specified directly on the command
+    line. Returning a SkippedFile prevents pytest from importing the file,
     which is necessary when files contain unavailable dependencies.
 
     Args:
@@ -307,7 +307,7 @@ def pytest_pycollect_makemodule(module_path, parent):
     file_path = module_path
 
     # Limit scope to docs/examples directory
-    if "docs" not in str(file_path) or "examples" not in str(file_path):
+    if "docs" not in file_path.parts or "examples" not in file_path.parts:
         return None
 
     if file_path.name == "conftest.py":
@@ -319,14 +319,14 @@ def pytest_pycollect_makemodule(module_path, parent):
         config._example_capabilities = get_system_capabilities()
 
     # Check manual skip list
-    if file_path.name in examples_to_skip:
+    if str(file_path) in examples_to_skip:
         return SkippedFile.from_parent(parent, path=file_path)
 
     # Extract and evaluate markers
     markers = _extract_markers_from_file(file_path)
 
     if not markers:
-        return None
+        return SkippedFile.from_parent(parent, path=file_path)
 
     should_skip, _reason = _should_skip_collection(markers)
 
@@ -365,16 +365,19 @@ def pytest_ignore_collect(collection_path, config):
         and "examples" in abs_path.parts
     ):
         # Skip files in the manual skip list
-        if collection_path.name in examples_to_skip:
+        if str(collection_path) in examples_to_skip:
             return True
 
         # Extract markers and check if we should skip
         try:
             markers = _extract_markers_from_file(collection_path)
+            # No markers → not a runnable example (e.g. __init__.py, helpers)
+            if not markers:
+                return True
             should_skip, reason = _should_skip_collection(markers)
             if should_skip and reason:
                 # Add to skip list with reason for terminal summary
-                examples_to_skip[collection_path.name] = reason
+                examples_to_skip[str(collection_path)] = reason
                 # Return True to ignore this file completely
                 return True
         except Exception as e:
@@ -389,36 +392,35 @@ def pytest_ignore_collect(collection_path, config):
     return False
 
 
-# This doesn't replace the existing pytest file collection behavior.
 def pytest_collect_file(parent: pytest.Dir, file_path: pathlib.PosixPath):
-    # Do a quick check that it's a .py file in the expected `docs/examples` folder. We can make
-    # this more exact if needed.
+    """Provide an explicit collector for example files in docs/examples/."""
     if (
         file_path.suffix == ".py"
         and "docs" in file_path.parts
         and "examples" in file_path.parts
     ):
-        # Skip this test. It requires additional setup.
-        if file_path.name in examples_to_skip:
-            return
+        # Directly-specified files are handled by pytest_pycollect_makemodule —
+        # only provide an explicit collector during directory traversal.
+        if parent.session.isinitpath(file_path):
+            return None
 
-        # Check markers first - if file has skip marker, return SkippedFile
-        try:
-            markers = _extract_markers_from_file(file_path)
-            should_skip, _reason = _should_skip_collection(markers)
-            if should_skip:
-                # FIX: Return a dummy collector instead of None.
-                # This prevents pytest from falling back to the default Module collector
-                # which would try to import the file.
-                return SkippedFile.from_parent(parent, path=file_path)
-        except Exception:
-            # If we can't read markers, continue with other checks
-            pass
+        # Already flagged for skipping (missing system capability)
+        if str(file_path) in examples_to_skip:
+            return
 
-        # ExampleModule (returned by pytest_pycollect_makemodule) handles
-        # collection for files that should run — return None here to avoid
-        # creating a duplicate collector from this hook.
-        return None
+        # Check markers — no markers means not a runnable example.
+        # _extract_markers_from_file is self-contained (returns [] on error),
+        # so no try/except needed here.
+        markers = _extract_markers_from_file(file_path)
+        if not markers:
+            return None
+        should_skip, _reason = _should_skip_collection(markers)
+        if should_skip:
+            return SkippedFile.from_parent(parent, path=file_path)
+
+        # pytest_pycollect_makemodule only fires for files matching python_files
+        # (test_*.py) — examples need an explicit collector for directory traversal.
+        return ExampleModule.from_parent(parent, path=file_path)
 
 
 class SkippedFile(pytest.File):
diff --git a/docs/examples/helper/helpers.py b/docs/examples/helper/helpers.py
@@ -1,5 +1,3 @@
-# pytest: ollama, e2e
-
 from textwrap import fill
 from typing import Any
 
diff --git a/test/MARKERS_GUIDE.md b/test/MARKERS_GUIDE.md
@@ -270,16 +270,20 @@ pytestmark = [pytest.mark.e2e, pytest.mark.huggingface,
 
 ## Example Files (`docs/examples/`)
 
-Examples use a comment-based marker format instead of `pytestmark`:
+Unlike `test/` files (which are auto-collected and default to `unit`), examples
+require an explicit `# pytest:` comment to be collected. This opt-in approach
+reflects that examples often have variable dependencies and limited setup, so
+only files that declare themselves runnable should be executed.
 
 ```python
 # pytest: e2e, ollama, qualitative
 """Example description..."""
 ```
 
 Same classification rules apply. The comment must appear in the first few
-lines before non-comment code. Parser: `docs/examples/conftest.py`
-(`_extract_markers_from_file`).
+lines before non-comment code. Files without this comment are silently
+ignored — they won't appear in skip summaries or collection output.
+Parser: `docs/examples/conftest.py` (`_extract_markers_from_file`).
 
 ## Adding Markers to New Tests
 
diff --git a/test/test_example_collection.py b/test/test_example_collection.py
@@ -0,0 +1,42 @@
+"""Regression tests for docs/examples/ collection hooks.
+
+These hooks have regressed twice (#794, #796). This test ensures:
+- Support files (__init__.py, helpers.py, conftest.py) are never collected
+- Real examples with markers ARE collected
+- No example is collected twice (duplicate guard)
+"""
+
+import subprocess
+
+
+def test_example_collection_sanity():
+    """Verify example collection excludes support files and avoids duplicates."""
+    result = subprocess.run(
+        ["uv", "run", "pytest", "docs/examples/", "--collect-only", "-q"],
+        capture_output=True,
+        text=True,
+        timeout=120,
+    )
+
+    lines = result.stdout.splitlines()
+    # Collected test IDs are lines before the blank/summary lines
+    collected = [line for line in lines if "::" in line]
+
+    # Support files must never appear as collected tests
+    for item in collected:
+        filename = item.split("::")[0].rsplit("/", 1)[-1]
+        assert filename != "__init__.py", f"__init__.py collected as test: {item}"
+        assert filename != "helpers.py", f"helpers.py collected as test: {item}"
+        assert filename != "conftest.py", f"conftest.py collected as test: {item}"
+
+    # Sanity floor — we have ~79 examples today; 50 is a safe lower bound
+    assert len(collected) >= 50, (
+        f"Only {len(collected)} examples collected — expected at least 50. "
+        "Collection hooks may be broken."
+    )
+
+    # No duplicates — each test ID should appear exactly once
+    seen = set()
+    for item in collected:
+        assert item not in seen, f"Duplicate collection detected: {item}"
+        seen.add(item)

Original file line number	Diff line number	Diff line change
`@@ -1,5 +1,3 @@`
`1`		`-# pytest: ollama, e2e`
`2`		`-`
`3`	`1`	`from textwrap import fill`
`4`	`2`	`from typing import Any`
`5`	`3`