test: run example files as CI tests; fix bugs they surface (PRI-328)#327
Conversation
Rewrite the example-test harness to run each example as a subprocess, with fast per-PR smoke semantics (a timeout passes; only an error fails) and a strict mode for the scheduled GPU run (PRI-330). Add a path-filtered CPU smoke workflow that runs it on PRs touching examples/. Running the examples for real surfaced several breakages, fixed here: - shapiq examples: bump the interpretability floor to shapiq>=1.2.0. The old >=1.1.0 floor let lowest-direct install 1.1.0, which predates class_index (1.1.1) and TabPFNExplainer (1.2.0); the example crashed on both. - shap_example: add shap to a new "examples" dependency group, floored at 0.46.0 (0.41.0 pulled an ancient, unbuildable numba). - generate_data_following_dag: import the unsupervised.experiments submodule explicitly instead of relying on attribute access. - tabebm example: xfail with a tracking note; broken with current TabPFN's inference_config API (#225). Closes #51. Supersedes #321. Co-Authored-By: Matt Van Horn <455140+mvanhorn@users.noreply.github.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request refactors the example testing framework to run each example script in its own subprocess with configurable timeouts, strictness, and environment handling. It also updates dependencies in pyproject.toml (such as upgrading shapiq and adding a dedicated examples dependency group) and adds a warning to the TabEBM example. The reviewer feedback suggests prepending the local src directory to PYTHONPATH for the subprocesses to ensure the local package version is tested, and increasing the lines of captured traceback output on failure from 25 to 100 to aid in debugging.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
- Prepend the in-repo src/ to the example subprocess PYTHONPATH so it imports this checkout's tabpfn_extensions even without an editable install. - Widen the failure-output tail from 25 to 100 lines for easier CI debugging. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…fix-bugs-skip-if-missing-deps
Closes #51.
What
Reenables example-file testing in CI, replacing the old 5s-timeout/XFAIL smoke harness with a real one.
tests/test_examples.py): each example runs as a subprocess. Two modes:--example-strict, for the scheduled GPU run): a timeout is a failure — run to completion..github/workflows/example_tests.yml): path-filtered toexamples/**+ the harness; runs the smoke gate with-n auto.interpretabilityfloor toshapiq>=1.2.0(lowest-direct was installing 1.1.0, which predatesclass_index(1.1.1) andTabPFNExplainer(1.2.0)).shap_example→shapadded to a newexamplesdependency group, floored>=0.46.0(0.41.0 pulled an ancient, unbuildable numba).generate_data_following_dag→ import theunsupervised.experimentssubmodule explicitly.tabebmexample → xfail + documented; broken with current TabPFN'sinference_configAPI (TabEBM broken with recent TabPFN + test broken. #225).Verified
Built the env on a GPU node and ran the examples —
shapiq_example+generate_data_following_dagpass,tabebmxfails. Full CPU smoke suite: 16 passed, 2 skipped (survival: GPL;get_embeddings: GPU-only), 1 xfailed.Follow-up
Nightly GPU run-to-completion (strict) + making slow examples test-mode-aware is tracked in PRI-330.
Notes
Supersedes #321 (the prior attempt at #51) — builds on @mvanhorn's work there, credited as co-author.
🤖 Generated with Claude Code