test: run example files as CI tests; fix bugs they surface (PRI-328) by adrian-prior · Pull Request #327 · PriorLabs/tabpfn-extensions

adrian-prior · 2026-06-25T16:41:32Z

Closes #51.

What

Reenables example-file testing in CI, replacing the old 5s-timeout/XFAIL smoke harness with a real one.

Harness rewrite (tests/test_examples.py): each example runs as a subprocess. Two modes:
- Smoke (per-PR, default): short timeout; an example that errors fails, one that times out passes (we only assert it starts and runs without crashing).
- Strict (--example-strict, for the scheduled GPU run): a timeout is a failure — run to completion.
CPU smoke workflow (.github/workflows/example_tests.yml): path-filtered to examples/** + the harness; runs the smoke gate with -n auto.
Bug fixes — running the examples for real surfaced these:
- shapiq examples → bump interpretability floor to shapiq>=1.2.0 (lowest-direct was installing 1.1.0, which predates class_index (1.1.1) and TabPFNExplainer (1.2.0)).
- shap_example → shap added to a new examples dependency group, floored >=0.46.0 (0.41.0 pulled an ancient, unbuildable numba).
- generate_data_following_dag → import the unsupervised.experiments submodule explicitly.
- tabebm example → xfail + documented; broken with current TabPFN's inference_config API (TabEBM broken with recent TabPFN + test broken. #225).

Verified

Built the env on a GPU node and ran the examples — shapiq_example + generate_data_following_dag pass, tabebm xfails. Full CPU smoke suite: 16 passed, 2 skipped (survival: GPL; get_embeddings: GPU-only), 1 xfailed.

Follow-up

Nightly GPU run-to-completion (strict) + making slow examples test-mode-aware is tracked in PRI-330.

Notes

Supersedes #321 (the prior attempt at #51) — builds on @mvanhorn's work there, credited as co-author.

🤖 Generated with Claude Code

Rewrite the example-test harness to run each example as a subprocess, with fast per-PR smoke semantics (a timeout passes; only an error fails) and a strict mode for the scheduled GPU run (PRI-330). Add a path-filtered CPU smoke workflow that runs it on PRs touching examples/. Running the examples for real surfaced several breakages, fixed here: - shapiq examples: bump the interpretability floor to shapiq>=1.2.0. The old >=1.1.0 floor let lowest-direct install 1.1.0, which predates class_index (1.1.1) and TabPFNExplainer (1.2.0); the example crashed on both. - shap_example: add shap to a new "examples" dependency group, floored at 0.46.0 (0.41.0 pulled an ancient, unbuildable numba). - generate_data_following_dag: import the unsupervised.experiments submodule explicitly instead of relying on attribute access. - tabebm example: xfail with a tracking note; broken with current TabPFN's inference_config API (#225). Closes #51. Supersedes #321. Co-Authored-By: Matt Van Horn <455140+mvanhorn@users.noreply.github.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request refactors the example testing framework to run each example script in its own subprocess with configurable timeouts, strictness, and environment handling. It also updates dependencies in pyproject.toml (such as upgrading shapiq and adding a dedicated examples dependency group) and adds a warning to the TabEBM example. The reviewer feedback suggests prepending the local src directory to PYTHONPATH for the subprocesses to ensure the local package version is tested, and increasing the lines of captured traceback output on failure from 25 to 100 to aid in debugging.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

- Prepend the in-repo src/ to the example subprocess PYTHONPATH so it imports this checkout's tabpfn_extensions even without an editable install. - Widen the failure-output tail from 25 to 100 lines for easier CI debugging. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…fix-bugs-skip-if-missing-deps

adrian-prior and others added 2 commits June 25, 2026 18:41

docs: add changelog entries for #327

83b4dbc

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gemini-code-assist Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread tests/test_examples.py

Comment thread tests/test_examples.py

adrian-prior mentioned this pull request Jun 25, 2026

test: run example-file tests in pull request CI #321

Closed

adrian-prior requested a review from oscarkey June 25, 2026 16:54

Merge branch 'main' into adrian/pri-328-run-example-file-tests-in-ci-…

2a28027

…fix-bugs-skip-if-missing-deps

oscarkey marked this pull request as ready for review July 2, 2026 14:08

oscarkey requested a review from a team as a code owner July 2, 2026 14:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: run example files as CI tests; fix bugs they surface (PRI-328)#327

test: run example files as CI tests; fix bugs they surface (PRI-328)#327
adrian-prior wants to merge 4 commits into
mainfrom
adrian/pri-328-run-example-file-tests-in-ci-fix-bugs-skip-if-missing-deps

adrian-prior commented Jun 25, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

adrian-prior commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Verified

Follow-up

Notes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adrian-prior commented Jun 25, 2026 •

edited

Loading