Skip to content

test: run example files as CI tests; fix bugs they surface (PRI-328)#327

Open
adrian-prior wants to merge 4 commits into
mainfrom
adrian/pri-328-run-example-file-tests-in-ci-fix-bugs-skip-if-missing-deps
Open

test: run example files as CI tests; fix bugs they surface (PRI-328)#327
adrian-prior wants to merge 4 commits into
mainfrom
adrian/pri-328-run-example-file-tests-in-ci-fix-bugs-skip-if-missing-deps

Conversation

@adrian-prior

@adrian-prior adrian-prior commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Closes #51.

What

Reenables example-file testing in CI, replacing the old 5s-timeout/XFAIL smoke harness with a real one.

  • Harness rewrite (tests/test_examples.py): each example runs as a subprocess. Two modes:
    • Smoke (per-PR, default): short timeout; an example that errors fails, one that times out passes (we only assert it starts and runs without crashing).
    • Strict (--example-strict, for the scheduled GPU run): a timeout is a failure — run to completion.
  • CPU smoke workflow (.github/workflows/example_tests.yml): path-filtered to examples/** + the harness; runs the smoke gate with -n auto.
  • Bug fixes — running the examples for real surfaced these:
    • shapiq examples → bump interpretability floor to shapiq>=1.2.0 (lowest-direct was installing 1.1.0, which predates class_index (1.1.1) and TabPFNExplainer (1.2.0)).
    • shap_exampleshap added to a new examples dependency group, floored >=0.46.0 (0.41.0 pulled an ancient, unbuildable numba).
    • generate_data_following_dag → import the unsupervised.experiments submodule explicitly.
    • tabebm example → xfail + documented; broken with current TabPFN's inference_config API (TabEBM broken with recent TabPFN + test broken. #225).

Verified

Built the env on a GPU node and ran the examples — shapiq_example + generate_data_following_dag pass, tabebm xfails. Full CPU smoke suite: 16 passed, 2 skipped (survival: GPL; get_embeddings: GPU-only), 1 xfailed.

Follow-up

Nightly GPU run-to-completion (strict) + making slow examples test-mode-aware is tracked in PRI-330.

Notes

Supersedes #321 (the prior attempt at #51) — builds on @mvanhorn's work there, credited as co-author.

🤖 Generated with Claude Code

adrian-prior and others added 2 commits June 25, 2026 18:41
Rewrite the example-test harness to run each example as a subprocess, with
fast per-PR smoke semantics (a timeout passes; only an error fails) and a
strict mode for the scheduled GPU run (PRI-330). Add a path-filtered CPU
smoke workflow that runs it on PRs touching examples/.

Running the examples for real surfaced several breakages, fixed here:

- shapiq examples: bump the interpretability floor to shapiq>=1.2.0. The old
  >=1.1.0 floor let lowest-direct install 1.1.0, which predates class_index
  (1.1.1) and TabPFNExplainer (1.2.0); the example crashed on both.
- shap_example: add shap to a new "examples" dependency group, floored at
  0.46.0 (0.41.0 pulled an ancient, unbuildable numba).
- generate_data_following_dag: import the unsupervised.experiments submodule
  explicitly instead of relying on attribute access.
- tabebm example: xfail with a tracking note; broken with current TabPFN's
  inference_config API (#225).

Closes #51. Supersedes #321.

Co-Authored-By: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the example testing framework to run each example script in its own subprocess with configurable timeouts, strictness, and environment handling. It also updates dependencies in pyproject.toml (such as upgrading shapiq and adding a dedicated examples dependency group) and adds a warning to the TabEBM example. The reviewer feedback suggests prepending the local src directory to PYTHONPATH for the subprocesses to ensure the local package version is tested, and increasing the lines of captured traceback output on failure from 25 to 100 to aid in debugging.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread tests/test_examples.py
Comment thread tests/test_examples.py
- Prepend the in-repo src/ to the example subprocess PYTHONPATH so it imports
  this checkout's tabpfn_extensions even without an editable install.
- Widen the failure-output tail from 25 to 100 lines for easier CI debugging.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@oscarkey oscarkey marked this pull request as ready for review July 2, 2026 14:08
@oscarkey oscarkey requested a review from a team as a code owner July 2, 2026 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve testing: reenable testing example files

2 participants