Skip to content

Fixing #872#892

Merged
psinger-prior merged 5 commits intomainfrom
claude/fix-tabpfn-872-2ElLU
Apr 27, 2026
Merged

Fixing #872#892
psinger-prior merged 5 commits intomainfrom
claude/fix-tabpfn-872-2ElLU

Conversation

@psinger-prior
Copy link
Copy Markdown
Contributor

@psinger-prior psinger-prior commented Apr 23, 2026

Issue

Closes #872

We can just store self.X_ = X with the original data instead of after ensure_compatible_fit_inputs_sklearn which then anyways refits at a later point via self.finetuned_inference_classifier_.fit(self.X_, self.y_) which internally re-runs ensure_compatible_fit_inputs_sklearn.

I could not identify why this could only be an "issue" when providing X_val as pointed out in the opened issue.

claude added 4 commits April 23, 2026 09:46
`FinetunedTabPFNBase._fit` was overwriting `self.X_`/`self.y_` with the
numpy arrays returned by sklearn validation, so the final inference
estimator was refit on numpy inputs and never recorded the original
DataFrame's feature names. Retain the raw inputs before validation so
the inference model sees the DataFrame and sets `feature_names_in_`,
avoiding spurious "X does not have valid feature names" warnings when
predicting on DataFrames.

Fixes #872.
@psinger-prior psinger-prior requested a review from a team as a code owner April 23, 2026 11:07
@psinger-prior psinger-prior requested review from alanprior and removed request for a team April 23, 2026 11:07
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 23, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix an issue where pandas feature names were dropped from the final inference model in FinetunedTabPFN estimators. The changes involve moving the assignment of raw training inputs before the validation step to ensure feature names are retained. A review comment suggests refactoring this logic to avoid setting fitted attributes before validation is complete, which could lead to inconsistent estimator states, and recommends explicitly capturing feature names and counts for better scikit-learn API compliance.

Comment thread src/tabpfn/finetuning/finetuned_base.py Outdated
Copy link
Copy Markdown
Contributor

@alanprior alanprior left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@psinger-prior psinger-prior added this pull request to the merge queue Apr 27, 2026
Merged via the queue into main with commit 6158b2c Apr 27, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FinetunedTabPFNClassifier drops Pandas feature names if X_val is used

4 participants