WIP: add preprocessing presets kwarg to TabPFNTSPipeline by LeoGrin · Pull Request #113 · PriorLabs/tabpfn-time-series

LeoGrin · 2026-04-22T13:33:37Z

Summary

Expose tabpfn's inference-time X/y preprocessing pipeline via a simple string enum on TabPFNTSPipeline. Today changing it requires threading inference_config through tabpfn_model_config with PreprocessorConfig objects imported from a nested tabpfn module — not discoverable.

New kwarg: preprocessing: Literal["default", "none", "squashing_scaler"] = "default".

"default" keeps current behaviour (tabpfn library defaults).
"none" disables X/y preprocessing entirely (PREPROCESS_TRANSFORMS=[PreprocessorConfig("none")], REGRESSION_Y_PREPROCESS_TRANSFORMS=[None]).
"squashing_scaler" uses squashing_scaler_max10 + svd_quarter_components followed by a numeric "none" config.

An explicit inference_config in tabpfn_model_config still wins over the preset; a warning is emitted if both are supplied.

Motivation

Empirical sweep on fev-bench (3 checkpoints × 2 splits × 3 preprocs, 846 per-task results): on the non-lite/small-datasets split, "none" and "squashing_scaler" both give a +0.05 SQL skill boost over defaults for both the library default checkpoint and OOD-finetuned variants. On the lite split the library defaults usually win (small regression from removing preprocessing). Worth exposing as a user-facing knob.

Changes

tabpfn_time_series/preprocessing_presets.py (new) — PreprocessingPreset type + build_preprocessing_inference_config() helper.
tabpfn_time_series/pipeline.py — add preprocessing kwarg + _apply_preprocessing_preset() merging logic.
tests/test_preprocessing_presets.py — 6 tests covering the helper and the pipeline integration (both with and without user-supplied inference_config).

Test plan

New unit tests pass (pytest tests/test_preprocessing_presets.py).
Existing test_preprocessing.py, test_predictor.py still pass.
test_pipeline.py::test_predict_client_mode requires TabPFN-cloud credentials (prompts for login interactively), unrelated to this change.
User-facing example in README / quickstart notebook — left for follow-up.

🤖 Generated with Claude Code

Exposes tabpfn's inference-time X/y preprocessing pipeline via a simple string enum on TabPFNTSPipeline. Today, changing it requires threading `inference_config` through `tabpfn_model_config` with PreprocessorConfig objects imported from a nested tabpfn module — not discoverable. New kwarg: preprocessing in {"default", "none", "squashing_scaler"}. - "default" keeps current behaviour (tabpfn library defaults). - "none" disables X/y preprocessing entirely (PREPROCESS_TRANSFORMS=[none], REGRESSION_Y_PREPROCESS_TRANSFORMS=[None]). - "squashing_scaler" uses squashing_scaler_max10 + svd_quarter_components followed by a numeric "none" config. Empirically on the fev-bench small/non-lite split, "none" and "squashing_scaler" give a +0.05 SQL skill boost over defaults for both the library default checkpoint and OOD-finetuned variants. On the fev-bench lite split the defaults usually win (small regression from removing preprocessing). Exposing the knob lets users try both. An explicit `inference_config` in `tabpfn_model_config` still wins over the preset; a warning is emitted if both are supplied.

CLAassistant · 2026-04-22T13:33:44Z

All committers have signed the CLA.

gemini-code-assist

Code Review

This pull request introduces a preprocessing parameter to the TabPFNTSPipeline constructor, allowing users to select from predefined inference-time preprocessing presets. These presets are defined in a new preprocessing_presets module. A review comment pointed out that the _apply_preprocessing_preset method should consistently return a copy of the configuration dictionary to prevent accidental mutation of the global default settings and to match the method's docstring.

gemini-code-assist · 2026-04-22T13:35:08Z

+        preset_cfg = build_preprocessing_inference_config(preprocessing)
+        if preset_cfg is None:
+            return tabpfn_model_config
+        if "inference_config" in tabpfn_model_config:
+            # User-supplied inference_config takes precedence; warn so the
+            # mismatch between kwargs is discoverable.
+            warnings.warn(
+                "Both `preprocessing` and `tabpfn_model_config['inference_config']` "
+                "were provided. Using the explicit `inference_config` from "
+                "`tabpfn_model_config` and ignoring the preset.",
+                stacklevel=3,
+            )
+            return tabpfn_model_config
+        return {**tabpfn_model_config, "inference_config": preset_cfg}


The docstring for _apply_preprocessing_preset states that it injects the configuration into a copy of the input, but the implementation returns the original tabpfn_model_config object in several branches (lines 293 and 303). Since the default value for tabpfn_model_config in the constructor is a shared global dictionary (TABPFN_DEFAULT_CONFIG), returning it directly can lead to accidental mutation of the default configuration if the predictor or other components modify it in-place.

Always returning a copy ensures the original configuration remains immutable and consistent with the docstring.

config = tabpfn_model_config.copy() preset_cfg = build_preprocessing_inference_config(preprocessing) if preset_cfg is None: return config if "inference_config" in config: # User-supplied inference_config takes precedence; warn so the # mismatch between kwargs is discoverable. warnings.warn( "Both `preprocessing` and `tabpfn_model_config['inference_config']` " "were provided. Using the explicit `inference_config` from " "`tabpfn_model_config` and ignoring the preset.", stacklevel=3, ) return config config["inference_config"] = preset_cfg return config

gemini-code-assist Bot reviewed Apr 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: add preprocessing presets kwarg to TabPFNTSPipeline#113

WIP: add preprocessing presets kwarg to TabPFNTSPipeline#113
LeoGrin wants to merge 1 commit into
mainfrom
leo/preprocessing-options

LeoGrin commented Apr 22, 2026

Uh oh!

CLAassistant commented Apr 22, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LeoGrin commented Apr 22, 2026

Summary

Motivation

Changes

Test plan

Uh oh!

CLAassistant commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented Apr 22, 2026 •

edited

Loading