|
1 | | -# 📊 Telemetry |
| 1 | +# Telemetry |
2 | 2 |
|
3 | 3 | This project includes lightweight, anonymous telemetry to help us improve TabPFN. If you'd rather not send telemetry, you can always opt out (see **Opting out**). |
4 | 4 |
|
5 | 5 | --- |
6 | 6 |
|
7 | | -## 🔍 What we collect |
| 7 | +## What we collect |
8 | 8 |
|
9 | | -We only gather **very high-level usage signals** — enough to guide development, never enough to identify you or your data. |
| 9 | +We only gather **very high-level usage signals** — enough to guide development, never enough to identify you or your data. |
10 | 10 |
|
11 | 11 | Here's the full list: |
12 | 12 |
|
13 | 13 | ### Events |
14 | | -- `ping` – sent when models initialize, used to check liveness |
15 | | -- `fit_called` – sent when you call `fit` |
| 14 | +- `ping` – periodic liveness heartbeat (daily / weekly / monthly cadence) |
| 15 | +- `session` – sent when you initialize a TabPFN estimator (`TabPFNClassifier`, `TabPFNRegressor`) |
| 16 | +- `model_load` – sent when TabPFN attempts to load model weights (reports `success` / `failed`) |
| 17 | +- `dataset` – sent when a dataset is passed to `fit` or `predict` (no dataset content; shape only) |
| 18 | +- `fit_called` – sent when you call `fit` |
16 | 19 | - `predict_called` – sent when you call `predict` |
17 | | -- `session` - sent whenever a user initializes a TabPFN estimator. |
| 20 | +- `extension_entry` – sent when a TabPFN extension entry point (e.g. from `tabpfn-extensions`, `tabpfn-time-series`) is invoked |
18 | 21 |
|
19 | 22 | ### Metadata (all events) |
20 | | -- `python_version` – version of Python you're running |
| 23 | +- `python_version` – Python version you're running |
21 | 24 | - `tabpfn_version` – TabPFN package version |
| 25 | +- `numpy_version` – local NumPy version |
| 26 | +- `pandas_version` – local pandas version |
| 27 | +- `gpu_type` – type of GPU TabPFN is running on |
| 28 | +- `platform_os` – operating system |
| 29 | +- `runtime_kernel` – runtime kernel (e.g. CPython) |
| 30 | +- `runtime_environment` – runtime environment (e.g. notebook / script / CI) |
22 | 31 | - `timestamp` – time of the event |
23 | | -- `numpy_vesion` - local Numpy version |
24 | | -- `pandas_version` - local Pandas version |
25 | | -- `gpu_type` - type of GPU TabPFN is running on. |
26 | | -- `install_date` - `year-month-day` when TabPFN was used for the first time |
27 | | -- `install_id` - unique, random and anonymous installation ID. |
28 | | - |
29 | | -### Extra metadata (`fit` / `predict` only) |
30 | | -- `task` – whether the call was for **classification** or **regression** |
31 | | -- `num_rows` – *rounded* number of rows in your dataset |
32 | | -- `num_columns` – *rounded* number of columns in your dataset |
33 | | -- `duration_ms` – time it took to complete the call |
| 32 | +- `install_date` – `year-month-day` when TabPFN was used for the first time |
| 33 | +- `install_id` – unique, random and anonymous installation ID |
| 34 | + |
| 35 | +### Extra metadata (per-event) |
| 36 | +- `fit_called` / `predict_called`: `task` (classification or regression), `num_rows` (*rounded*), `num_columns` (*rounded*), `duration_ms` |
| 37 | +- `model_load`: `model_name` (HuggingFace repo id), `status` |
| 38 | +- `dataset`: `task`, `role` (train / test), `num_rows` (*rounded*), `num_columns` (*rounded*) |
| 39 | +- `extension_entry`: `extension_name` |
34 | 40 |
|
35 | 41 | --- |
36 | 42 |
|
37 | | -## 🛡️ How we protect your privacy |
| 43 | +## How we protect your privacy |
38 | 44 |
|
39 | | -- **No inputs, no outputs, no code** ever leave your machine. |
40 | | -- **No personal data** is collected. |
41 | | -- Dataset shapes are **rounded into ranges** (e.g. `(953, 17)` → `(1000, 20)`) so exact dimensionalities can't be linked back to you. |
42 | | -- The data is strictly anonymous — it cannot be tied to individuals, projects, or datasets. |
| 45 | +- **No inputs, no outputs, no code** ever leave your machine. |
| 46 | +- **No personal data** is collected. |
| 47 | +- Dataset shapes are **rounded into ranges** (e.g. `(953, 17)` → `(1000, 20)`) so exact dimensionalities can't be linked back to you. |
| 48 | +- The data is strictly anonymous — it cannot be tied to individuals, projects, or datasets. |
43 | 49 |
|
44 | | -This approach lets us understand dataset *patterns* (e.g. "most users run with ~1k features") while ensuring no one's data is exposed. |
| 50 | +This approach lets us understand dataset *patterns* (e.g. "most users run with ~1k features") while ensuring no one's data is exposed. |
45 | 51 |
|
46 | 52 | --- |
47 | 53 |
|
48 | | -## 🤔 Why collect telemetry? |
| 54 | +## Why collect telemetry? |
49 | 55 |
|
50 | | -Open-source projects don't get much feedback unless people file issues. Telemetry helps us: |
51 | | -- See which parts of TabPFN are most used (fit vs predict, classification vs regression) |
52 | | -- Detect performance bottlenecks and stability issues |
53 | | -- Prioritize improvements that benefit the most users |
| 56 | +Open-source projects don't get much feedback unless people file issues. Telemetry helps us: |
| 57 | +- See which parts of TabPFN are most used (fit vs predict, classification vs regression) |
| 58 | +- Detect performance bottlenecks and stability issues |
| 59 | +- Prioritize improvements that benefit the most users |
54 | 60 |
|
55 | | -This information goes directly into **making TabPFN better** for the community. |
| 61 | +This information goes directly into **making TabPFN better** for the community. |
56 | 62 |
|
57 | 63 | --- |
58 | 64 |
|
59 | | -## 🚫 Opting out |
| 65 | +## Opting out |
60 | 66 |
|
61 | 67 | Don't want to send telemetry? No problem — just set the environment variable: |
62 | 68 |
|
|
0 commit comments