Skip to content

Commit b24c02f

Browse files
authored
docs: update TELEMETRY.md to reflect current event set (#893)
1 parent 135d29f commit b24c02f

1 file changed

Lines changed: 37 additions & 31 deletions

File tree

TELEMETRY.md

Lines changed: 37 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,62 +1,68 @@
1-
# 📊 Telemetry
1+
# Telemetry
22

33
This project includes lightweight, anonymous telemetry to help us improve TabPFN. If you'd rather not send telemetry, you can always opt out (see **Opting out**).
44

55
---
66

7-
## 🔍 What we collect
7+
## What we collect
88

9-
We only gather **very high-level usage signals** — enough to guide development, never enough to identify you or your data.
9+
We only gather **very high-level usage signals** — enough to guide development, never enough to identify you or your data.
1010

1111
Here's the full list:
1212

1313
### Events
14-
- `ping` – sent when models initialize, used to check liveness
15-
- `fit_called` – sent when you call `fit`
14+
- `ping` – periodic liveness heartbeat (daily / weekly / monthly cadence)
15+
- `session` – sent when you initialize a TabPFN estimator (`TabPFNClassifier`, `TabPFNRegressor`)
16+
- `model_load` – sent when TabPFN attempts to load model weights (reports `success` / `failed`)
17+
- `dataset` – sent when a dataset is passed to `fit` or `predict` (no dataset content; shape only)
18+
- `fit_called` – sent when you call `fit`
1619
- `predict_called` – sent when you call `predict`
17-
- `session` - sent whenever a user initializes a TabPFN estimator.
20+
- `extension_entry` sent when a TabPFN extension entry point (e.g. from `tabpfn-extensions`, `tabpfn-time-series`) is invoked
1821

1922
### Metadata (all events)
20-
- `python_version`version of Python you're running
23+
- `python_version` – Python version you're running
2124
- `tabpfn_version` – TabPFN package version
25+
- `numpy_version` – local NumPy version
26+
- `pandas_version` – local pandas version
27+
- `gpu_type` – type of GPU TabPFN is running on
28+
- `platform_os` – operating system
29+
- `runtime_kernel` – runtime kernel (e.g. CPython)
30+
- `runtime_environment` – runtime environment (e.g. notebook / script / CI)
2231
- `timestamp` – time of the event
23-
- `numpy_vesion` - local Numpy version
24-
- `pandas_version` - local Pandas version
25-
- `gpu_type` - type of GPU TabPFN is running on.
26-
- `install_date` - `year-month-day` when TabPFN was used for the first time
27-
- `install_id` - unique, random and anonymous installation ID.
28-
29-
### Extra metadata (`fit` / `predict` only)
30-
- `task` – whether the call was for **classification** or **regression**
31-
- `num_rows`*rounded* number of rows in your dataset
32-
- `num_columns`*rounded* number of columns in your dataset
33-
- `duration_ms` – time it took to complete the call
32+
- `install_date``year-month-day` when TabPFN was used for the first time
33+
- `install_id` – unique, random and anonymous installation ID
34+
35+
### Extra metadata (per-event)
36+
- `fit_called` / `predict_called`: `task` (classification or regression), `num_rows` (*rounded*), `num_columns` (*rounded*), `duration_ms`
37+
- `model_load`: `model_name` (HuggingFace repo id), `status`
38+
- `dataset`: `task`, `role` (train / test), `num_rows` (*rounded*), `num_columns` (*rounded*)
39+
- `extension_entry`: `extension_name`
3440

3541
---
3642

37-
## 🛡️ How we protect your privacy
43+
## How we protect your privacy
3844

39-
- **No inputs, no outputs, no code** ever leave your machine.
40-
- **No personal data** is collected.
41-
- Dataset shapes are **rounded into ranges** (e.g. `(953, 17)``(1000, 20)`) so exact dimensionalities can't be linked back to you.
42-
- The data is strictly anonymous — it cannot be tied to individuals, projects, or datasets.
45+
- **No inputs, no outputs, no code** ever leave your machine.
46+
- **No personal data** is collected.
47+
- Dataset shapes are **rounded into ranges** (e.g. `(953, 17)``(1000, 20)`) so exact dimensionalities can't be linked back to you.
48+
- The data is strictly anonymous — it cannot be tied to individuals, projects, or datasets.
4349

44-
This approach lets us understand dataset *patterns* (e.g. "most users run with ~1k features") while ensuring no one's data is exposed.
50+
This approach lets us understand dataset *patterns* (e.g. "most users run with ~1k features") while ensuring no one's data is exposed.
4551

4652
---
4753

48-
## 🤔 Why collect telemetry?
54+
## Why collect telemetry?
4955

50-
Open-source projects don't get much feedback unless people file issues. Telemetry helps us:
51-
- See which parts of TabPFN are most used (fit vs predict, classification vs regression)
52-
- Detect performance bottlenecks and stability issues
53-
- Prioritize improvements that benefit the most users
56+
Open-source projects don't get much feedback unless people file issues. Telemetry helps us:
57+
- See which parts of TabPFN are most used (fit vs predict, classification vs regression)
58+
- Detect performance bottlenecks and stability issues
59+
- Prioritize improvements that benefit the most users
5460

55-
This information goes directly into **making TabPFN better** for the community.
61+
This information goes directly into **making TabPFN better** for the community.
5662

5763
---
5864

59-
## 🚫 Opting out
65+
## Opting out
6066

6167
Don't want to send telemetry? No problem — just set the environment variable:
6268

0 commit comments

Comments
 (0)