You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I addition to #24, this one adds full support for python 3.13 and python
3.14 and drops support for 3.9 (because it is EOL).
Pandas is now supports up to python 3.14, which was the blocker for
supporting these version in `tab-err`.
See:
- https://pyreadiness.org/3.9/
- https://pyreadiness.org/3.13
- https://pyreadiness.org/3.14
---------
Signed-off-by: Sebastian Jäger <git@sebastian-jaeger.me>
`tab_err` injects realistic errors into tabular data such as database tables and DataFrames.
23
23
The library is developed and maintained by the [Cognitive Algorithms Lab](https://calgo-lab.de/) at BHT Berlin.
24
24
@@ -28,13 +28,15 @@ Researchers and data practitioners can generate errors in a controlled way, eval
28
28
## How it Works
29
29
30
30
The library's building blocks are `ErrorMechanism`s, `ErrorType`s, and `ErrorModel`s.
31
-
- An `ErrorMechanism` describes the error's distribution - that's *where* incorrect cells appear in the table. We support *erroneous at random* (EAR), *erroneous not at random* (ENAR) and *erroneous completely at random* (ECAR).
32
-
- An `ErrorType` describes *how* the value is wrong: a typo, an outlier, a category swap, and so on. Read the documentation for a [full list of supported error types](https://tab-err.readthedocs.io/latest/api/tab_err/error_type/index.html).
31
+
32
+
- An `ErrorMechanism` describes the error's distribution - that's _where_ incorrect cells appear in the table. We support _erroneous at random_ (EAR), _erroneous not at random_ (ENAR) and _erroneous completely at random_ (ECAR).
33
+
- An `ErrorType` describes _how_ the value is wrong: a typo, an outlier, a category swap, and so on. Read the documentation for a [full list of supported error types](https://tab-err.readthedocs.io/latest/api/tab_err/error_type/index.html).
33
34
- An `ErrorModel` is a set of mechanisms and types to perturb existing data with realistic errors. It is shareable as metadata.
34
35
35
36
`tab_err` is supported by a `pandas` backend.
36
37
37
38
## Examples
39
+
38
40
```python
39
41
from sklearn.datasets import load_iris
40
42
@@ -77,7 +79,7 @@ For a detailed guide and more examples, see our [Getting Started Notebook](https
77
79
78
80
## Where to get it
79
81
80
-
The source code is hosted on GitHub at <https://github.com/calgo-lab/tab_err>.
82
+
The source code is hosted on GitHub at <https://github.com/calgo-lab/tab_err>.
81
83
Binary installers for the latest releases are available at the Python Package Index (PyPI) <https://pypi.org/project/tab-err>.
82
84
83
85
```sh
@@ -96,25 +98,26 @@ Develop on feature branches and open pull requests when you're ready.
96
98
Make sure that your changes are tested, documented, and clearly described in the pull request.
97
99
98
100
## Citation
101
+
99
102
If you use the error model that's underlying `tab_err` for a scientific publication, we would appreciate your citation.
100
103
101
-
```
104
+
```bib
102
105
@article{10.1145/3774914,
103
-
author = {Jung, Philipp and J\"{a}ger, Sebastian and Chandler, Nicholas and Biessmann, Felix},
104
-
title = {Towards Realistic Error Models for Tabular Data},
105
-
year = {2025},
106
-
issue_date = {December 2025},
107
-
publisher = {Association for Computing Machinery},
108
-
address = {New York, NY, USA},
109
-
volume = {17},
110
-
number = {4},
111
-
issn = {1936-1955},
112
-
url = {https://doi.org/10.1145/3774914},
113
-
doi = {10.1145/3774914},
114
-
journal = {J. Data and Information Quality},
115
-
month = dec,
116
-
articleno = {28},
117
-
numpages = {27},
118
-
keywords = {Tabular data, data quality, data errors, data error generation, error model, realistic error model, error type}
106
+
author = {Jung, Philipp and J\"{a}ger, Sebastian and Chandler, Nicholas and Biessmann, Felix},
107
+
title = {Towards Realistic Error Models for Tabular Data},
108
+
year = {2025},
109
+
issue_date = {December 2025},
110
+
publisher = {Association for Computing Machinery},
111
+
address = {New York, NY, USA},
112
+
volume = {17},
113
+
number = {4},
114
+
issn = {1936-1955},
115
+
url = {https://doi.org/10.1145/3774914},
116
+
doi = {10.1145/3774914},
117
+
journal = {J. Data and Information Quality},
118
+
month = dec,
119
+
articleno = {28},
120
+
numpages = {27},
121
+
keywords = {Tabular data, data quality, data errors, data error generation, error model, realistic error model, error type}
0 commit comments