Linting library and tools for machine learning, statistical modelling, data, code.
- Introduction
- Quick Start
- Installation
- Configuration
- Integrations
- Lint Catalog
- License
- Contributing
- References
- Acknowledgements
DataLinter is a library for contextual linting of data and code. Its development started by rewriting Google's data linter, in Julia. The aim of the redesign was to provide a richer and faster experience while also providing the baseline benefits outlined in the original paper. DataLinter adds on top support for data contexts, such as code snippets or information about the type of analysis, which can lead to the detection of more complex, conceptual issues relating to data and code quality.
- 28 data+code linters (including the Google linters)
- Zero-config CLI and HTTP server modes
- Production-ready Docker image and GitHub Actions integration
- Flexible code querying through ParSitter.jl
- First-class R language support by tree-sitter-based code parsing
- Fully customizable rule engine (see configuration docs)
Try it in seconds with Docker (no installation required):
# Lint a dataset (from the root directory of the repository)
datalinter ./test/data/imbalanced_data.csv \
--code-path ./test/code/r_snippet_imbalanced.r \
--config-path ./config/r_modelling_config.toml \
--log-level error
# Or run the server for HTTP API use
./datalinterserver \
-p 10000 \
--config-path ./config/r_modelling_config.toml \
--log-level debugThe latest Docker image can be downloaded with
docker pull ghcr.io/zgornel/datalinter-compiled:latestSpecific versions are also tagged and accessible with (example for v0.1.2)
docker pull ghcr.io/zgornel/datalinter-compiled:v0.1.2Download the latest datalinter-compiled-binary.zip from the Releases page. Contains both CLI and server binaries.
Note: Windows and macOS users should use Docker or install via Julia.
Installation can be performed also from the Julia REPL with
using Pkg; Pkg.add(url="https://github.com/zgornel/DataLinter")Check out the documentation for information on configuring, running and integrating the linters.
Available integrations:
- RStudio
- Jupyter Notebooks
- Github Actions
- Gitlab CI (upcoming)
- VS Code (upcoming)
DataLinter ships with 28 built-in linters. Description available here.
This code has an MIT license.
Please see CONTRIBUTING.md on how to contribute.
To report a bug or request a feature, please file an issue.
Recent changes can be found in CHANGELOG.md.
[1] https://en.wikipedia.org/wiki/Lint_(software)
[2] N. Hynes, D. Sculley, M. Terry "The data linter: Lightweight, automated sanity checking for ml data sets", NIPS MLSys Workshop, 2017; paper
[3] The data-linter code repository
The initial version of DataLinter was fully inspired by this work written by Google brain research.
