Skip to content

zgornel/DataLinter

Repository files navigation

data | linter

Linting library and tools for machine learning, statistical modelling, data, code.

License Latest Release Tests Build Status codecov code style: runic Documentation

til

Table of Contents

Introduction

DataLinter is a library for contextual linting of data and code. Its development started by rewriting Google's data linter, in Julia. The aim of the redesign was to provide a richer and faster experience while also providing the baseline benefits outlined in the original paper. DataLinter adds on top support for data contexts, such as code snippets or information about the type of analysis, which can lead to the detection of more complex, conceptual issues relating to data and code quality.

Key Features

Quick Start

Try it in seconds with Docker (no installation required):

# Lint a dataset (from the root directory of the repository)
datalinter ./test/data/imbalanced_data.csv \
    --code-path ./test/code/r_snippet_imbalanced.r \
    --config-path ./config/r_modelling_config.toml \
    --log-level error

# Or run the server for HTTP API use
./datalinterserver \
    -p 10000 \
    --config-path ./config/r_modelling_config.toml \
    --log-level debug

Installation

Docker image

The latest Docker image can be downloaded with

docker pull ghcr.io/zgornel/datalinter-compiled:latest

Specific versions are also tagged and accessible with (example for v0.1.2)

docker pull ghcr.io/zgornel/datalinter-compiled:v0.1.2

Pre-compiled binaries (Linux x86-64)

Download the latest datalinter-compiled-binary.zip from the Releases page. Contains both CLI and server binaries.

Note: Windows and macOS users should use Docker or install via Julia.

Julia

Installation can be performed also from the Julia REPL with

using Pkg; Pkg.add(url="https://github.com/zgornel/DataLinter")

Configuration

Check out the documentation for information on configuring, running and integrating the linters.

Integrations

Available integrations:

Lint Catalog

DataLinter ships with 28 built-in linters. Description available here.

License

This code has an MIT license.

Contributing

Please see CONTRIBUTING.md on how to contribute.

To report a bug or request a feature, please file an issue.

Recent changes can be found in CHANGELOG.md.

References

[1] https://en.wikipedia.org/wiki/Lint_(software)

[2] N. Hynes, D. Sculley, M. Terry "The data linter: Lightweight, automated sanity checking for ml data sets", NIPS MLSys Workshop, 2017; paper

[3] The data-linter code repository

Acknowledgements

The initial version of DataLinter was fully inspired by this work written by Google brain research.

Packages

 
 
 

Contributors