Skip to content
#

data-curation

Here are 160 public repositories matching this topic...

Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

  • Updated Jan 13, 2026
  • Python

fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quality of both images and labels, while significantly reducing data operation costs, all with unmatched scalability.

  • Updated Apr 14, 2026
  • Python

PyTorch dataset debugger for computer vision — pause training, mine live loss signals to surface mislabels, class imbalance & outliers, then curate your image, video & LiDAR data without restarting

  • Updated Jun 26, 2026
  • Python

A tool for downloading from public image boards (which allow scraping) / preview your images & tags / edit your images & tags. Additional tabs for downloading other desired code repositories as well as S.O.T.A. diffusion and auto-tag/caption models for your purposes. Custom datasets can be added!

  • Updated Jun 27, 2026
  • Python

Improve this page

Add a description, image, and links to the data-curation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-curation topic, visit your repo's landing page and select "manage topics."

Learn more