Commit c50489b
committed
feat: add classify, crawls, export CLI commands
Unify the full pipeline under the corpus CLI:
- `corpus crawls` lists available CDX-filtered crawls from R2
- `corpus classify` wraps Python ML classification as subprocess
- `corpus export` wraps HuggingFace parquet export via uv
- `corpus status` now shows LLM classification breakdown by type
Rewrite README to reflect the complete pipeline flow.1 parent 5b2fedd commit c50489b
6 files changed
Lines changed: 553 additions & 265 deletions
0 commit comments