Skip to content

Commit c50489b

Browse files
committed
feat: add classify, crawls, export CLI commands
Unify the full pipeline under the corpus CLI: - `corpus crawls` lists available CDX-filtered crawls from R2 - `corpus classify` wraps Python ML classification as subprocess - `corpus export` wraps HuggingFace parquet export via uv - `corpus status` now shows LLM classification breakdown by type Rewrite README to reflect the complete pipeline flow.
1 parent 5b2fedd commit c50489b

6 files changed

Lines changed: 553 additions & 265 deletions

File tree

0 commit comments

Comments
 (0)