derekallman
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 0 deletions b/‎.gitignore‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 22 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 22 additions & 0 deletions
diff --git a/‎Cargo.lock‎
Lines changed: 3 additions & 3 deletions b/‎Cargo.lock‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎Cargo.toml‎
Lines changed: 1 addition & 1 deletion b/‎Cargo.toml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md‎
Lines changed: 5 additions & 5 deletions b/‎README.md‎
Lines changed: 5 additions & 5 deletions
diff --git a/‎ROADMAP.md‎
Lines changed: 1 addition & 9 deletions b/‎ROADMAP.md‎
Lines changed: 1 addition & 9 deletions
diff --git a/‎crates/hotcoco-cli/Cargo.toml‎
Lines changed: 1 addition & 1 deletion b/‎crates/hotcoco-cli/Cargo.toml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎crates/hotcoco-pyo3/Cargo.toml‎
Lines changed: 1 addition & 1 deletion b/‎crates/hotcoco-pyo3/Cargo.toml‎
Lines changed: 1 addition & 1 deletion
@@ -133,6 +133,8 @@ site/
 crates/hotcoco-pyo3/data/annotations/
 crates/hotcoco-pyo3/data/*.json
 crates/hotcoco-pyo3/data/__pycache__/
+# Adversarial harness output (generated, not committed)
+crates/hotcoco-pyo3/data/parity_failures/
 
 # uv lock file (transient dev env)
 crates/hotcoco-pyo3/uv.lock
 
@@ -7,8 +7,30 @@ and this project adheres to [Semantic Versioning](https://semver.org/).
 
 ## [Unreleased]
 
+## [0.2.0] - 2026-03-11
+
 ### Added
 
+- Objects365 benchmark results (80k images, 365 categories, ~1.2M detections): hotcoco **39×** vs pycocotools and **14×** vs faster-coco-eval; peak committed RAM 8 GB vs 24–30 GB for alternatives
+- `bench_objects365.py` now includes pycocotools as a third runner; Windows support (`peak_wset` + pagefile for memory measurement, `.exe` binary name); `_bench_python_runner` shared helper; process-tree memory tracking via psutil
+
+### Changed
+
+- Feature comparison table in `docs/benchmarks.md` corrected: faster-coco-eval installation (prebuilt wheels available), metric parity (exact vs pycocotools), LVIS support (`lvis_style=True`), per-class AP (`extended_metrics`), Python version floor (3.7+)
+- Parity tolerance claim updated from flat "≤1e-4" to per-type breakdown: bbox ≤1e-4, segm ≤2e-4, keypoints exact
+- Benchmark numbers in `README.md` and `docs/index.md` synced to current bench.py output (bbox 0.41s 23×, segm 0.49s 18.6×, kpts 0.21s 12.7×); corrected detection count from ~43,700 to 36,781
+- Documentation: added paper citations for COCO eval (Lin et al. ECCV 2014), OKS (cocodataset.org), LVIS (Gupta et al. ECCV 2019), and TIDE (Bolya et al. ECCV 2020 arxiv); area range notation clarified to square pixels (px²); LVIS frequency definition corrected from instance count to training image count
+
+### Added
+
+- `ConfusionMatrix.cat_names` / `confusion_matrix()` dict now includes `"cat_names"` — category names parallel to `cat_ids`, eliminating a manual `load_cats` lookup after computing a confusion matrix
+- `EvalResults.hotcoco_version` — records the library version that produced the results file; included in the `results()` dict and saved JSON
+- `TideErrors` now derives `Serialize` (Rust) — can be serialized directly with `serde_json`
+
+### Changed
+
+- `EvalResults::to_json_string()` renamed to `to_json()` for consistency with Rust naming conventions
+
 - `COCOeval.results(per_class=False)` — return serializable evaluation results as a dict; `save_results(path, per_class=False)` writes the same structure as pretty-printed JSON
 - `coco-eval --output / -o <path>` — CLI flag to write evaluation results JSON after evaluation (always includes per-category AP)
 - `AreaRange` struct in `hotcoco::params` (re-exported from crate root) — replaces the two parallel `area_rng` / `area_rng_lbl` vecs in `Params` with a single `Vec<AreaRange { label, range }>`
 
@@ -3,7 +3,7 @@ members = ["crates/hotcoco", "crates/hotcoco-cli", "crates/hotcoco-pyo3"]
 resolver = "2"
 
 [workspace.package]
-version = "0.1.0"
+version = "0.2.0"
 edition = "2021"
 rust-version = "1.74"
 authors = ["Derek Allman <derek.allman@yahoo.com>"]
 
@@ -5,21 +5,21 @@
 [![Crates.io](https://img.shields.io/crates/v/hotcoco)](https://crates.io/crates/hotcoco)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
 
-11-26x faster COCO evaluation — a drop-in replacement for [pycocotools](https://github.com/ppwwyyxx/cocoapi) that works with Ultralytics YOLO, Detectron2, mmdetection, RF-DETR, and any pycocotools-based pipeline.
+Fast enough for every epoch, lean enough for every dataset. A drop-in replacement for [pycocotools](https://github.com/ppwwyyxx/cocoapi) that doesn't become the bottleneck — in your training loop or at foundation model scale. Up to 23× faster on standard COCO, 39× faster on Objects365, and fits comfortably in memory where alternatives run out.
 
 Available as a **Python package**, **CLI tool**, and **Rust library**. Pure Rust — no Cython, no C compiler, no Microsoft Build Tools. Prebuilt wheels for Linux, macOS, and Windows.
 
 **[Documentation](https://derekallman.github.io/hotcoco/)** | **[Changelog](CHANGELOG.md)** | **[Roadmap](ROADMAP.md)**
 
 ## Performance
 
-Benchmarked on COCO val2017 (5,000 images, 36,781 ground truth annotations, ~43,700 detections), Apple M1 MacBook Air:
+Benchmarked on COCO val2017 (5,000 images, 36,781 synthetic detections), Apple M1 MacBook Air:
 
 | Eval Type | pycocotools | faster-coco-eval | hotcoco |
 |-----------|-------------|------------------|-----------|
-| bbox      | 11.79s      | 3.47s (3.4x)    | 0.74s (15.9x) |
-| segm      | 19.49s      | 10.52s (1.9x)   | 1.58s (12.3x) |
-| keypoints | 4.79s       | 3.08s (1.6x)    | 0.19s (25.0x) |
+| bbox      | 9.46s  | 2.45s (3.9x)  | **0.41s (23.0x)** |
+| segm      | 9.16s  | 4.36s (2.1x)  | **0.49s (18.6x)** |
+| keypoints | 2.62s  | 1.78s (1.5x)  | **0.21s (12.7x)** |
 
 Speedups in parentheses are vs pycocotools. Results verified against pycocotools on COCO val2017 with a 10,000+ case parity test suite — your AP scores won't change.
 
 
@@ -56,7 +56,7 @@ All implemented in Rust core, exposed via Python CLI and Python API.
 
 **Shipped.**
 
-~~Standard COCO evaluation protocol over 365 categories and ~2M images.~~ Verified working on real O365 annotation data. Published benchmark numbers coming soon — pending runs on more capable hardware.
+~~Standard COCO evaluation protocol over 365 categories and ~2M images.~~ Verified working on real O365 annotation data. Benchmark numbers published: 39× vs pycocotools, 14× vs faster-coco-eval on 80k images / 365 categories / 1.2M detections, using 8 GB committed vs 24–30 GB for alternatives.
 
 ### LVIS
 
@@ -72,14 +72,6 @@ All implemented in Rust core, exposed via Python CLI and Python API.
 
 ---
 
-## Tier 1 — Next
-
-### O365 Benchmark Numbers
-
-Publish benchmark results for Objects365-scale evaluation once runs on more capable hardware are complete.
-
----
-
 ## Tier 2 — Medium Term
 
 ### Format Conversion
 
@@ -17,5 +17,5 @@ name = "coco-eval"
 path = "src/main.rs"
 
 [dependencies]
-hotcoco = { version = "0.1.0", path = "../hotcoco" }
+hotcoco = { version = "0.2.0", path = "../hotcoco" }
 clap = { version = "4", features = ["derive"] }
@@ -15,7 +15,7 @@ name = "hotcoco"
 crate-type = ["cdylib"]
 
 [dependencies]
-hotcoco-core = { version = "0.1.0", path = "../hotcoco", package = "hotcoco" }
+hotcoco-core = { version = "0.2.0", path = "../hotcoco", package = "hotcoco" }
 pyo3 = { version = "0.23", features = ["extension-module", "abi3-py39"] }
 numpy = "0.23"
 serde_json = "1"