Skip to content

Commit f6e8a0a

Browse files
committed
Finish clostera rebrand cleanup
1 parent 22dcdef commit f6e8a0a

16 files changed

Lines changed: 203 additions & 203 deletions

README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ clusterer = clostera.Clusterer(k=256, fastest=True) # K = number of clusters
4242
labels = clusterer.fit_transform(vectors)
4343
```
4444

45-
`fastest=True` turns off OPQ and uses the plain PQ path. That is the right choice when end-to-end throughput matters more than reconstruction quality. The main speed win is in encoder training and encoding; the final PQk-means assignment stage itself is already fast in both modes.
45+
`fastest=True` turns off OPQ and uses the plain PQ path. That is the right choice when end-to-end throughput matters more than reconstruction quality. The main speed win is in encoder training and encoding; the final compressed assignment stage itself is already fast in both modes.
4646

4747
### Out-of-core from parquet
4848

@@ -59,7 +59,7 @@ The original repository proved a powerful idea: by clustering in PQ code space i
5959

6060
`clostera` asks the obvious follow-up question:
6161

62-
> what happens if you rebuild PQk-means properly for modern hardware and modern Python workflows?
62+
> what happens if you rebuild the original `pqkmeans` project properly for modern hardware and modern Python workflows?
6363
6464
On the committed deterministic `10M x 2048` checkpoint, the answer is not subtle.
6565

@@ -524,7 +524,7 @@ The classes below expose the encoder/clusterer split directly. Reach for them wh
524524
| --- | --- | --- | --- |
525525
| `encoder` | `PQEncoder` | `required` | Trained encoder that defines the codebooks. |
526526
| `k` | `int \| None` | `None` | Number of target clusters. Here `K` means the number of clusters. `None` enables Rust-side automatic number-of-clusters selection over candidate values in PQ code space. |
527-
| `iterations` | `int` | `20` | Number of PQk-means update rounds. |
527+
| `iterations` | `int` | `20` | Number of clustering update rounds. |
528528
| `seed` | `int` | `0` | Deterministic seed for cluster-center initialization. |
529529
| `verbose` | `bool` | `False` | Emit inertia diagnostics during fitting. |
530530
| `lookup_table_bytes` | `int` | `1 << 30` | Memory budget for code-domain lookup tables. Larger budgets favor faster assignment. |
@@ -545,10 +545,10 @@ The classes below expose the encoder/clusterer split directly. Reach for them wh
545545
| `num_subquantizers` | `int \| None` | `None` | Optional encoder-side PQ subspace count when `encoder` is omitted. |
546546
| `codebook_size` | `int` | `256` | Optional encoder-side codebook size when `encoder` is omitted. |
547547
| `encoder_iterations` | `int` | `20` | Encoder training iterations used when `encoder` is omitted. |
548-
| `seed` | `int` | `0` | Deterministic seed shared by the implicit encoder and the PQk-means clusterer. |
548+
| `seed` | `int` | `0` | Deterministic seed shared by the implicit encoder and the clusterer. |
549549
| `opq_iterations` | `int` | `3` | OPQ refinement steps used by the implicit encoder. |
550550
| `k` | `int \| None` | `None` | Number of target clusters. Here `K` means the number of clusters. `None` enables Rust-side automatic number-of-clusters selection over candidate values in PQ code space. |
551-
| `iterations` | `int` | `20` | Number of PQk-means update rounds. |
551+
| `iterations` | `int` | `20` | Number of clustering update rounds. |
552552
| `verbose` | `bool` | `False` | Emit inertia diagnostics during fitting. |
553553
| `lookup_table_bytes` | `int` | `1 << 30` | Memory budget for code-domain lookup tables. Larger budgets favor faster assignment. |
554554
| `auto_k_method` | `str` | `"centroid_silhouette"` | Automatic-number-of-clusters (`K`) scoring rule. Supported values are `"centroid_silhouette"`, `"davies_bouldin"`, `"elbow"`, and `"bic"`. |
@@ -579,7 +579,7 @@ When `k=None`, fitting also populates:
579579

580580
| Environment variable | Meaning |
581581
| --- | --- |
582-
| `PQK_ROTATION_BATCH_MIB` | Override the default OPQ rotation batch target in MiB for benchmarking or machine-specific tuning. |
582+
| `CLOSTERA_ROTATION_BATCH_MIB` | Override the default OPQ rotation batch target in MiB for benchmarking or machine-specific tuning. |
583583

584584
## Reproducing the benchmark artifacts
585585

docs/assets/benchmark_hero.png

6.35 KB
Loading

docs/assets/clostera_hero.png

13 KB
Loading
-718 Bytes
Loading

notebooks/clostera_showcase.ipynb

Lines changed: 176 additions & 176 deletions
Large diffs are not rendered by default.

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ build-backend = "maturin"
55
[project]
66
name = "clostera"
77
version = "1.0.0"
8-
description = "Modern Rust implementation of PQk-means for large-scale clustering with numpy and parquet workflows"
8+
description = "Modern Rust rewrite of the original pqkmeans project for large-scale clustering with numpy and parquet workflows"
99
readme = "README.md"
1010
requires-python = ">=3.10"
1111
license = { file = "LICENSE" }

scripts/generate_demo_notebook.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ def build_notebook() -> dict:
5757
markdown_cell(
5858
"""# clostera Tutorial
5959
60-
This notebook is a **hands-on tutorial** for using `clostera`, the Rust implementation of PQk-means. It focuses on the public API and the workflows you are most likely to use in practice:
60+
This notebook is a **hands-on tutorial** for using `clostera`, the Rust rewrite of the original `pqkmeans` project. It focuses on the public API and the workflows you are most likely to use in practice:
6161
6262
1. Use the high-level `Clusterer` API
6363
2. Cluster with a known number of clusters (`K`)
@@ -185,7 +185,7 @@ def build_notebook() -> dict:
185185
markdown_cell(
186186
"""## 4. Need maximum throughput? Use `fastest=True`
187187
188-
`fastest=True` turns off OPQ and uses the plain PQ path. That usually gives the best end-to-end throughput, at the cost of somewhat worse reconstruction quality. The main speed win is in encoder training and encoding, not in the final PQk-means assignment loop itself.
188+
`fastest=True` turns off OPQ and uses the plain PQ path. That usually gives the best end-to-end throughput, at the cost of somewhat worse reconstruction quality. The main speed win is in encoder training and encoding, not in the final compressed assignment loop itself.
189189
"""
190190
),
191191
code_cell(

scripts/render_benchmark_assets.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -615,7 +615,7 @@ def render_hero_asset(args: argparse.Namespace, suite_payload: dict, large_paylo
615615
ax.text(
616616
0.05,
617617
0.695,
618-
"A from-scratch Rust rebuild of PQk-means with deterministic initialization,\n"
618+
"A from-scratch Rust rebuild of the original pqkmeans project with deterministic initialization,\n"
619619
"full-core CPU execution, parquet-native ingestion, out-of-core raw-vector workflows, and automatic K selection.",
620620
fontsize=17,
621621
color=phosphor_dim,
@@ -968,7 +968,7 @@ def toy_visualization(output_path: Path) -> None:
968968
axes[0].scatter(vectors[:, 0], vectors[:, 1], c=truth, s=10, cmap=cmap, alpha=0.8)
969969
axes[0].set_title("Ground truth clusters")
970970
axes[1].scatter(vectors[:, 0], vectors[:, 1], c=predicted, s=10, cmap=cmap, alpha=0.8)
971-
axes[1].set_title("PQKMeans prediction")
971+
axes[1].set_title("clostera prediction")
972972
for axis in axes:
973973
axis.set_xlabel("x0")
974974
axis.set_ylabel("x1")

scripts/render_sexy_hero.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,10 +53,10 @@ def render_hero():
5353
ax.imshow(Z, extent=(0, 1, 0, 1), cmap="Blues", alpha=0.1, aspect="auto", zorder=0)
5454

5555
# --- Header Area ---
56-
ax.text(0.05, 0.88, "PQK ENGINE", color=THEME["blue"], fontsize=12, weight="bold")
56+
ax.text(0.05, 0.88, "CLOSTERA ENGINE", color=THEME["blue"], fontsize=12, weight="bold")
5757
ax.text(0.05, 0.74, "Billion-Scale Clustering\nOn Your Laptop",
5858
color=THEME["text"], fontsize=36, weight="bold", linespacing=1.1, va="top")
59-
ax.text(0.05, 0.44, "The high-performance Rust rebuild of PQk-means.\nOptimized for 2026 hardware. Engineered for scale.",
59+
ax.text(0.05, 0.44, "The high-performance Rust rebuild of the original pqkmeans project.\nOptimized for 2026 hardware. Engineered for scale.",
6060
color=THEME["subtext"], fontsize=14, va="top", linespacing=1.4)
6161

6262
# Tech Stack Badges

scripts/run_impl_eval.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -92,9 +92,9 @@ def apply_thread_settings(args: argparse.Namespace) -> dict[str, int | None]:
9292
os.environ.pop("RAYON_NUM_THREADS", None)
9393

9494
if args.rotation_batch_mib > 0:
95-
os.environ["PQK_ROTATION_BATCH_MIB"] = str(args.rotation_batch_mib)
95+
os.environ["CLOSTERA_ROTATION_BATCH_MIB"] = str(args.rotation_batch_mib)
9696
else:
97-
os.environ.pop("PQK_ROTATION_BATCH_MIB", None)
97+
os.environ.pop("CLOSTERA_ROTATION_BATCH_MIB", None)
9898

9999
return {
100100
"blas_threads": args.blas_threads if args.blas_threads > 0 else None,

0 commit comments

Comments
 (0)