You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+32Lines changed: 32 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,6 +18,38 @@ This is not a thin wrapper around old code. It is a modern rewrite with a new Ru
18
18
pip install clostera
19
19
```
20
20
21
+
<details>
22
+
<summary>Why billion-scale clustering?</summary>
23
+
24
+
The short answer is that it is genuinely useful. If you work with embeddings, recommendations, retrieval, representation learning, semantic search, or large behavioral datasets, clustering at very large scale is not academic theater. It is operationally important.
25
+
26
+
But for 🦋 `clostera`, that is only part of the story.
27
+
28
+
The deeper reason is historical and conceptual. The extreme efficiency and mathematical elegance of the original [`pqkmeans`](https://github.com/DwangoMediaVillage/pqkmeans) algorithm indirectly helped inspire the development of [EMDE](https://arxiv.org/abs/2006.01894), and later a much stronger internal family of TREMDE algorithms. Together with internal proprietary evolutions of 🦋 [Cleora](https://github.com/BaseModelAI/cleora), those ideas form a major part of the conceptual foundation behind Synerise's flagship product, [BaseModel.AI](https://basemodel.ai), developed by [Synerise](https://synerise.com).
29
+
30
+
That is why this rewrite exists. The original project mattered. It influenced real systems, real products, and real lines of research. Left unmaintained, it deserved a modern successor: faster, cleaner, easier to install, easier to use, and built for current hardware instead of the past.
31
+
32
+
</details>
33
+
34
+
<details>
35
+
<summary>Origins of the Clostera name</summary>
36
+
37
+
At Synerise, we have a tradition of finding algorithmic inspiration in the natural world, specifically, the quiet, hyper-efficient mechanics of the moth.
38
+
39
+
Just as we look to 🦋 [Cleora](https://github.com/BaseModelAI/cleora) to capture the geometry and distance calculations of our hyperspherical embeddings, we turned to the **🦋 Clostera** moth to represent the colossal mechanics of billion-scale clustering.
40
+
41
+
In taxonomy, *🦋 Clostera* is a genus of prominent moths known for their robust build and rapid flight. But the true magic lies in the origin of the name. Derived from the ancient Greek word *klostir* (κλωστήρ), "🦋 Clostera" literally translates to **the spindle**.
42
+
43
+
A spindle's sole purpose is to take raw, chaotic, disconnected fibers and rapidly rotate them, pulling them tightly around a central core to spin them into structured, organized threads.
44
+
45
+
In machine learning, your billion-scale dataset is that chaotic fleece.
46
+
47
+
**🦋 Clostera** is your algorithmic spindle. It acts as a high-speed rotational force, drawing billions of isolated vectors toward a shared center of mass, the centroid. It takes the noise, finds the pattern, and binds your scattered data into structured clusters.
48
+
49
+
Fast, robust, and mathematically grounded. Welcome to the **🦋 Clostera** era.
0 commit comments