Skip to content

Commit e9a9c39

Browse files
committed
chore: bump version to 1.0.0 for release
1 parent 4467eda commit e9a9c39

6 files changed

Lines changed: 505 additions & 7 deletions

File tree

CHANGELOG.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
88
## [1.0.0] - 2025-12-02
99

1010
### Summary
11-
First official production release. Features a hybrid Python/Rust architecture for sub-millisecond semantic caching.
11+
First official production release. Achieves < 1ms lookup latency and significant memory reduction via Rust backend.
1212

1313
### Known Issues
14-
- **Full Cache Load Time:** Loading a 1M-entry cache takes ~300-350ms total. While the binary index is search-ready in <10ms, Python pickle deserialization of response objects adds ~300ms overhead. This is a known limitation and will be addressed in v1.1+.
14+
- **Full Cache Load Time:** Loading a 1M-entry cache takes ~300-350ms total. While the binary index is search-ready in <10ms, Python pickle deserialization of response objects adds ~300ms overhead. This is a known limitation of the hybrid Python/Rust architecture and will be addressed in a future release.
1515

1616
### Added
1717
- **Rust Storage Backend:** Core storage moved to Rust (`RustCacheStorage`), reducing memory usage to 44 bytes per entry (codes + metadata).
@@ -21,7 +21,7 @@ First official production release. Features a hybrid Python/Rust architecture fo
2121
- **Migration Tool:** CLI tool to migrate legacy v2 caches to v3 format (`binary-semantic-cache migrate`).
2222

2323
### Changed
24-
- **Performance:** Lookup latency for 100k entries reduced from 1.14ms to 0.16ms (7x speedup).
24+
- **Performance:** Lookup latency for 100k entries reduced from 1.14ms to 0.41ms (2.8x speedup).
2525
- **Memory:** Total memory footprint per entry reduced from ~119 bytes to ~52 bytes.
2626
- **Dependency:** Added `maturin` build system for Rust extensions.
2727

@@ -30,7 +30,8 @@ First official production release. Features a hybrid Python/Rust architecture fo
3030
- **Windows Compatibility:** Fixed Unicode encoding issues in benchmark reporting.
3131

3232
## [0.2.1] - 2025-12-02 (Beta)
33-
- *Pre-release version. All changes merged into 1.0.0.*
33+
*(Superseded by 1.0.0)*
34+
3435

3536
## [0.1.0] - 2025-11-25
3637

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -156,8 +156,8 @@ Persistence is handled by a split-file strategy ensuring fast loading regardless
156156
| Parameter | Default | Description |
157157
| :--- | :--- | :--- |
158158
| `max_entries` | `1000` | Maximum items before LRU eviction. |
159-
| `similarity_threshold` | `0.95` | Cosine similarity threshold (0.0-1.0). Lower = more hits, higher = precise. |
160-
| `code_bits` | `256` | Size of binary hash. Fixed at 256 for v0.2.0. |
159+
| `similarity_threshold` | `0.80` | Cosine similarity threshold (0.0-1.0). Lower = more hits, higher = precise. |
160+
| `code_bits` | `256` | Size of binary hash. Fixed at 256 for v1.0.0. |
161161
| `storage_mode` | `"memory"` | Currently memory-only (with disk persistence). |
162162

163163
---

docs/API_STABILITY_V1.md

Lines changed: 224 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,224 @@
1+
# API Stability Guarantees (v1.0)
2+
3+
**Date:** 2025-12-02
4+
**Version:** 1.0.0
5+
**Status:** OFFICIAL
6+
7+
This document defines the API stability guarantees for `binary_semantic_cache` v1.x releases. It categorizes all public APIs into three tiers: **Stable**, **Deprecated**, and **Unstable/Internal**.
8+
9+
---
10+
11+
## Stability Tiers
12+
13+
| Tier | Meaning |
14+
| :--- | :--- |
15+
| **Stable** | Will not have breaking changes in any v1.x release. Safe to depend on. |
16+
| **Deprecated** | Will be removed in v2.0.0. Use the documented replacement. |
17+
| **Unstable/Internal** | May change or be removed in any release without notice. Do not depend on. |
18+
19+
---
20+
21+
## 1. Stable APIs (Will Not Break in v1.x)
22+
23+
### 1.1 `BinarySemanticCache`
24+
25+
The primary cache class. All methods and properties listed below are stable.
26+
27+
| Member | Signature | Notes |
28+
| :--- | :--- | :--- |
29+
| `__init__` | `(encoder, max_entries=100000, similarity_threshold=0.80)` | Constructor |
30+
| `get` | `(embedding: np.ndarray) -> Optional[CacheEntry]` | Lookup by similarity |
31+
| `put` | `(embedding: np.ndarray, response: Any, store_embedding: bool = False) -> int` | Store entry, returns index |
32+
| `delete` | `(entry_id: int) -> bool` | Delete by index |
33+
| `clear` | `() -> None` | Remove all entries |
34+
| `stats` | `() -> CacheStats` | Get statistics |
35+
| `memory_bytes` | `() -> int` | Estimate memory usage |
36+
| `get_all_entries` | `() -> List[CacheEntry]` | Get all entries |
37+
| `save_mmap_v3` | `(path: str) -> None` | Save to v3 format |
38+
| `load_mmap_v3` | `(path: str, skip_checksum: bool = False) -> None` | Load from v3 format |
39+
| `__len__` | `() -> int` | Number of entries |
40+
| `__repr__` | `() -> str` | String representation |
41+
| `encoder` | `property -> Encoder` | Get encoder instance |
42+
| `max_entries` | `property -> int` | Maximum capacity |
43+
| `similarity_threshold` | `property -> float` | Hit threshold |
44+
45+
### 1.2 `CacheEntry`
46+
47+
Immutable result object returned by `get()`. All fields are stable.
48+
49+
| Field | Type | Notes |
50+
| :--- | :--- | :--- |
51+
| `id` | `int` | Entry index |
52+
| `code` | `np.ndarray` | Binary code (uint64) |
53+
| `response` | `Any` | Cached response object |
54+
| `created_at` | `float` | Unix timestamp (creation) |
55+
| `last_accessed` | `float` | Unix timestamp (last access) |
56+
| `access_count` | `int` | Number of accesses |
57+
| `similarity` | `float` | Similarity score (default 1.0) |
58+
59+
### 1.3 `CacheStats`
60+
61+
Statistics dataclass returned by `stats()`. All fields and properties are stable.
62+
63+
| Member | Type | Notes |
64+
| :--- | :--- | :--- |
65+
| `size` | `int` | Current entry count |
66+
| `max_size` | `int` | Maximum capacity |
67+
| `hits` | `int` | Total cache hits |
68+
| `misses` | `int` | Total cache misses |
69+
| `evictions` | `int` | Total evictions |
70+
| `memory_bytes` | `int` | Estimated memory usage |
71+
| `hit_rate` | `property -> float` | hits / (hits + misses) |
72+
| `memory_mb` | `property -> float` | memory_bytes / 1MB |
73+
74+
### 1.4 `RustBinaryEncoder`
75+
76+
The production encoder (Rust backend). All methods and properties listed below are stable.
77+
78+
| Member | Signature | Notes |
79+
| :--- | :--- | :--- |
80+
| `__init__` | `(embedding_dim: int, code_bits: int = 256, seed: int = 42)` | Constructor |
81+
| `encode` | `(embedding: np.ndarray) -> np.ndarray` | Encode single/batch |
82+
| `embedding_dim` | `property -> int` | Input dimension |
83+
| `code_bits` | `property -> int` | Output bits (256) |
84+
| `n_words` | `property -> int` | Number of uint64 words |
85+
86+
### 1.5 `PythonBinaryEncoder` (Test Oracle)
87+
88+
The Python encoder retained for testing. Same interface as `RustBinaryEncoder`.
89+
90+
**Note:** This class is stable for testing purposes only. Production code should use `RustBinaryEncoder`.
91+
92+
### 1.6 Exception Classes
93+
94+
All exception classes in the error hierarchy are stable.
95+
96+
| Exception | Base | Purpose |
97+
| :--- | :--- | :--- |
98+
| `CacheError` | `Exception` | Base class for all cache errors |
99+
| `ChecksumError` | `CacheError` | SHA-256 checksum mismatch |
100+
| `FormatVersionError` | `CacheError` | Unsupported persistence format |
101+
| `CorruptFileError` | `CacheError` | Invalid or truncated cache file |
102+
| `UnsupportedPlatformError` | `CacheError` | Platform incompatibility (e.g., endianness) |
103+
104+
### 1.7 Utility Functions
105+
106+
| Function | Signature | Notes |
107+
| :--- | :--- | :--- |
108+
| `detect_format_version` | `(path: str) -> int` | Returns 2 (v2) or 3 (v3) |
109+
110+
### 1.8 Constants
111+
112+
| Constant | Value | Notes |
113+
| :--- | :--- | :--- |
114+
| `DEFAULT_MAX_ENTRIES` | `100_000` | Default cache capacity |
115+
| `DEFAULT_THRESHOLD` | `0.80` | Default similarity threshold |
116+
| `DEFAULT_CODE_BITS` | `256` | Fixed binary code size |
117+
| `MMAP_FORMAT_VERSION` | `2` | v2 format identifier |
118+
| `MMAP_FORMAT_VERSION_V3` | `3` | v3 format identifier |
119+
120+
---
121+
122+
## 2. Deprecated APIs (Will Be Removed in v2.0)
123+
124+
These methods emit `DeprecationWarning` when called. Use the documented replacements.
125+
126+
| Method | Replacement | Removal Version |
127+
| :--- | :--- | :--- |
128+
| `BinarySemanticCache.save(path)` | `save_mmap_v3(path)` | v2.0.0 |
129+
| `BinarySemanticCache.load(path)` | `load_mmap_v3(path)` | v2.0.0 |
130+
| `BinarySemanticCache.save_mmap(path)` | `save_mmap_v3(path)` | v2.0.0 |
131+
| `BinarySemanticCache.load_mmap(path)` | `load_mmap_v3(path)` | v2.0.0 |
132+
133+
**Migration Example:**
134+
135+
```python
136+
# Old (deprecated)
137+
cache.save("cache.npz")
138+
cache.load("cache.npz")
139+
140+
# New (stable)
141+
cache.save_mmap_v3("cache_v3/")
142+
cache.load_mmap_v3("cache_v3/")
143+
```
144+
145+
---
146+
147+
## 3. Unstable/Internal APIs (May Change)
148+
149+
The following are internal implementation details and are **not** part of the public API. They may change or be removed without notice.
150+
151+
### 3.1 Internal Methods (Prefixed with `_`)
152+
153+
All methods starting with `_` are internal:
154+
155+
- `_set_response(idx, response)`
156+
- `_get_response(idx)`
157+
- `_delete_response(idx)`
158+
- `_compute_checksum(data)`
159+
- `_validate_single(embedding)`
160+
- `_validate_batch(embeddings)`
161+
- `_encode_single(embedding)`
162+
- `_encode_batch(embeddings)`
163+
164+
### 3.2 Internal Attributes
165+
166+
- `_encoder`
167+
- `_storage` (RustCacheStorage instance)
168+
- `_responses` (Python list)
169+
- `_lock` (RLock)
170+
- `_hits`, `_misses`, `_evictions`
171+
172+
### 3.3 Rust Internals
173+
174+
The following Rust bindings are internal and may change:
175+
176+
- `RustCacheStorage` (use `BinarySemanticCache` instead)
177+
- `HammingSimilarity` (use `BinarySemanticCache.get()` instead)
178+
- `hamming_distance` (internal utility)
179+
- `rust_version` (informational only)
180+
181+
### 3.4 Protocol Classes
182+
183+
- `EncoderProtocol` — Type hint only, not for subclassing.
184+
185+
### 3.5 File Format Internals
186+
187+
The following constants define the v3 file format. They are stable in terms of format compatibility but should not be used directly:
188+
189+
- `V3_HEADER_FILE`, `V3_ENTRIES_FILE`, `V3_RESPONSES_FILE`
190+
- `V3_ENTRY_SIZE` (44 bytes)
191+
- `EPOCH_2020`
192+
193+
---
194+
195+
## 4. Semantic Contracts (Frozen)
196+
197+
The following semantic behaviors are guaranteed and will not change in v1.x:
198+
199+
| Contract | Definition |
200+
| :--- | :--- |
201+
| **Encoder Determinism** | `RustBinaryEncoder(seed=42)` produces identical codes for identical inputs across all v1.x releases. |
202+
| **Threshold Semantics** | `HIT` if and only if `similarity >= threshold`. |
203+
| **Similarity Formula** | `similarity = 1.0 - (hamming_distance / code_bits)` |
204+
| **LRU Eviction** | When `len(cache) >= max_entries`, the least-recently-used entry is evicted. |
205+
206+
---
207+
208+
## 5. Breaking Change Policy
209+
210+
For v1.x releases:
211+
212+
1. **Stable APIs** will not have breaking changes.
213+
2. **Deprecated APIs** will continue to work but emit warnings.
214+
3. **Unstable APIs** may change at any time.
215+
216+
For v2.0.0:
217+
218+
1. **Deprecated APIs** will be removed.
219+
2. **Stable APIs** may have breaking changes (with migration guide).
220+
221+
---
222+
223+
*This document is the authoritative source for API stability in v1.x.*
224+

docs/KNOWN_LIMITATIONS_V1.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Known Limitations (v1.0)
2+
3+
This document outlines the known limitations, constraints, and trade-offs of the `binary_semantic_cache` v1.0 release.
4+
5+
## 1. Performance Limitations
6+
7+
### 1.1 Full Cache Load Time
8+
While the *index* (binary codes) loads instantly (<10ms for 1M entries), the *response objects* are stored in a Python pickle file. Loading 1 million response objects takes approximately **300ms**. This is an O(n) operation dominated by Python's unpickling overhead.
9+
10+
### 1.2 Linear Scan Scaling
11+
The cache uses a linear scan (O(N)) search. This allows for zero-index-build-time and perfect accuracy relative to the quantized codes, but it scales linearly.
12+
- **N < 100k:** Sub-millisecond (0.16ms).
13+
- **N = 1M:** ~1.6ms.
14+
- **N > 1M:** Performance will degrade linearly. Use cases requiring >1M entries should consider sharding or wait for Phase 3 (HNSW).
15+
16+
### 1.3 Concurrency
17+
The system uses a global `RLock` to serialize writes to ensure thread safety between the Python LRU and Rust storage.
18+
- **Reads:** Fast and release the GIL in Rust.
19+
- **Writes:** Serialized. High write contention may cause blocking.
20+
21+
## 2. Memory Limitations
22+
23+
### 2.1 In-Memory Storage
24+
The entire index and all response objects must fit in RAM. There is no disk-based serving mode.
25+
- **Index Overhead:** Fixed at ~52 bytes per entry.
26+
- **Response Overhead:** Depends entirely on your data size (e.g., large JSON strings).
27+
28+
### 2.2 Memory Estimation Accuracy
29+
The `cache.memory_usage()` method reports the exact memory used by the Rust index and Python internal structures. It **does not** (and cannot accurately) account for the memory consumed by the actual response string objects, as Python's object overhead varies. Always provision extra RAM.
30+
31+
## 3. Correctness & Semantics
32+
33+
### 3.1 Delete Behavior
34+
The `delete(key)` method immediately removes the entry from the Python lookup table (making it inaccessible). However, the underlying slot in the Rust vector is **not freed** immediately. It remains "orphaned" until the LRU mechanism eventually recycles that slot for a new entry. This is a design choice to avoid O(N) shifts in the compact Rust vector.
35+
36+
### 3.2 Timestamp Resolution
37+
LRU timestamps are stored as `u32` seconds since the epoch (2020-01-01). Access patterns occurring within the same second may not strictly preserve LRU order. This is considered acceptable for a semantic cache where "approximate LRU" is sufficient.
38+
39+
### 3.3 Quantization Drift
40+
Binary quantization introduces a small amount of information loss. The Hamming distance is a proxy for Cosine Similarity.
41+
- **Implication:** A threshold of `0.80` in binary space is not mathematically identical to `0.80` in float space. Users must tune thresholds empirically (see `THRESHOLD_TUNING_GUIDE.md`).
42+
43+
## 4. Platform & Distribution
44+
45+
### 4.1 Build Requirements
46+
v1.0 requires a **Rust toolchain** (Cargo) to be installed to build the package from source. There are currently no pre-built binary wheels on PyPI.
47+
48+
### 4.2 Windows Unicode
49+
While Unicode path handling was improved in v0.2.0, extensive testing on non-English Windows locales has not been performed in CI.
50+
51+
### 4.3 Pickle Security
52+
The `responses.pkl` file uses Python's `pickle` module. **Do not load cache files from untrusted sources.** Pickle deserialization can execute arbitrary code. Only load caches that you or your organization created.
53+

0 commit comments

Comments
 (0)