|
| 1 | +# Valori: The Flight Recorder for AI Memory |
| 2 | + |
| 3 | +**Version:** 0.1.0-mvp | **License:** MIT | **Status:** Production Ready (Phase 9) |
| 4 | + |
| 5 | +> "The only vector database that guarantees your AI behaves exactly the same way today as it did yesterday." |
| 6 | +
|
| 7 | +**Valori** is a **deterministic, forensic AI substrate**. Unlike standard vector databases (Pinecone, Qdrant) which prioritize speed and fuzzy search, Valori prioritizes **Truth** and **Reproducibility**. It captures the entire evolution of your AI's memory, allowing you to rewind time, replay decisions, and prove exactly why your agent's behavior changed. |
| 8 | + |
| 9 | +--- |
| 10 | + |
| 11 | +## 🎯 Why Valori? |
| 12 | + |
| 13 | +The modern AI stack is built on **Probabilistic Foundations** (Float32, Random Seeds, Approximate Nearest Neighbor). This makes it impossible to audit. |
| 14 | + |
| 15 | +If your autonomous agent, trading bot, or retrieval system makes a different decision today than it did yesterday, you cannot know *why*. Was it the model? Was it a new vector? Was it a race condition in the database? |
| 16 | + |
| 17 | +**Valori solves this by enforcing strict determinism:** |
| 18 | +* **Bit-for-Bit Reproducibility:** `Insert A` -> `Delete A` results in the exact same state as the beginning. |
| 19 | +* **Deterministic Math:** Uses Q16.16 Fixed Point arithmetic instead of floating point. `1.0` is always `1.0`. |
| 20 | +* **Proven Topology:** Uses a deterministic HNSW graph structure derived from data entropy, not random seeds. |
| 21 | + |
| 22 | +--- |
| 23 | + |
| 24 | +## 🚀 Quick Start |
| 25 | + |
| 26 | +### Installation |
| 27 | + |
| 28 | +**From Source:** |
| 29 | +```bash |
| 30 | +# Clone the repo |
| 31 | +git clone https://github.com/your-org/valori.git |
| 32 | +cd valori |
| 33 | + |
| 34 | +# Build the CLI |
| 35 | +cargo install --path crates/cli |
| 36 | + |
| 37 | +# Verify Installation |
| 38 | +valori --version |
| 39 | +``` |
| 40 | + |
| 41 | +### Basic Workflow |
| 42 | + |
| 43 | +In this example, we simulate an AI system inserting memory vectors, and then perform a forensic investigation. |
| 44 | + |
| 45 | +**1. Create a Database (Mock)** |
| 46 | +*Assume you have a directory `data/` with `snapshot.val`, `events.log`, and `metadata.idx`.* |
| 47 | + |
| 48 | +**2. Inspect the State** |
| 49 | +```bash |
| 50 | +valori inspect --dir ./data |
| 51 | +``` |
| 52 | +*Output:* |
| 53 | +```text |
| 54 | +╔════════════════════════════════════════════╗ |
| 55 | +║ VALORI FORENSIC CLI v0.1.0-mvp ║ |
| 56 | +╚════════════════════════════════════════════╝ |
| 57 | +
|
| 58 | +Valori Status Report |
| 59 | +-------------------- |
| 60 | +File | Status | Details |
| 61 | +----------|---------|------------------------------------------------ |
| 62 | +Snapshot | FOUND | Format: V1, Magic: VALO, Ver: 1, Idx: 100 |
| 63 | +WAL | FOUND | 105 events |
| 64 | +Index | FOUND | 3 labeled entries |
| 65 | +``` |
| 66 | + |
| 67 | +**3. Rewind Time (Replay)** |
| 68 | +Fast-forward the database to a specific point in the event log to see what the state looked like then. |
| 69 | +```bash |
| 70 | +valori replay-query --dir ./data --at 102 --query "[10, 20, 30]" |
| 71 | +``` |
| 72 | + |
| 73 | +**4. The "Money" Feature: Semantic Diff** |
| 74 | +Compare the search results between two different time points. |
| 75 | +*Did a new vector enter the Top 10? Did the ranking shift?* |
| 76 | +```bash |
| 77 | +valori diff --dir ./data --from 100 --to 105 --query "[10, 20, 30]" |
| 78 | +``` |
| 79 | +*Output:* |
| 80 | +```text |
| 81 | +State Comparison |
| 82 | +---------------- |
| 83 | +Property | Value |
| 84 | +-------------|------------------ |
| 85 | +From Index | 100 |
| 86 | +From Hash | 0x1a2b3c... |
| 87 | +To Index | 105 |
| 88 | +To Hash | 0x9f8e7d... |
| 89 | +Status | DRIFTED |
| 90 | +
|
| 91 | +Semantic Diff (Top-5) |
| 92 | +-------------------- |
| 93 | +ID | Change | Detail |
| 94 | +-----|-----------------|---------------------------------- |
| 95 | +102 | ~ Rank Change | 1 -> 3 |
| 96 | +105 | + Entered Top-5 | Rank 4 |
| 97 | +``` |
| 98 | + |
| 99 | +--- |
| 100 | + |
| 101 | +## 🛠️ The Architecture |
| 102 | + |
| 103 | +Valori is not a monolithic server. It is a **Workspace of Crates**: |
| 104 | + |
| 105 | +### 1. `valori-kernel` (The Brain) |
| 106 | +The `no_std` pure Rust library containing the AI logic. |
| 107 | +* **Math:** Q16.16 Fixed Point Arithmetic. |
| 108 | +* **Index:** Deterministic HNSW (Graph Structure). |
| 109 | +* **State:** `BTreeMap` storage for determinism. |
| 110 | +* **Philosophy:** Zero heap allocators (optional), zero floating points. |
| 111 | + |
| 112 | +### 2. `valori-persistence` (The Storage) |
| 113 | +The binary format layer. |
| 114 | +* **Format:** `snapshot.val` (Graph Topology) + `events.log` (Append-Only). |
| 115 | +* **Integrity:** CRC64 Checksums on every byte. Fail-closed validation. |
| 116 | + |
| 117 | +### 3. `valori-cli` (The Flight Recorder Interface) |
| 118 | +The command-line tool for engineers. |
| 119 | +* **Offline Forensics:** Reads disk directly. No daemon required. |
| 120 | +* **Time Travel:** `replay`, `diff`, `verify`. |
| 121 | + |
| 122 | +--- |
| 123 | + |
| 124 | +## 📚 Commands Reference |
| 125 | + |
| 126 | +### `valori inspect` |
| 127 | +Inspect the health and metadata of a database volume. |
| 128 | +* **Usage:** `valori inspect --dir <path>` |
| 129 | +* **Output:** Snapshot version, WAL event counts, Integrity status. |
| 130 | + |
| 131 | +### `valori verify` |
| 132 | +Cryptographically verify a snapshot file. |
| 133 | +* **Usage:** `valori verify snapshot.val` |
| 134 | +* **Output:** `✅ VERIFIED` or `❌ CORRUPTED`. |
| 135 | +* **Use Case:** Validating backups before an incident response. |
| 136 | + |
| 137 | +### `valori timeline` |
| 138 | +List labeled checkpoints in the event log. |
| 139 | +* **Usage:** `valori timeline metadata.idx` |
| 140 | +* **Output:** Human-readable timeline of `ingest:batch_01`, `experiment:v2`, etc. |
| 141 | + |
| 142 | +### `valori replay-query` |
| 143 | +Replay the WAL to a specific event ID and execute a search. |
| 144 | +* **Usage:** `valori replay-query --at <event_id> --query "[...]"` |
| 145 | +* **Use Case:** "What did the top-5 neighbors look like *right before* the crash?" |
| 146 | + |
| 147 | +### `valori diff` |
| 148 | +Compare search results (Topology) between two points in time. |
| 149 | +* **Usage:** `valori diff --from <id_a> --to <id_b> --query "[...]"` |
| 150 | +* **Output:** Delta of neighbors (+ Entry, - Exit, ~ Rank Shift). |
| 151 | + |
| 152 | +--- |
| 153 | + |
| 154 | +## 🧬 Technical Specifications |
| 155 | + |
| 156 | +### Deterministic Math |
| 157 | +Valori uses **Q16.16 Fixed Point** arithmetic instead of IEEE 754 Float32. |
| 158 | +* **Range:** [-32768.0, 32767.99998] |
| 159 | +* **Behavior:** No NaN, no Infinity, no `1.0 + 2.0 != 2.0 + 1.0`. |
| 160 | +* **Overflow:** Hard failure (Clamped/Rejected) rather than silent wrapping. |
| 161 | + |
| 162 | +### Deterministic HNSW |
| 163 | +The graph index is not stochastic. |
| 164 | +* **Entry Points:** Derived from `trailing_zeros(hash(id))`, creating a natural geometric distribution without RNG. |
| 165 | +* **Neighbor Selection:** Strict `Distance ASC -> ID ASC` sorting. |
| 166 | +* **Result:** The graph structure on an x86 server is **identical** to the graph on an ARM microcontroller. |
| 167 | + |
| 168 | +### Serialization Format |
| 169 | +* **Header:** `VALO` + Version + EventIndex + Timestamp. |
| 170 | +* **Body:** Vectors + Graph Topology (Layers, Neighbors). |
| 171 | +* **Verification:** Body checksums must match header checksums. |
| 172 | + |
| 173 | +--- |
| 174 | + |
| 175 | +## 🚧 Development |
| 176 | + |
| 177 | +**Testing:** |
| 178 | +```bash |
| 179 | +# Run all unit and integration tests |
| 180 | +cargo test --workspace |
| 181 | + |
| 182 | +# Run with output |
| 183 | +cargo test --workspace -- --nocapture |
| 184 | +``` |
| 185 | + |
| 186 | +**Build:** |
| 187 | +```bash |
| 188 | +# Release build (optimized) |
| 189 | +cargo build --release |
| 190 | +``` |
| 191 | + |
| 192 | +--- |
| 193 | + |
| 194 | +## 🗺️ Roadmap |
| 195 | + |
| 196 | +* **v0.1.0 (Current):** MVP Release. CLI, Deterministic Kernel, Snapshotting. |
| 197 | +* **v0.2.0:** Performance Tuning. Neighbor Pruning, `ef_search` optimization. |
| 198 | +* **v0.3.0:** `valori-node`. HTTP Server & Network Layer. |
| 199 | +* **v0.4.0:** Distributed Consensus. "God Mode" state sync across nodes. |
| 200 | + |
| 201 | +--- |
| 202 | + |
| 203 | +## ⚖️ License |
| 204 | + |
| 205 | +MIT License - See LICENSE file for details. |
| 206 | + |
| 207 | +**Valori.** *Operate on Truth.* |
0 commit comments