Skip to content

Commit 2154b4f

Browse files
committed
docs: Add forensic toolset documentation
1 parent 1b69fc6 commit 2154b4f

1 file changed

Lines changed: 207 additions & 0 deletions

File tree

crates/README.md

Lines changed: 207 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,207 @@
1+
# Valori: The Flight Recorder for AI Memory
2+
3+
**Version:** 0.1.0-mvp | **License:** MIT | **Status:** Production Ready (Phase 9)
4+
5+
> "The only vector database that guarantees your AI behaves exactly the same way today as it did yesterday."
6+
7+
**Valori** is a **deterministic, forensic AI substrate**. Unlike standard vector databases (Pinecone, Qdrant) which prioritize speed and fuzzy search, Valori prioritizes **Truth** and **Reproducibility**. It captures the entire evolution of your AI's memory, allowing you to rewind time, replay decisions, and prove exactly why your agent's behavior changed.
8+
9+
---
10+
11+
## 🎯 Why Valori?
12+
13+
The modern AI stack is built on **Probabilistic Foundations** (Float32, Random Seeds, Approximate Nearest Neighbor). This makes it impossible to audit.
14+
15+
If your autonomous agent, trading bot, or retrieval system makes a different decision today than it did yesterday, you cannot know *why*. Was it the model? Was it a new vector? Was it a race condition in the database?
16+
17+
**Valori solves this by enforcing strict determinism:**
18+
* **Bit-for-Bit Reproducibility:** `Insert A` -> `Delete A` results in the exact same state as the beginning.
19+
* **Deterministic Math:** Uses Q16.16 Fixed Point arithmetic instead of floating point. `1.0` is always `1.0`.
20+
* **Proven Topology:** Uses a deterministic HNSW graph structure derived from data entropy, not random seeds.
21+
22+
---
23+
24+
## 🚀 Quick Start
25+
26+
### Installation
27+
28+
**From Source:**
29+
```bash
30+
# Clone the repo
31+
git clone https://github.com/your-org/valori.git
32+
cd valori
33+
34+
# Build the CLI
35+
cargo install --path crates/cli
36+
37+
# Verify Installation
38+
valori --version
39+
```
40+
41+
### Basic Workflow
42+
43+
In this example, we simulate an AI system inserting memory vectors, and then perform a forensic investigation.
44+
45+
**1. Create a Database (Mock)**
46+
*Assume you have a directory `data/` with `snapshot.val`, `events.log`, and `metadata.idx`.*
47+
48+
**2. Inspect the State**
49+
```bash
50+
valori inspect --dir ./data
51+
```
52+
*Output:*
53+
```text
54+
╔════════════════════════════════════════════╗
55+
║ VALORI FORENSIC CLI v0.1.0-mvp ║
56+
╚════════════════════════════════════════════╝
57+
58+
Valori Status Report
59+
--------------------
60+
File | Status | Details
61+
----------|---------|------------------------------------------------
62+
Snapshot | FOUND | Format: V1, Magic: VALO, Ver: 1, Idx: 100
63+
WAL | FOUND | 105 events
64+
Index | FOUND | 3 labeled entries
65+
```
66+
67+
**3. Rewind Time (Replay)**
68+
Fast-forward the database to a specific point in the event log to see what the state looked like then.
69+
```bash
70+
valori replay-query --dir ./data --at 102 --query "[10, 20, 30]"
71+
```
72+
73+
**4. The "Money" Feature: Semantic Diff**
74+
Compare the search results between two different time points.
75+
*Did a new vector enter the Top 10? Did the ranking shift?*
76+
```bash
77+
valori diff --dir ./data --from 100 --to 105 --query "[10, 20, 30]"
78+
```
79+
*Output:*
80+
```text
81+
State Comparison
82+
----------------
83+
Property | Value
84+
-------------|------------------
85+
From Index | 100
86+
From Hash | 0x1a2b3c...
87+
To Index | 105
88+
To Hash | 0x9f8e7d...
89+
Status | DRIFTED
90+
91+
Semantic Diff (Top-5)
92+
--------------------
93+
ID | Change | Detail
94+
-----|-----------------|----------------------------------
95+
102 | ~ Rank Change | 1 -> 3
96+
105 | + Entered Top-5 | Rank 4
97+
```
98+
99+
---
100+
101+
## 🛠️ The Architecture
102+
103+
Valori is not a monolithic server. It is a **Workspace of Crates**:
104+
105+
### 1. `valori-kernel` (The Brain)
106+
The `no_std` pure Rust library containing the AI logic.
107+
* **Math:** Q16.16 Fixed Point Arithmetic.
108+
* **Index:** Deterministic HNSW (Graph Structure).
109+
* **State:** `BTreeMap` storage for determinism.
110+
* **Philosophy:** Zero heap allocators (optional), zero floating points.
111+
112+
### 2. `valori-persistence` (The Storage)
113+
The binary format layer.
114+
* **Format:** `snapshot.val` (Graph Topology) + `events.log` (Append-Only).
115+
* **Integrity:** CRC64 Checksums on every byte. Fail-closed validation.
116+
117+
### 3. `valori-cli` (The Flight Recorder Interface)
118+
The command-line tool for engineers.
119+
* **Offline Forensics:** Reads disk directly. No daemon required.
120+
* **Time Travel:** `replay`, `diff`, `verify`.
121+
122+
---
123+
124+
## 📚 Commands Reference
125+
126+
### `valori inspect`
127+
Inspect the health and metadata of a database volume.
128+
* **Usage:** `valori inspect --dir <path>`
129+
* **Output:** Snapshot version, WAL event counts, Integrity status.
130+
131+
### `valori verify`
132+
Cryptographically verify a snapshot file.
133+
* **Usage:** `valori verify snapshot.val`
134+
* **Output:** `✅ VERIFIED` or `❌ CORRUPTED`.
135+
* **Use Case:** Validating backups before an incident response.
136+
137+
### `valori timeline`
138+
List labeled checkpoints in the event log.
139+
* **Usage:** `valori timeline metadata.idx`
140+
* **Output:** Human-readable timeline of `ingest:batch_01`, `experiment:v2`, etc.
141+
142+
### `valori replay-query`
143+
Replay the WAL to a specific event ID and execute a search.
144+
* **Usage:** `valori replay-query --at <event_id> --query "[...]"`
145+
* **Use Case:** "What did the top-5 neighbors look like *right before* the crash?"
146+
147+
### `valori diff`
148+
Compare search results (Topology) between two points in time.
149+
* **Usage:** `valori diff --from <id_a> --to <id_b> --query "[...]"`
150+
* **Output:** Delta of neighbors (+ Entry, - Exit, ~ Rank Shift).
151+
152+
---
153+
154+
## 🧬 Technical Specifications
155+
156+
### Deterministic Math
157+
Valori uses **Q16.16 Fixed Point** arithmetic instead of IEEE 754 Float32.
158+
* **Range:** [-32768.0, 32767.99998]
159+
* **Behavior:** No NaN, no Infinity, no `1.0 + 2.0 != 2.0 + 1.0`.
160+
* **Overflow:** Hard failure (Clamped/Rejected) rather than silent wrapping.
161+
162+
### Deterministic HNSW
163+
The graph index is not stochastic.
164+
* **Entry Points:** Derived from `trailing_zeros(hash(id))`, creating a natural geometric distribution without RNG.
165+
* **Neighbor Selection:** Strict `Distance ASC -> ID ASC` sorting.
166+
* **Result:** The graph structure on an x86 server is **identical** to the graph on an ARM microcontroller.
167+
168+
### Serialization Format
169+
* **Header:** `VALO` + Version + EventIndex + Timestamp.
170+
* **Body:** Vectors + Graph Topology (Layers, Neighbors).
171+
* **Verification:** Body checksums must match header checksums.
172+
173+
---
174+
175+
## 🚧 Development
176+
177+
**Testing:**
178+
```bash
179+
# Run all unit and integration tests
180+
cargo test --workspace
181+
182+
# Run with output
183+
cargo test --workspace -- --nocapture
184+
```
185+
186+
**Build:**
187+
```bash
188+
# Release build (optimized)
189+
cargo build --release
190+
```
191+
192+
---
193+
194+
## 🗺️ Roadmap
195+
196+
* **v0.1.0 (Current):** MVP Release. CLI, Deterministic Kernel, Snapshotting.
197+
* **v0.2.0:** Performance Tuning. Neighbor Pruning, `ef_search` optimization.
198+
* **v0.3.0:** `valori-node`. HTTP Server & Network Layer.
199+
* **v0.4.0:** Distributed Consensus. "God Mode" state sync across nodes.
200+
201+
---
202+
203+
## ⚖️ License
204+
205+
MIT License - See LICENSE file for details.
206+
207+
**Valori.** *Operate on Truth.*

0 commit comments

Comments
 (0)