Commit a401c0c
committed
Merge torrust#832: Introduce Mudlark — an adaptive dual-tree spatial index
5286152 mudlark: clarify LINKING-EXCEPTION scope to original copyright holders' API (Cameron Garnham)
fbe2975 mudlark: add depth-limited PEWEI extraction (ADR-M-041) (Cameron Garnham)
2b44663 mudlark: introduce generational GNodeId handle (ADR-M-040) (Cameron Garnham)
8a3d996 mudlark: close mutation-testing gaps; rewrite README with doc-tested examples (Cameron Garnham)
ed73f1d mudlark: polish API surface and fix decay depth bug for 1.0.0 (Cameron Garnham)
57639d3 feat(package): mudlark, a φ-bounded geometric-value graph (Cameron Garnham)
Pull request description:
The Magpie-lark is a small Australian bird. Common, unassuming, often spotted fossicking in the mud — often affectionately called a "mudlark". What makes it distinctive is that mated pairs sing together in precisely coordinated duets, each bird contributing its part to a shared song.
That's the metaphor behind this crate's name. Mudlark is built around two trees that work in concert: one handles spatial geometry, the other handles significance ranking. They each have their own form and uses, and when they co-operate, they duet.
---
### Summary
This PR adds mudlark to the workspace: a library crate shipping a single core type, `GvGraph`, that acts as an adaptive spatial index over a bounded domain. You feed it observations — a coordinate and a value — and it figures out where to pay attention. Regions that receive lots of activity earn finer resolution. Quiet regions stay coarse. As the distribution shifts, the structure re-evaluates, promoting what's become important and shedding what's gone cold.
The simplest way to think about it: a histogram that decides its own bin widths, continuously, based on what it's seen.
The spec calls it a **φ-Bounded Geometric-Value Graph** — φ (the golden ratio) because the depth bound follows a Fibonacci recurrence, *bounded* because the node budget is hard-capped, *geometric* for the spatial partition tree, *value* for the significance tournament. `GvGraph` in the code.
Most adaptive structures make one tree do everything — spatial partitioning, importance ranking, rebalancing — all tangled together. Mudlark separates those concerns cleanly. The **G-Tree** (Geometric) is a binary partition of the coordinate space. It knows about intervals, accumulated values, and routing. The **V-Tree** (Value) is a competitive tournament that ranks entries by significance. The two share the same underlying nodes but see them from completely different angles — one spatial, one evaluative. Neither alone is sufficient. Together they produce an index whose resolution tracks the data's actual structure, with formal efficiency guarantees rooted in information theory.
The everyday surface is straightforward: observe into it, query it (point lookups, range sums, O(1) contour projection), sample from it proportionally, extract significance-ordered snapshots, and apply temporal decay so it tracks the present rather than accumulating the past forever. The sampling cost adapts to the distribution — hotspots are cheap, uniform distributions cost more — and the bound is provably within 1.44× of the information-theoretic optimum.
This data structure is general and should find use in future developments.
### Contents
The addition is substantial (~79k lines) but almost entirely self-contained within mudlark.
- **37 source modules** with a clean internal/external boundary (`pub(crate)` for internals, flat re-exports for the public API) and `#![forbid(unsafe_code)]`
- **1 311 tests** — 651 unit, 543 integration, 117 doc-tests — including adversarial stress patterns, a full invariant checker, and two narrative pedagogy tests that walk through every public operation step by step
- **143 Criterion benchmarks** across six families with published scaling exponents
- **~10k-line formal specification** (`idea.md`) developing the structure from first principles in coding theory — the implementation references it throughout via `§IDEA N` annotations
- **38 architecture decision records** documenting every significant design choice, including a mutation-testing gap analysis (ADR-M-039)
- **README with runnable examples** — every code block in the Quick Start and Advanced Usage sections compiles as a doc-test
- **Companion documentation** — API reference, architecture guide, performance analysis, testing guide, and standalone theory document
- **AGPL-3.0-only with Linking Exception** — a formal LINKING-EXCEPTION file permits unmodified use through the public API (Surfaces 1 and 2) without copyleft obligations on downstream code
### Changes outside mudlark
- Cargo.toml / Cargo.lock — workspace member added, lock file regenerated
- project-words.txt — spellcheck dictionary entries for mudlark terminology
### Reviewing this
The public API is defined in `docs/api.md` and surfaced through the flat re-exports in lib.rs — that's the best starting point. The formal spec (`docs/idea.md`) is reference material; you don't need to read it to review the code, but the source cross-references it via `§IDEA N` annotations if you want to trace a design choice back to its rationale.
For a focused pass: lib.rs (crate root and re-exports) → `src/graph.rs` (the main type) → `src/traits/` (the public trait surface) → tests (integration tests exercise only the public API).
The two pedagogy tests (`tests/pedagogy.rs` and `tests/pedagogy_advanced.rs`) are designed to be read end-to-end — they walk through the worked example from the spec, printing a teaching narrative at each step. Running `cargo test -p torrust-mudlark --test pedagogy -- --nocapture` produces a readable tutorial.
### Notes
- **Ships at 1.0.0.** The public API surface is defined, documented, and semver-locked. `Config` and `GvGraph` have no `Default` impl because the parameters are genuinely domain-dependent. The API surface is intentionally narrow and documented in `docs/api.md`.
- **The formal spec is the single source of truth for the design** — the ADRs are the full implementation notes.
- **AGPL-3.0-only** with a formal LINKING-EXCEPTION: using the public API (Surfaces 1 and 2) from your own code doesn't trigger copyleft on downstream code. Modifications or use of internal APIs (Surface 3) are subject to the full AGPL.
- **MSRV 1.88**, inherited from the workspace.
- **No `unsafe` code.** Forbidden at the crate root.
ACKs for top commit:
da2ce7:
ACK 5286152
Tree-SHA512: 7a3c2be703d856c5c232dceb51db85d2f186805d10001c352f8bd681de7a16ecbc01f6c2fd61373d3071e0e2b6a297aa1be73b2c303f441a09da99133d0de3dc164 files changed
Lines changed: 80644 additions & 12 deletions
File tree
- packages/mudlark
- adr
- benches
- docs
- src
- testing
- tests
- traits
- tests
- bench_mirrors
- support
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
38 | | - | |
| 38 | + | |
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
43 | | - | |
44 | 43 | | |
45 | 44 | | |
46 | 45 | | |
| |||
54 | 53 | | |
55 | 54 | | |
56 | 55 | | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
| |||
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
70 | | - | |
| 70 | + | |
| 71 | + | |
71 | 72 | | |
72 | | - | |
| 73 | + | |
73 | 74 | | |
74 | 75 | | |
75 | 76 | | |
| |||
0 commit comments