Skip to content

Commit 6d53216

Browse files
committed
Merge remote-tracking branch 'origin/master' into claude/splat3d-cpu-simd-renderer-MAOO0
2 parents 8749a15 + f054bc7 commit 6d53216

225 files changed

Lines changed: 66392 additions & 3813 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.cargo/config-apple-m2.toml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
[build]
2+
# Apple M2 / M3 / M4 — ARMv8.6-A+ with BF16, dotprod, fp16, i8mm.
3+
# Use with:
4+
# cargo --config .cargo/config-apple-m2.toml build --target=aarch64-apple-darwin
5+
#
6+
# Targets the BF16 tier — see `src/simd_neon_bf16.rs` for the silicon
7+
# table, runtime detection (`sysctl hw.optional.arm.FEAT_BF16`), the
8+
# BFMMLA / BFDOT intrinsic family, and the asm-byte fallback path that
9+
# stable Rust 1.95 must use until `vbfdotq_f32` stabilizes (issue
10+
# #117222).
11+
#
12+
# Also works on:
13+
# - Apple M3 (target-cpu=apple-m3) — same ARMv8.6-A baseline
14+
# - Apple M4 — adds SVE2, can override with -Ctarget-cpu=apple-m4
15+
# - Snapdragon X Elite / X Plus on macOS-like targets (use cortex-x4)
16+
#
17+
# DOES NOT target Apple M1 — M1 is ARMv8.5-A and lacks BF16. M1 should
18+
# use the dotprod tier (config-pi5.toml-shaped, target-cpu=apple-m1).
19+
[target.aarch64-apple-darwin]
20+
rustflags = ["-Ctarget-cpu=apple-m2", "-Ctarget-feature=+bf16,+dotprod,+fp16,+i8mm"]

.cargo/config-avx512.toml

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
[build]
2+
# Explicit AVX-512 config — Sapphire Rapids baseline. Use with:
3+
# cargo --config .cargo/config-avx512.toml build
4+
# cargo --config .cargo/config-avx512.toml test
5+
#
6+
# `-Ctarget-cpu=sapphirerapids` enables, in addition to the
7+
# `x86-64-v4` AVX-512 baseline (F + BW + CD + DQ + VL):
8+
#
9+
# - AVX-512 VNNI (VPDPBUSD u8×i8 → i32)
10+
# - AVX-512 BF16 (VDPBF16PS, VCVTNE2PS2BF16)
11+
# - AVX-512 FP16 (16-wide native FP16 arithmetic)
12+
# - AVX-512 VBMI / VBMI2 (byte permute)
13+
# - AVX-512 IFMA, BITALG, VPOPCNTDQ, GFNI, VAES, VPCLMUL
14+
# - AVX-VNNI (ymm VPDPBUSD on Alder/Sapphire client)
15+
# - AMX-TILE + AMX-INT8 + AMX-BF16 (16×16×k tile kernels)
16+
#
17+
# Effect on the agnostic surfaces in `src/simd_*ops.rs`:
18+
#
19+
# - `simd_int_ops::gemm_u8_i8` resolves to the AVX-512 VNNI `VPDPBUSD`
20+
# zmm kernel (`hpc::vnni_gemm::int8_gemm_vnni_avx512`). When the
21+
# planned `amx-int8` arm lands, it will preempt this one and route
22+
# to `TDPBUSD` instead — same source, no caller changes.
23+
# - BF16 / FP16 lane ops in `src/simd_avx512.rs` light up.
24+
# - `simd_amx::*` tile primitives are usable without further gating.
25+
#
26+
# Pure `x86-64-v4` is NOT used here — Skylake-X is the only AVX-512 CPU
27+
# without VNNI and the project's design pins VNNI as the lowest common
28+
# denominator above the scalar reference. SKX users either build with
29+
# `-Ctarget-cpu=x86-64-v4` explicitly (and accept the scalar arm for
30+
# `gemm_u8_i8`) or run a runtime-LazyLock dispatch binary.
31+
#
32+
# Binary produced here will SIGILL on CPUs that lack any of the
33+
# enabled feature sets — i.e. anything pre-Sapphire-Rapids on x86_64:
34+
#
35+
# - Cooper Lake / Cascade Lake / Ice Lake-SP (no BF16+FP16+AMX)
36+
# - Skylake-X / Skylake-SP / Skylake-W (no VNNI either)
37+
# - Zen 4 / Zen 5 (no AMX)
38+
# - Alder Lake / Arrow Lake (no AVX-512 at all)
39+
# - Haswell ⇢ Coffee Lake (AVX2 only)
40+
#
41+
# Only deploy artifacts built with this config to hosts that report
42+
# `amx_int8 amx_bf16 avx512_bf16 avx512_fp16 avx512_vnni` in
43+
# `/proc/cpuinfo`. For Cascade Lake → Ice Lake-SP → Zen 4 silicon
44+
# (AVX-512 + VNNI but no AMX/BF16/FP16), build with
45+
# `-Ctarget-cpu=cascadelake` or `-Ctarget-cpu=znver4` instead. For
46+
# shipping a single release artifact that adapts at process start,
47+
# see the LazyLock runtime dispatch path in § 7.1 of the architecture
48+
# doc instead.
49+
[target.'cfg(target_arch = "x86_64")']
50+
rustflags = ["-Ctarget-cpu=sapphirerapids"]

.cargo/config-graviton.toml

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
[build]
2+
# AWS Graviton 3 / 3E / 4 (Neoverse V1 / V2) — ARMv8.4-A+ with BF16
3+
# (V1: optional, V2: mandatory) + SVE / SVE2.
4+
# Use with:
5+
# cargo --config .cargo/config-graviton.toml build --target=aarch64-unknown-linux-gnu
6+
#
7+
# Targets the BF16 tier — see `src/simd_neon_bf16.rs`. Graviton 3 (V1)
8+
# also adds SVE 256-bit; Graviton 4 (V2) adds SVE2 + BFMMLA + i8mm.
9+
#
10+
# Also works on:
11+
# - Cortex-X3 / X4 / X925 generic Linux servers
12+
# - Ampere Altra (V1-class — same baseline)
13+
# - NVIDIA Grace (V2 — same as Graviton 4)
14+
#
15+
# For ARMv9 cores with SVE2 you may want a separate config-sve2.toml
16+
# later that adds `+sve2` and routes through a future
17+
# `src/simd_neon_sve2.rs` (not in Phase 3 scope).
18+
[target.aarch64-unknown-linux-gnu]
19+
rustflags = ["-Ctarget-cpu=neoverse-v2", "-Ctarget-feature=+bf16,+dotprod,+fp16,+i8mm"]

.cargo/config-native.toml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
[build]
2+
# Native build config — `target-cpu = "native"`. Use with:
3+
# cargo --config .cargo/config-native.toml build
4+
# cargo --config .cargo/config-native.toml test
5+
#
6+
# rustc resolves the build host's CPUID at invocation and enables every
7+
# `target_feature` the host CPU advertises. `simd.rs` then picks the
8+
# matching backend (typically `simd_avx512` on modern dev machines).
9+
#
10+
# Produces a binary tuned for the developer's exact silicon. The result
11+
# is NOT portable: do not distribute artifacts built with this config.
12+
[target.'cfg(target_arch = "x86_64")']
13+
rustflags = ["-Ctarget-cpu=native"]

.cargo/config-pi5.toml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
[build]
2+
# Raspberry Pi 5 (BCM2712, Cortex-A76) — ARMv8.2-A with dotprod + fp16.
3+
# Use with:
4+
# cargo --config .cargo/config-pi5.toml build --target=aarch64-unknown-linux-gnu
5+
#
6+
# Targets the dotprod/fp16 tier — see `src/simd_neon_dotprod.rs` for the
7+
# silicon table, runtime detection, and stub map. Also works on:
8+
# - Orange Pi 5 (Rockchip RK3588, Cortex-A76)
9+
# - Anything reporting `Features: ... asimddp asimdhp ...` in
10+
# /proc/cpuinfo without `bf16`.
11+
#
12+
# For Apple M2+ / Snapdragon X / Graviton 4, use config-apple-m2.toml
13+
# (BF16 tier — see src/simd_neon_bf16.rs).
14+
[target.aarch64-unknown-linux-gnu]
15+
rustflags = ["-Ctarget-cpu=cortex-a76", "-Ctarget-feature=+dotprod,+fp16"]

.cargo/config.toml

Lines changed: 25 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,26 @@
11
[build]
2-
# No global target-cpu. Each kernel uses #[target_feature(enable = "avx512f")]
3-
# per-function, with LazyLock runtime detection. One binary, all ISAs.
4-
# Railway (AVX-512) and GitHub CI (AVX2) use the same binary.
2+
# Default cargo config — x86-64-v3 (AVX2) baseline. Portable across all
3+
# x86_64 silicon shipping since ~2013 (Haswell+). This is what GitHub CI
4+
# runs against and what `cargo build` produces for general distribution.
5+
#
6+
# Why v3 and not "no target-cpu":
7+
# `src/simd_avx2.rs` composes `F32x16` as two `__m256` halves (AVX
8+
# intrinsics), and the `simd_avx2_*` op funcs use `__m256i` (AVX2).
9+
# Without a global v3 baseline, rustc compiles to x86-64 generic (SSE2)
10+
# and those intrinsics emit instructions the CPU never executes →
11+
# SIGILL at run time, exactly the PR #170 CI failure mode.
12+
#
13+
# AVX-512 builds: use `--config .cargo/config-avx512.toml` (or
14+
# `CARGO_BUILD_RUSTFLAGS='-Ctarget-cpu=x86-64-v4'`). The simd.rs dispatch
15+
# arms key off `target_feature = "avx512f"`; under v4 they pick the
16+
# `simd_avx512` backend (native `__m512` / `__m512d` / `__m512i`).
17+
#
18+
# Build-machine-tuned binaries: use `--config .cargo/config-native.toml`
19+
# (`target-cpu = "native"`); rustc resolves the host CPUID at compile.
20+
#
21+
# Runtime LazyLock dispatch (one release binary, heterogeneous deployment
22+
# silicon) is a fifth opt-in mode — see § 7.1 of
23+
# .claude/knowledge/simd-dispatch-architecture.md. Reserved for the
24+
# release-binary distribution path; never the dev / CI default.
25+
[target.'cfg(target_arch = "x86_64")']
26+
rustflags = ["-Ctarget-cpu=x86-64-v3"]

.claude/ATT/DE/README.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
> **Sprache:** Deutsch · siehe `../README.md` für die englische Quellfassung.
2+
3+
# `.claude/ATT/` — NLSpecs unseres Kits im Attractor-Stil
4+
5+
> **Status:** DRAFT · **Version:** 0.1.0 · **Stand:** 2026-05-17
6+
>
7+
> **Was das ist.** Die Ideen unseres `.claude/EN/`-Kits, neu formuliert
8+
> im [strongdm/attractor](https://github.com/strongdm/attractor) NLSpec-Format,
9+
> damit ein Coding-Agent sie direkt umsetzen kann.
10+
> [NLSpec](https://github.com/strongdm/attractor#terminology) =
11+
> "human-readable spec intended to be directly usable by coding agents
12+
> to implement/validate behavior."
13+
>
14+
> **Was das NICHT ist.** Kein Ersatz für `.claude/EN/`. Die beiden sind
15+
> komplementär: `.claude/EN/` ist der Cheat-Sheet für Operatoren
16+
> (Prosa, in-session-Use); `.claude/ATT/` ist die Engineering-Spec
17+
> (NLSpec, build-time-Use).
18+
19+
## Die drei Specs
20+
21+
| Datei | Spiegelt Attractors | Ergänzt unsere Innovation |
22+
|---|---|---|
23+
| [`autoattended-orchestrator-spec.md`](./autoattended-orchestrator-spec.md) | [`attractor-spec.md`](https://github.com/strongdm/attractor/blob/main/attractor-spec.md) (DOT-Graph-Pipeline-Runner) | Wave-basierter 12-Worker-Fan-out; 4-savant Verdict-Gates (PP-13/14/15/16); 6 Worker-Iron-Rules; Sprint-Token-Budget (~300k/Wave); Multi-File-Board-Pattern mit single-mutable-file-Invariante |
24+
| [`anti-skim-agent-spec.md`](./anti-skim-agent-spec.md) | [`coding-agent-loop-spec.md`](https://github.com/strongdm/attractor/blob/main/coding-agent-loop-spec.md) (die Per-LLM-Call-Agent-Library) | Reading-Depth-Ladder (grep→read→thorough→fan-out); Lie-Detector LD-1..5 (Sentinel-Token / Proof-of-Read SHA / 3-Section-Challenge / Negative-Knowledge-Test / Line-Range-Quote); typisierte Stuck-Protocol-Blocker (AMBIGUITY / MISSING_INVARIANT / SPEC_SOURCE_MISMATCH / BEHAVIOUR_QUESTION / EXTERNAL_DEPENDENCY) |
25+
| [`agent-coordination-mcp-spec.md`](./agent-coordination-mcp-spec.md) | [`unified-llm-spec.md`](https://github.com/strongdm/attractor/blob/main/unified-llm-spec.md) (Provider-agnostisches LLM-SDK) | Drei Koordinations-Layer (Role-Teleport / File-Blackboard / Branch-Pub-Sub) so wie sie ein nativer A2A-MCP-Server exponieren sollte; strukturiertes Handover-Schema; Decision-Matrix dafür, wann welcher Layer passt |
26+
27+
## Was wir von Attractor übernommen haben (die fünf Wins)
28+
29+
Das sind konkrete Dinge, die Attractors Specs festnageln und die unseren
30+
Prosa-Docs in `.claude/EN/` gefehlt haben — jetzt eingearbeitet:
31+
32+
| # | Attractor-Konzept | Landet in unserer NLSpec | Übernahme |
33+
|---|---|---|---|
34+
| 1 | Typisiertes `status.json`-Schema + 5-Wert-`StageStatus`-Enum (Attractor Appendix C: `{outcome, preferred_label, suggested_next_ids, context_updates, notes}`) | [`autoattended-orchestrator-spec.md` §9](./autoattended-orchestrator-spec.md#9-status-file-schema) | Übernommen **mit verpflichtendem `auto_status=false`** (siehe "Wo es kollidiert" unten). |
35+
| 2 | DOT-Graph-DSL für den Workflow + Lint-Regeln (Attractor §2 Grammatik + §7 Validierung: `reachability`, `start_no_incoming`, `goal_gate_has_retry`, `condition_syntax`) | [`autoattended-orchestrator-spec.md` §6](./autoattended-orchestrator-spec.md#6-sprint-plan-format) (DOT + YAML-Mirror) + [§7](./autoattended-orchestrator-spec.md#7-validation-rules) (WAVE-001..WAVE-017 mit ERROR/WARNING-Severity) | Übernommen mit drei Wave-spezifischen Zusätzen: `unique-write`, `declared-shared`, `auto-status-false`. |
36+
| 3 | Context-Fidelity-Ladder (Attractor §5.4: `full` / `truncate` / `compact` / `summary:low/medium/high` mit Token-Budgets + Vorrang edge > node > graph > default) | [`autoattended-orchestrator-spec.md` §11](./autoattended-orchestrator-spec.md#11-context-fidelity) | Übernommen mit einer Verschärfung: `fidelity=truncate` entbindet einen Worker NICHT von der §3.3-Reading-Depth-Ladder aus `anti-skim-agent-spec.md`. |
37+
| 4 | In-Loop-Tool-Call-Loop-Detection (Attractor coding-agent §2.10: letzte 10 Calls scannen auf wiederholende Patterns der Länge 1/2/3 → Steering-Warning einspielen) | [`anti-skim-agent-spec.md` §6](./anti-skim-agent-spec.md#6-tool-call-loop-detection) + AP9 in [§9](./anti-skim-agent-spec.md#9-anti-pattern-catalog-ap1ap9) | Wortgleich übernommen; auf System-Level-Invariante erhoben. PP-13s Post-hoc-AP9 fängt, was der In-Loop-Detector verpasst. |
38+
| 5 | Definition-of-Done-Checklisten + Cross-Provider-Parity-Matrix pro Spec (Konformanz-Tabellen im Stil von Attractor §10) | Jede NLSpec endet mit `§ Definition von Fertig` + `§ Cross-{Language,Provider}-Parity-Matrix` | Als strukturelles Template übernommen. Der 26-Repo-Rollout ist jetzt maschinell prüfbar. |
39+
40+
## Warum dieses Format
41+
42+
Drei Eigenschaften, die wir aus Attractors NLSpec-Format bekommen und
43+
die unseren Prosa-Docs in `.claude/EN/` fehlen:
44+
45+
1. **Definition-von-Fertig-Checklisten** am Ende jeder Spec — gibt uns
46+
einen Konformanz-Test für "ist diese Implementierung fertig?"
47+
2. **Cross-Provider-Parity-Matrix**-Tabellen — geben uns ein Per-Language /
48+
Per-Runtime-Mapping, sodass dieselbe NLSpec in Rust, Python, TypeScript,
49+
Go landen kann ohne Drei-Wege-Drift.
50+
3. **Validierungs-Regeln mit ERROR/WARNING/INFO-Severity** — macht aus
51+
Linting einen deterministischen Prozess, kein Bauchgefühl.
52+
53+
## Wo es mit Attractors Haltung kollidiert (und warum wir bei unserer Position bleiben)
54+
55+
| Attractors Default | Unsere Position | Warum |
56+
|---|---|---|
57+
| `auto_status=true` (§4.5 + Appendix C: "wenn der Handler keinen Status schreibt, auto-generiere SUCCESS") | `auto_status=false` ist Pflicht | Genau das ist der Silent-Skim-Failure-Mode, gegen den unser Lie-Detector LD-1..5 existiert. Fehlender Status = FAIL, nicht SUCCESS. |
58+
| Single-Threaded Graph-Traversal (§3.8: "Nur ein Node läuft zur Zeit") | Wave-Fan-out ist die Baseline, kein Spezial-Fall `parallel`-Node | Unser Token-Budget ist pro Wave (~300k für 12 Workers), nicht pro Node. Waves als einen riesigen `parallel`-Node zu modellieren ist syntaktisch hässlich. |
59+
| Engine-Level-Retries mit Exponential-Backoff (§3.5-3.6) | Stuck-Agents filen typisierte Blocker in REQUESTS-FROM-AGENTS.md; Meta-Agent entscheidet | Retries sollen kontextuell und inspiziert sein, nicht silent und uniform. |
60+
| `wait.human` als Default; LLM-Gates sind Test-Fixtures (§6.4: `AutoApproveInterviewer` "Used for automated testing") | LLM-Meta-Agent ist das Production-Gate | Unser Meta-Agent macht Plan-Review + P0/P1-PR-Review + Inbox-Drain als Production-Rolle, nicht als Test-Fixture. |
61+
| Subagent-Tiefe = 1 Default (coding-agent §7.3) | Wave-Fan-out fährt routinemäßig 12 Workers aus einem Orchestrator | Wir überschreiben die Tiefe auf ≥2 — Workers sollen Sub-Investigations spawnen dürfen. |
62+
63+
## Konformanz
64+
65+
Ein Repo, das diese NLSpecs einbindet, ist konform, wenn es die
66+
"Definition von Fertig"-Checkliste am Ende jeder Spec erfüllt. Die
67+
lance-graph-Implementierung (siehe [`AdaWorldAPI/lance-graph` `.claude/agents/`](https://github.com/AdaWorldAPI/lance-graph/tree/main/.claude/agents))
68+
ist die reifste Referenz; die WoA + woa-rs-Implementierungen sind die
69+
Wave-basierte Referenz.
70+
71+
## Provenienz
72+
73+
- Quell-Kit: `.claude/EN/` in diesem Repo (siehe [`.claude/EN/README.md`](../../EN/README.md))
74+
- Format-Inspiration: [strongdm/attractor](https://github.com/strongdm/attractor) (NLSpecs im MIT-Stil)
75+
- Distillation-Handover: [`META/HANDOVER-AGENTKIT-CONSOLIDATION-2026-05-17.md`](https://github.com/AdaWorldAPI/WoA/blob/main/META%2FHANDOVER-AGENTKIT-CONSOLIDATION-2026-05-17.md) (in `AdaWorldAPI/WoA`)
76+
- Schwester-Handover (Rust-Hardening-Pass): [`META/HANDOVER-WOA-RS-AGENT-HARDENING-2026-05-17.md`](https://github.com/AdaWorldAPI/WoA/blob/main/META%2FHANDOVER-WOA-RS-AGENT-HARDENING-2026-05-17.md)
77+
78+
## Build
79+
80+
Um diese NLSpecs in eine lauffähige Implementierung zu verwandeln,
81+
gib einem modernen Coding-Agent (Claude Code, Codex, OpenCode, Amp,
82+
Cursor, ...) diesen Prompt:
83+
84+
```
85+
codeagent> Implement the Autoattended Orchestrator + Anti-Skim Agent
86+
+ Agent Coordination MCP as described by
87+
https://github.com/AdaWorldAPI/<repo>/tree/main/.claude/ATT
88+
together with strongdm/attractor as the substrate.
89+
```
90+
91+
*Ende der Datei README.md.*

0 commit comments

Comments
 (0)