A falsifiable test of LLM-augmented systems engineering at one-person scale.
The hypothesis is that a single developer, in the role of contract architect, can build a non-trivial systems-software artifact through structured work with a tiered LLM pipeline, where the LLMs operate as executors inside contract boundaries rather than as substitutes for engineering judgement.
The claim is operationalized as the production of a moddable simulation engine with declared invariants — a multithreaded ECS, capability-based mod isolation, and a replaceable native core — built solo, with measured pipeline throughput and a recorded defect rate. The colony-simulator content sits on top of the engine as a realistic load; the engine exists to stress-test the methodology under non-trivial workload.
The claim is rejected if any of the following hold over a sustained development window:
- Defect rate. The shipped artifact accumulates production-class defects that the contract-and-test infrastructure was supposed to prevent. Current state: 0 known production bugs across the closed phases; full test counts and acceptance criteria are recorded in docs/ROADMAP.md.
- Architectural integrity. The architecture drifts under sustained activity — locked specifications stop reflecting the code, contracts weaken to accommodate executor limitations, or isolation guarantees erode. Current state: architectural decisions and their rejected alternatives are recorded in docs/architecture/ARCHITECTURE.md, docs/architecture/MOD_OS_ARCHITECTURE.md, and the normalization audit in docs/reports/NORMALIZATION_REPORT.md.
- Pipeline economics. The pipeline cannot sustain its own throughput under a fixed monthly subscription and spills into pay-as-you-go API consumption to keep moving. Current state: two consecutive weekly windows under different operational profiles converge on the same headroom band; measurements are recorded in docs/methodology/PIPELINE_METRICS.md §3.
Each condition has a documented source of truth; the present README does not restate the numbers.
The pipeline configures N agents in an architect-executor split with rigid contracts at boundaries: a human as direction owner; one or more LLM instances operating as architect (deliberation, brief authoring, QA review) and executor (mechanical application against authored briefs). The architect-executor split with contracts at boundaries is invariant across configurations; specific N, the boundary type between architect and executor (model-tier boundary, session-mode boundary, or mixed), and tier mix vary by pipeline configuration.
The agents do not communicate directly; coordination happens through LOCKED documents in the repository and through the human as session router.
Current configuration (v1.6, 2026-05-10). N=2: Crystalka (direction owner) plus a unified Claude Desktop session that switches between deliberation mode (chat interface, architectural decision recording per K8.0 / K-L3.1 / A'.0.7 precedent) and execution mode (Claude Code agent, autonomous tool-loop per A'.0.5 precedent). Boundary type: session-mode.
v1.x era (Phase 0–8, ending 2026-05-09) used model-tier boundary с N=4 (local quantized Gemma executor + cloud Sonnet prompt-generator + cloud Opus architect + human direction owner). Empirical record preserved in docs/methodology/PIPELINE_METRICS.md с per-metric transferability annotations.
Full pipeline configuration, empirical task metrics, subscription headroom data, and reproducibility requirements documented in docs/methodology/PIPELINE_METRICS.md. Full methodology documented in docs/methodology/METHODOLOGY.md. The methodology and deeper documents are authored under agent-as-primary-reader assumption — readers unfamiliar с the project's cross-reference density should use AI tooling for navigation through the documentation corpus.
If a contract is rigid enough that an executor produces correct code under it on the first build at a measurable rate (target <30% requiring second execution), the contract will hold under any stronger executor or any restructured boundary type. Isolation from executor errors is a structural property of the contract, not of the executor's specific capacity.
The engine is the stress test. Without a non-trivial workload, the pipeline claim reduces to a statement about toy problems. The engine carries three properties that make the workload non-trivial:
- A multithreaded ECS with declarative system access (
[SystemAccess]), a Kahn-sorted dependency graph, and compile-time isolation enforcement.[SystemAccess]declarations are consumed byDependencyGraphfor edge-building; the future A'.9 Roslyn analyzer extends this enforcement to call sites. The runtime guard methods that previously threwIsolationViolationExceptionwere deleted in K8.3+K8.4 (A'.5 closure, 2026-05-14) — the safety model is compile-time + analyzer, not runtime. - Capability-based mod isolation: each mod loads into its own
AssemblyLoadContext, sees onlyDualFrontier.Contracts, and interacts with the kernel through reflection-scanned capabilities. The architecture is documented as an OS-style design in docs/architecture/MOD_OS_ARCHITECTURE.md. - A native ECS storage backend (
NativeWorld) — the sole production component-storage path after A'.5 K8.3+K8.4 (2026-05-14). The prior managedWorldis retired from production and survives only as a test fixture (ManagedTestWorld). An earlier exploration of a separate C++ kernel as a replaceable boundary produced a measured negative result with criterion reformulation, recorded in docs/reports/NATIVE_CORE_EXPERIMENT.md.
This repository is not a game release, a competitor to Bevy or Unity DOTS, or a claim that LLM pipelines can replace software engineers. It is also not a claim about generalizability beyond systems software with formal, machine-checkable contracts. The boundaries of applicability are recorded in docs/methodology/METHODOLOGY.md §6.
Dual Frontier targets modern GPU hardware с Vulkan 1.3 + async compute queue family support per the K-L19 architectural commitment (docs/architecture/KERNEL_ARCHITECTURE.md Part 0, K-L19).
Minimum tier:
- NVIDIA: Turing or newer — GeForce GTX 1660 / RTX 20-series и later
- AMD: RDNA 1 or newer — Radeon RX 5500 и later
- Intel: Arc Alchemist or newer — Arc A380 и later
- Integrated GPUs: most NOT supported (lack async compute queue family)
Pre-Turing NVIDIA, pre-RDNA AMD, pre-Arc Intel hardware will fail at startup с a clear diagnostic message. This is an intentional architectural choice supporting clean implementation per K-L14 («performance derives from architectural cleanliness»); it is not a hardware-discrimination decision. By Dual Frontier release timeline, the target hardware tier represents the majority of gaming hardware.
Verification: launch Dual Frontier — if startup fails с
HardwareCapabilityException, upgrade GPU driver or hardware. Run
vulkaninfo.exe (from the Vulkan SDK or GPU driver) to verify that the
host hardware supports Vulkan 1.3 + a compute-capable queue family.
OS support: Windows 10/11 x64. Linux/macOS deferred per docs/architecture/VULKAN_SUBSTRATE.md L7 LOCKED.
- docs/methodology/METHODOLOGY.md — the methodology as designed: pipeline architecture, contracts as inter-agent IPC, verification cycle, threat model, boundaries of applicability.
- docs/architecture/MOD_OS_ARCHITECTURE.md — the capability-based mod isolation as an OS-style architecture; v1.0 LOCKED.
- docs/reports/NATIVE_CORE_EXPERIMENT.md — a measured negative result with explicit criterion reformulation.
The full documentation index is in docs/README.md. Source layout is described in docs/architecture/ARCHITECTURE.md; without it the assembly structure looks excessive.
This project is distributed under the PolyForm Noncommercial 1.0.0 license. Commercial use of the engine code requires a separate agreement.
docs/methodology/METHODOLOGY.md— pipeline и methodologydocs/methodology/CODING_STANDARDS.md— coding conventionsdocs/architecture/MOD_OS_ARCHITECTURE.md— modding architecturedocs/architecture/VULKAN_SUBSTRATE.md— Vulkan substrate (V) — rendering + compute use cases unified per Q-G-1 LOCKdocs/architecture/KERNEL_ARCHITECTURE.md— native ECS kernel layer (K0-K8)docs/reports/CPP_KERNEL_BRANCH_REPORT.md— Discovery report (experimental branch)