Spruill-1
diff --git a/‎.context/resume.md‎
Lines changed: 4 additions & 3 deletions b/‎.context/resume.md‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎.github/copilot-instructions.md‎
Lines changed: 1 addition & 1 deletion b/‎.github/copilot-instructions.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md‎
Lines changed: 41 additions & 1161 deletions b/‎README.md‎
Lines changed: 41 additions & 1161 deletions
diff --git a/‎docs/README.md‎
Lines changed: 59 additions & 0 deletions b/‎docs/README.md‎
Lines changed: 59 additions & 0 deletions
diff --git a/‎docs/architecture/compute-analysis-pipeline.md‎
Lines changed: 54 additions & 0 deletions b/‎docs/architecture/compute-analysis-pipeline.md‎
Lines changed: 54 additions & 0 deletions
diff --git a/‎docs/architecture/d2d-d3d11-hybrid-compute.md‎
Lines changed: 167 additions & 0 deletions b/‎docs/architecture/d2d-d3d11-hybrid-compute.md‎
Lines changed: 167 additions & 0 deletions
diff --git a/‎docs/architecture/display-monitoring.md‎
Lines changed: 34 additions & 0 deletions b/‎docs/architecture/display-monitoring.md‎
Lines changed: 34 additions & 0 deletions
@@ -11,7 +11,7 @@
 - **Language**: C++/WinRT — direct COM access to `ID2D1EffectImpl`, `ID2D1DrawTransform`, `ID2D1ComputeTransform`.
 - **Branch / repo state**: `main`, tagged `v1.5.0`. Working tree clean; the 4 stacked commits are Phase 8 prep + cleanup, not yet pushed.
 
-> Authoritative sources of truth: `README.md` (architecture + decision log, **63 entries**), `CHANGELOG.md` (per-version diffs, including `[Unreleased]` for the post-1.5 work), `Version.h` (numeric version), `.github/copilot-instructions.md` (AI agent rules). This file is a fast-orientation summary; it can drift — re-check the README before relying on details.
+> Authoritative sources of truth: [`docs/`](../docs/README.md) (architecture tree + per-file references) and especially [`docs/history/decision-log.md`](../docs/history/decision-log.md) (**63 entries**), `CHANGELOG.md` (per-version diffs, including `[Unreleased]` for the post-1.5 work), `Version.h` (numeric version), `.github/copilot-instructions.md` (AI agent rules). This file is a fast-orientation summary; it can drift — re-check the docs tree before relying on details.
 
 ---
 
@@ -199,9 +199,10 @@ ShaderLab\
 ├── app.manifest                    # DPI awareness, heap type
 ├── EngineExport.h / .cpp           # SHADERLAB_API + ABI version + ShaderLab_GetAbiVersion C export
 ├── Version.h                       # App 1.5.0, graph format 2
-├── README.md                       # Living architecture doc + decision log (63 entries)
+├── README.md                       # Slim repo intro + pointer to docs/
+├── docs/                           # Architecture tree (architecture / effects / ui-ux / hosts / development / history)
+├── docs/effects/new-effect-defaults.md  # D2D effect default-property reference
 ├── CHANGELOG.md                    # Version history
-├── NewEffectDefaults.md            # D2D effect default-property reference
 ├── Bootstrap.ps1                   # One-command fresh-clone setup
 │
 ├── pch.h / pch.cpp                 # App PCH
 
@@ -7,7 +7,7 @@ ShaderLab is a WinUI 3 desktop application (C++/WinRT) for developing, testing,
 ## Hard Rules
 
 - **C++/WinRT only** — never generate C# code. Direct COM access to `ID2D1EffectImpl`, `ID2D1DrawTransform`, `ID2D1ComputeTransform` is the reason this project exists.
-- **README.md is a living architecture doc** — update it with Mermaid diagrams and a decision log entry whenever a significant architectural decision is made.
+- **`docs/` is the living architecture documentation tree.** Update the relevant file under `docs/architecture/`, `docs/effects/`, etc. when a significant architectural change lands. Append a new entry to [`docs/history/decision-log.md`](../docs/history/decision-log.md) with a Mermaid diagram for any choice that future contributors will need to understand the *why* of. The repo-root `README.md` is intentionally slim — install + build + a pointer to `docs/` — and should not be expanded with technical detail.
 - **All new `.cpp` files must `#include "pch.h"` as the first include** — precompiled header is mandatory (`pch.h` aggregates WinRT, D2D, D3D, Win2D, DXGI, WIC, and STL headers).
 
 ## Build
 
@@ -0,0 +1,59 @@
+# ShaderLab Documentation
+
+The repo root [README.md](../README.md) covers identity, install, and getting-started. Everything below is the deeper technical reference, organized by audience.
+
+> **Looking for the changelog?** [`CHANGELOG.md`](../CHANGELOG.md) at the repo root tracks every version. The [decision log](history/decision-log.md) below tracks architectural choices with rationale (independent of release boundaries).
+
+---
+
+## Architecture
+
+Reference for "how does ShaderLab work under the hood".
+
+- [Architecture Overview](architecture/overview.md) — high-level diagram of the components (rendering, graph, effects, UI controllers, hosts).
+- [Pipeline Format Strategy](architecture/pipeline-format.md) — why the pipeline is always scRGB FP16 and how DWM/ACM handles the final display conversion.
+- [Effect Graph Model](architecture/effect-graph-model.md) — `EffectGraph` / `EffectNode` / `EffectEdge` / `PropertyValue`, JSON serialization, dirty tracking.
+- [Topological Evaluation](architecture/topological-evaluation.md) — Kahn's algorithm, evaluation order, cycle detection.
+- [Display Monitoring](architecture/display-monitoring.md) — DXGI adapter-change events, `WM_DISPLAYCHANGE`, ICC profile parsing, SDR white level.
+- [Display Profile Mocking](architecture/display-profile-mocking.md) — simulated SDR/HDR/WCG environments and the testing harness.
+- [Compute Shader Analysis Pipeline](architecture/compute-analysis-pipeline.md) — D2D compute conventions, CPU readback, analysis output schema.
+- [D2D / D3D11 Hybrid Compute System](architecture/d2d-d3d11-hybrid-compute.md) — `CustomComputeBridgeEffect`, `D3D11ComputeRunner`, GPU-binding routing, COM class hierarchy.
+- [Engine / Host Split](architecture/engine-host-split.md) — `ShaderLabEngine.dll` ABI, `IEngineCommandSink`, host event hooks.
+
+## Effects
+
+Reference for the effect catalog and per-effect mechanics.
+
+- [Built-in Effect Catalog](effects/builtin-catalog.md) — the ~35 ShaderLab effects (Analysis, Color, Source, Tone Mapping, Parameter).
+- [Effect Versioning System](effects/effect-versioning.md) — how `effectVersion` bumps are detected on graph load.
+- [Effect Designer](effects/effect-designer.md) — the modal window for authoring custom pixel/compute shaders.
+- [Numeric Expression Node (ExprTk)](effects/numeric-expression.md) — single-input math expression parameter node.
+- [Parameter Nodes](effects/parameter-nodes.md) — Float / Integer / Toggle / Gamut Parameter and Clock.
+- [Property Bindings (Data Pins)](effects/property-bindings.md) — wiring analysis fields to downstream parameters.
+- [Working Space Integration](effects/working-space.md) — host-driven node mirroring the active display profile.
+- [New Effect Defaults](effects/new-effect-defaults.md) — default property values for every newly added D2D effect.
+
+## UI / UX
+
+- [Graph Editor UX](ui-ux/graph-editor.md) — keyboard / mouse shortcuts, color coding, inline data display.
+- [Multi-Output Windows](ui-ux/multi-output-windows.md) — the secondary preview windows.
+- [Animation System](ui-ux/animation-system.md) — Clock-driven dirty propagation.
+- [Conditional Parameter Visibility](ui-ux/conditional-parameter-visibility.md) — `visibleWhen` expressions on parameters.
+
+## Hosts
+
+- [ShaderLabHeadless (Console Host)](hosts/headless.md) — the no-UI host for CI / batch scripting.
+- [MCP Server (AI Agent Integration)](hosts/mcp-server.md) — JSON-RPC 2.0 server, tool catalog, route distribution.
+
+## Development
+
+- [Build Instructions](development/build.md) — prerequisites, configurations, dependency map.
+- [Project Structure](development/project-structure.md) — full file tree with per-file descriptions.
+
+## History
+
+- [Decision Log](history/decision-log.md) — chronological architectural decisions with rationale (60+ entries).
+
+---
+
+Back to [Repo root](../README.md).
@@ -0,0 +1,54 @@
+# Compute Shader Analysis Pipeline
+
+Custom compute shaders can act as analysis effects, producing typed output fields that are read back to the CPU and can drive downstream effect properties via data bindings.
+
+## Analysis Output Types
+
+| Type | Pixels | Packing |
+|------|--------|---------|
+| `float` | 1 | `.x` used |
+| `float2` | 1 | `.xy` used |
+| `float3` | 1 | `.xyz` used |
+| `float4` | 1 | all 4 components |
+| `floatarray` | ceil(N/4) | 4 floats packed per pixel |
+| `float2array` | N | `.xy` per pixel |
+| `float3array` | N | `.xyz` per pixel |
+| `float4array` | N | all 4 per pixel |
+
+## D2D Compute Shader Conventions
+
+D2D evaluates compute effects in **tiles**. Key conventions:
+
+- **`_TileOffset`** (int2): Auto-injected at cbuffer offset 0 in `CalculateThreadgroups`. Gives the tile's origin in the full image.
+- **`Source.GetDimensions()`**: Returns the full source image size (not tile size).
+- **`SampleLevel()`**: Must use normalized UVs via `SampleLevel()`. `Load()` is not available in D2D compute shaders.
+- **`Output.GetDimensions()`**: Returns the tile size, not the full image.
+- **Constant buffer upload**: Done in `CalculateThreadgroups` (not `PrepareForRender`) for correct per-tile values.
+
+## Shader Pattern
+```hlsl
+cbuffer Constants : register(b0) {
+    int2 _TileOffset;  // Auto-injected per tile
+    // User parameters here...
+};
+Texture2D Source : register(t0);
+RWTexture2D<float4> Output : register(u0);
+SamplerState Sampler0 : register(s0);
+
+[numthreads(8, 8, 1)]
+void main(uint3 DTid : SV_DispatchThreadID) {
+    uint srcW, srcH;
+    Source.GetDimensions(srcW, srcH);
+    uint2 globalPos = DTid.xy + uint2(_TileOffset);
+    if (globalPos.x >= srcW || globalPos.y >= srcH) return;
+    
+    float2 uv = (float2(globalPos) + 0.5) / float2(srcW, srcH);
+    float4 color = Source.SampleLevel(Sampler0, uv, 0);
+    Output[DTid.xy] = color;
+}
+```
+
+
+---
+
+Back to [docs/](../README.md) • [Repo root](../../README.md)
@@ -0,0 +1,167 @@
+# D2D / D3D11 Hybrid Compute System
+
+## Problem
+
+D2D's custom compute shader API (`ID2D1ComputeTransform`) has fundamental limitations that prevent full-image reduction operations:
+
+| Limitation | Impact |
+|-----------|--------|
+| **Per-tile UAV clearing** | D2D clears the output `RWTexture2D<float4>` before each tile dispatch. Scatter writes don't accumulate across tiles. |
+| **No custom UAV binding** | `ID2D1ComputeInfo::SetResourceTexture` binds read-only `ID2D1ResourceTexture` (register t), not UAVs (register u). |
+| **No uint atomics on output** | The output UAV is `RWTexture2D<float4>`. `InterlockedMin`/`InterlockedMax`/`InterlockedAdd` require `RWBuffer<uint>`. |
+| **No input as D3D11 texture** | `PrepareForRender` doesn't expose the input image as a D3D11 surface. The effect context is deliberately isolated from the device. |
+
+The built-in `CLSID_D2D1Histogram` effect works around these via private D2D internals not exposed through the public API.
+
+## Solution: Evaluator-Owned D3D11 Dispatch
+
+The graph evaluator owns a **raw D3D11 compute dispatch path** that bypasses D2D's tiling entirely. D2D handles the effect graph wiring (input/output connections), while D3D11 handles the actual computation.
+
+## COM Class Hierarchy
+
+```mermaid
+classDiagram
+    class ID2D1EffectImpl {
+        <<interface>>
+        +Initialize(effectContext, transformGraph)
+        +PrepareForRender(changeType)
+        +SetGraph(transformGraph)
+    }
+
+    class ID2D1DrawTransform {
+        <<interface>>
+        +SetDrawInfo(drawInfo)
+        +MapInputRectsToOutputRect()
+        +MapOutputRectToInputRects()
+        +MapInvalidRect()
+        +GetInputCount()
+    }
+
+    class D3D11ComputeRunner {
+        <<RWStructuredBuffer<float4> path>>
+        -ID3D11ComputeShader* m_shader
+        -ID3D11Buffer* m_resultBuffer
+        +CompileShader(hlsl)
+        +Dispatch(input, cbuffer, resultCount) vector~float~
+    }
+
+    class CustomPixelShaderEffect {
+        <<PixelShader path>>
+        +LoadShaderBytecode()
+        +SetConstantBufferData()
+    }
+
+    class CustomComputeShaderEffect {
+        <<ComputeShader / D3D11ComputeShader paths>>
+        +SetThreadGroupSize()
+        +CalculateThreadgroups()
+    }
+
+    ID2D1EffectImpl <|.. CustomPixelShaderEffect
+    ID2D1DrawTransform <|.. CustomPixelShaderEffect
+    ID2D1EffectImpl <|.. CustomComputeShaderEffect
+
+    note for CustomComputeShaderEffect "D2D-tiled compute\nUAV cleared per tile\nNo atomics\n+ D3D11 hybrid mode dispatched\n by GraphEvaluator via D3D11ComputeRunner"
+```
+
+## Data Flow: D2D → D3D11 Handoff
+
+```mermaid
+flowchart TD
+    subgraph D2D_Graph["D2D Effect Graph (Evaluator)"]
+        SRC[Source Node<br/>ID2D1Image*] --> FX[Upstream Effect<br/>Gamut Map / Delta E / etc.]
+        FX --> CACHE["cachedOutput<br/>(deferred ID2D1Image*)"]
+    end
+
+    subgraph Realize["Realize to D3D11 Texture"]
+        CACHE --> CREATE["dc->CreateBitmap()<br/>DXGI_FORMAT_R32G32B32A32_FLOAT"]
+        CREATE --> DRAW["dc->SetTarget(bitmap)<br/>dc->DrawImage(cachedOutput)<br/>dc->SetTarget(prev)"]
+        DRAW --> FLUSH["dc->Flush()<br/>⚠ Required — D2D batches<br/>commands until Flush/EndDraw"]
+        FLUSH --> SURFACE["bitmap->GetSurface()<br/>→ IDXGISurface"]
+        SURFACE --> QI["surface->QueryInterface()<br/>→ ID3D11Texture2D"]
+    end
+
+    subgraph D3D11_Compute["D3D11 Compute Dispatch"]
+        QI --> SRV["CreateShaderResourceView()<br/>register(t0)"]
+        SRV --> CBUF["Update Constant Buffer<br/>(Width, Height, Channel, NonzeroOnly)"]
+        CBUF --> CLEAR["ClearUnorderedAccessViewUint()<br/>(reset result buffer)"]
+        CLEAR --> DISPATCH["ctx->Dispatch(1, 1, 1)<br/>32×32 = 1024 threads"]
+    end
+
+    subgraph GPU_Reduction["GPU Reduction (groupshared)"]
+        DISPATCH --> STRIDE["Each thread strides<br/>across entire image"]
+        STRIDE --> LOCAL["Per-thread accumulators<br/>min, max, sum, count"]
+        LOCAL --> SHARED["groupshared parallel reduction<br/>log2(1024) = 10 steps"]
+        SHARED --> WRITE["Thread 0 writes<br/>8 uints to RWBuffer"]
+    end
+
+    subgraph Readback["Result Readback (32 bytes)"]
+        WRITE --> COPY["CopyResource()<br/>→ staging buffer"]
+        COPY --> MAP["Map() + read 8 uints"]
+        MAP --> STATS["ImageStats struct<br/>min, max, mean, samples, nonzero"]
+        STATS --> ANALYSIS["node.analysisOutput.fields<br/>(data pins on graph)"]
+    end
+```
+
+## Three Effect Types Compared
+
+| | D2D Pixel Shader | D2D Compute Shader | D3D11 Hybrid Compute |
+|---|---|---|---|
+| **COM class** | `CustomPixelShaderEffect` | `CustomComputeShaderEffect` (D2D-tiled mode) | `CustomComputeShaderEffect` (D3D11 mode) |
+| **D2D interface** | `ID2D1DrawTransform` | `ID2D1ComputeTransform` | `ID2D1DrawTransform` (pass-through) |
+| **Shader target** | `ps_5_0` | `cs_5_0` | `cs_5_0` (dispatched by host) |
+| **Execution** | D2D renders directly | D2D dispatches per-tile | Evaluator dispatches via D3D11 |
+| **Tiling** | D2D-managed | D2D-managed (UAV cleared) | **None** — single dispatch |
+| **Atomics** | N/A | No (float4 UAV only) | **Yes** (RWStructuredBuffer / RWBuffer) |
+| **groupshared** | N/A | Yes (per-tile only) | **Yes** (full image) |
+| **Shader linking** | Yes (D2D optimizes) | No | No |
+| **Image output** | Yes | Yes | Optional (pass-through or none) |
+| **Analysis output** | Via pixel readback | Via pixel readback | Via `RWStructuredBuffer<float4> Result` |
+| **`CustomShaderType`** | `PixelShader` | `ComputeShader` | `D3D11ComputeShader` |
+
+The `D3D11ComputeShader` mode is what powers Channel / Luminance / Chromaticity Statistics, the gamut analysis effects, and any user-authored "analyze the whole image" shader created via the Effect Designer. Internally it dispatches through `Rendering::D3D11ComputeRunner`.
+
+## Usage: ShaderLab Evaluator (Optimized Path)
+
+```cpp
+// In GraphEvaluator::ProcessDeferredCompute(), for D3D11ComputeShader nodes:
+
+// 1. Render upstream D2D output to FP32 bitmap
+winrt::com_ptr<ID2D1Bitmap1> gpuTarget;
+dc->CreateBitmap(D2D1::SizeU(w, h), nullptr, 0, fp32Props, gpuTarget.put());
+winrt::com_ptr<ID2D1Image> prevTarget;
+dc->GetTarget(prevTarget.put());
+dc->SetTarget(gpuTarget.get());
+dc->Clear(D2D1::ColorF(0, 0, 0, 0));
+dc->DrawImage(upstreamNode->cachedOutput);
+dc->SetTarget(prevTarget.get());
+
+// 2. Flush D2D command batch — CRITICAL for D2D→D3D11 handoff.
+//    D2D batches DrawImage commands until EndDraw() or Flush().
+//    Without this, D3D11 reads uninitialized zeros from the texture.
+dc->Flush();
+
+// 3. Get D3D11 texture (zero-copy — same DXGI surface)
+winrt::com_ptr<IDXGISurface> surface;
+gpuTarget->GetSurface(surface.put());
+winrt::com_ptr<ID3D11Texture2D> d3dTexture;
+surface->QueryInterface(d3dTexture.put());
+
+// 4. Dispatch GPU reduction (single call)
+auto stats = m_gpuReduction.Reduce(d3dCtx, d3dTexture.get(), channel, nonzeroOnly);
+
+// 5. Populate analysis output for graph data pins
+node->analysisOutput.fields = { {"Min", stats.min}, {"Max", stats.max}, ... };
+```
+
+## Known Limitations
+
+- **D2D→D3D11 flush required**: When rendering a D2D effect chain to a bitmap and then reading it with D3D11, `dc->Flush()` **must** be called between `DrawImage` and any D3D11 access to the underlying texture. D2D batches draw commands until `EndDraw()` or `Flush()` — without an explicit flush, D3D11 reads zeros from the texture. Applied in `DispatchUserD3D11Compute` in `GraphEvaluator`.
+- **D2D draw session required**: `ProcessDeferredCompute` must run inside an active `BeginDraw`/`EndDraw` session because `DispatchUserD3D11Compute` calls `dc->DrawImage` internally to pre-render the upstream chain into an FP32 bitmap. Outside a draw session that DrawImage silently no-ops and the compute reads a black input texture. The GUI's `RenderFrame`, the headless host's `runEval` / `RunRender`, and the test bench all wrap the call accordingly.
+- **No shader linking**: D3D11 compute shaders are opaque to D2D. They don't participate in D2D's shader linking optimization for chained pixel shader effects.
+- **Single thread group per dispatch**: `D3D11ComputeRunner` dispatches `(1,1,1)` — one group of 1024 threads. For images larger than ~33 megapixels (1024² pixels per thread), a multi-dispatch pyramid would be needed.
+
+
+---
+
+Back to [docs/](../README.md) • [Repo root](../../README.md)
@@ -0,0 +1,34 @@
+# Display Monitoring
+
+```mermaid
+sequenceDiagram
+    participant App as ShaderLab
+    participant DXGI as DXGI Output
+    participant DM as DisplayMonitor
+    participant PF as PipelineFormat
+    participant SC as SwapChain
+
+    App->>DXGI: IDXGIOutput6::GetDesc1()
+    DXGI-->>App: DXGI_OUTPUT_DESC1
+    App->>DM: Initialize(hWnd)
+    DM->>DM: Register WM_DISPLAYCHANGE
+    DM->>DM: Register IDXGIFactory7::RegisterAdaptersChangedEvent
+
+    Note over DM: Display change detected
+    DM->>DXGI: Re-query IDXGIOutput6::GetDesc1()
+    DXGI-->>DM: Updated capabilities
+    DM->>PF: NotifyDisplayChanged(newCaps)
+    PF->>SC: Recreate with new format if needed
+    DM->>App: Update status bar
+```
+
+## SDR white level
+
+`DisplayCapabilities::sdrWhiteLevelNits` is queried from the OS via `DisplayConfigGetDeviceInfo(DISPLAYCONFIG_DEVICE_INFO_GET_SDR_WHITE_LEVEL)`, decoded as `nits = SDRWhiteLevel / 1000 * 80`. This value tracks the user's **Settings → Display → HDR → "SDR content brightness"** slider when HDR is on; when HDR is off it falls back to 80 nits.
+
+The value is exposed to graphs through the **`Working Space` parameter node** (see [Working Space Integration](#working-space-integration)) on its `SdrWhiteNits` analysis output. Effects that need to know the nit value of scRGB 1.0 (the entire ICtCp suite) consume it via property bindings — wire `working_space.SdrWhiteNits` into the effect's nit-target parameter and it tracks both the OS slider and any simulated `DisplayProfile` preset automatically. There is no longer any per-effect "follow the live monitor" or "follow the working space" host-side plumbing; the Working Space node is the single explicit path.
+
+
+---
+
+Back to [docs/](../README.md) • [Repo root](../../README.md)