Skip to content

Commit 0a7f9fc

Browse files
Spruill-1Copilot
andcommitted
Bump version to 1.6.0
Phase 8 GPU-binding stack + bug-fix flurry. See CHANGELOG.md [1.6.0] for the full list. Highlights: * Compute outputs route directly to compute consumers as SRVs (no CPU readback round-trip) * Disk-persistent bytecode cache with eager-precompile of GPU-binding variants * CustomComputeBridgeEffect unifies D3D11 compute custom-effect discovery * Gamut Coverage / Vectorscope / Waveform Monitor migrated to single-group D3D11 compute scatter * Skip-readback default-on with per-frame CpuAnalysisInterest hints and 0.5 Hz throttle * Split Comparison gets host-injected output dimensions to dodge D2Ds atlas allocation behavior on multi-branch graphs * CIE Histogram register(u0)->register(u1) UAV slot fix * Properties panel skips its 4Hz binding-value rebuild while a descendant control has focus 154/154 tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent ff7cff0 commit 0a7f9fc

3 files changed

Lines changed: 17 additions & 4 deletions

File tree

CHANGELOG.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,11 @@ Format follows [Keep a Changelog](https://keepachangelog.com/).
55

66
## [Unreleased]
77

8-
### Added
8+
## [1.6.0] - 2026-05-08
9+
10+
The "Phase 8 GPU-binding" release. Compute outputs now route directly to downstream compute consumers as SRVs (no CPU readback round-trip), with a per-frame skip-readback policy that avoids `Map()` calls when no consumer needs CPU values. Disk-persistent bytecode cache, eager precompile of GPU-binding variants, and a new `CustomComputeBridgeEffect` D2D wrapper unify the discovery channel for D3D11 compute custom effects. Two slow analysis viewers (Gamut Coverage at 4K, plus Vectorscope/Waveform Monitor when present) migrated to single-group D3D11 compute scatter. Plus a flurry of bug fixes from real-world heavy-graph testing.
11+
12+
### Added (since 1.5.0)
913

1014
- **Phase 8 GPU-binding rollout (decisions #62, #63 in README; v1.6 work-in-progress).** The full Phase 8 stack is now in place:
1115
- `Effects/IEngineComputeOutput.h` — engine-internal COM interface (IID `831B9291-CCAB-40A2-B0BA-E847F5B9FA6C`) with `GetAnalysisSrv` + `GetLastEvaluatedFrame`. Pure POD ABI, no STL across the COM boundary. Field layout stays in the graph data model on `EffectNode::analysisOutput.fields` so the metadata isn't duplicated.
@@ -32,6 +36,15 @@ Format follows [Keep a Changelog](https://keepachangelog.com/).
3236
### Fixed
3337

3438
- **D3D11 compute analysis nodes silently produced zero stats in headless contexts.** `DispatchUserD3D11Compute`'s internal `dc->DrawImage` requires an active D2D draw session; without one, the compute shader read a black input texture and emitted Min/Max/Mean = 0 with the histogram floor at bin 0. Caught by the new `TestLuminanceStatistics` unit test and fixed by wrapping the headless host's `runEval` / `RunRender` and the test bench's `Evaluate` shim in `dc->BeginDraw/EndDraw`. The latent bug had probably been there since the D3D11 hybrid compute landed in v1.4-era; it never manifested because the GUI's `RenderFrame` always wraps correctly and the GUI was the only host running stats analysis.
39+
- **Split Comparison pivot off-center on multi-branch graphs.** D2D atlas-pads pixel-shader input intermediates to 4096×4096 when an upstream effect has multiple downstream consumers, so HLSL `Texture2D::GetDimensions()` returns the atlas size, not the output rect. The shader's `uv0 - (W*0.5, H*0.5)` recentering was pivoting at (2048, 2048) on a 3840×2160 video, putting the seam at the bottom edge. Fix: added `OutputW`/`OutputH` hidden cbuffer fields populated by the host from `GetImageLocalBounds(srcNode->cachedOutput)`. Generic mechanism — any pixel shader declaring `OutputW`/`OutputH` parameters gets host-injected dimensions.
40+
- **`D2D1_PIXEL_OPTIONS_TRIVIAL_SAMPLING` now passed on every `CustomPixelShaderEffect::SetPixelShader` call.** Disables D2D's intermediate-atlas allocation so `uv0`/`SV_POSITION` report true output coordinates. Contract: every ShaderLab pixel shader reads inputs at the same coord as the output (cross-texel sampling effects belong on the compute path).
41+
- **CIE Histogram produced an empty image output.** The shader declared `RWTexture2D<float4> Output : register(u0)`, but the runner binds the analysis structured buffer at `u0` and the image output at `u1`. Histogram writes were silently going to the (unused) analysis buffer. Fixed: `register(u0)``register(u1)`. Downstream CIE Chromaticity Plot now correctly shows the histogram scatter overlay.
42+
- **Properties panel was un-editable on Clock-driven graphs.** The 4 Hz binding-value refresh was unconditionally calling `UpdatePropertiesPanel()` (which `Clear()`s the entire control tree) whenever the selected node had any property binding and any graph node was dirty — both true continuously while a Clock plays. Mid-edit clicks lost focus on every 250 ms tick. Fix: walk the focused element's parent chain via `VisualTreeHelper::GetParent` and skip the periodic rebuild while any descendant of `PropertiesPanel` holds keyboard focus. User-driven rebuild paths (selection change, explicit property mutation) still rebuild immediately.
43+
- **Vectorscope and Waveform Monitor analysis viewers removed.** No clear use case after the migration to compute scatter; 318 lines net.
44+
- **Gamut Coverage GPU lockup at 4K.** Pixel-shader path was 65K-iter inner loop × full source resolution = ~543B ops on a 4K HDR source. Migrated to single-group D3D11 compute scatter (numthreads(32,32,1), dispatch (1,1,1)) — ~46× speedup.
45+
- **D2D HDR Tone Map post-PDC eval pass leaked compute dispatches into the next frame.** Caused two-frame flicker on tone-mapper output. Fixed via `SetDeferredComputeFrozen(bool)` flag gated around pass 3.
46+
- **Skip-readback compute consumers stopped re-dispatching when bindings hadn't changed.** Broadened force-redispatch condition: when the skip-readback flag is on, ALL compute nodes redispatch every frame regardless of dirty state (bindings can be CPU-throttled while keeping image-output texture fresh).
47+
- **`MapInputRectsToOutputRect` clamps to 4096×4096** so single-input pixel shaders with a hidden 1×1 dummy bitmap don't degenerate to a zero-size output rect.
3548

3649
## [1.5.0] - 2026-05-06
3750

Package.appxmanifest

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
<Identity
1010
Name="ShaderLab"
1111
Publisher="CN=ShaderLab"
12-
Version="1.5.0.0" />
12+
Version="1.6.0.0" />
1313

1414
<mp:PhoneIdentity PhoneProductId="a1b2c3d4-e5f6-7890-abcd-ef1234567890" PhonePublisherId="00000000-0000-0000-0000-000000000000"/>
1515

Version.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,11 @@
1313
namespace ShaderLab
1414
{
1515
constexpr uint32_t VersionMajor = 1;
16-
constexpr uint32_t VersionMinor = 5;
16+
constexpr uint32_t VersionMinor = 6;
1717
constexpr uint32_t VersionPatch = 0;
1818

1919
// Human-readable version string.
20-
constexpr const wchar_t* VersionString = L"1.5.0";
20+
constexpr const wchar_t* VersionString = L"1.6.0";
2121

2222
// Graph format version. Increment when serialization format changes.
2323
// Graphs saved with a higher format version cannot be loaded by older apps.

0 commit comments

Comments
 (0)