Implement opt-in compute-based skin caching. by pcwalton · Pull Request #23255 · bevyengine/bevy

pcwalton · 2026-03-07T18:49:53Z

At the moment, Bevy performs skinning and morph target deformation in the vertex shader. This is not what most production engines do, for multiple reasons:

Skinning each mesh in the vertex shader results in redundant work being done during multipass rendering, for example for prepasses, multiview, and shadow map rasterization.
Building raytracing acceleration structures for skinned meshes requires the skinning to be done ahead of time, because the only transform that's allowed to be done when building an acceleration structure is a single fixed-function matrix multiply.

However, there is also an upside to skinning in the vertex shader as Bevy does: it results in decreased memory usage, especially when drawing multiple instances of a skinned mesh. In my tests, I've found that the performance of skinning in the vertex shader is extremely close to that of skinning in a compute shader, even when a prepass and four shadow cascades are being rendered. Because of this, the primary benefit of skin caching in practice, at least on high-end hardware, seems to be in regards to point (2): Solari can't render skinned meshes with Bevy as it currently stands.

This PR fixes the issue by implementing skin caching: ahead-of-time evaluation of skins and morph targets in a compute shader. To opt in to skin caching for a mesh with skins and/or morph targets, add the CacheSkin component to it. This causes Bevy to perform one compute shader dispatch per vertex slab in order to skin all the skinned meshes in that vertex slab at once.

Perhaps surprisingly, the skinning shader doesn't write the skinned vertices into standard vertex slabs and instead stores them in a different buffer. This is by design, because not all vertex attributes can be skinned: for instance, UVs are unaffected by skinning. It would be wasteful for memory consumption if UVs were duplicated into every instance of a skinned mesh. Therefore, Bevy stores only the data that's affected by skins and/or morph targets in a separate buffer. The vertex shader consults the original vertex slab for the data unaffected by skinning and the skin cache for the data affected by skinning. This is compatible with raytracing acceleration structure building.

The many_foxes example, alongside several others, has been updated with a new --cache-skin switch. Note that this is a performance regression on the many_foxes example in particular. That's because that demo consists entirely of many instances of the same mesh, and so memory usage is greatly increased with skin caching. Skin caching is a double-edged sword, and while it's needed for some use cases, like raytracing, it's undesirable for others.

At the moment, Bevy performs skinning and morph target deformation in the vertex shader. This is not what most production engines do, for multiple reasons: 1. Skinning each mesh in the vertex shader results in redundant work being done during multipass rendering, for example for prepasses, multiview, and shadow map rasterization. 2. Building raytracing acceleration structures for skinned meshes requires the skinning to be done ahead of time, because the only transform that's allowed to be done when building an acceleration structure is a single fixed-function matrix multiply. However, there is also an upside to skinning in the vertex shader as Bevy does: it results in decreased memory usage, especially when drawing multiple instances of a skinned mesh. In my tests, I've found that the performance of skinning in the vertex shader is extremely close to that of skinning in a compute shader, even when a prepass and four shadow cascades are being rendered. Because of this, the primary benefit of skin caching in practice, at least on high-end hardware, seems to be in regards to point (2): Solari can't render skinned meshes with Bevy as it currently stands. This PR fixes the issue by implementing skin caching: ahead-of-time evaluation of skins and morph targets in a compute shader. To opt in to skin caching for a mesh with skins and/or morph targets, add the `CacheSkin` component to it. This causes Bevy to perform one compute shader dispatch per vertex slab in order to skin all the skinned meshes in that vertex slab at once. Perhaps surprisingly, the skinning shader doesn't write the skinned vertices into standard vertex slabs and instead stores them in a different buffer. This is by design, because not all vertex attributes can be skinned: for instance, UVs are unaffected by skinning. It would be wasteful for memory consumption if UVs were duplicated into every instance of a skinned mesh. Therefore, Bevy stores only the data that's affected by skins and/or morph targets in a separate buffer. The vertex shader consults the original vertex slab for the data unaffected by skinning and the skin cache for the data affected by skinning. This is compatible with raytracing acceleration structure building. The `many_foxes` example, alongside several others, has been updated with a new `--cache-skin` switch. Note that *this is a performance regression* on the `many_foxes` example in particular. That's because that demo consists entirely of many instances of the same mesh, and so memory usage is greatly increased with skin caching. Skin caching is a double-edged sword, and while it's needed for some use cases, like raytracing, it's undesirable for others.

greeble-dev

I've done a shallow review - the overall design and a lot of the details are above my pay grade.

There were a bunch of issues with WebGL even without skin caching enabled. I was able to get it mostly working by hacking out a few things - see various comments. There clearly needs to be some extra logic to handle this, but I wasn't sure where that logic should go. Should individual systems handle it or should SkinCachePlugin never register those systems in the first place? Does there need to be a global flag similar to skins_use_uniform_buffers?

Separately, previous skinned vertices seem broken except for the first instance. Tested on Win10/Nvidia, DX12 and Vulkan: cargo run --example many_foxes -- --count 6 --cache-skins --motion-blur.

Maybe worth noting that #21926 will collide with this PR - skin_cache.wgsl will need to handle compressed vertices. Or maybe the practical short-term step is to only allow skin caching if the mesh has an uncompressed layout, similar to Solari's is_mesh_raytracing_compatible?

greeble-dev · 2026-03-20T10:57:43Z

+/// plays an animation on a skinned glTF model of a fox
+#[derive(FromArgs, Resource)]
+struct Args {
+    /// enable skin caching
+    #[argh(switch)]
+    cache_skins: bool,
+}
+


I don't think skin caching options should be added to the animated_mesh and morph_targets examples. Their role is to be minimal examples of animation APIs.

greeble-dev · 2026-03-20T13:13:10Z

+                    &model,
+                    skin,
+                    prev_skin,
+                    Some(skin_cache_buffers),


The Some(skin_cache_buffers) should depend on whether skin caching is supported? I could only get WebGL working by hacking this to be None.

greeble-dev · 2026-03-20T13:44:02Z

+// If there are no morph targets, this will be a dummy buffer.
+@group(0) @binding(7) var<storage> morph_descriptors: array<MorphDescriptor>;
+
+@compute @workgroup_size(64, 1, 1)


Suggested change

@compute @workgroup_size(64, 1, 1)

// The workgroup size should match `SKIN_CACHE_WORKGROUP_SIZE`.

@compute @workgroup_size(64, 1, 1)

jasmine-nominal · 2026-03-24T17:18:44Z

+        // caching shader.
        match *self {
-            ElementClass::Vertex => BufferUsages::VERTEX,
+            ElementClass::Vertex => BufferUsages::VERTEX | BufferUsages::STORAGE,


This can hurt some GPUs iirc. Hence why I added extra_buffer_usages to MeshAllocator for Solari. Maybe we can set this automatically when SkinCache is used?

That would mean we would either have to:

Copy all the mesh data for every single mesh in the system whenever SkinCache is added to any mesh, and also when the last one is removed. Or:

Put skin-cached meshes in different slabs, increasing drawcall count, increasing fragmentation, and regressing memory usage. And then have to duplicate the mesh data if the same mesh is used in a skin-cached instance and simultaneously in a non-skin-cached instance, which would break the assumption used throughout rendering that meshes and their mesh buffers are in 1:1. Or:

Put skinned meshes in different slabs, increasing drawcall count, increasing fragmentation, and regressing memory usage, even if the skin cache isn't used.

I didn't think any of these 3 options were worth the cost unless we actually observe a major regression (and if we ever find such a GPU maybe we should just disable skin caching entirely on that GPU instead of trying to do fine-grained gymnastics on the buffers).

I think it should not be added unconditionally, as it may be slower on mobile GPUs.

If users need compute skinning, this usage can be added to extra_buffer_usages manually. We just need to emit an error or panic if it will fail to use cached skin.

The only realistic way I can see to do this would be to make the skin caching plugin not part of DefaultPlugins, so you have to add it manually. Does that work for people?

jasmine-nominal · 2026-03-24T17:20:17Z

 }

 #[derive(ShaderType, Clone)]
 pub struct MeshUniform {


Shame to make this even more expensive :/.

We might want to consider splitting this up in the future.

I was wondering yesterday if we could pack the new offsets into current_skin_index and morph_descriptor_index, capping everything at 65k. It's probably a bit too limiting though. Maybe we could push this new data into something indexed by current_skin_index?

MeshUniform (as opposed to MeshInputUniform) is solely read and written on GPU in the GPU driven path. Mesh preprocessing is never a bottleneck in the profiles I've seen. The GPU's enormous memory bandwidth eats it up.

atlv24

I'm gonna give this a round of testing and then approve I think.

atlv24 · 2026-03-24T17:27:05Z

 }

 #[derive(ShaderType, Clone)]
 pub struct MeshUniform {


I was wondering yesterday if we could pack the new offsets into current_skin_index and morph_descriptor_index, capping everything at 65k. It's probably a bit too limiting though. Maybe we could push this new data into something indexed by current_skin_index?

atlv24 · 2026-03-24T17:32:04Z

    }
 }
+
+/// Adds `CacheSkin` components to skinned meshes if skin caching was requested


Suggested change

/// Adds `CacheSkin` components to skinned meshes if skin caching was requested

/// Adds [`CacheSkin`] components to skinned meshes if skin caching was requested

Do you have an opinion as to whether we should have the --cache-skins option at all? @greeble-dev objected to it. I don't have an opinion, I just want to make sure my reviewers are in agreement before I decide what to do.

Just to clarify, I think adding it to the animated_mesh and morph_targets examples is bad. Adding it to many_foxes and other stress tests is good.

Addressed comment

atlv24 · 2026-03-24T20:46:39Z

+    /// The vertex slab ID.
+    pub vertex_slab_id: SlabId,


can you avoid writing tautological comments like this please? they're just noise

I made the comment longer so that it adds information that's maybe not obvious from the name.

atlv24 · 2026-03-24T20:48:14Z

+    /// The layout for the bind group.
+    bind_group_layout: &'a BindGroupLayout,


I made the comment longer so it doesn't just restate things that are in the names.

atlv24

I tested this and it looks like a regression in many_foxes, both with and without --release. Frame time is 3.4-4.1ms without skin caching, and 4.3-5.3ms with, on my rtx 5090. But at least without --skin-caching, performance matches main, so you can just not turn it on if its slower. Edit: this is noted in the PR description

pcwalton · 2026-03-25T22:28:21Z

Yep, skin caching is basically the worst case for many_foxes. Honestly, I feel like the biggest win for skin caching in general is going to be the compatibility with Solari more than anything performance related. I think it should probably be off by default.

greeble-dev · 2026-03-26T08:14:22Z

In case there's any doubt about the utility of skin caching, I'm just gonna note that it sets the groundwork for other features and optimisations in addition to ray-tracing - so anything that wants to run compute on a whole vertex buffer rather than per-vertex. High-end face animation wants it for tangent recalcs and sparse morphs. Physics wants it for GPU cloth sims and deformations. High-end rigging wants it for muscle sims. Lots of fun stuff.

pcwalton · 2026-04-13T00:44:26Z

Fixed motion blur. The problem was that we didn't copy over prev_cached_skin_offset in mesh_preprocess.wgsl.

pcwalton · 2026-04-19T06:40:11Z

@atlv24 Regarding your comment about MeshUniform, as I mentioned before, I think its size doesn't matter much as it's entirely read and written on GPU which has enormous memory bandwidth. I don't really want to make a whole separate buffer and add extra indexing to it just to save a word or two on GPU, especially since I think we're already coming close to WebGPU storage limits.

pcwalton · 2026-04-19T06:54:54Z

OK, I've addressed or responded to every review comment. WebGL 2 is currently broken on main, so I can't test it, but I verified that it isn't broken any worse than it already is. This patch is now properly gated off when we're using WebGL 2, so I don't anticipate any problems.

greeble-dev

Added a few minor comments.

greeble-dev · 2026-04-19T08:12:42Z

+/// References to the buffers that contain cached skinned vertices for the mesh
+/// instances corresponding to a vertex/morph target slab pair.
+#[derive(Clone, Copy)]
+pub struct SkinCacheBuffers<'a> {


I found it confusing to have SkinCacheBuffers when there's also CachedSkinBuffers. Not sure what a good rename would be - maybe SkinCacheBuffers -> CachedSkinBuffersForBindGroup?

I'll rename CachedSkinBuffers to GlobalSkinCacheBuffers.

beicause · 2026-04-20T01:09:51Z

I left a comment in #23255 (comment)

I think it should not be added unconditionally, as it may be slower on mobile GPUs.

If users need compute skinning, this usage can be added to extra_buffer_usages manually. We just need to emit an error or panic if it will fail to use cached skin.

greeble-dev

Did a fuller review - got some nitpicky/speculative/non-blocking comments, and one case where I think a binary search isn't quite right.

greeble-dev · 2026-04-20T10:03:05Z

    >,
    skinned_mesh_inverse_bindposes: &Assets<SkinnedMeshInverseBindposes>,
    joints: &Query<&GlobalTransform>,
+    removed_cache_skin_query: &mut RemovedComponents<CacheSkin>,


I found that skins would break if I removed the CacheSkin component after the mesh had been rendered at least once. Smelt like stale specialization as I could see that no specializations occurred after the remove.

I tried changing the add_or_delete_skins within extract_skins to get the DirtySpecializations resource and do dirty_specialization.changed_rendered.insert(skinned_mesh_entity), but this alone did not fix it. The system order is extract_skins -> clear_dirty_specializations -> queue_material_meshes, so the dirty entry gets cleared before it's processed by queue_material_meshes.

I tried slapping in an extract_skins.after(DirtySpecializationSystems::CheckForRemovals). That did fix the issue, but doesn't feel like the right solution. Seems more like clear_dirty_specializations should be at the very beginning of extraction, but I'm not familiar enough with ECS scheduling to suggest a change.

In case it's useful, here's a commit with the probably wrong fix + debugging hacks: greeble-dev@690408a

I don't think this bug should be blocker. Reasoning:

Removing the CacheSkin component is a niche case.

Skin caching is a new opt-in feature and arguably experimental.

I suspect there's similar specialization bugs with skinned meshes that will be fixed by adding dirties to extract_skins, so the problem might not be specific to skin caching.

greeble-dev · 2026-04-20T12:06:30Z

+    );
+#endif  // VERTEX_NORMALS
+
+    // Skin the mikktspace tangent of the vertex, if applicable.


Suggested change

// Skin the mikktspace tangent of the vertex, if applicable.

// Skin the tangent of the vertex, if applicable.

Nitpicky, but there's no guarantee that the mesh is mikktspace.

greeble-dev · 2026-04-20T12:37:19Z

+        }
+
+        // Do this check because we don't want to read past the end of the buffer.
+        if (skin_task_mid == arrayLength(&skin_tasks)) {


Suggested change

if (skin_task_mid == arrayLength(&skin_tasks)) {

if (skin_task_mid == (arrayLength(&skin_tasks) - 1)) {

I think this is needed? If skin_task_mid == arrayLength then the next statement's skin_tasks[skin_task_mid + 1u] will read out of bounds?

greeble-dev · 2026-04-20T18:11:57Z

+    sorted_cached_skin_entities.sort_unstable_by_key(|main_entity| match render_mesh_instances_gpu
+        .get(main_entity)
+    {
+        Some(render_mesh_instance) => render_mesh_instance.gpu_specific.current_uniform_index(),
+        None => u32::MAX,
+    });


I was a bit worried about this per-entity sort, particularly with a hash lookup in the sort function. Although in many_foxes with 1000 meshes, the cost is dwarfed by the joint animation and transform propagation costs:

Animation+propagation = ~23ms (total core time, not end to end)

prepare_skin_cache_buffer = 0.13ms, of which sorting is 0.054ms.

I do wonder if it would be better to make skin caching a property of the Mesh rather than per-instance. From the content side, I'm struggling to think of a case where I'd want different instances of the same mesh to make different skin cache choices - so it's arguably simpler and more robust to choose per-asset and not have to worry about getting the component right per-instance. And from the render side, making the caching choice per-asset seems like it would simplify a few things. So overall it could be both a performance and UX win.

That said, I don't feel strongly either way and don't think it should block this PR, but maybe worth exploring later.

(EDIT: Should clarify - I don't think making it per-Mesh would have a direct impact on that particular sort since the goal is to find the mesh instance, not the mesh. But making things coarser grained would probably be a win somewhere.)

(EDIT: Separately, would suggest adding a comment to the sort explaining why it's there - if I hadn't seen the compute shader first then I would have been confused.)

greeble-dev · 2026-04-21T00:59:02Z

Oops, forgot one thing - hit a limit which I think means the compute dispatch has to be broken up into chunks? Tested on Win10/Nvidia, Vulkan and DX12.

cargo run --example many_foxes --profile bench --features "debug" -- --count 3000 --cache-skins

ERROR RenderContextState::apply{system=bevy_pbr::render::skin::cache::skin_cache}: bevy_render::error_handler: Caught rendering error: Validation Error

Caused by:
  In a CommandEncoder
    In a dispatch command, indirect:false
      Each current dispatch group size dimension ([81000, 1, 1]) must be less or equal to 65535

pcwalton requested review from JMS55, atlv24 and tychedelia March 7, 2026 18:50

pcwalton added the A-Rendering Drawing game state to the screen label Mar 7, 2026

github-project-automation bot added this to Rendering Mar 7, 2026

github-project-automation bot moved this to Needs SME Triage in Rendering Mar 7, 2026

pcwalton added the A-Animation Make things move and change over time label Mar 7, 2026

github-project-automation bot added this to Animation Mar 7, 2026

github-project-automation bot moved this to Needs SME Triage in Animation Mar 7, 2026

pcwalton added S-Needs-Review Needs reviewer attention (from anyone!) to move forward C-Feature A new feature, making something new possible C-Performance A change motivated by improving speed, memory usage or compile times labels Mar 7, 2026

pcwalton force-pushed the skin-caching branch from e9066d7 to 44aa9b3 Compare March 7, 2026 18:52

pcwalton added 4 commits March 7, 2026 13:12

Ambiguity police

7fc1cf6

Merge remote-tracking branch 'origin/main' into skin-caching

a599ee8

Merge remote-tracking branch 'origin/main' into skin-caching

b23524f

Ambiguity police

a8bea23

greeble-dev reviewed Mar 20, 2026

View reviewed changes

jasmine-nominal suggested changes Mar 24, 2026

View reviewed changes

atlv24 reviewed Mar 24, 2026

View reviewed changes

Comment thread crates/bevy_pbr/src/render/mesh.rs Outdated

atlv24 reviewed Mar 24, 2026

View reviewed changes

Comment thread crates/bevy_pbr/src/render/mesh.rs Outdated

atlv24 reviewed Mar 24, 2026

View reviewed changes

Comment thread crates/bevy_pbr/src/render/skin/cache.rs Outdated

atlv24 reviewed Mar 24, 2026

View reviewed changes

atlv24 reviewed Mar 25, 2026

View reviewed changes

atlv24 approved these changes Mar 25, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into skin-caching

ac7e695

pcwalton added S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Apr 9, 2026

pcwalton added 3 commits April 12, 2026 16:52

Merge remote-tracking branch 'origin/main' into skin-caching

2f27a54

Merge remote-tracking branch 'origin/main' into skin-caching

8744345

Copy over prev_cached_skin_offset

7479f40

pcwalton added 5 commits April 12, 2026 17:46

Address some review comments

4cd9101

Reword comment

d3f9fa8

Partially fix WebGL 2

4e60beb

Merge remote-tracking branch 'origin/main' into skin-caching

8b37bcf

Merge remote-tracking branch 'origin/main' into skin-caching

0cbd7e4

pcwalton added 2 commits April 18, 2026 23:42

Address review comment

36d60b4

Address review comments

ffde1c1

pcwalton added S-Needs-Review Needs reviewer attention (from anyone!) to move forward and removed S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged labels Apr 19, 2026

greeble-dev reviewed Apr 19, 2026

View reviewed changes

pcwalton added 3 commits April 19, 2026 14:44

Address review comments

9a6e180

Merge remote-tracking branch 'origin/main' into skin-caching

b2c13d7

Fix LAST_FLAG definition

ca88f07

pcwalton requested a review from greeble-dev April 20, 2026 00:49

Rustfmt police

d2c0f7a

greeble-dev reviewed Apr 20, 2026

View reviewed changes

	@compute @workgroup_size(64, 1, 1)
	// The workgroup size should match `SKIN_CACHE_WORKGROUP_SIZE`.
	@compute @workgroup_size(64, 1, 1)

	/// Adds `CacheSkin` components to skinned meshes if skin caching was requested
	/// Adds [`CacheSkin`] components to skinned meshes if skin caching was requested

		/// The layout for the bind group.
		bind_group_layout: &'a BindGroupLayout,

	// Skin the mikktspace tangent of the vertex, if applicable.
	// Skin the tangent of the vertex, if applicable.

	if (skin_task_mid == arrayLength(&skin_tasks)) {
	if (skin_task_mid == (arrayLength(&skin_tasks) - 1)) {

Uh oh!

Conversation

pcwalton commented Mar 7, 2026

Uh oh!

greeble-dev left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greeble-dev Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pcwalton Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

beicause Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

atlv24 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

atlv24 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pcwalton commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greeble-dev commented Mar 26, 2026

Uh oh!

pcwalton commented Apr 13, 2026

Uh oh!

pcwalton commented Apr 19, 2026

Uh oh!

pcwalton commented Apr 19, 2026

greeble-dev left a comment •

edited

Loading

greeble-dev Mar 20, 2026 •

edited

Loading

pcwalton Apr 19, 2026 •

edited

Loading

beicause Apr 19, 2026 •

edited

Loading

atlv24 left a comment •

edited

Loading

pcwalton commented Mar 25, 2026 •

edited

Loading

greeble-dev Apr 20, 2026 •

edited

Loading