Skip to content

[rlsw] RenderTexture support#5655

Merged
raysan5 merged 19 commits into
raysan5:masterfrom
Bigfoot71:rlsw-surface
Mar 17, 2026
Merged

[rlsw] RenderTexture support#5655
raysan5 merged 19 commits into
raysan5:masterfrom
Bigfoot71:rlsw-surface

Conversation

@Bigfoot71
Copy link
Copy Markdown
Contributor

Rlsw refactor to support RenderTexture

This draft starts the rework of rlsw's internal framebuffer system in order to support RenderTexture. The goal is to allow offscreen rendering via framebuffers with a single color attachment and an optional depth buffer.

What has been done so far:

  • Expanded texture format support to cover packed color formats (R3G3B2, R5G6B5, R4G4B4A4, R5G5B5A1) as well as depth formats.
  • Textures are no longer pre-converted to RGBA8 and therefore require a switch per read.
  • The internal framebuffer now relies on the existing texture system rather than its own pixel storage
  • Framebuffers now support all texture formats already available in the library.
  • The 24-bit depth format has been removed as it is no longer needed.
  • Framebuffer format is still defined at compile time.
  • The allocated texture size is now preserved across resizes, avoiding frequent reallocations when resizing the framebuffer and enabling possible support of glTexSubImage2D.

The FBO API is the only missing piece and doesn't require much more, but I can see quite a few improvement opportunities along the way, I'll document them here as I go.

I was considering restricting framebuffers to formats matching those defined at compile time, but adding full runtime format support should be fairly straightforward at the cost of a switch per pixel read/write, it can be an option at compile time as well. I'll let @raysan5 decide on this.

I'll also run a performance comparison on desktop between the old and new code to get an idea of the overall impact. So far it seems fairly minimal on my end in debug builds, but things are still a bit rough at this stage.

Added support for `R3G3B2`, `R5G6B5`, `R4G4B4A4` and `R5G5B5A1`
Added depth formats
- Framebuffers can now use all texture types that are already available.
- The 24-bit depth format has been removed as it is no longer needed.
- Framebuffer formats are still defined at compile time.
- The allocated texture size is now preserved, which avoids frequent reallocations when resizing framebuffers and will allow the use of `glTexSubImage2D`.
This greatly simplifies the framebuffer blit/copy logic while now supporting all pixel formats. It is slightly slower in debug builds, but this path is mainly kept for compatibility anyway. The `copy_fast` version is still used for the "normal" cases when presenting to the screen.
less ops for certain formats + fixes
I made the  pointers parameters `restrict` for reading/writing textures, which resulted in a slight improvement.
And I reviewed the `static inline` statements, which could potentially bias the compiler; no difference, but it's cleaner.
@Bigfoot71
Copy link
Copy Markdown
Contributor Author

Bigfoot71 commented Mar 15, 2026

I managed to get a fairly clean base. I'm going to start adding the FBO API, and the changes I made even gave me a 10 FPS gain on the models_first_person_maze example, that's a really nice bonus!

I also ran some tests and got a bit more performance in a few other areas, especially color blending. But since that's outside the scope of this PR I'll move it to another one.

Also, I had written a comment about the conversion between RGBA8 and RGBA32. That has been resolved in the meantime, so I reintroduced the SIMD functions. The performance gain seems smaller than before, but it's still significant.

Just as a note, I also fixed a few UB issues in the SIMD functions and I removed the clamp in the 'set_color' RISC-V version and added a comment mentioning that this version currently does more than necessary, but I'd rather not touch it any further.

will allow management of both textures and framebuffers
added support for `glTexSubImage2D`
added handling of 'GL_OUT_OF_MEMORY' errors
removed the default internal texture (unused)
@Bigfoot71
Copy link
Copy Markdown
Contributor Author

I took the opportunity to add support for glTexSubImage2D, I was able to test the textures_image_processing example, which now works correctly.

@raysan5
Copy link
Copy Markdown
Owner

raysan5 commented Mar 15, 2026

@Bigfoot71 It looks amazing!

I was considering restricting framebuffers to formats matching those defined at compile time, but adding full runtime format support should be fairly straightforward at the cost of a switch per pixel read/write, it can be an option at compile time as well. I'll let @raysan5 decide on this.

I think a compile option can be added as a fast-path, switch per pixel read/write can suppose a hit on performance.

I'll also run a performance comparison on desktop between the old and new code to get an idea of the overall impact. So far it seems fairly minimal on my end in debug builds, but things are still a bit rough at this stage.

Definitely, it would be nice to get some numbers in that regards.

@Bigfoot71
Copy link
Copy Markdown
Contributor Author

Bigfoot71 commented Mar 15, 2026

I think a compile option can be added as a fast-path, switch per pixel read/write can suppose a hit on performance.

Yes, that's what I've been doing so far for the framebuffer.

For textures there's still one switch per read. Avoiding it would require further specialization of the rasterization functions, but keeping that in a single header quickly becomes unmanageable.

Just an idea: we could implement the rasterization functions in separate files and include them multiple times with different #define configurations. That would allow deeper specialization and platform-specific optimizations. The downside is that it would no longer be a single header...

@raysan5
Copy link
Copy Markdown
Owner

raysan5 commented Mar 15, 2026

@Bigfoot71 I'd prefer to keep it as a single-file, it's more self-contained and portable. I saw the multiple functions split, still, it looks simple and readable.

@Bigfoot71
Copy link
Copy Markdown
Contributor Author

Bigfoot71 commented Mar 17, 2026

Framebuffers are officially supported!

Enregistrement.d.ecran_20260317_010651.webm

Well, there's a lot to say, sorry in advance for the length.

New supported functions

The following rlgl functions are now supported with rlsw:

  • rlLoadFramebuffer()
  • rlFramebufferAttach()
  • rlFramebufferComplete()
  • rlUnloadFramebuffer()
  • rlEnableFramebuffer()
  • rlGetActiveFramebuffer()
  • rlDisableFramebuffer()
  • rlBindFramebuffer()
  • rlLoadTextureDepth()

rlBlitFramebuffer is not yet supported, let me know if that's really needed.

How it works

The API only supports the equivalent of GL_FRAMEBUFFER target, with a single color attachment
at GL_COLOR_ATTACHMENT0 (required) and an optional depth buffer, if no depth buffer is
attached, depth testing is implicitly disabled.

If the bound framebuffer is incomplete, rendering is simply skipped.

Renderbuffer binding has been added; note that returned IDs are actually texture IDs
internally, which keeps the rlgl side simple.

Only the framebuffer format defined at compile time on the rlsw side is accepted as a
valid attachment, anything else will be treated as incomplete.

Default formats have been updated to R8G8B8A8 and D32 to match what rlgl expects
by default. Note that GL_DEPTH_COMPONENT24 is considered valid and maps to D32 internally.

Performance

I didn't do precise measurements for one simple reason: despite all these changes,
it's clearly faster. Here are some quick comparisons on my machine
(AMD Ryzen 5 3600, O2 on GCC 15.2.1, SDL backend, no manual SIMD paths enabled):

Scenario Before After
textures_bunnymark idle ~400 FPS ~540 FPS
textures_bunnymark 30 FPS threshold 900 bunnies 1500 bunnies
models_first_person_maze heavy overdraw zone ~29 FPS ~41 FPS

shapes_bullet_hell and models_waving_cubes also occasionally dipped below 60 FPS
before, both now run at a stable 60. With SSE2 SIMD paths enabled the gains are even
slightly higher.

Given how large the changes are, pinning down a specific measurement doesn't feel
particularly meaningful here.

Notes

A few things I noticed that could be cleaned up in separate PRs:

  • rlResizeFramebuffer is never called by any platform, it should be called on window
    resize. Worth a dedicated PR as this one is already large enough.
  • GRAPHICS_API_OPENGL_11_SOFTWARE might want a rename, since the software backend now
    takes code paths in rlgl that have nothing to do with GL 1.1.
  • Color blending is currently one of the biggest bottlenecks (~12% of frame time according
    to my profiling, which is significant, especially since raylib enables alpha blending
    constantly by default). A big part of the issue is the double function pointer
    indirection. Happy to propose a dedicated PR for that.

A note on the rasterizer specialization approach

I'd prefer to keep it as a single-file, it's more self-contained and portable. I saw the multiple functions split, still, it looks simple and readable.

I would like to update the point on the idea I had proposed here.

While working on this I hit a crash inside the rasterizer functions that was really painful to debug, no debugger could point to the right line, and making temporary test changes was really tedious.

Thinking about this idea more, instead of separate files re-included with specific #define (a pattern found in several software rasterizers including mesa), I realized the header could simply include itself multiple times, which produces exactly the same result.

I went ahead and did that here, sorry if this was outside the scope of the PR, but clearly it was worth it and it helped me a lot.

The advantages are significant:

  • Debugger now points to the exact line when something breaks
  • Modifications and testing are MUCH simpler
  • Specialization of functions also much simpler
  • Finer control over what each specialization does or doesn't include
  • More confidence in what the compiler actually can generates

The only downside is that declaring each specialization is slightly more verbose, but it stays very manageable. To help with that, a bit of macro work generates the dispatch tables automatically from the pipeline state, which also simplified the render call sites considerably.

@Bigfoot71 Bigfoot71 marked this pull request as ready for review March 17, 2026 02:06
@Bigfoot71
Copy link
Copy Markdown
Contributor Author

Ah, and I also went through a large number of examples, everything seems to work well. Only one blend mode example looks a bit off to me, but I haven't taken the time to compare it against the GPU version yet, given the note I mentioned about blending.

@raysan5
Copy link
Copy Markdown
Owner

raysan5 commented Mar 17, 2026

@Bigfoot71 Amazing work! Definitely a big list of changes! My answer to exposed concerns:

rlBlitFramebuffer is not yet supported, let me know if that's really needed.

That function was added for a one very specific use case (afair, for the deferred renderer and G-buffers), not really needed at the moment but considering it is a data memcpy between two buffers, I think it can be useful. Actually rlCopyFramebuffer is already available.

rlResizeFramebuffer is never called by any platform, it should be called on window resize. Worth a dedicated PR as this one is already large enough.

Agree, separate PR for the future. Note that most platforms using the software renderer (like embedded devices), I think would rarely need framebuffer resizing after initialization... unless they render to a smaller FB for optimization... 🤔

GRAPHICS_API_OPENGL_11_SOFTWARE might want a rename, since the software backend now takes code paths in rlgl that have nothing to do with GL 1.1.

Absolutely, what about just GRAPHICS_API_OPENGL_SOFTWARE?

Color blending is currently one of the biggest bottlenecks (~12% of frame time according to my profiling, which is significant, especially since raylib enables alpha blending constantly by default). A big part of the issue is the double function pointer indirection. Happy to propose a dedicated PR for that.

Any ideas are welcome, maybe there could be a fast path with a flag to disable alpha blending. Cheking the current use-case, it seems performance is more important than blending in some scenarios:

image

I realized the header could simply include itself multiple times, which produces exactly the same result.

Wow! What an approach! Still, sounds fine to me if it helps to keep it self-contained.

The only downside is that declaring each specialization is slightly more verbose, but it stays very manageable.

Verbosity has never been a problem for raylib, I think it even helps in many scenarios.

also went through a large number of examples, everything seems to work well.

I tried the changes with multiple examples and VS2022, using the new (experimental) rcore_desktop_win32 backend, I got some issues, seem related to the double-macro preprocesing order:

image

Also tried multiple examples with current (old) rlsw implementation and I got some crashes on depth buffer accesses, specifically on lines renderings... but I'll try again with this big update.

In any case, this improvement is fantastic, thank you very much for all the hard work put on this new module, definitely the key new big feature for the new raylib 6.0 release! Thanks!

@Bigfoot71
Copy link
Copy Markdown
Contributor Author

Bigfoot71 commented Mar 17, 2026

I tried the changes with multiple examples and VS2022, using the new (experimental) rcore_desktop_win32 backend, I got some issues, seem related to the double-macro preprocesing order:

That's been fixed: 0754c12
MSVC doesn't seem to support VLA, even though that's C99.
We can just allocate 16 bytes for colors and 4 for depth, there's never going to be anything larger anyway.

Also tried multiple examples with current (old) rlsw implementation and I got some crashes on depth buffer accesses, specifically on lines renderings... but I'll try again with this big update.

Yeah, that's also been fixed, I noticed that too while doing comparisons. There was an issue with line projection, they could jitter between two pixels and sometimes land exactly on the buffer boundary.

They're now properly centered and stable. I also added a post-projection clamp that costs nothing just to be safe.

Any ideas are welcome, maybe there could be a fast path with a flag to disable alpha blending. Cheking the current use-case, it seems performance is more important than blending in some scenarios:

Yes, we could disable it when the texture format has no alpha channel in its format, and for formats that do, analyze it at load time, but we'd also need to check the vertex colors. I'll think about it.

Absolutely, what about just GRAPHICS_API_OPENGL_SOFTWARE?

Yeah, just dropping the version hint sounds good to me.

Edit: Though in hindsight, keeping 11 implies no extra features are supported, but dropping it doesn't really say anything either. Not critical, just an observation.

Details Otherwise, I had initially thought about handling framebuffers by exposing `GL_NUM_EXTENSIONS` and related. That could have been standard even for GL 1.1 context, though we'd have assumed it present anyway for simplicity, and it still wouldn't have followed the same paths as `OPENGL_11` anyway.

@Bigfoot71
Copy link
Copy Markdown
Contributor Author

Here are some quick comparisons on my machine
(AMD Ryzen 5 3600, SDL backend, no manual SIMD paths enabled):

I forgot to mention, it was tested in O2 on GCC 15.2.1
O3 also gives better results by the way.

@raysan5
Copy link
Copy Markdown
Owner

raysan5 commented Mar 17, 2026

Otherwise, I had initially thought about handling framebuffers by exposing GL_NUM_EXTENSIONS and related. That could have been standard even for GL 1.1 context, though we'd have assumed it present anyway for simplicity, and it still wouldn't have followed the same paths as OPENGL_11 anyway.

Afaik, extensions mechanism was introduced in OpenGL 2.0, by design OpenGL 1.1 did not allow extensions, I think. In any case, I prefer to avoid that extensions route (never liked it on OpenGL), and rlsw can keep growing on its own with a versioning system and documented features.

I'm doing some more tests but I think this PR is ready for merge, it's already a quite big one, further improvements can be added later. It will bee included into raylib 6.0, actually it's one of the key additions of this new version!

@raysan5 raysan5 merged commit e7d999e into raysan5:master Mar 17, 2026
16 checks passed
@raysan5
Copy link
Copy Markdown
Owner

raysan5 commented Mar 17, 2026

@Bigfoot71 I tested some examples and it works great! Merging for further review!

@Bigfoot71
Copy link
Copy Markdown
Contributor Author

In any case, I prefer to avoid that extensions route (never liked it on OpenGL), and rlsw can keep growing on its own with a versioning system and documented features.

100% agree with avoiding that route! I'll move on to the problem of blend mode!

Details Small nit: extensions already existed in 1.1 (via glGetString(GL_EXTENSIONS) and glext.h often shipped), therefore consistent, but yeah GL_NUM_EXTENSIONS only came in 3.0

@raysan5
Copy link
Copy Markdown
Owner

raysan5 commented Mar 17, 2026

@Bigfoot71 Just finished reviewing rlsw, applied some format tweaks and updated version number to rlsw 1.5; rlsw 1.0 was already in use by the ESP32 port so I updated to avoid confusion, also, 1.1 could be confused with OpenGL 1.1. Just choose 1.5, we can change to other numbering if you prefer.

@Bigfoot71
Copy link
Copy Markdown
Contributor Author

we can change to other numbering if you prefer.

That's correct, no objections!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants