[rlsw] RenderTexture support#5655
Conversation
Added support for `R3G3B2`, `R5G6B5`, `R4G4B4A4` and `R5G5B5A1` Added depth formats
- Framebuffers can now use all texture types that are already available. - The 24-bit depth format has been removed as it is no longer needed. - Framebuffer formats are still defined at compile time. - The allocated texture size is now preserved, which avoids frequent reallocations when resizing framebuffers and will allow the use of `glTexSubImage2D`.
This greatly simplifies the framebuffer blit/copy logic while now supporting all pixel formats. It is slightly slower in debug builds, but this path is mainly kept for compatibility anyway. The `copy_fast` version is still used for the "normal" cases when presenting to the screen.
less ops for certain formats + fixes
I made the pointers parameters `restrict` for reading/writing textures, which resulted in a slight improvement. And I reviewed the `static inline` statements, which could potentially bias the compiler; no difference, but it's cleaner.
|
I managed to get a fairly clean base. I'm going to start adding the FBO API, and the changes I made even gave me a 10 FPS gain on the I also ran some tests and got a bit more performance in a few other areas, especially color blending. But since that's outside the scope of this PR I'll move it to another one. Also, I had written a comment about the conversion between RGBA8 and RGBA32. That has been resolved in the meantime, so I reintroduced the SIMD functions. The performance gain seems smaller than before, but it's still significant. Just as a note, I also fixed a few UB issues in the SIMD functions and I removed the clamp in the 'set_color' RISC-V version and added a comment mentioning that this version currently does more than necessary, but I'd rather not touch it any further. |
will allow management of both textures and framebuffers added support for `glTexSubImage2D` added handling of 'GL_OUT_OF_MEMORY' errors removed the default internal texture (unused)
|
I took the opportunity to add support for |
|
@Bigfoot71 It looks amazing!
I think a compile option can be added as a fast-path, switch per pixel read/write can suppose a hit on performance.
Definitely, it would be nice to get some numbers in that regards. |
Yes, that's what I've been doing so far for the framebuffer. For textures there's still one switch per read. Avoiding it would require further specialization of the rasterization functions, but keeping that in a single header quickly becomes unmanageable. Just an idea: we could implement the rasterization functions in separate files and include them multiple times with different |
|
@Bigfoot71 I'd prefer to keep it as a single-file, it's more self-contained and portable. I saw the multiple functions split, still, it looks simple and readable. |
and rename rlsw's resize/copy/blit
+ tweaks and fixes
my bad, an oversight in my previous fix. This offset should have been moved here rather than per pixel during truncation.
|
Framebuffers are officially supported! Enregistrement.d.ecran_20260317_010651.webmWell, there's a lot to say, sorry in advance for the length. New supported functionsThe following
How it worksThe API only supports the equivalent of If the bound framebuffer is incomplete, rendering is simply skipped. Renderbuffer binding has been added; note that returned IDs are actually texture IDs Only the framebuffer format defined at compile time on the Default formats have been updated to R8G8B8A8 and D32 to match what PerformanceI didn't do precise measurements for one simple reason: despite all these changes,
Given how large the changes are, pinning down a specific measurement doesn't feel NotesA few things I noticed that could be cleaned up in separate PRs:
A note on the rasterizer specialization approach
I would like to update the point on the idea I had proposed here. While working on this I hit a crash inside the rasterizer functions that was really painful to debug, no debugger could point to the right line, and making temporary test changes was really tedious. Thinking about this idea more, instead of separate files re-included with specific I went ahead and did that here, sorry if this was outside the scope of the PR, but clearly it was worth it and it helped me a lot. The advantages are significant:
The only downside is that declaring each specialization is slightly more verbose, but it stays very manageable. To help with that, a bit of macro work generates the dispatch tables automatically from the pipeline state, which also simplified the render call sites considerably. |
|
Ah, and I also went through a large number of examples, everything seems to work well. Only one blend mode example looks a bit off to me, but I haven't taken the time to compare it against the GPU version yet, given the note I mentioned about blending. |
|
@Bigfoot71 Amazing work! Definitely a big list of changes! My answer to exposed concerns:
That function was added for a one very specific use case (afair, for the deferred renderer and G-buffers), not really needed at the moment but considering it is a data
Agree, separate PR for the future. Note that most platforms using the software renderer (like embedded devices), I think would rarely need framebuffer resizing after initialization... unless they render to a smaller FB for optimization... 🤔
Absolutely, what about just
Any ideas are welcome, maybe there could be a fast path with a flag to disable alpha blending. Cheking the current use-case, it seems performance is more important than blending in some scenarios:
Wow! What an approach! Still, sounds fine to me if it helps to keep it self-contained.
Verbosity has never been a problem for raylib, I think it even helps in many scenarios.
I tried the changes with multiple examples and VS2022, using the new (experimental)
Also tried multiple examples with current (old) rlsw implementation and I got some crashes on depth buffer accesses, specifically on lines renderings... but I'll try again with this big update. In any case, this improvement is fantastic, thank you very much for all the hard work put on this new module, definitely the key new big feature for the new |
That's been fixed: 0754c12
Yeah, that's also been fixed, I noticed that too while doing comparisons. There was an issue with line projection, they could jitter between two pixels and sometimes land exactly on the buffer boundary. They're now properly centered and stable. I also added a post-projection clamp that costs nothing just to be safe.
Yes, we could disable it when the texture format has no alpha channel in its format, and for formats that do, analyze it at load time, but we'd also need to check the vertex colors. I'll think about it.
Yeah, just dropping the version hint sounds good to me. Edit: Though in hindsight, keeping DetailsOtherwise, I had initially thought about handling framebuffers by exposing `GL_NUM_EXTENSIONS` and related. That could have been standard even for GL 1.1 context, though we'd have assumed it present anyway for simplicity, and it still wouldn't have followed the same paths as `OPENGL_11` anyway. |
I forgot to mention, it was tested in O2 on GCC 15.2.1 |
Afaik, extensions mechanism was introduced in OpenGL 2.0, by design OpenGL 1.1 did not allow extensions, I think. In any case, I prefer to avoid that extensions route (never liked it on OpenGL), and I'm doing some more tests but I think this PR is ready for merge, it's already a quite big one, further improvements can be added later. It will bee included into |
|
@Bigfoot71 I tested some examples and it works great! Merging for further review! |
100% agree with avoiding that route! I'll move on to the problem of blend mode! DetailsSmall nit: extensions already existed in 1.1 (via glGetString(GL_EXTENSIONS) and glext.h often shipped), therefore consistent, but yeah GL_NUM_EXTENSIONS only came in 3.0 |
|
@Bigfoot71 Just finished reviewing |
That's correct, no objections! |


Rlsw refactor to support
RenderTextureThis draft starts the rework of rlsw's internal framebuffer system in order to support
RenderTexture. The goal is to allow offscreen rendering via framebuffers with a single color attachment and an optional depth buffer.What has been done so far:
R3G3B2,R5G6B5,R4G4B4A4,R5G5B5A1) as well as depth formats.glTexSubImage2D.The FBO API is the only missing piece and doesn't require much more, but I can see quite a few improvement opportunities along the way, I'll document them here as I go.
I was considering restricting framebuffers to formats matching those defined at compile time, but adding full runtime format support should be fairly straightforward at the cost of a switch per pixel read/write, it can be an option at compile time as well. I'll let @raysan5 decide on this.
I'll also run a performance comparison on desktop between the old and new code to get an idea of the overall impact. So far it seems fairly minimal on my end in debug builds, but things are still a bit rough at this stage.