[rlsw] RenderTexture support by Bigfoot71 · Pull Request #5655 · raysan5/raylib

Bigfoot71 · 2026-03-15T00:15:40Z

Rlsw refactor to support RenderTexture

This draft starts the rework of rlsw's internal framebuffer system in order to support RenderTexture. The goal is to allow offscreen rendering via framebuffers with a single color attachment and an optional depth buffer.

What has been done so far:

Expanded texture format support to cover packed color formats (R3G3B2, R5G6B5, R4G4B4A4, R5G5B5A1) as well as depth formats.
Textures are no longer pre-converted to RGBA8 and therefore require a switch per read.
The internal framebuffer now relies on the existing texture system rather than its own pixel storage
Framebuffers now support all texture formats already available in the library.
The 24-bit depth format has been removed as it is no longer needed.
Framebuffer format is still defined at compile time.
The allocated texture size is now preserved across resizes, avoiding frequent reallocations when resizing the framebuffer and enabling possible support of glTexSubImage2D.

The FBO API is the only missing piece and doesn't require much more, but I can see quite a few improvement opportunities along the way, I'll document them here as I go.

I was considering restricting framebuffers to formats matching those defined at compile time, but adding full runtime format support should be fairly straightforward at the cost of a switch per pixel read/write, it can be an option at compile time as well. I'll let @raysan5 decide on this.

I'll also run a performance comparison on desktop between the old and new code to get an idea of the overall impact. So far it seems fairly minimal on my end in debug builds, but things are still a bit rough at this stage.

Added support for `R3G3B2`, `R5G6B5`, `R4G4B4A4` and `R5G5B5A1` Added depth formats

- Framebuffers can now use all texture types that are already available. - The 24-bit depth format has been removed as it is no longer needed. - Framebuffer formats are still defined at compile time. - The allocated texture size is now preserved, which avoids frequent reallocations when resizing framebuffers and will allow the use of `glTexSubImage2D`.

This greatly simplifies the framebuffer blit/copy logic while now supporting all pixel formats. It is slightly slower in debug builds, but this path is mainly kept for compatibility anyway. The `copy_fast` version is still used for the "normal" cases when presenting to the screen.

less ops for certain formats + fixes

I made the pointers parameters `restrict` for reading/writing textures, which resulted in a slight improvement. And I reviewed the `static inline` statements, which could potentially bias the compiler; no difference, but it's cleaner.

Bigfoot71 · 2026-03-15T16:56:42Z

I managed to get a fairly clean base. I'm going to start adding the FBO API, and the changes I made even gave me a 10 FPS gain on the models_first_person_maze example, that's a really nice bonus!

I also ran some tests and got a bit more performance in a few other areas, especially color blending. But since that's outside the scope of this PR I'll move it to another one.

Also, I had written a comment about the conversion between RGBA8 and RGBA32. That has been resolved in the meantime, so I reintroduced the SIMD functions. The performance gain seems smaller than before, but it's still significant.

Just as a note, I also fixed a few UB issues in the SIMD functions and I removed the clamp in the 'set_color' RISC-V version and added a comment mentioning that this version currently does more than necessary, but I'd rather not touch it any further.

will allow management of both textures and framebuffers added support for `glTexSubImage2D` added handling of 'GL_OUT_OF_MEMORY' errors removed the default internal texture (unused)

Bigfoot71 · 2026-03-15T21:45:18Z

I took the opportunity to add support for glTexSubImage2D, I was able to test the textures_image_processing example, which now works correctly.

raysan5 · 2026-03-15T22:32:23Z

@Bigfoot71 It looks amazing!

I was considering restricting framebuffers to formats matching those defined at compile time, but adding full runtime format support should be fairly straightforward at the cost of a switch per pixel read/write, it can be an option at compile time as well. I'll let @raysan5 decide on this.

I think a compile option can be added as a fast-path, switch per pixel read/write can suppose a hit on performance.

I'll also run a performance comparison on desktop between the old and new code to get an idea of the overall impact. So far it seems fairly minimal on my end in debug builds, but things are still a bit rough at this stage.

Definitely, it would be nice to get some numbers in that regards.

Bigfoot71 · 2026-03-15T23:00:00Z

I think a compile option can be added as a fast-path, switch per pixel read/write can suppose a hit on performance.

Yes, that's what I've been doing so far for the framebuffer.

For textures there's still one switch per read. Avoiding it would require further specialization of the rasterization functions, but keeping that in a single header quickly becomes unmanageable.

Just an idea: we could implement the rasterization functions in separate files and include them multiple times with different #define configurations. That would allow deeper specialization and platform-specific optimizations. The downside is that it would no longer be a single header...

raysan5 · 2026-03-15T23:09:34Z

@Bigfoot71 I'd prefer to keep it as a single-file, it's more self-contained and portable. I saw the multiple functions split, still, it looks simple and readable.

and rename rlsw's resize/copy/blit

+ tweaks and fixes

my bad, an oversight in my previous fix. This offset should have been moved here rather than per pixel during truncation.

Bigfoot71 · 2026-03-17T02:06:39Z

Framebuffers are officially supported!

Enregistrement.d.ecran_20260317_010651.webm

Well, there's a lot to say, sorry in advance for the length.

New supported functions

The following rlgl functions are now supported with rlsw:

rlLoadFramebuffer()
rlFramebufferAttach()
rlFramebufferComplete()
rlUnloadFramebuffer()
rlEnableFramebuffer()
rlGetActiveFramebuffer()
rlDisableFramebuffer()
rlBindFramebuffer()
rlLoadTextureDepth()

rlBlitFramebuffer is not yet supported, let me know if that's really needed.

How it works

The API only supports the equivalent of GL_FRAMEBUFFER target, with a single color attachment
at GL_COLOR_ATTACHMENT0 (required) and an optional depth buffer, if no depth buffer is
attached, depth testing is implicitly disabled.

If the bound framebuffer is incomplete, rendering is simply skipped.

Renderbuffer binding has been added; note that returned IDs are actually texture IDs
internally, which keeps the rlgl side simple.

Only the framebuffer format defined at compile time on the rlsw side is accepted as a
valid attachment, anything else will be treated as incomplete.

Default formats have been updated to R8G8B8A8 and D32 to match what rlgl expects
by default. Note that GL_DEPTH_COMPONENT24 is considered valid and maps to D32 internally.

Performance

I didn't do precise measurements for one simple reason: despite all these changes,
it's clearly faster. Here are some quick comparisons on my machine
(AMD Ryzen 5 3600, O2 on GCC 15.2.1, SDL backend, no manual SIMD paths enabled):

Scenario	Before	After
`textures_bunnymark` idle	~400 FPS	~540 FPS
`textures_bunnymark` 30 FPS threshold	900 bunnies	1500 bunnies
`models_first_person_maze` heavy overdraw zone	~29 FPS	~41 FPS

shapes_bullet_hell and models_waving_cubes also occasionally dipped below 60 FPS
before, both now run at a stable 60. With SSE2 SIMD paths enabled the gains are even
slightly higher.

Given how large the changes are, pinning down a specific measurement doesn't feel
particularly meaningful here.

Notes

A few things I noticed that could be cleaned up in separate PRs:

rlResizeFramebuffer is never called by any platform, it should be called on window
resize. Worth a dedicated PR as this one is already large enough.
GRAPHICS_API_OPENGL_11_SOFTWARE might want a rename, since the software backend now
takes code paths in rlgl that have nothing to do with GL 1.1.
Color blending is currently one of the biggest bottlenecks (~12% of frame time according
to my profiling, which is significant, especially since raylib enables alpha blending
constantly by default). A big part of the issue is the double function pointer
indirection. Happy to propose a dedicated PR for that.

A note on the rasterizer specialization approach

I'd prefer to keep it as a single-file, it's more self-contained and portable. I saw the multiple functions split, still, it looks simple and readable.

I would like to update the point on the idea I had proposed here.

While working on this I hit a crash inside the rasterizer functions that was really painful to debug, no debugger could point to the right line, and making temporary test changes was really tedious.

Thinking about this idea more, instead of separate files re-included with specific #define (a pattern found in several software rasterizers including mesa), I realized the header could simply include itself multiple times, which produces exactly the same result.

I went ahead and did that here, sorry if this was outside the scope of the PR, but clearly it was worth it and it helped me a lot.

The advantages are significant:

Debugger now points to the exact line when something breaks
Modifications and testing are MUCH simpler
Specialization of functions also much simpler
Finer control over what each specialization does or doesn't include
More confidence in what the compiler actually can generates

The only downside is that declaring each specialization is slightly more verbose, but it stays very manageable. To help with that, a bit of macro work generates the dispatch tables automatically from the pipeline state, which also simplified the render call sites considerably.

Bigfoot71 · 2026-03-17T02:10:01Z

Ah, and I also went through a large number of examples, everything seems to work well. Only one blend mode example looks a bit off to me, but I haven't taken the time to compare it against the GPU version yet, given the note I mentioned about blending.

raysan5 · 2026-03-17T08:02:16Z

@Bigfoot71 Amazing work! Definitely a big list of changes! My answer to exposed concerns:

rlBlitFramebuffer is not yet supported, let me know if that's really needed.

That function was added for a one very specific use case (afair, for the deferred renderer and G-buffers), not really needed at the moment but considering it is a data memcpy between two buffers, I think it can be useful. Actually rlCopyFramebuffer is already available.

rlResizeFramebuffer is never called by any platform, it should be called on window resize. Worth a dedicated PR as this one is already large enough.

Agree, separate PR for the future. Note that most platforms using the software renderer (like embedded devices), I think would rarely need framebuffer resizing after initialization... unless they render to a smaller FB for optimization... 🤔

GRAPHICS_API_OPENGL_11_SOFTWARE might want a rename, since the software backend now takes code paths in rlgl that have nothing to do with GL 1.1.

Absolutely, what about just GRAPHICS_API_OPENGL_SOFTWARE?

Color blending is currently one of the biggest bottlenecks (~12% of frame time according to my profiling, which is significant, especially since raylib enables alpha blending constantly by default). A big part of the issue is the double function pointer indirection. Happy to propose a dedicated PR for that.

Any ideas are welcome, maybe there could be a fast path with a flag to disable alpha blending. Cheking the current use-case, it seems performance is more important than blending in some scenarios:

I realized the header could simply include itself multiple times, which produces exactly the same result.

Wow! What an approach! Still, sounds fine to me if it helps to keep it self-contained.

The only downside is that declaring each specialization is slightly more verbose, but it stays very manageable.

Verbosity has never been a problem for raylib, I think it even helps in many scenarios.

also went through a large number of examples, everything seems to work well.

I tried the changes with multiple examples and VS2022, using the new (experimental) rcore_desktop_win32 backend, I got some issues, seem related to the double-macro preprocesing order:

Also tried multiple examples with current (old) rlsw implementation and I got some crashes on depth buffer accesses, specifically on lines renderings... but I'll try again with this big update.

In any case, this improvement is fantastic, thank you very much for all the hard work put on this new module, definitely the key new big feature for the new raylib 6.0 release! Thanks!

Bigfoot71 · 2026-03-17T13:27:03Z

I tried the changes with multiple examples and VS2022, using the new (experimental) rcore_desktop_win32 backend, I got some issues, seem related to the double-macro preprocesing order:

That's been fixed: 0754c12
MSVC doesn't seem to support VLA, even though that's C99.
We can just allocate 16 bytes for colors and 4 for depth, there's never going to be anything larger anyway.

Also tried multiple examples with current (old) rlsw implementation and I got some crashes on depth buffer accesses, specifically on lines renderings... but I'll try again with this big update.

Yeah, that's also been fixed, I noticed that too while doing comparisons. There was an issue with line projection, they could jitter between two pixels and sometimes land exactly on the buffer boundary.

They're now properly centered and stable. I also added a post-projection clamp that costs nothing just to be safe.

Any ideas are welcome, maybe there could be a fast path with a flag to disable alpha blending. Cheking the current use-case, it seems performance is more important than blending in some scenarios:

Yes, we could disable it when the texture format has no alpha channel in its format, and for formats that do, analyze it at load time, but we'd also need to check the vertex colors. I'll think about it.

Absolutely, what about just GRAPHICS_API_OPENGL_SOFTWARE?

Yeah, just dropping the version hint sounds good to me.

Edit: Though in hindsight, keeping 11 implies no extra features are supported, but dropping it doesn't really say anything either. Not critical, just an observation.

Details

Otherwise, I had initially thought about handling framebuffers by exposing `GL_NUM_EXTENSIONS` and related. That could have been standard even for GL 1.1 context, though we'd have assumed it present anyway for simplicity, and it still wouldn't have followed the same paths as `OPENGL_11` anyway.

Bigfoot71 · 2026-03-17T13:28:51Z

Here are some quick comparisons on my machine
(AMD Ryzen 5 3600, SDL backend, no manual SIMD paths enabled):

I forgot to mention, it was tested in O2 on GCC 15.2.1
O3 also gives better results by the way.

raysan5 · 2026-03-17T16:25:24Z

Otherwise, I had initially thought about handling framebuffers by exposing GL_NUM_EXTENSIONS and related. That could have been standard even for GL 1.1 context, though we'd have assumed it present anyway for simplicity, and it still wouldn't have followed the same paths as OPENGL_11 anyway.

Afaik, extensions mechanism was introduced in OpenGL 2.0, by design OpenGL 1.1 did not allow extensions, I think. In any case, I prefer to avoid that extensions route (never liked it on OpenGL), and rlsw can keep growing on its own with a versioning system and documented features.

I'm doing some more tests but I think this PR is ready for merge, it's already a quite big one, further improvements can be added later. It will bee included into raylib 6.0, actually it's one of the key additions of this new version!

raysan5 · 2026-03-17T16:50:38Z

@Bigfoot71 I tested some examples and it works great! Merging for further review!

Bigfoot71 · 2026-03-17T16:56:32Z

In any case, I prefer to avoid that extensions route (never liked it on OpenGL), and rlsw can keep growing on its own with a versioning system and documented features.

100% agree with avoiding that route! I'll move on to the problem of blend mode!

Details

Small nit: extensions already existed in 1.1 (via glGetString(GL_EXTENSIONS) and glext.h often shipped), therefore consistent, but yeah GL_NUM_EXTENSIONS only came in 3.0

raysan5 · 2026-03-17T17:37:02Z

@Bigfoot71 Just finished reviewing rlsw, applied some format tweaks and updated version number to rlsw 1.5; rlsw 1.0 was already in use by the ESP32 port so I updated to avoid confusion, also, 1.1 could be confused with OpenGL 1.1. Just choose 1.5, we can change to other numbering if you prefer.

Bigfoot71 · 2026-03-17T17:45:02Z

we can change to other numbering if you prefer.

That's correct, no objections!

Bigfoot71 added 8 commits March 14, 2026 19:10

review texture formats

30d8fed

Added support for `R3G3B2`, `R5G6B5`, `R4G4B4A4` and `R5G5B5A1` Added depth formats

review pixel get/set

10ca5b7

less ops for certain formats + fixes

fix depth write

3f141c1

texture read/write cleanup + tweaks

08a8c6c

I made the pointers parameters `restrict` for reading/writing textures, which resulted in a slight improvement. And I reviewed the `static inline` statements, which could potentially bias the compiler; no difference, but it's cleaner.

style tweaks

f73909a

review uint8_t <-> float conversion

52d4fba

added a reusable object pool system

fae2e88

will allow management of both textures and framebuffers added support for `glTexSubImage2D` added handling of 'GL_OUT_OF_MEMORY' errors removed the default internal texture (unused)

Bigfoot71 added 9 commits March 16, 2026 21:05

added FBO API + refactored rasterizer dispatch logic

9654fff

fix ndc projection + review presentation

e89c8a4

and rename rlsw's resize/copy/blit

add glRenderbufferStorage binding

dc9d2c9

+ tweaks and fixes

fix quad sorting + simplify quad rasterization part

4928008

fix line shaking issue

5791f53

support of GL_DRAW_FRAMEBUFFER_BINDING

42ddf55

update rlgl - support of rlsw's framebuffers

fd6fdcc

fix pixel origin in line rasterization

7ac973b

my bad, an oversight in my previous fix. This offset should have been moved here rather than per pixel during truncation.

style tweaks

f9824eb

Bigfoot71 marked this pull request as ready for review March 17, 2026 02:06

fix vla issue with msvc - fill depth / fill color

0754c12

raysan5 merged commit e7d999e into raysan5:master Mar 17, 2026
16 checks passed

Bigfoot71 mentioned this pull request Apr 3, 2026

[rlsw] Window resize not supported, segmentation fault on sw_color8_to_color() #5715

Closed

Bigfoot71 deleted the rlsw-surface branch April 4, 2026 12:08

Uh oh!

Conversation

Bigfoot71 commented Mar 15, 2026

Uh oh!

Bigfoot71 commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Bigfoot71 commented Mar 15, 2026

Uh oh!

raysan5 commented Mar 15, 2026

Uh oh!

Bigfoot71 commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

raysan5 commented Mar 15, 2026

Uh oh!

Bigfoot71 commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New supported functions

How it works

Performance

Notes

A note on the rasterizer specialization approach

Uh oh!

Bigfoot71 commented Mar 17, 2026

Uh oh!

raysan5 commented Mar 17, 2026

Uh oh!

Bigfoot71 commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Bigfoot71 commented Mar 17, 2026

Uh oh!

raysan5 commented Mar 17, 2026

Uh oh!

Uh oh!

raysan5 commented Mar 17, 2026

Uh oh!

Bigfoot71 commented Mar 17, 2026

Uh oh!

raysan5 commented Mar 17, 2026

Uh oh!

Bigfoot71 commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bigfoot71 commented Mar 15, 2026 •

edited

Loading

Bigfoot71 commented Mar 15, 2026 •

edited

Loading

Bigfoot71 commented Mar 17, 2026 •

edited

Loading

Bigfoot71 commented Mar 17, 2026 •

edited

Loading