Skip to content

Optimize stack management, fix preemption, and add automated QEMU testing for RISC-V32 port#513

Open
Winstonllllai wants to merge 11 commits intoeclipse-threadx:masterfrom
Winstonllllai:rv32-fix-feature
Open

Optimize stack management, fix preemption, and add automated QEMU testing for RISC-V32 port#513
Winstonllllai wants to merge 11 commits intoeclipse-threadx:masterfrom
Winstonllllai:rv32-fix-feature

Conversation

@Winstonllllai
Copy link
Copy Markdown

This PR focuses on technical debt cleanup and functional enhancements for the ThreadX RISC-V 32-bit (RV32) port.

  • Symbolic Stack Offsets:
    • Replaced hardcoded magic numbers in .S files with symbolic macros in tx_port.h.
    • Decoupled stack layout for easier future updates (e.g., Vector or Lazy FPU extensions).
  • Scheduler & Preemption Fixes:
    • Refactored _tx_thread_context_restore to resolve potential race conditions in high-priority task preemption.
    • Integrated the wfi (Wait For Interrupt) instruction to optimize CPU idle cycles and power efficiency.
  • Automated Testing Framework:
    • Introduced a Python-based E2E test harness using QEMU + GDB.
    • Validates multitasking, timer interrupts, FPU context preservation, and priority consistency.
  • Linker & Performance Optimization:
    • Defined __global_pointer$ in link.lds to enable GP Relaxation.
    • Reduced code size and improved global variable access efficiency.

How to Run Tests

To verify the changes using the automated framework, please run the following commands:

cd build_qemu
make check-functional-riscv32

This is my first PR in Threadx. Please let me know if there’s anything I should improve or if I missed anything.

Optimized system reliability and interrupt latency by implementing wfi in the idle loop
and enforcing 16-byte alignment for the system stack.

Details:
1. Reliability (WFI) Implementation: Modified tx_thread_schedule.S to include the
wfi (Wait For Interrupt) instruction in the idle loop.
The scheduler now enters low-power sleep mode when no threads are ready,
instead of busy-waiting, reducing significant power consumption.
2. Interrupt Latency (Shadow Stack Alignment): Updated tx_initialize_low_level.S to
enforce 16-byte alignment on the system stack pointer (sp) before saving it to
_tx_thread_system_stack_ptr.
Ensures strict adherence to the RISC-V ABI and prevents misaligned access faults
or performance penalties during context switches to the interrupt stack.
- Add Python-based test runner automating QEMU launch and GDB control.
  - Verifies Context Switching, FPU Context Preservation, and Timer Interrupts.
  - Implements robust cleanup using try-finally blocks to prevent orphaned QEMU processes.
- Update CMake configuration to include the `check-functional-riscv32` target.
- Add floating-point arithmetic test logic to thread entry function.
- Add global pointer definition to linker script to resolve relocation errors.
- Enabled `-mrelax` compiler flag in CMake configuration.
- Updated entry.s to correctly initialize the Global Pointer (`gp`) using
the `__global_pointer$` symbol.
- Verified that `gp` is treated as a constant and no longer saved/restored
during context switches, improving performance.
- Implement symbolic stack offsets for RISC-V 32-bit GNU port.
- Harmonize context restoration across all scheduling paths.
- Clean up assembly files within the risc-v32/gnu source directory.
Standardized stack frame access using sp instead of t0 in tx_thread_context_restore.S.
Fixed global pointer address loading using la for _tx_thread_current_ptr and others.
Implemented dynamic mstatus recovery from stack to preserve FPU and interrupt states.
Optimized register storage sequence in the preemption path for better atomicity.
Adjusted thread priorities in demo_threadx.c to facilitate preemption verification.
@fdesbiens
Copy link
Copy Markdown
Contributor

Hi @Winstonllllai!

Thank you for this contribution. I will ask someone to review it and provide feedback. Since most senior team members were at Embedded World this week, we expect our feedback to be delayed.

Before we can accept your code, you need to sign the Eclipse Contributor Agreement (ECA). The purpose of the ECA is to provide a written record that you have agreed to provide your code and documentation contributions under the licenses used by the Eclipse ThreadX project. It also makes it clear that you are promising that what you are contributing to Eclipse is code you wrote, and you have the necessary rights to contribute it to our projects. And finally, it documents a commitment from you that your open source contributions will be permanently on the public record.

Signing the ECA requires an Eclipse Foundation account if you do not already have one. You can create one for free at https://accounts.eclipse.org.

Be sure to use the same email address when you register for the account that you intend to use on Git commit records. Also, please add your GitHub ID to your Eclipse account. This enables synchronisation between Eclipse-owned infrastructure and GitHub.

Here is the link to sign the ECA:
https://accounts.eclipse.org/user/login?destination=user/eca

@Winstonllllai
Copy link
Copy Markdown
Author

Hi @fdesbiens ,
Thanks for the information. I’ve completed the registration and signed the Eclipse Contributor Agreement (ECA). Please let me know if anything else is needed.

@fdesbiens
Copy link
Copy Markdown
Contributor

Hi @Winstonllllai.

Thank you. The ECA check passes. I will ask a team member to have a look next week (March 23), when I am back from vacation.

@Winstonllllai
Copy link
Copy Markdown
Author

Hi @fdesbiens ,
Thanks! Looking forward to the review next week. Enjoy your vacation!

@fdesbiens fdesbiens moved this to In review in ThreadX Roadmap Mar 24, 2026
@fdesbiens fdesbiens requested a review from akifejaz March 25, 2026 20:27
@fdesbiens
Copy link
Copy Markdown
Contributor

@akifejaz Would you mind taking a look, please?

Copy link
Copy Markdown
Contributor

@akifejaz akifejaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Winstonllllai, Thanks for the PR. Firstly, I'm little late in review, I apologize for that.

Here is my feedback on the PR.

-- we should not be using the tx_port.h in any of the .S file
-- hardcoded values in src/*.S are fine you can revert those back
-- can you please add more comments in entry.s & link.lds, the changes there are not justified.
-- almost all the files needs cleanup in terms of comments, alignment etc, you can refer to ThreadX coding standards for more details.
-- all changes in tx_port.h seems irrelevant you can revert these.
-- lets move the build instructions from project root CMakeLists.txt to port level CMakeLists.txt

moreover, I also added inline comments in some of the files, I didn't reviewed rest of the files/changes, but they also seem to have similar issues. Let me know if you get chance to cleanup the PR, and I'm happy to review it again.

Thanks!

tx_thread_create(&thread_2, "thread 2", thread_2_entry, 2,
pointer, DEMO_STACK_SIZE,
16, 16, 4, TX_AUTO_START);
10, 10, 4, TX_AUTO_START);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need this?

}
}

float fpu_test_val = 0.0f;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be moved to top after other #defines.

Comment on lines +14 to +17
.option push
.option norelax
la gp, __global_pointer$
.option pop
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x3 register is missing. also I'm not sure why we need to change .text to .init.

Comment on lines +65 to +71
#if defined(__riscv_flen) && ((__riscv_flen == 32)||(__riscv_flen == 64))
addi sp, sp, -65*REGBYTES // Allocate space for all registers - with floating point enabled
#else
addi sp, sp, -128 // Allocate space for all registers - without floating point enabled (32*4)
addi sp, sp, -32*REGBYTES // Allocate space for all registers - without floating point enabled
#endif

sw x1, 112(sp) // Store RA (28*4 = 112, because call will override ra [ra is a callee register in riscv])
STORE x1, 28*REGBYTES(sp) // Store RA, 28*REGBYTES(because call will override ra [ra is a calle register in riscv])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we intentionally used the hardcoded numbers, so we do not have to include .h file in .S files.

Comment on lines +51 to +69
#ifdef __ASSEMBLER__


#if __riscv_xlen == 64
# define SLL32 sllw
# define STORE sd
# define LOAD ld
# define LWU lwu
# define LOG_REGBYTES 3
#else
# define SLL32 sll
# define STORE sw
# define LOAD lw
# define LWU lw
# define LOG_REGBYTES 2
#endif
#define REGBYTES (1 << LOG_REGBYTES)

/* Define stack frame offsets for thread context save/restore.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not recommended as per ThreadX coding standards.

Comment on lines +189 to +198
la t0, _tx_thread_current_ptr
LOAD t1, 0(t0) // Pickup current thread pointer
beqz t1, _tx_thread_idle_system_restore // If NULL, idle system restore


la t0, _tx_thread_preempt_disable // Pickup preempt disable flag address
lw t2, 0(t0) // Pickup preempt disable flag (UINT)

la t0, _tx_thread_preempt_disable
LOAD t2, 0(t0) // Pickup preempt disable flag
bgtz t2, _tx_thread_no_preempt_restore // If set, restore interrupted thread


la t0, _tx_thread_execute_ptr // Pickup thread execute pointer address
lw t2, 0(t0) // Pickup thread execute pointer

la t0, _tx_thread_execute_ptr
LOAD t2, 0(t0) // Pickup thread execute pointer
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, No changes needed here.

Comment on lines +71 to +90
la x5, _tx_thread_system_state // Pickup address of system state
LOAD x6, 0(x5) // Pickup system state

/* Check for a nested interrupt. */
/* Check for a nested interrupt condition. */
/* if (_tx_thread_system_state++)
{ */
beqz x6, _tx_thread_not_nested_save // If 0, first interrupt condition
addi x6, x6, 1 // Increment the interrupt counter
STORE x6, 0(x5) // Store the interrupt counter

/* Nested interrupt condition.
Save the reset of the scratch registers on the stack and return to the
calling ISR. */

la t0, _tx_thread_system_state // Pickup addr of system state var
lw t1, 0(t0) // Pickup system state
addi t1, t1, 1 // Increment system state
sw t1, 0(t0) // Store system state
li t0, 1
bgt t1, t0, _tx_thread_nested_save // If it's more than 1, nested interrupt

/* First level interrupt, save the rest of the scratch registers and
check for a thread to preempt. */

sw t2, 17*4(sp) // Store t2
sw s0, 12*4(sp) // Store s0
sw a0, 27*4(sp) // Store a0
sw a1, 26*4(sp) // Store a1
sw a2, 25*4(sp) // Store a2
sw a3, 24*4(sp) // Store a3
sw a4, 23*4(sp) // Store a4
sw a5, 22*4(sp) // Store a5
sw a6, 21*4(sp) // Store a6
sw a7, 20*4(sp) // Store a7
sw t3, 16*4(sp) // Store t3
sw t4, 15*4(sp) // Store t4
sw t5, 14*4(sp) // Store t5
sw t6, 13*4(sp) // Store t6

/* Save floating point registers. */
#if defined(__riscv_float_abi_single)
fsw f0, 31*4(sp) // Store ft0
fsw f1, 32*4(sp) // Store ft1
fsw f2, 33*4(sp) // Store ft2
fsw f3, 34*4(sp) // Store ft3
fsw f4, 35*4(sp) // Store ft4
fsw f5, 36*4(sp) // Store ft5
fsw f6, 37*4(sp) // Store ft6
fsw f7, 38*4(sp) // Store ft7
fsw f10, 41*4(sp) // Store fa0
fsw f11, 42*4(sp) // Store fa1
fsw f12, 43*4(sp) // Store fa2
fsw f13, 44*4(sp) // Store fa3
fsw f14, 45*4(sp) // Store fa4
fsw f15, 46*4(sp) // Store fa5
fsw f16, 47*4(sp) // Store fa6
fsw f17, 48*4(sp) // Store fa7
fsw f28, 59*4(sp) // Store ft8
fsw f29, 60*4(sp) // Store ft9
fsw f30, 61*4(sp) // Store ft10
fsw f31, 62*4(sp) // Store ft11
STORE x7, TX_STACK_OFFSET_X7(sp) // Store t2
STORE x8, TX_STACK_OFFSET_X8(sp) // Store s0
STORE x10, TX_STACK_OFFSET_X10(sp) // Store a0
STORE x11, TX_STACK_OFFSET_X11(sp) // Store a1
STORE x12, TX_STACK_OFFSET_X12(sp) // Store a2
STORE x13, TX_STACK_OFFSET_X13(sp) // Store a3
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, No changes needed here.

Comment on lines +63 to +68
/* Pickup current interrupt lockout posture. */

/* Pickup current interrupt posture. */
csrr t0, mstatus
mv t1, t0 // Save original mstatus for return

csrr a1, mstatus // Pickup mstatus
andi a1, a1, 0x08 // Mask out all but MIE

/* Check for the new posture. */

beqz a0, _tx_thread_interrupt_disable // If 0, disable interrupts

/* Enable interrupts. */

csrsi mstatus, 0x08 // Enable interrupts (MIE bit 3)
j _tx_thread_interrupt_control_exit // Return to caller

_tx_thread_interrupt_disable:

/* Disable interrupts. */

csrci mstatus, 0x08 // Disable interrupts (MIE bit 3)

_tx_thread_interrupt_control_exit:

/* Return the old interrupt posture. */

mv a0, a1 // Setup return value
ret // Return to caller
/* Apply the new interrupt posture. */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing the _tx_thread_interrupt_control_exit, _tx_thread_interrupt_disable symbols.

Comment on lines +1 to +17
import subprocess
import sys
import time
import os
import argparse
import socket
import select

def print_content(content):
"""Prints content using os.write to handle non-blocking stdout robustly."""
try:
msg = f"{content}\n".encode('utf-8')
total_len = len(msg)
written = 0
fd = sys.stdout.fileno()
while written < total_len:
try:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this script should be added under ports/ or not.

Comment on lines +72 to +87
include(CPack)

# Enable the QEMU Virt demo application (Target: kernel.elf)
set(QEMU_DEMO_DIR ${CMAKE_CURRENT_LIST_DIR}/ports/risc-v32/gnu/example_build/qemu_virt)

add_executable(kernel.elf
${QEMU_DEMO_DIR}/demo_threadx.c
${QEMU_DEMO_DIR}/entry.s
${QEMU_DEMO_DIR}/uart.c
${QEMU_DEMO_DIR}/plic.c
${QEMU_DEMO_DIR}/hwtimer.c
${QEMU_DEMO_DIR}/trap.c
${QEMU_DEMO_DIR}/board.c
${QEMU_DEMO_DIR}/tx_initialize_low_level.S
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is port/ level cmake, lets move this under ports/riscv32/ if possible.

@fdesbiens fdesbiens added this to the 2026-02 milestone Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

3 participants