Skip to content

Commit 507d3ad

Browse files
committed
Loongarch: optimize syscall reg save
deepin inclusion category: performance It saves a st.d in the hot syscall path, and let the compiler know to optimize it in asm, and helps to improve the syscall performance little. I have test in 3A6000 After patch: Benchmark Run: 一 6月 16 2025 20:38:10 - 20:47:09 8 CPUs in system; running 1 parallel copy of tests Dhrystone 2 using register variables 47066632.8 lps (10.0 s, 2 samples) Double-Precision Whetstone 5036.1 MWIPS (10.0 s, 2 samples) Execl Throughput 4484.2 lps (29.2 s, 1 samples) File Copy 1024 bufsize 2000 maxblocks 656586.0 KBps (30.0 s, 1 samples) File Copy 256 bufsize 500 maxblocks 175086.0 KBps (30.0 s, 1 samples) File Copy 4096 bufsize 8000 maxblocks 1998702.0 KBps (30.0 s, 1 samples) Pipe Throughput 1365130.7 lps (10.0 s, 2 samples) Pipe-based Context Switching 126232.9 lps (10.0 s, 2 samples) Process Creation 9202.7 lps (30.0 s, 1 samples) Shell Scripts (1 concurrent) 12501.2 lpm (60.0 s, 1 samples) Shell Scripts (8 concurrent) 4974.9 lpm (60.0 s, 1 samples) System Call Overhead 1467021.7 lps (10.0 s, 2 samples) System Benchmarks Index Values BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 47066632.8 4033.1 Double-Precision Whetstone 55.0 5036.1 915.7 Execl Throughput 43.0 4484.2 1042.8 File Copy 1024 bufsize 2000 maxblocks 3960.0 656586.0 1658.0 File Copy 256 bufsize 500 maxblocks 1655.0 175086.0 1057.9 File Copy 4096 bufsize 8000 maxblocks 5800.0 1998702.0 3446.0 Pipe Throughput 12440.0 1365130.7 1097.4 Pipe-based Context Switching 4000.0 126232.9 315.6 Process Creation 126.0 9202.7 730.4 Shell Scripts (1 concurrent) 42.4 12501.2 2948.4 Shell Scripts (8 concurrent) 6.0 4974.9 8291.5 System Call Overhead 15000.0 1467021.7 978.0 ======== System Benchmarks Index Score 1510.2 ------------------------------------------------------------------------ Benchmark Run: 一 6月 16 2025 20:47:09 - 20:56:08 8 CPUs in system; running 8 parallel copies of tests Dhrystone 2 using register variables 221748966.2 lps (10.0 s, 2 samples) Double-Precision Whetstone 37218.5 MWIPS (10.0 s, 2 samples) Execl Throughput 24364.4 lps (29.0 s, 1 samples) File Copy 1024 bufsize 2000 maxblocks 3681637.0 KBps (30.0 s, 1 samples) File Copy 256 bufsize 500 maxblocks 1020033.0 KBps (30.0 s, 1 samples) File Copy 4096 bufsize 8000 maxblocks 8054794.0 KBps (30.0 s, 1 samples) Pipe Throughput 8209249.1 lps (10.0 s, 2 samples) Pipe-based Context Switching 1058150.7 lps (10.0 s, 2 samples) Process Creation 49636.4 lps (30.0 s, 1 samples) Shell Scripts (1 concurrent) 43521.6 lpm (60.0 s, 1 samples) Shell Scripts (8 concurrent) 5672.4 lpm (60.0 s, 1 samples) System Call Overhead 9407101.4 lps (10.0 s, 2 samples) System Benchmarks Index Values BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 221748966.2 19001.6 Double-Precision Whetstone 55.0 37218.5 6767.0 Execl Throughput 43.0 24364.4 5666.2 File Copy 1024 bufsize 2000 maxblocks 3960.0 3681637.0 9297.1 File Copy 256 bufsize 500 maxblocks 1655.0 1020033.0 6163.3 File Copy 4096 bufsize 8000 maxblocks 5800.0 8054794.0 13887.6 Pipe Throughput 12440.0 8209249.1 6599.1 Pipe-based Context Switching 4000.0 1058150.7 2645.4 Process Creation 126.0 49636.4 3939.4 Shell Scripts (1 concurrent) 42.4 43521.6 10264.5 Shell Scripts (8 concurrent) 6.0 5672.4 9454.0 System Call Overhead 15000.0 9407101.4 6271.4 ======== System Benchmarks Index Score 7335.3 Before patch: Benchmark Run: 一 6月 16 2025 22:58:12 - 23:07:11 8 CPUs in system; running 1 parallel copy of tests Dhrystone 2 using register variables 41001790.5 lps (10.0 s, 2 samples) Double-Precision Whetstone 5036.1 MWIPS (10.0 s, 2 samples) Execl Throughput 4482.0 lps (29.6 s, 1 samples) File Copy 1024 bufsize 2000 maxblocks 654904.0 KBps (30.0 s, 1 samples) File Copy 256 bufsize 500 maxblocks 173158.0 KBps (30.0 s, 1 samples) File Copy 4096 bufsize 8000 maxblocks 2008222.0 KBps (30.0 s, 1 samples) Pipe Throughput 1370314.7 lps (10.0 s, 2 samples) Pipe-based Context Switching 126314.0 lps (10.0 s, 2 samples) Process Creation 9063.9 lps (30.0 s, 1 samples) Shell Scripts (1 concurrent) 12506.3 lpm (60.0 s, 1 samples) Shell Scripts (8 concurrent) 4972.7 lpm (60.0 s, 1 samples) System Call Overhead 1448942.6 lps (10.0 s, 2 samples) System Benchmarks Index Values BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 41001790.5 3513.4 Double-Precision Whetstone 55.0 5036.1 915.7 Execl Throughput 43.0 4482.0 1042.3 File Copy 1024 bufsize 2000 maxblocks 3960.0 654904.0 1653.8 File Copy 256 bufsize 500 maxblocks 1655.0 173158.0 1046.3 File Copy 4096 bufsize 8000 maxblocks 5800.0 2008222.0 3462.5 Pipe Throughput 12440.0 1370314.7 1101.5 Pipe-based Context Switching 4000.0 126314.0 315.8 Process Creation 126.0 9063.9 719.4 Shell Scripts (1 concurrent) 42.4 12506.3 2949.6 Shell Scripts (8 concurrent) 6.0 4972.7 8287.8 System Call Overhead 15000.0 1448942.6 966.0 ======== System Benchmarks Index Score 1488.9 ------------------------------------------------------------------------ Benchmark Run: 一 6月 16 2025 23:07:11 - 23:16:11 8 CPUs in system; running 8 parallel copies of tests Dhrystone 2 using register variables 221753204.3 lps (10.0 s, 2 samples) Double-Precision Whetstone 37215.6 MWIPS (10.0 s, 2 samples) Execl Throughput 24319.0 lps (30.0 s, 1 samples) File Copy 1024 bufsize 2000 maxblocks 3656936.0 KBps (30.0 s, 1 samples) File Copy 256 bufsize 500 maxblocks 1016886.0 KBps (30.0 s, 1 samples) File Copy 4096 bufsize 8000 maxblocks 7966493.0 KBps (30.0 s, 1 samples) Pipe Throughput 8211487.8 lps (10.0 s, 2 samples) Pipe-based Context Switching 1066013.7 lps (10.0 s, 2 samples) Process Creation 50743.5 lps (30.0 s, 1 samples) Shell Scripts (1 concurrent) 43664.4 lpm (60.0 s, 1 samples) Shell Scripts (8 concurrent) 5674.7 lpm (60.0 s, 1 samples) System Call Overhead 9320000.0 lps (10.0 s, 2 samples) System Benchmarks Index Values BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 221753204.3 19002.0 Double-Precision Whetstone 55.0 37215.6 6766.5 Execl Throughput 43.0 24319.0 5655.6 File Copy 1024 bufsize 2000 maxblocks 3960.0 3656936.0 9234.7 File Copy 256 bufsize 500 maxblocks 1655.0 1016886.0 6144.3 File Copy 4096 bufsize 8000 maxblocks 5800.0 7966493.0 13735.3 Pipe Throughput 12440.0 8211487.8 6600.9 Pipe-based Context Switching 4000.0 1066013.7 2665.0 Process Creation 126.0 50743.5 4027.3 Shell Scripts (1 concurrent) 42.4 43664.4 10298.2 Shell Scripts (8 concurrent) 6.0 5674.7 9457.8 System Call Overhead 15000.0 9320000.0 6213.3 ======== System Benchmarks Index Score 7336.1 Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
1 parent e83cbc9 commit 507d3ad

2 files changed

Lines changed: 4 additions & 1 deletion

File tree

arch/loongarch/kernel/entry.S

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,8 @@ SYM_CODE_START(handle_syscall)
3030
addi.d sp, sp, -PT_SIZE
3131
cfi_st t2, PT_R3
3232
cfi_rel_offset sp, PT_R3
33-
st.d zero, sp, PT_R0
33+
# Note it will be set in do_syscall regs->regs[0] = 0;
34+
# st.d zero, sp, PT_R0
3435
csrrd t2, LOONGARCH_CSR_PRMD
3536
st.d t2, sp, PT_PRMD
3637
csrrd t2, LOONGARCH_CSR_CRMD

arch/loongarch/kernel/syscall.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,8 @@ void noinstr do_syscall(struct pt_regs *regs)
4444
sys_call_fn syscall_fn;
4545

4646
nr = regs->regs[11];
47+
// Move from handle_syscall macro to save a memio
48+
regs->regs[0] = 0;
4749
/* Set for syscall restarting */
4850
if (nr < NR_syscalls)
4951
regs->regs[0] = nr + 1;

0 commit comments

Comments
 (0)