Commit 507d3ad
committed
Loongarch: optimize syscall reg save
deepin inclusion
category: performance
It saves a st.d in the hot syscall path, and let
the compiler know to optimize it in asm,
and helps to improve the syscall performance little.
I have test in 3A6000 After patch:
Benchmark Run: 一 6月 16 2025 20:38:10 - 20:47:09
8 CPUs in system; running 1 parallel copy of tests
Dhrystone 2 using register variables 47066632.8 lps (10.0 s, 2 samples)
Double-Precision Whetstone 5036.1 MWIPS (10.0 s, 2 samples)
Execl Throughput 4484.2 lps (29.2 s, 1 samples)
File Copy 1024 bufsize 2000 maxblocks 656586.0 KBps (30.0 s, 1 samples)
File Copy 256 bufsize 500 maxblocks 175086.0 KBps (30.0 s, 1 samples)
File Copy 4096 bufsize 8000 maxblocks 1998702.0 KBps (30.0 s, 1 samples)
Pipe Throughput 1365130.7 lps (10.0 s, 2 samples)
Pipe-based Context Switching 126232.9 lps (10.0 s, 2 samples)
Process Creation 9202.7 lps (30.0 s, 1 samples)
Shell Scripts (1 concurrent) 12501.2 lpm (60.0 s, 1 samples)
Shell Scripts (8 concurrent) 4974.9 lpm (60.0 s, 1 samples)
System Call Overhead 1467021.7 lps (10.0 s, 2 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 47066632.8 4033.1
Double-Precision Whetstone 55.0 5036.1 915.7
Execl Throughput 43.0 4484.2 1042.8
File Copy 1024 bufsize 2000 maxblocks 3960.0 656586.0 1658.0
File Copy 256 bufsize 500 maxblocks 1655.0 175086.0 1057.9
File Copy 4096 bufsize 8000 maxblocks 5800.0 1998702.0 3446.0
Pipe Throughput 12440.0 1365130.7 1097.4
Pipe-based Context Switching 4000.0 126232.9 315.6
Process Creation 126.0 9202.7 730.4
Shell Scripts (1 concurrent) 42.4 12501.2 2948.4
Shell Scripts (8 concurrent) 6.0 4974.9 8291.5
System Call Overhead 15000.0 1467021.7 978.0
========
System Benchmarks Index Score 1510.2
------------------------------------------------------------------------
Benchmark Run: 一 6月 16 2025 20:47:09 - 20:56:08
8 CPUs in system; running 8 parallel copies of tests
Dhrystone 2 using register variables 221748966.2 lps (10.0 s, 2 samples)
Double-Precision Whetstone 37218.5 MWIPS (10.0 s, 2 samples)
Execl Throughput 24364.4 lps (29.0 s, 1 samples)
File Copy 1024 bufsize 2000 maxblocks 3681637.0 KBps (30.0 s, 1 samples)
File Copy 256 bufsize 500 maxblocks 1020033.0 KBps (30.0 s, 1 samples)
File Copy 4096 bufsize 8000 maxblocks 8054794.0 KBps (30.0 s, 1 samples)
Pipe Throughput 8209249.1 lps (10.0 s, 2 samples)
Pipe-based Context Switching 1058150.7 lps (10.0 s, 2 samples)
Process Creation 49636.4 lps (30.0 s, 1 samples)
Shell Scripts (1 concurrent) 43521.6 lpm (60.0 s, 1 samples)
Shell Scripts (8 concurrent) 5672.4 lpm (60.0 s, 1 samples)
System Call Overhead 9407101.4 lps (10.0 s, 2 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 221748966.2 19001.6
Double-Precision Whetstone 55.0 37218.5 6767.0
Execl Throughput 43.0 24364.4 5666.2
File Copy 1024 bufsize 2000 maxblocks 3960.0 3681637.0 9297.1
File Copy 256 bufsize 500 maxblocks 1655.0 1020033.0 6163.3
File Copy 4096 bufsize 8000 maxblocks 5800.0 8054794.0 13887.6
Pipe Throughput 12440.0 8209249.1 6599.1
Pipe-based Context Switching 4000.0 1058150.7 2645.4
Process Creation 126.0 49636.4 3939.4
Shell Scripts (1 concurrent) 42.4 43521.6 10264.5
Shell Scripts (8 concurrent) 6.0 5672.4 9454.0
System Call Overhead 15000.0 9407101.4 6271.4
========
System Benchmarks Index Score 7335.3
Before patch:
Benchmark Run: 一 6月 16 2025 22:58:12 - 23:07:11
8 CPUs in system; running 1 parallel copy of tests
Dhrystone 2 using register variables 41001790.5 lps (10.0 s, 2 samples)
Double-Precision Whetstone 5036.1 MWIPS (10.0 s, 2 samples)
Execl Throughput 4482.0 lps (29.6 s, 1 samples)
File Copy 1024 bufsize 2000 maxblocks 654904.0 KBps (30.0 s, 1 samples)
File Copy 256 bufsize 500 maxblocks 173158.0 KBps (30.0 s, 1 samples)
File Copy 4096 bufsize 8000 maxblocks 2008222.0 KBps (30.0 s, 1 samples)
Pipe Throughput 1370314.7 lps (10.0 s, 2 samples)
Pipe-based Context Switching 126314.0 lps (10.0 s, 2 samples)
Process Creation 9063.9 lps (30.0 s, 1 samples)
Shell Scripts (1 concurrent) 12506.3 lpm (60.0 s, 1 samples)
Shell Scripts (8 concurrent) 4972.7 lpm (60.0 s, 1 samples)
System Call Overhead 1448942.6 lps (10.0 s, 2 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 41001790.5 3513.4
Double-Precision Whetstone 55.0 5036.1 915.7
Execl Throughput 43.0 4482.0 1042.3
File Copy 1024 bufsize 2000 maxblocks 3960.0 654904.0 1653.8
File Copy 256 bufsize 500 maxblocks 1655.0 173158.0 1046.3
File Copy 4096 bufsize 8000 maxblocks 5800.0 2008222.0 3462.5
Pipe Throughput 12440.0 1370314.7 1101.5
Pipe-based Context Switching 4000.0 126314.0 315.8
Process Creation 126.0 9063.9 719.4
Shell Scripts (1 concurrent) 42.4 12506.3 2949.6
Shell Scripts (8 concurrent) 6.0 4972.7 8287.8
System Call Overhead 15000.0 1448942.6 966.0
========
System Benchmarks Index Score 1488.9
------------------------------------------------------------------------
Benchmark Run: 一 6月 16 2025 23:07:11 - 23:16:11
8 CPUs in system; running 8 parallel copies of tests
Dhrystone 2 using register variables 221753204.3 lps (10.0 s, 2 samples)
Double-Precision Whetstone 37215.6 MWIPS (10.0 s, 2 samples)
Execl Throughput 24319.0 lps (30.0 s, 1 samples)
File Copy 1024 bufsize 2000 maxblocks 3656936.0 KBps (30.0 s, 1 samples)
File Copy 256 bufsize 500 maxblocks 1016886.0 KBps (30.0 s, 1 samples)
File Copy 4096 bufsize 8000 maxblocks 7966493.0 KBps (30.0 s, 1 samples)
Pipe Throughput 8211487.8 lps (10.0 s, 2 samples)
Pipe-based Context Switching 1066013.7 lps (10.0 s, 2 samples)
Process Creation 50743.5 lps (30.0 s, 1 samples)
Shell Scripts (1 concurrent) 43664.4 lpm (60.0 s, 1 samples)
Shell Scripts (8 concurrent) 5674.7 lpm (60.0 s, 1 samples)
System Call Overhead 9320000.0 lps (10.0 s, 2 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 221753204.3 19002.0
Double-Precision Whetstone 55.0 37215.6 6766.5
Execl Throughput 43.0 24319.0 5655.6
File Copy 1024 bufsize 2000 maxblocks 3960.0 3656936.0 9234.7
File Copy 256 bufsize 500 maxblocks 1655.0 1016886.0 6144.3
File Copy 4096 bufsize 8000 maxblocks 5800.0 7966493.0 13735.3
Pipe Throughput 12440.0 8211487.8 6600.9
Pipe-based Context Switching 4000.0 1066013.7 2665.0
Process Creation 126.0 50743.5 4027.3
Shell Scripts (1 concurrent) 42.4 43664.4 10298.2
Shell Scripts (8 concurrent) 6.0 5674.7 9457.8
System Call Overhead 15000.0 9320000.0 6213.3
========
System Benchmarks Index Score 7336.1
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>1 parent e83cbc9 commit 507d3ad
2 files changed
Lines changed: 4 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
| 33 | + | |
| 34 | + | |
34 | 35 | | |
35 | 36 | | |
36 | 37 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
| 47 | + | |
| 48 | + | |
47 | 49 | | |
48 | 50 | | |
49 | 51 | | |
| |||
0 commit comments