This is the Opus encoder/decoder adapted for the Nuclei CPU.
The origin source code is available here, current version is v1.5.2.
We did not compile the code into a library as in the original repository, but instead compiled the executable directly from the source code. opus_demo is a test program we designed, which is suitable for running directly on the bare-metal Nuclei CPU.
In opus_demo, we prepare a piece of audio data and first encode it by calling opus_encode, and then decode the encoded data by calling opus_decode to obtain the processed audio data.
- We analyze the processed audio data to ensure the correctness of the opus encoding and decoding functionality;
- We measure the execution time of
opus_encodeandopus_decodeto assess their performance.
| Directory | Description |
|---|---|
| celt | celt source files which is a part of Opus |
| silk | silk source files which is a part of Opus |
| include | Opus header files |
| src | Opus source files |
| data | data manipulation source files and some test results |
| reference | same test code but run on operating system as a reference |
Please refer to the Prerequests section in the README.md of parent directory.
Opus has both floating point and fixed-point implementation. Nuclei CPU support some extensions, such as B (Bitmanip) extension and P extension, which can enhance the performance of the codec. So there are some different build options for both floating point and fixed-point version.
First, change to the directory where Makefile is located. We take Nuclei N300 CPU as an example.
To build fixed-point version without extension:
make CORE=n300 ARCH_EXT= FIXED_POINT=1 allTo build floating-point version without extension:
make CORE=n300fd ARCH_EXT= FIXED_POINT=0 allTo build fixed-point version with B and P extension:
make CORE=n300 ARCH_EXT=_zba_zbb_zbc_zbs_xxldspn3x FIXED_POINT=1 allTo build floating-point version with B and P extension:
make CORE=n300fd ARCH_EXT=_zba_zbb_zbc_zbs_xxldspn3x FIXED_POINT=0 allFor more information about Nuclei CPU Architecture extension, please refer to ARCH_EXT section in Nuclei SDK documentation.
We have two opus_demo.c files, one opus_demo.c is for Nulcei CPU, and the other reference/opus_demo.c is for running on operating system with File I/O.
The test audio is in_1s.wav, which is a 1-second duration, 16k sample rate, PCM_S16LE format audio file. We extract the data from in_1s.wav to obtain in_1s.raw.
For Nuclei CPU baremetal environment, we use xxd tool to convert the raw format file in_1s.raw into data stored in data/in_1s.h.
When run on Nuclei CPU, we print the decode output data to console when uncomment the DUMP_DEC_RESULT macro in opus_demo.c. Then we save these data into a log file data/test/n300_fixed.txt and data/test/n300_float.txt manually. And you can convert the data to raw format file by to_raw.py. The raw files can be easily loaded by Audacity to show waveforms as below:
Although these four audio are not exactly same, but they are very close to each other. And people can hardly tell the difference between them by ear.
The encoder process 20 ms of audio data each time, so the 1s duration of audio data is divided into 50 frames. The decoder should follow inverse order, so the decoder should decode the frames for 50 times. We record the CPU cycles consumed to process each frame, and caclulate the average cycles as shown in the following table.
To show the performance of Nuclei CPU extensions, we compare the cpu cycles consumed between w/ and w/o extension. For w/o extension, the build option is ARCH_EXT=, for w/ extension, the build option is ARCH_EXT=_zba_zbb_zbc_zbs_xxldspn3x.
Test bitstream: n300_dual_best_config_ku060_16M_7cd945994_18d811786_202408191002.bit
These results can be easily calculated by data/bench/cmp.py.
| case | w/o ext (avg cycles) | w/ ext (avg cycles) | speedup ratio |
|---|---|---|---|
| encode | 6951502.84 | 5716036.98 | 1.22 |
| decode | 112640.22 | 106151.44 | 1.06 |
| case | w/o ext (avg cycles) | w/ ext (avg cycles) | speedup ratio |
|---|---|---|---|
| encode | 2889317.46 | 2861932.38 | 1.01 |
| decode | 133390.06 | 126119.08 | 1.06 |
| operator/function | description | file |
|---|---|---|
| SATURATE16 | using SCLI32 to replace | fixed_riscv.h:11 |
| MAX32 | using MAXW to replace | fixed_riscv.h:15 |
| MIN32 | using MINW to replace | fixed_riscv.h:18 |
| VSHR32 | using KSLRAW to replace | fixed_riscv.h:21 |
| MULT16_16_Q15 | using KHMBB to replace | fixed_riscv.h:24 |
| PSHR32 | using SRA_U to replace | fixed_riscv.h:27 |
