Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
09d19a0
Add benchmark/test harness and CI for FEC and CRC32 vectorization
connollydavid Feb 20, 2026
eed6419
Replace CRC32 (zlib) with CRC32C (Castagnoli) in production packet path
connollydavid Feb 20, 2026
d14f368
Vectorize addmul1 with SSSE3 (x86_64) and NEON (aarch64) nibble lookup
connollydavid Feb 20, 2026
68b17d9
Document scalar fallback as deliberately not auto-vectorizable
connollydavid Feb 20, 2026
3ef8b99
Add AVX2 addmul1 path with runtime CPUID dispatch
connollydavid Feb 20, 2026
385ee3d
Refactor packet cook pipeline into context struct for benchmarking
connollydavid Feb 20, 2026
7488a5b
Vectorize XOR loops in packet cook with SSE2 and NEON
connollydavid Feb 20, 2026
30bb2e2
Eliminate per-call malloc in fec_decode
connollydavid Feb 20, 2026
241041d
Add OPTIMIZATION.md documenting SIMD vectorization results
connollydavid Feb 20, 2026
107d3e4
Add CI static builds and profiling script for target hardware
connollydavid Feb 20, 2026
aac95f8
Add GitHub Releases workflow and project plan
connollydavid Feb 25, 2026
d95ea86
Add .gitignore to exclude private .claude directory
connollydavid Feb 25, 2026
6763602
Add end-to-end tunnel throughput harness and CI workflow
connollydavid Feb 25, 2026
e7410a4
Fix throughput.sh permission denied in CI
connollydavid Feb 25, 2026
58fe765
Fix JSON output in throughput harness
connollydavid Feb 25, 2026
e080e0f
Fix throughput.sh: wait for receiver instead of killing it
connollydavid Feb 25, 2026
3965702
Add baseline results to throughput benchmark tracking
connollydavid Feb 25, 2026
37808a2
Switch throughput unit from MB/s to Mbps
connollydavid Feb 25, 2026
a636b45
Improve benchmark harness stability and accuracy
connollydavid Feb 25, 2026
649fd4d
Reduce per-packet overhead: zero-copy conv header, skip delay memcpy,…
connollydavid Feb 25, 2026
00ba006
Fix inverted throughput alert threshold
connollydavid Feb 25, 2026
77eb8d3
Fix const sockaddr* build error in extracted functions
connollydavid Feb 25, 2026
452493e
Add io_uring multishot receive integration
connollydavid Feb 25, 2026
a623c5d
Fix io_uring build: C++ atomics, remove struct fallbacks
connollydavid Feb 25, 2026
a4d32c3
Fix crash: validate IV length in de_obscure against configured range
connollydavid Feb 26, 2026
bb36496
Fix io_uring recvmsg parsing, optimize CQ/buffer batching
connollydavid Feb 26, 2026
7b8b1ee
Document io_uring optimization: +27% throughput over recvfrom
connollydavid Feb 26, 2026
a563c5e
Batch FEC output sends with sendmmsg to reduce syscall overhead
connollydavid Feb 26, 2026
de1d99f
Replace std::map with flat array in FEC decode hot path
connollydavid Feb 26, 2026
fd7c555
Zero-copy io_uring recv: eliminate per-packet memcpy for conv header
connollydavid Feb 26, 2026
f9bb478
Replace anti_replay unordered_map with direct-mapped table
connollydavid Feb 26, 2026
a0e8d1f
Replace fec_group unordered_map with direct-mapped flat table
connollydavid Feb 26, 2026
6044a17
Update OPTIMIZATION.md with full optimization series results
connollydavid Feb 26, 2026
2f0ac6a
Add PowerPC e500v2 SPE support for XOR cook pipeline
connollydavid Feb 26, 2026
2eac606
Document PPC e500v2 results and cross-architecture notes
connollydavid Feb 26, 2026
b7ebb37
Harden codebase: fix UB, validate inputs, expand test coverage
connollydavid Feb 27, 2026
fbdbbb8
Add cross-architecture interop tests with MIPS and RISC-V support
connollydavid Feb 27, 2026
d3470d5
Eliminate remaining hot-path overhead: targeted sendmmsg init, skip r…
connollydavid Feb 27, 2026
6e63bba
Document final micro-optimizations and RS rewrite analysis
connollydavid Feb 27, 2026
1f22e31
Fix __BYTE_ORDER redefinition warning on musl toolchains
connollydavid Feb 27, 2026
ec732d1
Add -lgcc_eh to cross targets for musl static linking
connollydavid Feb 27, 2026
ba80f87
Auto-dump tunnel logs on interop test failure
connollydavid Feb 27, 2026
23cf4a2
Fix interop log capture: UDPspeeder logs to stdout, not stderr
connollydavid Feb 27, 2026
41ed115
Fix SPE xor_tile alignment bug + diagnostic CI for PPC interop
connollydavid Feb 27, 2026
09eb177
Remove diagnostic trace logging for PPC interop tests
connollydavid Feb 27, 2026
73a6487
Add AVX-512BW addmul1 and XOR cook with runtime CPUID dispatch
connollydavid Feb 28, 2026
f16fa22
Print detected SIMD tier in benchmark output
connollydavid Feb 28, 2026
979b8f5
Unroll NEON addmul1 and XOR cook loops 2x for ILP
connollydavid Feb 28, 2026
3374e3b
Add workflow_dispatch trigger to CI for manual runs
connollydavid Feb 28, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
300 changes: 300 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,300 @@
name: CI

on:
push:
branches: [branch_libev, master]
pull_request:
branches: [branch_libev, master]
workflow_dispatch:

permissions:
contents: write
deployments: write

jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Run correctness tests
run: make test

- name: Build benchmarks
run: make bench

- name: Run benchmarks
run: ./bench_udpspeeder

- name: Run benchmarks (JSON)
run: taskset -c 0 ./bench_udpspeeder --json

- name: Build production binary
run: make all

- name: Throughput test (io_uring)
run: bash bench/throughput.sh ./speederv2 --iterations 3 --duration 5

- name: Throughput test (recvfrom baseline)
run: UDPSPEEDER_NO_URING=1 bash bench/throughput.sh ./speederv2 --iterations 3 --duration 5

- name: Store benchmark results
if: github.ref == 'refs/heads/branch_libev'
uses: benchmark-action/github-action-benchmark@v1
with:
name: UDPspeeder Benchmarks
tool: customSmallerIsBetter
output-file-path: bench_results.json
github-token: ${{ secrets.GITHUB_TOKEN }}
auto-push: true
alert-threshold: '115%'
comment-on-alert: true
fail-on-alert: false
benchmark-data-dir-path: dev/bench

build-static:
runs-on: ubuntu-latest
strategy:
matrix:
include:
- name: x86_64
packages: ""
toolchain_url: ""
bench_target: bench-static
test_target: test-static
prod_target: all
make_args: ""
bench_bin: bench_udpspeeder_static
test_bin: test_udpspeeder_static
prod_bin: speederv2
qemu_cmd: ""
- name: aarch64
packages: g++-aarch64-linux-gnu qemu-user-static
toolchain_url: ""
bench_target: bench-cross
test_target: test-cross
prod_target: all-cross
make_args: "CC=aarch64-linux-gnu-g++"
bench_bin: bench_udpspeeder_cross
test_bin: test_udpspeeder_cross
prod_bin: speederv2_cross
qemu_cmd: qemu-aarch64-static
- name: mips
packages: g++-mips-linux-gnu qemu-user-static
toolchain_url: ""
bench_target: bench-cross
test_target: test-cross
prod_target: all-cross
make_args: "CC=mips-linux-gnu-g++"
bench_bin: bench_udpspeeder_cross
test_bin: test_udpspeeder_cross
prod_bin: speederv2_cross
qemu_cmd: qemu-mips-static
- name: powerpc
packages: qemu-user-static zstd
toolchain_url: "https://downloads.openwrt.org/releases/25.12.0-rc5/targets/mpc85xx/p1010/openwrt-toolchain-25.12.0-rc5-mpc85xx-p1010_gcc-14.3.0_musl.Linux-x86_64.tar.zst"
bench_target: bench-cross
test_target: test-cross
prod_target: all-cross
make_args: "SPE=1"
bench_bin: bench_udpspeeder_cross
test_bin: test_udpspeeder_cross
prod_bin: speederv2_cross
qemu_cmd: "qemu-ppc-static -cpu e500v2"
- name: riscv64
packages: qemu-user-static zstd
toolchain_url: "https://downloads.openwrt.org/releases/24.10.0/targets/sifiveu/generic/openwrt-toolchain-24.10.0-sifiveu-generic_gcc-13.3.0_musl.Linux-x86_64.tar.zst"
bench_target: bench-cross
test_target: test-cross
prod_target: all-cross
make_args: ""
bench_bin: bench_udpspeeder_cross
test_bin: test_udpspeeder_cross
prod_bin: speederv2_cross
qemu_cmd: qemu-riscv64-static
steps:
- uses: actions/checkout@v4

- name: Install packages
if: matrix.packages != ''
run: sudo apt-get update && sudo apt-get install -y ${{ matrix.packages }}

- name: Download OpenWrt toolchain
if: matrix.toolchain_url != ''
run: |
curl -fSL "${{ matrix.toolchain_url }}" -o toolchain.tar.zst
mkdir -p /tmp/openwrt-toolchain
tar --zstd -xf toolchain.tar.zst -C /tmp/openwrt-toolchain --strip-components=1
OPENWRT_GXX=$(find /tmp/openwrt-toolchain -name '*-g++' -path '*/bin/*' | head -1)
echo "OPENWRT_GXX=${OPENWRT_GXX}" >> "$GITHUB_ENV"
echo "STAGING_DIR=/tmp/openwrt-toolchain" >> "$GITHUB_ENV"
echo "Found toolchain: ${OPENWRT_GXX}"

- name: Build bench (${{ matrix.name }})
run: |
ARGS="${{ matrix.make_args }}"
if [ -n "${OPENWRT_GXX:-}" ]; then
ARGS="CC=${OPENWRT_GXX} ${ARGS}"
fi
make ${{ matrix.bench_target }} ${ARGS}

- name: Build test (${{ matrix.name }})
run: |
ARGS="${{ matrix.make_args }}"
if [ -n "${OPENWRT_GXX:-}" ]; then
ARGS="CC=${OPENWRT_GXX} ${ARGS}"
fi
make ${{ matrix.test_target }} ${ARGS}

- name: Build production (${{ matrix.name }})
run: |
ARGS="${{ matrix.make_args }}"
if [ -n "${OPENWRT_GXX:-}" ]; then
ARGS="CC=${OPENWRT_GXX} ${ARGS}"
fi
make ${{ matrix.prod_target }} ${ARGS}

- name: Verify binaries
run: file ${{ matrix.bench_bin }} ${{ matrix.test_bin }} ${{ matrix.prod_bin }}

- name: Run tests (QEMU)
if: matrix.qemu_cmd != ''
run: ${{ matrix.qemu_cmd }} ./${{ matrix.test_bin }}

- name: Run benchmarks (QEMU)
if: matrix.qemu_cmd != ''
run: ${{ matrix.qemu_cmd }} ./${{ matrix.bench_bin }}

- name: Run benchmarks JSON (QEMU)
if: matrix.qemu_cmd != '' && matrix.name == 'powerpc'
run: ${{ matrix.qemu_cmd }} ./${{ matrix.bench_bin }} --json

- name: Store PPC benchmark results
if: matrix.name == 'powerpc' && github.ref == 'refs/heads/branch_libev'
uses: benchmark-action/github-action-benchmark@v1
with:
name: UDPspeeder Benchmarks (PowerPC e500v2 via QEMU)
tool: customSmallerIsBetter
output-file-path: bench_results.json
github-token: ${{ secrets.GITHUB_TOKEN }}
auto-push: true
alert-threshold: '200%'
comment-on-alert: true
fail-on-alert: false
benchmark-data-dir-path: dev/bench-powerpc

- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: udpspeeder-${{ matrix.name }}
path: |
${{ matrix.bench_bin }}
${{ matrix.test_bin }}
${{ matrix.prod_bin }}
bench/profile.sh

interop:
runs-on: ubuntu-latest
needs: [build-static]
steps:
- uses: actions/checkout@v4

- name: Install QEMU
run: sudo apt-get update && sudo apt-get install -y qemu-user-static

- name: Download x86_64 artifact
uses: actions/download-artifact@v4
with:
name: udpspeeder-x86_64
path: bin/x86_64

- name: Download aarch64 artifact
uses: actions/download-artifact@v4
with:
name: udpspeeder-aarch64
path: bin/aarch64

- name: Download mips artifact
uses: actions/download-artifact@v4
with:
name: udpspeeder-mips
path: bin/mips

- name: Download powerpc artifact
uses: actions/download-artifact@v4
with:
name: udpspeeder-powerpc
path: bin/powerpc

- name: Download riscv64 artifact
uses: actions/download-artifact@v4
with:
name: udpspeeder-riscv64
path: bin/riscv64

- name: Set executable permissions
run: chmod +x bin/*/speederv2 bin/*/speederv2_cross

- name: Run cross-architecture interop tests
run: |
set -e
PASS=0
FAIL=0

X86="bin/x86_64/speederv2"
ARM="qemu-aarch64-static bin/aarch64/speederv2_cross"
MIPS="qemu-mips-static bin/mips/speederv2_cross"
PPC="qemu-ppc-static -cpu e500v2 bin/powerpc/speederv2_cross"
RV64="qemu-riscv64-static bin/riscv64/speederv2_cross"

TESTS=(
"x86-server_arm-client|$X86|$ARM"
"arm-server_x86-client|$ARM|$X86"
"x86-server_mips-client|$X86|$MIPS"
"mips-server_x86-client|$MIPS|$X86"
"x86-server_ppc-client|$X86|$PPC"
"ppc-server_x86-client|$PPC|$X86"
"x86-server_rv64-client|$X86|$RV64"
"rv64-server_x86-client|$RV64|$X86"
)

CONFIGS=(
"--disable-fec|no-fec"
"--disable-fec --key testkey123|no-fec-key"
"--fec 20:10|fec-20-10"
"--fec 20:10 --key testkey123|fec-20-10-key"
)

for entry in "${TESTS[@]}"; do
IFS='|' read -r pair_name server_cmd client_cmd <<< "$entry"
for cfg in "${CONFIGS[@]}"; do
IFS='|' read -r cfg_args cfg_label <<< "$cfg"
label="${pair_name}/${cfg_label}"

echo ""
echo "=========================================="
echo " TESTING: $label"
echo "=========================================="

if bash bench/interop.sh \
--server-cmd "$server_cmd" \
--client-cmd "$client_cmd" \
$cfg_args \
--label "$label" \
--packets 200; then
PASS=$((PASS + 1))
else
FAIL=$((FAIL + 1))
echo "^^^ FAILED: $label ^^^"
fi
done
done

echo ""
echo "=========================================="
echo " INTEROP RESULTS: $PASS passed, $FAIL failed"
echo "=========================================="

if [[ $FAIL -ne 0 ]]; then
exit 1
fi
73 changes: 73 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
name: Release

on:
push:
tags:
- 'v*'

permissions:
contents: write

jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Run tests
run: make test

build:
needs: test
runs-on: ubuntu-latest
strategy:
matrix:
include:
- name: x86_64
packages: ""
target: all
make_args: ""
src_bin: speederv2
release_bin: speederv2_linux_x86_64
- name: aarch64
packages: g++-aarch64-linux-gnu
target: all-cross
make_args: "CC=aarch64-linux-gnu-g++"
src_bin: speederv2_cross
release_bin: speederv2_linux_aarch64
steps:
- uses: actions/checkout@v4

- name: Install toolchain
if: matrix.packages != ''
run: sudo apt-get update && sudo apt-get install -y ${{ matrix.packages }}

- name: Build production binary
run: make ${{ matrix.target }} ${{ matrix.make_args }}

- name: Verify binary
run: file ${{ matrix.src_bin }}

- name: Rename binary
run: mv ${{ matrix.src_bin }} ${{ matrix.release_bin }}

- name: Upload artifact
uses: actions/upload-artifact@v4
with:
name: ${{ matrix.release_bin }}
path: ${{ matrix.release_bin }}

release:
needs: build
runs-on: ubuntu-latest
steps:
- name: Download all artifacts
uses: actions/download-artifact@v4
with:
path: artifacts

- name: Create GitHub Release
uses: softprops/action-gh-release@v2
with:
files: artifacts/**/*
generate_release_notes: true
Loading