Performance Philosophy

RustAPI is built on a simple premise: Abstractions shouldn't cost you runtime performance.

We leverage Rust's unique ownership system and modern async ecosystem (Tokio, Hyper) to deliver performance that rivals C++ servers, while maintaining developer safe-guards.

The Pillars of Speed

1. Zero-Copy Networking

Where possible, RustAPI avoids copying memory. When you receive a large JSON payload or file upload, we aim to pass pointers to the underlying memory buffer rather than cloning the data.

Bytes over Vec<u8>: We use the bytes crate extensively. Passing a Bytes object around is O(1) (it's just a reference-counted pointer and length), whereas cloning a Vec<u8> is O(n).
String View: Extractors like Path and Query often leverage Cow<'str, str> (Clone on Write) to avoid allocations if the data doesn't need to be modified.

2. Multi-Core Async Runtime

RustAPI runs on Tokio, a work-stealing, multi-threaded runtime.

Non-blocking I/O: A single thread can handle thousands of concurrent idle connections (e.g., WebSockets waiting for messages) with minimal memory overhead.
Work Stealing: If one CPU core is overloaded with tasks, other idle cores will "steal" work from its queue, ensuring balanced utilization of your hardware.

3. Compile-Time Router

Our router (matchit) is based on a Radix Trie structure.

O(log n) Lookup: Route matching speed depends on the length of the URL, not the number of routes defined. Having 10 routes or 10,000 routes has negligible impact on routing latency.
Allocation-Free Matching: For standard paths, routing decisions happen without heap allocations.

Memory Management

Stack vs. Heap

RustAPI encourages stack allocation for small, short-lived data.

Extractors are often allocated on the stack.
Response bodies are streamed, meaning a 1GB file download doesn't require 1GB of RAM. It flows through a small, constant-sized buffer.

Connection Pooling

For database performance, we strongly recommend using connection pooling (e.g., sqlx::Pool).

Reuse: Establishing a TCP connection and performing a simplified SSL handshake for every request is slow. Pooling keeps connections open and ready.
Multiplexing: Some drivers allow multiple queries to be in-flight on a single connection simultaneously.

Optimizing Your App

To get the most out of RustAPI, follow these guidelines:

Avoid Blocking the Async Executor: Never run CPU-intensive tasks (cryptography, image processing) or blocking I/O (std::fs::read) directly in an async handler.

Solution: Use tokio::task::spawn_blocking to offload these to a dedicated thread pool.

// BAD: Blocks the thread, potentially stalling other requests
fn handler() {
    let digest = tough_crypto_hash(data); 
}

// GOOD: Runs on a thread meant for blocking work
async fn handler() {
    let digest = tokio::task::spawn_blocking(move || {
        tough_crypto_hash(data)
    }).await.unwrap();
}

JSON Serialization: While serde is fast, JSON text processing is CPU heavy.
- For extremely high-throughput endpoints, consider binary formats like Protobuf or MessagePack if the client supports it.
Keep State Light: Your State struct is cloned for every request. Wrap large shared data in Arc<T> so only the pointer is cloned, not the data itself.

// Fast
#[derive(Clone)]
struct AppState {
    db: PgPool,                // Internally uses Arc
    config: Arc<Config>,       // Wrapped in Arc manually
}

Benchmarking

Performance is not a guessing game, but it is very easy to misquote stale numbers.

For that reason, RustAPI keeps its benchmark publication policy and canonical claims in docs/PERFORMANCE_BENCHMARKS.md.

Use that document for:

the current benchmark source of truth,
publication rules for new public claims,
local and CI benchmark entry points, and
historical-vs-current benchmark context.

Run benchmarks locally

From the repository root:

./scripts/bench.ps1

That currently executes cargo bench --workspace.

CI benchmark path

The repository also includes .github/workflows/benchmark.yml, which runs the same benchmark command and uploads the raw benchmark output as an artifact.

What to publish with benchmark results

Whenever you publish new numbers, include at minimum:

hardware and OS
Rust toolchain version
command and workload description
enabled feature flags
throughput plus $p50$, $p95$, and $p99$ latency
memory usage when available

Why So Fast?

Optimization	Description
⚡ SIMD-JSON	2-4x faster JSON parsing with `core-simd-json` feature
🔄 Zero-copy parsing	Direct memory access for path/query params
📦 SmallVec PathParams	Stack-optimized path parameters
🎯 Compile-time dispatch	All extractors resolved at compile time
🌊 Streaming bodies	Handle large uploads without memory bloat

Remember: RustAPI provides the capability for high performance, but your application logic ultimately dictates the speed. Use tools like wrk, k6, or drill to stress-test your specific endpoints.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance Philosophy

The Pillars of Speed

1. Zero-Copy Networking

2. Multi-Core Async Runtime

3. Compile-Time Router

Memory Management

Stack vs. Heap

Connection Pooling

Optimizing Your App

Benchmarking

Run benchmarks locally

CI benchmark path

What to publish with benchmark results

Why So Fast?

Uh oh!

FilesExpand file tree

performance.md

Latest commit

History

performance.md

File metadata and controls

Performance Philosophy

The Pillars of Speed

1. Zero-Copy Networking

2. Multi-Core Async Runtime

3. Compile-Time Router

Memory Management

Stack vs. Heap

Connection Pooling

Optimizing Your App

Benchmarking

Run benchmarks locally

CI benchmark path

What to publish with benchmark results

Why So Fast?