Name	Name	Last commit message	Last commit date
parent directory ..
producer	producer
README.md	README.md
consumer.py	consumer.py

cpp_producer_python_consumer

C++ process pushes f32[256] feature vectors into a Tachyon ring buffer. Python process drains them into a pre-allocated NumPy buffer and runs batched inference. Identical wire format and consumer pattern to rust_producer_python_consumer — the difference is the producer.

Results

Hardware: i7-12650H · DDR5-5600 · Fedora 43 · Linux 6.19.11 · no CPU isolation.

Metric	Value
Frames	500 000
Payload per frame	1 024 bytes (`f32[256]`)
Batch size (compute)	256 frames
Producer throughput	374 858 frames/sec
Consumer throughput	370 877 frames/sec
Bandwidth	0.38 GB/s

Wire format

FeatureVector  1024 bytes  little-endian
  [0..1024]    f32[256]    value = frame_idx + i * 0.001

Sentinel       1024 bytes  all zeros  type_id = 0

Build

# Python consumer
pipenv install torch   # or: pipenv install numpy

# C++ producer — requires GCC ≥ 14 or Clang ≥ 17
cd examples/cpp_producer_python_consumer/producer
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)

Run

# terminal 1 — consumer first (owns the socket)
pipenv run python examples/cpp_producer_python_consumer/consumer.py

# terminal 2
./examples/cpp_producer_python_consumer/producer/build/tachyon_cpp_producer

C++ vs Rust producer

The C++ producer is faster than the Rust producer (no FFI layer, no safety checks at the call site), which means it saturates the ring buffer more frequently. Saturation forces the producer into the 10 µs sleep branch of the backpressure loop more often, intermittently starving the Python consumer of CPU time. This is why throughput is lower (370K vs 466K f/s) despite the producer being intrinsically faster.

The +73% gain over the naive implementation (214K → 370K f/s) came from a single fix: removing tachyon_flush() from the backpressure loop. tachyon_flush() acquires consumer_lock internally; calling it while the buffer is full prevented acquire_rx() from draining the ring.

// Correct backpressure — no lock contention
static void *acquire_with_backpressure(tachyon_bus_t *bus, size_t size) {
    for (unsigned spins = 0;;) {
        void *ptr = tachyon_acquire_tx(bus, size);
        if (ptr) return ptr;
        // Do NOT call tachyon_flush() here.
        if (++spins < 32) { __asm__ volatile("pause" ::: "memory"); }
        else { std::this_thread::sleep_for(std::chrono::microseconds(10)); spins = 0; }
    }
}

Receive pattern

Same two-phase pattern as the Rust example. consumer_lock held for ~100 ns per frame; inference runs outside the SHM context.

with raw_bus.acquire_rx() as rx:  # consumer_lock held ~100 ns
    with memoryview(rx) as mv:
        accum[batch_idx] = np.frombuffer(mv, dtype=np.float32)

if batch_idx == BATCH_SIZE:  # inference outside SHM
    out = torch.from_numpy(accum) @ WEIGHTS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

cpp_producer_python_consumer

Results

Wire format

Build

Run

C++ vs Rust producer

Receive pattern

FilesExpand file tree

cpp_producer_python_consumer

Directory actions

More options

Directory actions

More options

Latest commit

History

cpp_producer_python_consumer

Folders and files

parent directory

README.md

cpp_producer_python_consumer

Results

Wire format

Build

Run

C++ vs Rust producer

Receive pattern