Skip to content

feat: Add json+cuda_ipc array encoding for GPU-direct tensor transfer#588

Draft
dionhaefner wants to merge 1 commit into
mainfrom
dion/gpu-go-brr
Draft

feat: Add json+cuda_ipc array encoding for GPU-direct tensor transfer#588
dionhaefner wants to merge 1 commit into
mainfrom
dion/gpu-go-brr

Conversation

@dionhaefner
Copy link
Copy Markdown
Contributor

Summary

  • New cuda_ipc array encoding that passes CUDA IPC memory handles instead of serialized tensor bytes
  • Framework-agnostic: works with any GPU array that implements __cuda_array_interface__ (PyTorch, CuPy, JAX, Numba)
  • Uses ctypes calls to libcudart directly — no PyTorch or CuPy dependency in the encode path
  • Decode path requires CuPy (returns cupy.ndarray); consumers convert via torch.as_tensor() or DLPack as needed
  • Containers launched with json+cuda_ipc automatically get --ipc=host

What problem does this solve?

Tesseract's data path is CPU-bound. Array encoding serializes via JSON/base64/binref, and copies to CPU at every boundary. For tight composition loops (optimization, MCMC), the GPU→CPU→serialize→network→CPU→GPU round-trip dominates wall time.

With cuda_ipc, tensors stay on the GPU, while the CPU only handles metadata (like shape and dtype) – this is essentially binref for GPU memory.

Usage

# Container path
t = Tesseract.from_image("my_gpu_tesseract", gpus=["0"], output_format="json+cuda_ipc")
t.serve()
result = t.apply({"x": cupy_array})  # or torch tensor, or any __cuda_array_interface__ object

# Local path
t = Tesseract.from_tesseract_api("tesseract_api.py", output_format="json+cuda_ipc")

Requirements

  • CUDA runtime (libcudart.so) on both producer and consumer
  • CuPy for decoding (pip install cupy-cuda12x)
  • --ipc=host for cross-container IPC (handled automatically by engine.py)
  • Both processes must see the same physical GPU

@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

❌ Patch coverage is 22.40000% with 97 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.48%. Comparing base (04b7550) to head (67c3150).

Files with missing lines Patch % Lines
tesseract_core/runtime/array_encoding.py 21.35% 80 Missing and 1 partial ⚠️
tesseract_core/sdk/tesseract.py 31.25% 10 Missing and 1 partial ⚠️
tesseract_core/sdk/engine.py 0.00% 2 Missing and 1 partial ⚠️
tesseract_core/runtime/file_interactions.py 33.33% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #588      +/-   ##
==========================================
- Coverage   77.08%   75.48%   -1.61%     
==========================================
  Files          32       32              
  Lines        4487     4606     +119     
  Branches      738      760      +22     
==========================================
+ Hits         3459     3477      +18     
- Misses        725      822      +97     
- Partials      303      307       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PasteurBot
Copy link
Copy Markdown
Contributor

Benchmark Results

Benchmarks use a no-op Tesseract to measure pure framework overhead.

🚀 0 faster, ⚠️ 0 slower, ✅ 36 unchanged

✅ No significant performance changes detected.

Full results
Benchmark Baseline Current Change Status
api/apply_1,000 0.402ms 0.393ms -2.2%
api/apply_100,000 0.408ms 0.400ms -1.9%
api/apply_10,000,000 0.407ms 0.397ms -2.5%
cli/apply_1,000 1672.317ms 1648.496ms -1.4%
cli/apply_100,000 1702.285ms 1680.354ms -1.3%
cli/apply_10,000,000 1766.081ms 1730.622ms -2.0%
decoding/base64_1,000 0.027ms 0.027ms -1.5%
decoding/base64_100,000 0.751ms 0.758ms +0.9%
decoding/base64_10,000,000 139.384ms 139.584ms +0.1%
decoding/binref_1,000 0.170ms 0.172ms +1.3%
decoding/binref_100,000 0.262ms 0.264ms +0.8%
decoding/binref_10,000,000 27.732ms 28.095ms +1.3%
decoding/json_1,000 0.089ms 0.089ms -0.0%
decoding/json_100,000 8.255ms 8.280ms +0.3%
decoding/json_10,000,000 1116.804ms 1121.226ms +0.4%
encoding/base64_1,000 0.034ms 0.034ms -0.7%
encoding/base64_100,000 0.204ms 0.206ms +0.7%
encoding/base64_10,000,000 66.182ms 65.200ms -1.5%
encoding/binref_1,000 0.236ms 0.242ms +2.2%
encoding/binref_100,000 0.410ms 0.412ms +0.5%
encoding/binref_10,000,000 30.667ms 30.439ms -0.7%
encoding/json_1,000 0.117ms 0.119ms +1.2%
encoding/json_100,000 10.918ms 11.576ms +6.0%
encoding/json_10,000,000 1296.752ms 1317.191ms +1.6%
http/apply_1,000 2.856ms 2.910ms +1.9%
http/apply_100,000 8.743ms 9.234ms +5.6%
http/apply_10,000,000 930.312ms 941.730ms +1.2%
roundtrip/base64_1,000 0.071ms 0.070ms -1.0%
roundtrip/base64_100,000 1.122ms 1.127ms +0.4%
roundtrip/base64_10,000,000 206.272ms 208.074ms +0.9%
roundtrip/binref_1,000 0.421ms 0.418ms -0.7%
roundtrip/binref_100,000 0.661ms 0.662ms +0.2%
roundtrip/binref_10,000,000 59.083ms 59.370ms +0.5%
roundtrip/json_1,000 0.220ms 0.217ms -1.1%
roundtrip/json_100,000 18.543ms 17.888ms -3.5%
roundtrip/json_10,000,000 2413.511ms 2415.169ms +0.1%
  • Runner: Linux 6.17.0-1010-azure x86_64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants