yscv Ecosystem Capability Matrix

This document is the canonical capability inventory for Python-ecosystem replacement.

Purpose

Lock the target surface needed to replace practical Python CV/DL workflows.
Track implementation status by capability area.
Make gaps explicit and tie each gap to concrete framework work.

Status Legend

Implemented: production-ready baseline exists in workspace crates with tests.
Partial: baseline exists but parity/scale/perf is incomplete.
Planned: no production-grade implementation yet.

Capability Matrix

Area	Python Parity Target	Status	yscv Surface Today	Primary Gap To Close
Tensor core and layout	NumPy tensor creation/layout/broadcast/reduction baseline	Implemented	`yscv-tensor` (115 ops on `Tensor` in `ops/`: creation, broadcast, add/sub/mul/div/pow/min/max elementwise, neg/abs/exp/ln/sqrt/reciprocal/sign/floor/ceil/round/clamp/scale/add_scalar unary, sum/mean/max_value/min_value/argmax/argmin/var/std_dev + axis reductions, transpose_2d/permute/unsqueeze/squeeze/cat/stack/select/narrow shape ops, eq/gt/lt/all_finite comparisons, where_cond/masked_fill/scatter/gather/scatter_add/topk/triu/tril/eye/repeat advanced ops, FP16/BF16 native dtype storage)	—
CPU numeric kernels	NumPy/SciPy/PyTorch CPU operator runtime	Implemented	`yscv-kernels` (adaptive + configurable threaded CPU matmul, elementwise/relu/sigmoid/gelu/silu/mish/dropout, softmax/log-softmax/logsumexp/layer-norm/rms-norm/group-norm/batch-norm, NHWC pooling/convolution/depthwise-conv/separable-conv/deformable-conv/transposed-conv, scaled dot-product attention, embedding lookup; GPU backend via wgpu: tiled matmul, elementwise, relu, sigmoid, softmax, conv2d, pooling, batch_norm, layer_norm, transpose compute shaders; GPU buffer pool; multi-device scheduling)	Deepen SIMD/zero-copy hot-path coverage
Autograd	PyTorch/TensorFlow eager grad baseline	Implemented	`yscv-autograd` (61 `Op` enum variants in `node.rs`: add/sub/mul/div/neg/exp/log/sqrt/sigmoid/tanh/abs/pow/clamp/leaky_relu/mean/relu/matmul_2d/sum/softmax/log_softmax/transpose_2d/reshape/unsqueeze/squeeze/conv1d/conv2d/conv3d/conv_transpose/depthwise_conv2d/batch_norm/layer_norm/group_norm/instance_norm/max_pool2d/avg_pool2d/global_avg_pool/adaptive_pool/multi_head_attention/embedding/lstm/gru/rnn/dropout/upsample/pixel_shuffle/residual/einsum/grid_sample/pad/flip/repeat/cat/select/narrow/gather/gather_elements/scatter_add/sum_axis/mean_axis/flatten/expand/slice, gradient checkpointing, GPU backward ops)	—
Optimizers	Core optimizer families used in CV training	Implemented	`yscv-optim` (8 optimizers: `Sgd`, `Adam`, `AdamW`, `RAdam`, `RmsProp`, `Adagrad`, `Lamb`, `Lars`; `Lookahead` meta-optimizer; 11 LR schedulers: `StepLr`, `CosineAnnealingLr`, `LinearWarmupLr`, `OneCycleLr`, `ExponentialLr`, `MultiStepLr`, `ReduceOnPlateauLr`, `CyclicLr`, `CosineWarmRestartLr`, `PolynomialLr`, `SequentialLr`; gradient clipping utilities)	—
Model/layer runtime	Typical CV model execution stack	Implemented	`yscv-model` (39 `ModelLayer` variants in `layers/mod.rs`: Linear, ReLU, LeakyReLU, Sigmoid, Tanh, Dropout, Conv1d, Conv2d, Conv3d, DepthwiseConv2d, SeparableConv2d, ConvTranspose2d, DeformableConv2d, BatchNorm2d, LayerNorm, GroupNorm, InstanceNorm, MaxPool2d, AvgPool2d, GlobalAvgPool2d, AdaptiveAvgPool2d, AdaptiveMaxPool2d, Flatten, Softmax, Embedding, LoraLinear, PixelShuffle, Upsample, GELU, SiLU, Mish, PReLU, ResidualBlock, Rnn, Lstm, Gru, MultiHeadAttention, TransformerEncoder, FeedForward; architecture blocks: BottleneckBlock, SqueezeExciteBlock, MbConvBlock, UNetEncoder/Decoder, FpnNeck, AnchorFreeHead, PatchEmbedding, VisionTransformer; quantization: INT8 symmetric/asymmetric/per-channel; inference: batched_inference, BatchCollector; weight management: save/load/inspect; model zoo: 17 architectures — ResNet18/34/50/101, VGG16/19, MobileNetV2, EfficientNetB0, AlexNet, ViTTiny/Base/Large, DeiTTiny — with ModelHub remote download)	—
Training loop stack	Practical supervised CV training workflows	Implemented	`yscv-model` + `yscv-optim` + `yscv-autograd` (high-level `Trainer` with `TrainerConfig`, validation split, `EarlyStopping`/`BestModelCheckpoint` callbacks; 17 loss functions: mse, mae, huber, hinge, bce, nll, cross_entropy, label_smoothing_cross_entropy, focal, dice, triplet, contrastive, cosine_embedding, ctc, smooth_l1, kl_div, distillation; per-optimizer train-step/epoch helpers; deterministic NHWC augmentation; weighted/shuffled/class-balanced batch pipeline; batch mixup/cutmix; eval/train mode; mixed-precision training with `DynamicLossScaler`; distributed training with `AllReduceAggregator`/`ParameterServer`/`TopKCompressor`/TCP transport; LoRA fine-tuning; EMA; LR finder; TensorBoard writer; StreamingDataLoader; gradient clipping)	—
Data transforms	torchvision.transforms-style composable preprocessing/augmentation	Implemented	`yscv-model` (Compose pipeline with 8 transforms: Normalize, ScaleValues, PermuteDims, Resize, CenterCrop, RandomHorizontalFlip, GaussianBlur; ImageAugmentationPipeline with 12+ ops; DataLoader with prefetch and samplers)	Expand transform catalogue
Image processing primitives	OpenCV/scikit-image/Pillow core transforms	Implemented	`yscv-imgproc` (159 free public functions in `ops/`: grayscale/HSV/BGR/LAB/YUV conversions, nearest+bilinear resize, box/Gaussian/median/bilateral blur, morphology suite, Sobel/Laplacian/Canny edges, binary/Otsu/adaptive thresholding, connected-component labeling, template matching (SSD+NCC), contour detection, filter2d, pad/crop, histogram+equalize+CLAHE, flip/rotate, affine+perspective warp, Harris/FAST corners, Hough lines/circles, Gaussian pyramid, distance transform, integral image, NMS, ORB/SIFT/SURF/BRIEF descriptors, Lucas-Kanade+Farneback optical flow, watershed, HOG descriptors, convex hull, min-area rect, homography (4pt DLT + RANSAC), ellipse fitting, Douglas-Peucker, inpainting, camera calibration, stereo matching, drawing primitives)	—
Video/Camera I/O	OpenCV-like cross-platform capture and frame pipeline	Implemented	`yscv-video` (native camera I/O via `nokhwa` (V4L2/AVFoundation/MediaFoundation); H.264 decoder: full Baseline/Main/High profile, I/P/B slices, CAVLC, CABAC (branchless), DCT + dequant, motion compensation, deblocking, weighted prediction, sub-MB partitions, scaling lists; HEVC/H.265 decoder: Main + Main10 (10-bit), I/P/B slices, CTU quad-tree, 35 intra modes, chroma motion compensation, CABAC, SAO, deblocking, tiles; hardware decode backends: VideoToolbox, VAAPI, NVDEC, MediaFoundation; MP4 + MKV/WebM container parsing; streaming `Mp4VideoReader` + `FrameStream`; audio metadata for AAC/ALAC/Opus/Vorbis/MP3/FLAC; YUV420→RGB8 BT.601 conversion with NEON+SSE2+AVX2 paths)	—
Detection runtime	Practical object/face detection baseline	Implemented	`yscv-detect` (full YOLOv8 + YOLOv11 ONNX inference pipelines: `detect_yolov8_from_rgb`, `decode_yolov8_output`, `decode_yolov11_output`, `yolov8_coco_config`, `yolov11_coco_config`, `letterbox_preprocess`; `ModelDetector` config with `postprocess_detections` and `preprocess_rgb8_for_model`; heatmap detection (`detect_from_heatmap`); NMS hard/soft/batched; anchor generation; RoI pool/align; scratch buffer reuse; `detect_people_from_rgb8` / `detect_faces_from_rgb8` helpers)	—
Recognition runtime	Embedding-based identity matching	Implemented	`yscv-recognize` (Recognizer with cosine similarity, gallery management, enroll/remove/recognize API, VP-Tree ANN indexing via `build_index()`/`search_indexed()`, snapshot JSON persistence, slice-based recognition)	—
Multi-object tracking	Stable ID tracking and counting baseline	Implemented	`yscv-track` (DeepSortTracker, ByteTracker, KalmanFilter 8-state, Hungarian assignment, TrackerConfig, re-identification: `ReIdExtractor` trait, `ColorHistogramReId`, `ReIdGallery` with distance threshold matching)	—
Evaluation stack	Detection/tracking/counting metric and benchmark tooling	Implemented	`yscv-eval` (37 public metric/eval functions: mAP/COCO detection, HOTA/IDF1/MOTA/MOTP tracking, accuracy/precision/recall/F1/ROC/AUC classification, MAE/RMSE/MAPE/R2 regression, DICE/IoU/PSNR/SSIM/top-k general, counting metrics; 8 dataset adapters under `dataset/`: COCO, JSONL, OpenImages, YOLO, VOC, KITTI, WIDERFACE, MOT; camera diagnostics; timing stats; pipeline benchmark thresholds)	—
Dataset adapters	Typical annotation formats used in CV training/eval	Implemented	Training: JSONL/CSV/ImageManifest/ImageFolder in `yscv-model`; Evaluation: JSONL/COCO/OpenImages/YOLO/VOC/KITTI/WIDERFACE detection + JSONL/MOT tracking in `yscv-eval` (8 formats in `dataset/`)	—
Model portability	ONNX import/export compatibility	Implemented	`yscv-onnx` (122 CPU op match arms in `runner/dispatch.rs` covering opset 22; quantized runtime ops, trig/hyperbolic, logical, activation, spatial, misc; NCHW<->NHWC conversion; graph optimizer with Conv+ReLU/BN+ReLU fusion; dynamic shape; `OnnxDtype` enum with Float32/Float16/Int8/UInt8/Int32/Int64/Bool, `OnnxTensorData` with quantize/dequantize; CPU perf varies by model/hardware: ~1.07× behind ORT single-thread on x86 Zen 4 and ~1.5-1.6× faster than ORT on the Orange Pi Zero 3 (Cortex-A53) Siamese tracker (see `performance-benchmarks.md`); supports YOLO11n opset 22 where onnxruntime/tract fail); `yscv-model` ONNX export bridge	—
Inference deployment ops	Graph optimization, batching, quantization, packaging	Implemented	ONNX graph optimization (Conv+BN folding, constant folding, Conv+ReLU/BN+ReLU fusion), quantized ONNX runtime, INT8 quantization, dynamic batching (`batched_inference`, `BatchCollector`), binary weight save/load, SafeTensors support	Model packaging/export tooling
Accelerator backends	CUDA/Metal/Vulkan/other acceleration parity track	Implemented	`yscv-kernels` `gpu` feature: wgpu compute backend (Vulkan/Metal/DX12) with tiled matmul, elementwise, activation, conv2d, pooling, batch_norm, layer_norm, softmax, transpose shaders; multi-device scheduling (`MultiGpuBackend`, `SchedulingStrategy::RoundRobin/DataParallel/Manual`); GPU buffer pool; Metal-native pipeline (`metal-backend` feature): compiled execution plans with tiled f16 conv GEMM (BM=64, BN=64, BK=16, 16 accumulators/thread), f16 weight pre-packing (half bandwidth), parallel softmax with shared-memory reduction, fast_divmod im2col, zero-cost buffer aliasing, fused command buffers; MPSGraph pipelined `submit`/`wait` API (triple-buffered by default via `YSCV_MPS_PIPELINE`, multi-input models, f16 end-to-end with NEON `vcvt_f32_f16` widening, zero ObjC allocs in hot path — measured ~4.4× ORT CoreML throughput on the Siamese tracker, Apple M1)	f16 inter-op buffers, further CoreML parity
Distributed training	Multi-node gradient synchronization	Implemented	`yscv-model` (`DistributedConfig`, `GradientAggregator` trait, `AllReduceAggregator` ring all-reduce, `ParameterServer` centralized aggregation, `InProcessTransport`, `TcpTransport` with coordinator/worker roles and length-prefixed protocol, `TopKCompressor` gradient compression, `distributed_train_step` helper)	Scale testing on real clusters
API/release governance	Semver policy + compatibility notes + publish automation	Implemented	`docs/api-stability.md` (stability tiers, semver policy, release checklist, publish order), `CHANGELOG.md`, `scripts/publish.sh` (dependency-ordered publish), `scripts/bump-version.sh`	—
End-to-end reference apps	Usable Rust apps proving framework viability	Implemented	`yscv-cli`, `apps/bench`, `apps/camera-face-tool`, `apps/llm-bench`, 26 examples in `examples/src/` (incl. `train_linear`, `train_cnn`, `image_pipeline`, `yolo_detect`, `yolo_finetune`, GPU/Metal/MPSGraph benches)	Expand real-world reference scenarios

Update Protocol

After every feature change, update this matrix if coverage status/gaps changed.
Do not mark Implemented without executable evidence (tests/benchmarks/repro run).
Keep context.md and AGENTS.md aligned with this matrix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

yscv Ecosystem Capability Matrix

Purpose

Status Legend

Capability Matrix

Update Protocol

FilesExpand file tree

ecosystem-capability-matrix.md

Latest commit

History

ecosystem-capability-matrix.md

File metadata and controls

yscv Ecosystem Capability Matrix

Purpose

Status Legend

Capability Matrix

Update Protocol