Pinned Loading
-
sglang
sglang PublicForked from sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
Python
-
TensorRT-LLM
TensorRT-LLM PublicForked from NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…
C++
-
vllm
vllm PublicForked from vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
-
NVIDIA/Model-Optimizer
NVIDIA/Model-Optimizer PublicA unified library of SOTA model optimization techniques like quantization, distillation, pruning, neural architecture search, speculative decoding, etc. It compresses deep learning models for downs…
If the problem persists, check the GitHub status page or contact support.


