LLMKube is a Kubernetes operator for llama.cpp-native LLM inference.
- GitHub: https://github.com/defilantech/llmkube
- Apache 2.0 license
- CRD-based model and inference service management
- NVIDIA CUDA and Apple Silicon Metal GPU support
- Multi-GPU layer sharding, pre-flight memory validation
- Helm chart, Prometheus metrics, OpenAI-compatible API
I'm the creator and maintainer. Happy to provide any additional info needed.
LLMKube is a Kubernetes operator for llama.cpp-native LLM inference.
I'm the creator and maintainer. Happy to provide any additional info needed.