🔭 I’m currently working on: Engineering AI-powered RAG chatbots and agentic NLP pipelines to improve research efficiency.
👯 I’m looking to collaborate on: Open-source projects focused on deep learning inference optimization and high-performance model deployment.
🤝 I’m looking to: Mastering advanced multi-agent orchestration and staying ahead of the curve in 2026's LLM research trends.
🌱 I’m currently learning: Advanced local-first RAG techniques, dense vector indexing with FAISS, and scaling workflows in Slurm-based HPC environments.
💬 Ask me about: Model pruning, quantization, NLP preprocessing pipelines, and deploying machine learning on embedded ARM systems.
⚡ Fun fact: I have a knack for speed—I've previously improved model performance by 2-4x through custom quantization and graph transformations.
Pinned Loading
-
BentoVLLM
BentoVLLM PublicForked from bentoml/BentoVLLM
Self-host LLMs with vLLM and BentoML
Python
-
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.