lm-sys
diff --git a/‎blog/2026-03-25-gtc2026.md‎
Lines changed: 151 additions & 0 deletions b/‎blog/2026-03-25-gtc2026.md‎
Lines changed: 151 additions & 0 deletions
diff --git a/‎public/images/blog/gtc2026/happyhour-banghua.jpg‎
151 KB b/‎public/images/blog/gtc2026/happyhour-banghua.jpg‎
151 KB
diff --git a/‎public/images/blog/gtc2026/happyhour-crowd.jpg‎
113 KB b/‎public/images/blog/gtc2026/happyhour-crowd.jpg‎
113 KB
diff --git a/‎public/images/blog/gtc2026/hero-collage.jpg‎
336 KB b/‎public/images/blog/gtc2026/hero-collage.jpg‎
336 KB
diff --git a/‎public/images/blog/gtc2026/keynote-slide.jpg‎
92.8 KB b/‎public/images/blog/gtc2026/keynote-slide.jpg‎
92.8 KB
diff --git a/‎public/images/blog/gtc2026/linkedin-miles.jpg‎
304 KB b/‎public/images/blog/gtc2026/linkedin-miles.jpg‎
304 KB
diff --git a/‎public/images/blog/gtc2026/linkedin-panel.jpg‎
362 KB b/‎public/images/blog/gtc2026/linkedin-panel.jpg‎
362 KB
diff --git a/‎public/images/blog/gtc2026/linkedin-talks.jpg‎
519 KB b/‎public/images/blog/gtc2026/linkedin-talks.jpg‎
519 KB
diff --git a/‎public/images/blog/gtc2026/linkedin-venue.jpg‎
307 KB b/‎public/images/blog/gtc2026/linkedin-venue.jpg‎
307 KB
diff --git a/‎public/images/blog/gtc2026/novita-banghua.jpg‎
212 KB b/‎public/images/blog/gtc2026/novita-banghua.jpg‎
212 KB
@@ -0,0 +1,151 @@
+---
+title: "Highlights of SGLang at NVIDIA GTC 2026"
+author: "The SGLang Team"
+date: "March 31, 2026"
+previewImg: /images/blog/gtc2026/happyhour-crowd.jpg
+type: news
+---
+
+
+SGLang came to NVIDIA GTC 2026 with panels, a happy hour, a 200-person meetup, and a hands-on training lab. Three days, five events, one packed week at the center of the LLM ecosystem and left with a lot to share. If you missed it, here's the full recap.
+
+<img src="/images/blog/gtc2026/hero-collage.jpg" style="display: block; margin: 20px auto 0; width: 100%; max-width: 100%; height: auto;" />
+<p style="text-align: center; color: #666; font-style: italic;">SGLang at GTC 2026: five events, three days.</p>
+
+
+## At the Main Conference
+
+### SGLang Featured in the GTC Keynote
+
+SGLang was featured on the NVIDIA AI ecosystem slide during Jensen Huang's GTC keynote. We are honored to be recognized as part of the infrastructure stack behind AI-native applications.
+
+<img src="/images/blog/gtc2026/keynote-slide.jpg" style="display: block; margin: 20px auto 0; width: 75%; max-width: 100%; height: auto;" />
+<p style="text-align: center; color: #666; font-style: italic;">SGLang on NVIDIA's AI ecosystem slide during the GTC 2026 keynote.</p>
+
+📝 [X recap post](https://x.com/lmsysorg/status/2033675233765499277)
+
+
+### Open-Source AI Panel at GTC
+
+On Tuesday, **Ying Sheng** joined the GTC panel **"The State of Open-Source AI"** alongside Vartika Singh (Strategic AI Lead, NVIDIA), Jonathan Cohen (VP of Applied Research, NVIDIA), Ion Stoica (Professor, EECS, UC Berkeley), Jeff Boudier (VP of Product, Hugging Face), and Ranjay Krishna (Director of Multimodal and Embodied AI, Ai2).
+
+The panel examined open-source AI's growing role as the primary R&D engine for sophisticated AI systems: what makes open ecosystems trustworthy, scalable, and production-ready, and the community infrastructure enabling reproducible, auditable research.
+
+<img src="/images/blog/gtc2026/panel-ying.jpg" style="display: block; margin: 20px auto 0; width: 75%; max-width: 100%; height: auto;" />
+<p style="text-align: center; color: #666; font-style: italic;">Ying Sheng (second from left) on the "The State of Open-Source AI" panel at GTC 2026.</p>
+
+🎬 [Watch the recording on NVIDIA On-Demand](https://www.nvidia.com/en-us/on-demand/session/gtc26-s81791/)
+
+
+### SGLang Training Lab at GTC 2026
+
+On Thursday morning, the **RadixArk team** led an official GTC training lab: **"High-Performance LLM Serving and Training with SGLang"**.
+
+The lab covered three areas:
+
+1. **Performance tuning with the SGLangCookbook**: practical techniques for improving serving throughput and latency in real deployments
+2. **Profiling and bottleneck analysis**: a developer-oriented walkthrough of identifying and resolving performance bottlenecks in LLM serving systems
+3. **SGLang × Miles RL integration**: a live demonstration of running SGLang as the inference backend inside a real RL training loop using the Miles framework
+
+
+<img src="/images/blog/gtc2026/traininglab-instructor.jpg" style="display: block; margin: 20px auto 0; width: 75%; max-width: 100%; height: auto;" />
+<p style="text-align: center; color: #666; font-style: italic;">The SGLang Training Lab at GTC 2026: hands-on LLM performance tuning and RL training.</p>
+
+🎬 [Watch full recording on NVIDIA On-Demand](https://www.nvidia.com/en-us/on-demand/session/gtc26-dlit82143/)
+
+📁 [Download the training lab materials](https://drive.google.com/drive/folders/1ByXsdu0n03-sYR8MfxFv1dK_2GGfB0Ku?usp=drive_link)
+
+
+## Side Events
+
+### SGLang × RadixArk GTC Happy Hour
+
+On Tuesday evening, SGLang and RadixArk co-hosted a GTC Happy Hour that brought together builders, researchers, and founders from across the inference and training ecosystem, including friends from OpenAI, xAI, DeepMind, Meta, NVIDIA, Ollama, and more.
+
+<img src="/images/blog/gtc2026/happyhour-crowd.jpg" style="display: block; margin: 20px auto 0; width: 75%; max-width: 100%; height: auto;" />
+<p style="text-align: center; color: #666; font-style: italic;">SGLang × RadixArk Happy Hour.</p>
+
+The evening featured two technical spotlights:
+- **Banghua Zhu (RadixArk)** introduced RadixArk and **Miles**, SGLang's native RL training framework purpose-built for large-scale MoE post-training workloads.
+- **Jason Zhao (ScitiX)** presented **SiMM**, an open-source in-memory KV cache engine integrated with SGLang for long-context serving.
+
+<img src="/images/blog/gtc2026/happyhour-banghua.jpg" style="display: block; margin: 20px auto 0; width: 75%; max-width: 100%; height: auto;" />
+<p style="text-align: center; color: #666; font-style: italic;">Banghua introducing RadixArk and the Miles RL framework.</p>
+
+Thank you to **Z Potentials** and **ScitiX** for sponsoring the event and making it possible.
+
+📝 [X recap post](https://x.com/lmsysorg/status/2034398140510720289?s=20)
+
+
+### Banghua at Novita's GTC Event
+
+**Banghua Zhu** joined Novita's GTC event with over 700 attendees. The discussion covered Jensen Huang's remarks on the inflection point between inference cost and demand, the key drivers behind the agentic AI movement, and what it takes for AI products to deliver real value. Banghua shared his perspective on how SGLang is shaping the future of inference infrastructure, enabling next-generation use cases from OpenClaw to agentic inference, and driving the evolution of open models and open infrastructure.
+
+<img src="/images/blog/gtc2026/novita-banghua.jpg" style="display: block; margin: 20px auto 0; width: 75%; max-width: 100%; height: auto;" />
+<p style="text-align: center; color: #666; font-style: italic;">Banghua presenting at Novita's GTC event.</p>
+
+Partners represented included NVIDIA, RadixArk, OpenRouter, Google DeepMind, Kimi (Moonshot AI), Alibaba Cloud, MiniMax, Z.ai, Hugging Face, and Kilo Code.
+
+
+### LinkedIn × SGLang Meetup: LLMs for Search & Recommendation
+
+On Wednesday evening, we hosted approximately 200 engineers at LinkedIn's Mountain View headquarters alongside teams from LinkedIn, TikTok, Meta, and NVIDIA for a deep dive into production LLM systems for search and recommendation.
+
+<img src="/images/blog/gtc2026/linkedin-venue.jpg" style="display: block; margin: 20px auto 0; width: 75%; max-width: 100%; height: auto;" />
+<p style="text-align: center; color: #666; font-style: italic;">SGLang swag at the LinkedIn meetup.</p>
+
+#### LinkedIn Engineering Talks
+
+LinkedIn opened with three engineering presentations:
+- **Fedor Borisyuk**: Semantic search at scale
+- **Zhipeng Wang**: Modeling optimizations for LLM-driven ranking
+- **Sundara Raman Ramachandran**: LLM inference infrastructure optimizations, including a prefill-only serving path delivering **2–3× throughput gains on H100s**, upstreamed back to SGLang
+
+<img src="/images/blog/gtc2026/linkedin-talks.jpg" style="display: block; margin: 20px auto 0; width: 75%; max-width: 100%; height: auto;" />
+<p style="text-align: center; color: #666; font-style: italic;">LinkedIn engineers presenting on semantic search, ranking, and inference infrastructure.</p>
+
+Relevant work from LinkedIn's engineering team:
+[[1]](https://arxiv.org/abs/2502.14305) [[2]](https://arxiv.org/abs/2602.07309) [[3]](https://arxiv.org/abs/2510.22101) [[4]](https://arxiv.org/abs/2512.07846) [[5]](https://github.com/linkedin/fmchisel) [[6]](https://openreview.net/forum?id=tyGfwG6xTh) [[7]](https://www.linkedin.com/blog/engineering/ai/scaling-llm-based-ranking-systems-with-sglang-at-linkedin)
+
+#### SGLang: Roadmap and Miles Framework
+
+SGLang core developer **Liangsheng Yin** walked through SGLang's H1 2026 roadmap.
+
+**Mao Cheng** then presented the **Miles RL framework**, addressing training–inference mismatch in production through three core techniques:
+1. **Importance sampling corrections**: compensating for distribution shift between training and inference
+2. **Inference-training alignment**: ensuring consistency between rollout behavior and gradient updates
+3. **Rollout Routing Replay (R3)**: replay-based routing for efficient use of generated rollout data
+
+<img src="/images/blog/gtc2026/linkedin-miles.jpg" style="display: block; margin: 20px auto 0; width: 75%; max-width: 100%; height: auto;" />
+<p style="text-align: center; color: #666; font-style: italic;">Mao Cheng presenting the Miles RL framework and its approach to training–inference alignment.</p>
+
+#### Industry Speakers
+
+- **Hongyu Lu (TikTok)**: LLM search at scale
+- **Luke Simon and Xi Liu (Meta)**: Generative Reasoning Reranker [[paper link]](https://lnkd.in/gGFwdkJw)
+- **Anish Maddipoti (NVIDIA)**: Dynamo + NeMoRL
+
+
+#### Panel Discussion
+
+The closing panel, hosted by Qing Lan, featured Wenfeng Zhuo, Fedor Borisyuk, Luke Simon, and Mao Cheng. Topics included:
+- Semantic ID vs. embedding retrieval
+- Whether unified retrieval + ranking (OneRec-style systems) is production-ready
+- Inference and training challenges in LLM recsys
+- Recent breakthroughs accelerating LLM adoption for recommendations
+- The role of continuous learning in production recommendation systems
+
+<img src="/images/blog/gtc2026/linkedin-panel.jpg" style="display: block; margin: 20px auto 0; width: 75%; max-width: 100%; height: auto;" />
+<p style="text-align: center; color: #666; font-style: italic;">The closing panel: Wenfeng Zhuo, Fedor Borisyuk, Luke Simon, and Mao Cheng, moderated by Qing Lan.</p>
+
+This is exactly the kind of collaboration that will define the next generation of recommendation systems: production teams and open-source infrastructure co-evolving together.
+
+📝 [LinkedIn recap post](https://www.linkedin.com/feed/update/urn:li:activity:7440847574133190656)
+
+
+
+## Looking Ahead
+
+GTC 2026 made clear how much the production ecosystem is converging around open-source infrastructure. From semantic search at LinkedIn scale to RL post-training for frontier MoE models, SGLang is increasingly the shared layer underneath.
+
+We'll keep building in the open. Follow our [Luma calendar](https://luma.com/SGLang) for future meetups, office hours, and community events.