-
Notifications
You must be signed in to change notification settings - Fork 678
Pull requests: sgl-project/mini-sglang
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Feature] Vanilla speculative decoding (standalone, opt-in offline interface)
#134
opened May 24, 2026 by
javierlimt6
Loading…
5 tasks done
[Feature] Add optional FP8 (float8_e4m3fn) KV cache pool
#132
opened May 16, 2026 by
javierlimt6
Loading…
6 of 7 tasks
[Feature] Add usage token counting and finish_reason to API responses
observability
Tools and changes for debugging, profiling, tracing, logging, metrics, and runtime diagnostics
#126
opened May 8, 2026 by
abinggo
Contributor
Loading…
Feature/support more sampling parameters
enhancement
New feature or request
#121
opened Apr 20, 2026 by
Alise-svg
Loading…
feat: add configurable scheduling policy
enhancement
New feature or request
#119
opened Apr 20, 2026 by
Alise-svg
Loading…
feat: implement batch tokenization for TokenizeManager
duplicate
This issue or pull request already exists
enhancement
New feature or request
#117
opened Apr 19, 2026 by
Alise-svg
Loading…
Feat: Enable structured output in Mini-SGLang
enhancement
New feature or request
#115
opened Apr 2, 2026 by
YzXiao101
Contributor
Loading…
4 of 6 tasks
Add text-only Llama 4 support to mini-sglang.
enhancement
New feature or request
#114
opened Apr 1, 2026 by
sheepfish5
Loading…
[Fix] Avoid double-free in overlap_loop if Req is aborted while finishing
bugfix
Fixes incorrect behavior, runtime errors, or regressions.
#111
opened Mar 25, 2026 by
MisakaVan
Contributor
Loading…
[refactor] enable page size > 1 for fi backend
enhancement
New feature or request
#110
opened Mar 24, 2026 by
zzh-stable
Loading…
Add request-scoped profiler benchmark flag and compressed trace export
observability
Tools and changes for debugging, profiling, tracing, logging, metrics, and runtime diagnostics
#109
opened Mar 21, 2026 by
CrazyDave999
Loading…
Optimize load_weight with per-file batch H2D and zero-copy CPU sharding
misc
Minor maintenance changes that do not alter core behavior.
#108
opened Mar 20, 2026 by
staryxchen
Contributor
Loading…
[Feture] Add reasoning-parser
enhancement
New feature or request
#107
opened Mar 19, 2026 by
jiahe7ay
Contributor
Loading…
[Fix]Pad the last rank if vocab size is not divisible by tp_size
bugfix
Fixes incorrect behavior, runtime errors, or regressions.
#100
opened Mar 9, 2026 by
cswuyg
Contributor
Loading…
[Feature] Better estimation policy
enhancement
New feature or request
#97
opened Mar 8, 2026 by
YzXiao101
Contributor
Loading…
7 tasks done
[Feature] Expert parallelism support for MoE models
enhancement
New feature or request
#96
opened Mar 6, 2026 by
NikitosKh
Contributor
Loading…
refactor(tests): convert to pytest-style with integration markers
tests
Tests, benchmarks, and other checks for validating correctness, performance, or regressions.
[Feature] Support hierarchical cache
enhancement
New feature or request
poc
Proof-of-concept or experimental changes that are not intended for merge.
#82
opened Feb 24, 2026 by
DarkSharpness
Collaborator
Loading…
Add graph replay dump tensor tool
observability
Tools and changes for debugging, profiling, tracing, logging, metrics, and runtime diagnostics
#72
opened Jan 30, 2026 by
wlc952
Loading…
feat: Add INT8 quantization support
enhancement
New feature or request
#57
opened Dec 30, 2025 by
louiswang524
Contributor
Loading…
perf: Optimize CUDA graph batch size selection and padding
misc
Minor maintenance changes that do not alter core behavior.
#56
opened Dec 30, 2025 by
louiswang524
Contributor
Loading…
feat: Implement batch tokenization for improved throughput
enhancement
New feature or request
#55
opened Dec 30, 2025 by
louiswang524
Contributor
Loading…
[Refactor] Restructure test suite to match source layout and isolate benchmarks
tests
Tests, benchmarks, and other checks for validating correctness, performance, or regressions.
#53
opened Dec 29, 2025 by
DhiraPT
Loading…
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.