perf: optimize core hot paths (chain, context, binding, responses)#3008
perf: optimize core hot paths (chain, context, binding, responses)#3008vishr wants to merge 3 commits into
Conversation
| @@ -0,0 +1,79 @@ | |||
| # Echo sonic JSON serializer | |||
There was a problem hiding this comment.
This could be just an example in cookbook https://echo.labstack.com/docs/category/cookbook example. I think PR to https://github.com/labstack/echox/tree/master/cookbook would make more sense than adding this submodule
There was a problem hiding this comment.
Agreed — removed the sonic submodule from this PR. It'll go to echox/cookbook as a runnable example (with the decode-wins / arm64-encode caveat that's the genuinely useful part). This PR is now purely core hot-path perf. Thanks!
|
@vishr , please look at your emails and respond me. |
- echo: compile global/pre middleware chains once instead of per request, eliminating per-request closure allocations (5 mw: 101ns/5allocs -> 34ns/0allocs) - context: zero-copy String/HTML/JSONP writes, reuse delayedStatusWriter (guarded against re-entrant c.JSON) and the store map across requests, drop deferred unlock on Get/Set, single-key QueryParam fast path (199ns/4allocs -> 41ns/0allocs) - bind: cache per-type struct field metadata (bindData -48%, query Bind -28%) - add hot-path benchmark suite and pooling/dispatch regression tests Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
96d6496 to
34b33be
Compare
The 4 sub-tests reused a hard-coded port (1323) sequentially; the next bind raced with the prior server's shutdown/socket release and failed on CI. Use :0 and dial the address reported by ListenerAddrFunc, preserving the network-family (tcp/tcp4/tcp6) intent without the fixed-port race. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t, test cached bind errors - context: correct newContext comment (Reset clears the store map, no longer nils it); document that only json()'s nested-guard may point the response at &c.dsw - test: deterministic cold-then-warm bind ensures the per-type cache preserves field-name conversion errors regardless of suite ordering Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@vishr , please check your emails: v@labstack.com and vr@labstack.com |
I did. Are you referring to these comments or something else? |
|
about May 23/26/27 2026 emails to |
|
Yes, let me reply. |
Summary
Optimizes Echo's per-request hot paths to remove avoidable allocations and CPU work. No public API changes; the standard-library JSON serializer remains the default. All numbers are
benchstatmedians (n=8, Apple M3 Max / arm64, Go 1.26).What changed
Core
echo.go,buildRouterChains) and reused, instead of re-wrapping closures on every request. Routing stores the matched handler on theContext.context.go): zero-copyString/HTML/JSONPwrites (write-onlyunsafeview), reuse ofdelayedStatusWriter(guarded against re-entrantc.JSON) and the store map across requests, inlineGet/Setunlock, and a single-keyQueryParamfast path proven byte-for-byte equal tourl.ParseQuery().Get(incl. malformed escapes /;/+).bind.go): per-reflect.Typefield-metadata cache so struct tags are parsed once per type, not per request. Preserves the field-name error wrapping from fix(binder): include field name in bind conversion errors (#2629) #3005.secure.go); pool the request-IDrandomStringscratch buffers (util.go).test:de-flakeTestStartConfig_WithListenerNetwork(ephemeral port instead of a hard-coded one) — separable commit; fixes a pre-existing CI flake.Performance (before → after)
Setper requestQueryParam(single key)String()responseJSON()responseBindquery (5 fields)bindDataw/ tagsHeadline: the middleware path and the
Set/QueryParampaths are now allocation-free; binding is 28–48% faster.Router — profiled, intentionally untouched
-cpuprofileshows the router is already 0 allocs/op, with time dominated by the irreducible LCP byte-loop (58%) and method switch (11%). I implemented the httprouterindices/IndexBytetrick forfindStaticChildand measured a 30–37% regression on hits — Echo's nodes have small fan-out, where the inlined linear scan beats a non-inlinedIndexBytecall — so it was reverted. No router change.Using a faster JSON encoder (e.g. sonic)
This PR does not bundle sonic. The
echo.JSONSerializerinterface already lets any app swap encoders in ~10 lines:Measured (this machine, arm64): sonic decode −44% (a clear win on any arch), encode +43% (arm64 is sonic's weak arch; usually a win on amd64). A full cookbook example with these caveats will be a separate PR to labstack/echox.
Testing
go test ./...+-racepass;gofmt+go vetclean.Reset, JSON status acrossReset, nestedc.JSON, global/pre middleware on 404/405/OPTIONS,randomStringconcurrency, query fast-path stdlib-equivalence.