Update Netty to 4.2.10.Final#779
Conversation
BREAKING CHANGE: Netty 4.2 promotes io_uring from incubator, replaces netty-codec with netty-codec-compression, changes the default allocator from pooled to adaptive, and removes deprecated ChannelOption/MAX_MESSAGES_PER_READ. - Add kqueue osx-aarch_64 classifier for Apple Silicon - Add netty-pkitesting as dev/test dependency - Exclude legacy BouncyCastle jdk15on artifacts globally - Pin -Dio.netty.allocator.type=pooled during stabilization - Add test-unsafe-deny Lein profile Signed-off-by: Robin Lahtinen <robin.lahtinen@gmail.com>
Replace legacy EventLoopGroup constructors with MultiThreadIoEventLoopGroup and IoHandler factories (NioIoHandler, EpollIoHandler, KQueueIoHandler, IoUringIoHandler). Extract channel classes to vars to work around CLJ-2842 eager static init in case expressions. Rename IOUring → IoUring to match Netty 4.2's graduated io_uring package. Signed-off-by: Robin Lahtinen <robin.lahtinen@gmail.com>
Add try-certificate-builder that uses Netty 4.2's CertificateBuilder from netty-pkitesting via reflection, with SelfSignedCertificate fallback. Disable builder-level endpointIdentificationAlgorithm to prevent double hostname verification. Replace SelfSignedCertificate in test fixtures with pre-generated EDN certificates. Signed-off-by: Robin Lahtinen <robin.lahtinen@gmail.com>
…ests Add test-self-signed-ssl-context covering CertificateBuilder path. Add pipeline-stress-test namespace with 4 stress tests for Netty 4.2 pipeline behavior under concurrent HTTP/2 streams. Stress tests are tagged :stress and excluded from default test runs. Signed-off-by: Robin Lahtinen <robin.lahtinen@gmail.com>
Signed-off-by: Robin Lahtinen <robin.lahtinen@gmail.com>
There was a problem hiding this comment.
Pull request overview
This pull request upgrades Aleph to Netty 4.2.10.Final, which is a major version upgrade from 4.1.130.Final. The changes modernize transport support, fix KQueue class loading issues on non-macOS systems, and update SSL certificate generation logic for compatibility with newer JDKs (24+).
Changes:
- Upgrades Netty from 4.1.130.Final to 4.2.10.Final (breaking change)
- Refactors event loop groups to use new Netty 4.2 MultiThreadIoEventLoopGroup and IoHandler APIs
- Implements lazy initialization pattern for transport channel classes to prevent KQueue class loading errors during builds on Linux (fixes #756)
- Updates self-signed-ssl-context to use CertificateBuilder (via reflection) with SelfSignedCertificate fallback for JDK 24+ compatibility
- Promotes io_uring transport from incubator package to stable io.netty package
- Adds kqueue native transport for Apple Silicon (osx-aarch_64)
- Disables default HTTPS endpoint identification in Netty 4.2 SSL client context builder (Aleph handles this independently)
- Removes deprecated MAX_MESSAGES_PER_READ channel option
- Adds stress tests for HTTP/2 pipeline behavior under concurrent load
- Updates Bouncy Castle exclusions and adds netty-pkitesting dependency
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| project.clj | Updates Netty version to 4.2.10.Final, adds kqueue osx-aarch_64 support, updates io_uring package, adds netty-pkitesting dependency, adds Bouncy Castle exclusions, and configures pooled allocator during stabilization |
| deps.edn | Mirrors project.clj dependency version updates (auto-generated) |
| src/aleph/netty.clj | Core implementation changes: new imports for Netty 4.2 APIs (IoHandler, MultiThreadIoEventLoopGroup, stable io_uring), refactored event loop group creation, lazy channel class initialization, updated SSL context builder with endpoint identification disabled, new CertificateBuilder reflection logic with fallback |
| test/aleph/tcp_ssl_test.clj | Adds comprehensive tests for self-signed-ssl-context function |
| test/aleph/ssl.clj | Replaces SelfSignedCertificate usage with pre-generated certificate and key for wrong_hostname test scenarios |
| test/wrong_hostname_cert.edn | Pre-generated test certificate data (base64-encoded) |
| test/wrong_hostname_key.edn | Pre-generated test RSA private key data |
| test/aleph/http/pipeline_stress_test.clj | New stress tests for Netty 4.2 pipeline behavior under concurrent HTTP/2 load |
| CHANGES.md | Documents breaking changes and major updates |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ewInstance() Signed-off-by: Robin Lahtinen <robin.lahtinen@gmail.com>
… compat Signed-off-by: Robin Lahtinen <robin.lahtinen@gmail.com>
Signed-off-by: Robin Lahtinen <robin.lahtinen@gmail.com>
…ss.newInstance() fix Signed-off-by: Robin Lahtinen <robin.lahtinen@gmail.com>
…nnection pool Signed-off-by: Robin Lahtinen <robin.lahtinen@gmail.com>
|
Note on the single failing check: I don't think this is an actual test failure, but I could be mistaken. It appears to be the CircleCI job that patches the GitHub commit status for the manual approval gate, and it seems to be getting a As far as I can tell, all the actual test jobs are green (leak detection and dependency cache on JDK 8 and 21). Would refreshing the token in CircleCI likely resolve this? If helpful, I could also add a small guard so a missing or expired token doesn't fail the build. Please let me know if you think anything is needed on the code side, or if I'm missing something. |
Netty 4.2 changed the default allocator from pooled to adaptive. The pooled pin was added to isolate transport-layer changes during the 4.2 migration. All CI jobs now pass on all platforms, so the pin is removed to adopt the new adaptive allocator. The leak-detection profile retains its unpooled override. Signed-off-by: Robin Lahtinen <robin.lahtinen@gmail.com>
When Http2FrameCodec is added to an already-active pipeline (the normal case for both client post-SSL-handshake and server ALPN paths), channelActive is never replayed, so the flushPreface guard in PrefaceDecoder.channelActive never fires. Without an explicit flush, the preface sits in SslHandler.pendingUnencryptedWrites and under certain allocator/timing conditions (e.g., unpooled allocator on JDK 21), a SETTINGS ACK can be flushed before the initial SETTINGS, causing the peer to reject the connection with Http2Exception. Fixes test-compressed-response failure on the leak-detection JDK 21 CI job after the pooled allocator pin was removed. See: netty/netty#12089 Signed-off-by: Robin Lahtinen <robin.lahtinen@gmail.com>
…obe detection Replace naive `System/gc` + `System/runFinalization` with a loop using a WeakReference sentinel to confirm GC completion. This ensures reference-processing threads have enough time to run, improving accuracy in detecting leaked resources. Signed-off-by: Robin Lahtinen <robin.lahtinen@gmail.com>
Increase `max-probe-gc-runs`, decrease `max-gc-polls`, and introduce a delay in the leak detection loop to improve reliability. Signed-off-by: Robin Lahtinen <robin.lahtinen@gmail.com>
DerGuteMoritz
left a comment
There was a problem hiding this comment.
Thanks a lot for your contribution 🙏 Let me know if you have the capacity to address the feedback - otherwise I'll see to it myself!
| - restore_deps_cache | ||
| - run: | ||
| name: Run tests with leak detection (JDK 21) | ||
| command: lein do clean, with-profile +leak-detection,+jdk-21 test :default+leak |
There was a problem hiding this comment.
Ah very good, updating the CI pipeline to use more recent versions was on my radar, as well. Ideally, we'd use matrix jobs for testing combinations of supported Clojure and JDK versions but we can do that in a follow-up. Filed #780 as a placeholder.
| ;; See https://github.com/netty/netty/issues/12089 | ||
| (let [ch (.channel pipeline)] | ||
| (when (.isActive ch) | ||
| (.flush ch))) |
There was a problem hiding this comment.
Hm, how did you figure this out and are you sure this is indeed a possible explanation? FYI: I've shared this with Netty upstream and they are not convinced so far 🤔
Note that we have #772 to track this particular issue.
| :dropped-error-deferred-detection {:jvm-opts ["-Dorg.slf4j.simpleLogger.log.manifold.debug=warn" | ||
| "-Daleph.testutils.detect-dropped-error-deferreds=true"]} | ||
| :test-unsafe-deny {:jvm-opts ["--sun-misc-unsafe-memory-access=deny"]} | ||
| :jdk-21 {:javac-options ^:replace ["--release" "11"]} |
There was a problem hiding this comment.
What's the reason for this?
| certificate uses weak algorithms and is only for testing. | ||
|
|
||
| Prefers Netty 4.2's `CertificateBuilder` from `io.netty/netty-pkitesting` | ||
| when available (works on all JDKs including 24+). Falls back to Netty's |
There was a problem hiding this comment.
No need to keep around the SelfSignedCertificate fallback - AFAIUI io.netty/netty-pkitesting also works with older JDK versions, right? See also my other comment.
| nil) | ||
| (catch Exception e | ||
| (log/debug e "CertificateBuilder reflection failed, falling back to SelfSignedCertificate") | ||
| nil))) |
There was a problem hiding this comment.
Since this will be a breaking change release anyway, I suggest we move self-signed-ssl-context into a separate namespace aleph.netty.test which people can separately load when needed and then be expected to provide io.netty/netty-pkitesting. That saves us from having to do this cumbersome reflection dance and also makes it more explicit that this function is not intended for production use at the same time.
| :kqueue (KQueueIoHandler/newFactory) | ||
| :io-uring (let [cls (Class/forName "io.netty.channel.uring.IoUringIoHandler") | ||
| m (.getMethod cls "newFactory" (into-array Class []))] | ||
| (.invoke m nil (object-array []))))] |
There was a problem hiding this comment.
I suggest to change it so that we make the caller pass in the factory instance directly instead of re-dispatching on the transport key here, thus keeping create-io-event-loop-group decoupled from the transport-specific shenanigans.
| `(let [server# (http/start-server ~handler (merge server-options ~opts)) | ||
| pool# (make-pool)] | ||
| (try | ||
| (let [~'pool pool#] |
There was a problem hiding this comment.
Holy hygiene breakage 🦇 😬 Please pull out into a dynamic variable like in aleph.http-test or make it an explicit binding passed in by the caller 🙏 In fact, couldn't you simply move these tests to aleph.http-test, as well? I appreciate the impetus to split this monster test namespace up but if we do it, we should use a common set of test utilities.
| (http/get url | ||
| {:pool pool | ||
| :insecure? true | ||
| :body (str "req-" i)})))] |
There was a problem hiding this comment.
Note that this will max out at 8 concurrent requests due to the default connections-per-host. Might be worth making explicit here.
| (System/gc) | ||
| (System/runFinalization) | ||
| ;; Use a WeakReference sentinel to confirm GC has actually run and | ||
| ;; processed references. This is the standard OpenJDK ForceGC pattern. |
There was a problem hiding this comment.
Interesting, I didn't come across this pattern during my research for the initial implementation of the leak detector. In what sense is it a "standard"? A quick web search pointed me to a jlibs project but that seems more like that guy's bag of utils.
In any case, AFAICT it basically amounts to the same idea, just leaking a WeakReference instead of a Netty buffer? What do we gain from doing this in addition? Shouldn't it also suffice to just bump max-probe-gc-runs instead to achieve better detection probability when needed?
| * Load io_uring transport classes lazily via reflection (requires Java 9+; gracefully unavailable on Java 8) | ||
| * Add kqueue `osx-aarch_64` native transport for Apple Silicon | ||
| * Update `self-signed-ssl-context` for JDK 24+ compatibility via Netty `CertificateBuilder`, with `SelfSignedCertificate` fallback | ||
| * Fix deprecated `Class.newInstance()` usage in `CertificateBuilder` reflection (use `Constructor.newInstance()`) |
There was a problem hiding this comment.
No need to mention a fix for something that was never released.
Ah I think the problem is that the environment variable is only available when the pipeline was triggered by a branch on the main repository or so. I just tried to manually re-run it. But yeah, this is indeed not a test failure. It only serves as a workaround to mark the pipeline as completed. Otherwise, it would be indefinitely pending. See https://federicoterzi.com/blog/solving-github-status-stuck-on-pending-with-circlecis-approvals/ if you are interested in the gory details 😅
👍
Right, we could make it not fail in this case but then the pipeline would instead be considered pending (see above). Not sure if that's really an improvement 🤷 |
This pull request upgrades Aleph to
Netty 4.2.10.Final, modernizes transport support, and updates SSL certificate logic for compatibility with newer JDKs.Resolves #760 and fixes #756.
Changes
Netty version upgrade and transport improvements
4.1.130.Finalto4.2.10.Final. This is a breaking change.io_uringtransport from the incubator package to the stableio.nettypackage and updates imports.kqueuenative transport support for macOS on Apple Silicon (osx-aarch_64).IoHandlerandMultiThreadIoEventLoopGroupAPIs in Netty 4.2.SSL context and certificate generation
self-signed-ssl-contextto useCertificateBuilderfor compatibility with JDK 24+.SelfSignedCertificateon older JDKs.Deprecations and cleanup
ChannelOption/MAX_MESSAGES_PER_READoption.Build and testing
:test-unsafe-denyprofile for testing environments with strict memory access flags.WeakReferencesentinel pattern inforce-leak-detection!for deterministic GC confirmation across JDK 8 and JDK 21.CHANGES.mdwith a summary of breaking changes.