fix(telemetry): synchronous send + version-alignment CI#136
Merged
Conversation
Java SDK had the same telemetry-delivery bug as Python (#1692) and Go (#1693): CompletableFuture.runAsync(lambda) without an explicit executor submits to ForkJoinPool.commonPool(), whose threads are daemon by default since Java 8. When the JVM's main thread exits, those daemon threads are killed mid-flight — the OkHttpClient.newCall().execute() inside the lambda is abandoned. CLI Java binaries, AWS Lambda Java handlers, serverless cold-starts, and quickstart scripts silently drop telemetry pings with no error visible to the caller. Per the 2026-04-24 SDK-telemetry investigation, this is a likely major contributor to the 0 external-confirmed Java records we've observed to date. ## Fix (TelemetryReporter.java) - Replaced CompletableFuture.runAsync(...) with synchronous execution. Blocks the caller briefly: ~350ms warm / ~1.3s cold on a reachable checkpoint, bounded at TIMEOUT_SECONDS on an unreachable one. Acceptable for a control-plane SDK's construction path, matching the Go SDK pattern shipped in #1693. - Shared monotonic deadline across /health probe and checkpoint POST. Previously detectPlatformVersion (2s timeout) and the checkpoint POST (3s timeout) had independent timeouts that could stack to ~5s. detectPlatformVersion now takes a budgetMs parameter derived from the shared deadline; POST uses whatever is left. - Both operations skip when remaining budget is below MIN_BUDGET_MS (100ms) to avoid issuing calls that are guaranteed to time out. ## Regression test (TelemetryReporterShortLivedTest.java) Verifies the core invariant: when sendPing returns, the HTTP round-trip has already completed (not still-pending on a dying daemon thread). Uses a WireMock server with a fixed-delay response; measures elapsed time to confirm sendPing actually blocked. Verified: - FAIL at 0.070s when reverted to CompletableFuture.runAsync - PASS at 0.971s with the fix So a future regression to fire-and-forget is caught by CI, not by missing telemetry in production. ## CHANGELOG Added [Unreleased] section with terse one-line entries per the telemetry-CHANGELOG-minimal rule: delivery fix + shared-deadline bound + the alignment-check addition below. ## Version alignment CI (.github/scripts + .github/workflows) Mirrors the pattern just added in axonflow-sdk-go PR #130 and already present in the platform repo. Script compares pom.xml <version> against the first released '## [X.Y.Z]' section in CHANGELOG.md; CI runs on any PR or push to main that touches either file. Prevents the drift pattern where a release ships to Maven Central but the repo's pom stays behind (and the inverse: pom bumped but CHANGELOG still shows the prior version as latest released). Verified the script locally: PASS on current state (pom 5.7.0 == CHANGELOG 5.7.0); FAIL when pom is manually mismatched to 5.8.0. Full test suite: 1,200 tests, 0 failures. Closes getaxonflow/axonflow-enterprise#1706.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes getaxonflow/axonflow-enterprise#1706
Summary
Java SDK had the same telemetry-delivery bug as Python (#1692) and Go (#1693): `CompletableFuture.runAsync(lambda)` without an explicit executor submits to `ForkJoinPool.commonPool()`, whose threads are daemon by default since Java 8. When the JVM's main thread exits, those daemon threads are killed mid-flight — the HTTP POST is abandoned. CLI binaries, AWS Lambda handlers, serverless cold-starts, and quickstart scripts silently drop their telemetry.
Missed yesterday because I tested Python/Go/TS but not Java; surfaced today when reviewing why TS/Java showed near-zero external pings.
Fix
`TelemetryReporter.sendPing()`:
Matches the shape shipped for Go SDK in #128 (axonflow-enterprise#1693).
Regression test
`TelemetryReporterShortLivedTest.java` verifies the core invariant: when `sendPing` returns, the HTTP round-trip has already completed — not still pending on a dying daemon thread. Uses a WireMock server with a 200ms fixed-delay response; asserts elapsed time >= 150ms (sync must have blocked). Under revert to `runAsync`: FAIL at 0.070s (returns immediately). With fix: PASS at 0.971s (blocks for the delay).
Also included: version alignment CI
New `.github/scripts/validate-version-alignment.sh` + `.github/workflows/validate-version-alignment.yml`.
Mirrors the pattern in the platform repo and the Go SDK's PR #130. Compares `pom.xml ` against the first released `## [X.Y.Z]` section in `CHANGELOG.md`; CI runs on any PR or push to main that touches either file. Prevents both drift patterns: pom behind CHANGELOG (manifest didn't get bumped) and CHANGELOG behind pom (tag shipped without CHANGELOG entry).
Verified locally:
Test plan
Post-merge
CHANGELOG has `[Unreleased]` entries. Bundle into the next Java SDK release (v5.7.1 patch or later minor) per the "commit per published version" rule.