Skip to content

review fixes: integration.yml, basic example, surefire binding#149

Merged
saurabhjain1592 merged 2 commits into
mainfrom
feature/qf-10-java-review-fixes
Apr 28, 2026
Merged

review fixes: integration.yml, basic example, surefire binding#149
saurabhjain1592 merged 2 commits into
mainfrom
feature/qf-10-java-review-fixes

Conversation

@saurabhjain1592

Copy link
Copy Markdown
Member

Summary

Deep-review pass over the QF-14 Java workstream (#147) caught several P0/P1 issues. All addressed in this single follow-up. CHANGELOG updated.

P0 — real bugs

  1. Basic.java never closed AxonFlow — the class implements Closeable and shuts down OkHttp's dispatcher + connection pool in close(). Without try-with-resources, OkHttp's non-daemon threads keep the JVM alive ~60s after main returns; the workflow's timeout 90 mvn -q compile exec:java was burning most of its budget waiting for JVM exit. Wrapped the client lifecycle in try-with-resources.

  2. userToken("demo-user") was rejected by the agent's JWT middleware. Real stacks validate user_token against a JWT secret; sending a literal non-JWT string returns 401. The catch-all catch (Exception e) then printed it as "expected on community without LLM", masking auth failure as success. Removed .userToken(...) from the ClientRequest builder — the SDK auto-populates user_token from clientId when omitted (AxonFlow.java:1304-1307).

  3. Catch-all Exception swallowed everything in Step 2 — AuthenticationException, ConnectionException, even NullPointerException from a SDK regression. Narrowed to specific handlers: Authentication/Connection exit non-zero, PolicyViolation is a valid outcome, only RuntimeException not in those classes is treated as community-fail-open. The smoke is no longer the same kind of theater that the old continue-on-error: true was.

P1 — workflow / pom hygiene

  1. -DskipUnitTests was an unbound property — running unit tests redundantly. pom.xml had no binding for skipUnitTests, so mvn verify -DskipUnitTests -B -U ran the 1211 surefire unit tests AND the 12 failsafe integration tests every PR. Bound to surefire's <skipTests> via the new <skipUnitTests>false</skipUnitTests> property. Verified locally: mvn verify -DskipUnitTests=true now runs only the 12 failsafe tests.

  2. mvn install -DskipTests had no retry in the live-integration job, while every other Maven step in ci.yml has a 3-attempt retry loop for transient Central flakes. Single Central blip would kill the whole job. Added the same loop.

  3. examples/basic/pom.xml version pin would silently drift on parent bump. It pinned <version>6.1.0</version>, and 6.1.0 is on Maven Central. When the parent bumps to 6.2.0, the example would keep resolving the OLD 6.1.0 from Central instead of the freshly-installed local SNAPSHOT — testing the wrong SDK against the new agent. Added a workflow step that rewrites the example pom's version to match mvn help:evaluate -Dexpression=project.version before running. Pin in the file is now informational; CI guarantees the dep matches.

  4. Contract-integration was JDK 17 only; the unit-test matrix in ci.yml is 11/17/21. Matrixed contract-integration the same way to catch JDK-specific WireMock drift at PR time. Cheap (~3 minutes per JDK).

  5. No artifact upload for triage. Added actions/upload-artifact@v4 for failsafe-reports (per JDK) on contract-integration failure, and docker-compose-logs on live-integration failure. Logs are now persisted to disk before teardown, captured even when triage happens hours later.

P1 — workflow correctness

  1. Show docker logs on failure ran AFTER Stop community stackcompose down --volumes --remove-orphans destroys the containers; subsequent compose logs returns nothing. Reordered: capture/upload first (if: failure()), then teardown (if: always()).

P2 — hardening

  • docker compose up -d --wait --wait-timeout 120 so compose's own healthcheck does part of the wait work.
  • concurrency.group: integration-${{ github.ref }} so back-to-back pushes don't spawn parallel docker stacks.
  • Cron moved from '0 6 * * 1' (Monday 06:00 UTC) to '0 6 * * 2' (Tuesday 06:00 UTC) — failures land in EU/IN working hours.
  • Basic.java Step 3: added listConnectors() — free coverage of a 4th SDK surface, read-only, works on community.

CHANGELOG

User-facing entries under [Unreleased]:

  • Added: examples/basic/ (was implicit before — example existed but wasn't in CHANGELOG)
  • Fixed: mvn verify -DskipUnitTests=true now actually skips unit tests

CI / workflow plumbing intentionally NOT in CHANGELOG per feedback_changelog_no_internal_entries.md.

Self-review checklist

  • Action versions pinned (@v4)
  • permissions: contents: read scoped tight
  • No new secret refs
  • No continue-on-error: true
  • No pom.xml version bump
  • No AI attribution

Test plan

  • mvn install -DskipTests -B clean (parent + example)
  • mvn test -B (full unit suite): BUILD SUCCESS — 1211 tests, 0 fail
  • mvn verify -DskipUnitTests=true -B: BUILD SUCCESS, only 12 failsafe tests run (was running 1211+12 redundantly before this PR)
  • cd examples/basic && mvn -B compile clean
  • Version-injection regex tested locally on a synthetic pom — rewrites 5.5.0 → parent version correctly across whitespace boundaries
  • CI: contract-integration green on JDKs 11/17/21
  • CI: live-integration green on push to main (basic example smoke + version sync + listConnectors)

Deep-review pass over QF-10 Java workstream (#147) caught several issues.
All P0+P1 findings addressed here.

Workflow correctness:
- `mvn verify -DskipUnitTests` was running unit tests AND integration
  tests redundantly because `skipUnitTests` was an unbound property —
  the flag was a no-op. Bound to surefire's <skipTests> via
  ${skipUnitTests} property in pom.xml; default false. Verified locally:
  `mvn verify -DskipUnitTests=true` now runs only the 12 failsafe
  integration tests, not 1211 surefire tests as well.
- Contract-integration job now matrixed across Java 11/17/21 (was JDK
  17 only). Matches the unit-test matrix in ci.yml so JDK-specific drift
  in WireMock contract tests fails at PR time.
- `mvn install -DskipTests` in live-integration now wrapped in a
  3-attempt retry loop, matching the pattern already used in ci.yml's
  Build/Test steps for transient Maven Central flakes.
- Example pom version pinning closed: live-integration now rewrites
  `examples/basic/pom.xml`'s axonflow-sdk version to match the parent
  pom's project.version before running. Previously the example pinned
  6.1.0 literally, so when the parent bumped (e.g. to 6.2.0) the example
  silently kept resolving 6.1.0 from Maven Central instead of the
  freshly-installed local SNAPSHOT.
- Logs ordering: capture docker logs BEFORE compose down (compose down
  destroys the containers, so the previous step ran AFTER teardown
  always returned empty).
- Persist logs to disk + upload as actions/upload-artifact so failure
  triage doesn't require re-running.
- Failsafe-reports artifact also uploaded on contract-integration
  failure (per JDK).
- docker compose up -d --wait --wait-timeout 120 — let compose's
  healthcheck do the polling work; kept curl /health loop as
  belt-and-suspenders.
- concurrency.group on workflow — back-to-back pushes won't spawn
  parallel docker stacks.
- Cron moved from Monday 06:00 UTC to Tuesday 06:00 UTC — failures
  land in EU/IN working hours, not weekend handover.

examples/basic/Basic.java:
- AxonFlow now created in try-with-resources. Without this OkHttp's
  dispatcher + connection pool keep the JVM alive ~60s after main()
  returns, eating into the workflow's 90s exec budget.
- userToken("demo-user") removed from the ClientRequest builder. The
  agent's JWT middleware rejects literal non-JWT strings; the SDK
  auto-populates user_token from clientId when omitted, which is what
  the example wanted.
- Catch-all `catch (Exception e)` replaced with narrow handlers:
  AuthenticationException + ConnectionException now exit non-zero
  (real failures), PolicyViolationException is logged as a valid
  outcome, only RuntimeException not in those classes is treated as
  community-fail-open. Smoke no longer masks regressions as success.
- Reads AXONFLOW_CLIENT_ID/SECRET from env without silent default —
  matches the TS basic example's pattern, exits 1 if missing.
- Added Step 3: listConnectors() — read-only, works on community,
  exercises a 4th SDK surface for free coverage.

CHANGELOG: example added + skipUnitTests fix listed under
[Unreleased] / Added + Fixed (user-facing).
The contract-integration job runs `mvn verify -DskipUnitTests=true`,
which now actually skips surefire (per the pom binding in this PR).
But jacoco:check (bound to verify) still ran and failed: "branches
covered ratio is 0.04, but expected minimum is 0.35" — coverage data
came only from the 12 failsafe tests because surefire was skipped.

Coverage gating is the unit-test job's responsibility (ci.yml
`build (17)`). Pass `-Djacoco.skip=true` here so the contract job
doesn't double up on coverage enforcement.
@saurabhjain1592 saurabhjain1592 merged commit 6fa1e62 into main Apr 28, 2026
14 checks passed
@saurabhjain1592 saurabhjain1592 deleted the feature/qf-10-java-review-fixes branch April 28, 2026 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant