Skip to content

ci: live community-stack integration + minimal basic example#147

Merged
saurabhjain1592 merged 1 commit into
mainfrom
feature/qf-10-java-integration
Apr 28, 2026
Merged

ci: live community-stack integration + minimal basic example#147
saurabhjain1592 merged 1 commit into
mainfrom
feature/qf-10-java-integration

Conversation

@saurabhjain1592

Copy link
Copy Markdown
Member

Summary

Closes the QF-14 (Java SDK) row in the Phase 1 quality-freeze epic
(axonflow-business-docs/engineering/QUALITY_FREEZE_EPIC_2026-04-24.md).

The integration-test job in ci.yml was running mvn verify -DskipUnitTests against WireMock with continue-on-error: true
non-blocking theater. It never exercised a real agent, and even if the
WireMock stubs drifted from the real wire shape (they had), the
suppression hid the failure.

This PR replaces it with the Go SDK pattern: dedicated integration.yml
that runs the WireMock contract layer as a real blocking job AND adds a
live-stack job that clones the community repo, brings up docker compose, and runs a real example.

Ported from: axonflow-sdk-go/.github/workflows/integration.yml.

What this PR does

CI

  • contract-integration (every PR + push, NO continue-on-error):
    mvn verify -DskipUnitTests. Same WireMock test suite as before, but
    now blocks merges if it fails.
  • live-integration (push to main + schedule + workflow_dispatch;
    skipped on PR per Go pattern): clones community repo, docker compose up, installs the local SNAPSHOT to ~/.m2, runs
    examples/basic end-to-end against http://localhost:8080.
  • ci.yml loses its stale integration-test job — same mvn verify
    step now lives in integration.yml without the continue-on-error
    suppression.

New example: examples/basic/

One pom.xml + one Java class (Basic.java). Mirrors the Go SDK's
examples/basic/main.go structure: client init from env, health check,
single proxyLLMCall round-trip. The live-integration job runs it
against the live community stack as a smoke. Issue #146 tracks the
follow-up planning + connectors example dirs, which were scoped out
per the QF-10 "trivially small" constraint.

Pre-existing test fix surfaced by removing the suppression

AxonFlowIntegrationTest#generatePlanShouldReturnPlan stubbed
/api/v1/orchestrator/plan, but the Java SDK actually POSTs to
/api/request with request_type=multi-agent-plan and parses the
agent's enveloped {success, plan_id, data: {steps, domain, ...}, metadata} shape. Updated stub URL + response body to match the real
wire contract. Per the freeze policy ("no bug found gets postponed"),
this is fixed inside the same PR — it's exactly the class of drift the
old continue-on-error: true was hiding.

Self-review checklist

  • Action versions pinned (@v4)
  • permissions: contents: read scoped tight
  • No new secret refs introduced
  • No continue-on-error: true anywhere
  • Contract-integration job runs in <10 min (PR-blocking budget)
  • Live-integration timeout 25 min matches Go SDK + Java's slower start
  • No pom.xml version bump (per feedback_no_per_pr_version_bumps.md)
  • No AI attribution

Test plan

  • mvn test -B (full unit suite): BUILD SUCCESS — 1204 tests, 0 fail
  • mvn verify -DskipUnitTests -B (WireMock contract suite): BUILD
    SUCCESS — 12/12 in AxonFlowIntegrationTest, after fixing
    generatePlanShouldReturnPlan
  • mvn install -DskipTests -B then cd examples/basic && mvn -B compile: BUILD SUCCESS
  • CI: contract-integration green on this PR
  • CI: live-integration green on push to main (post-merge)

Out of scope (filed separately)

The integration-test job in ci.yml ran `mvn verify -DskipUnitTests` against
WireMock with `continue-on-error: true`, which masked failures and never
exercised a real agent. Replaces it with a Go-pattern dedicated workflow:
the WireMock contract layer becomes its own non-blocking-suppression job,
plus a new live-stack job that clones the community repo, brings up
docker compose, and runs a real example.

Mirrors axonflow-sdk-go/.github/workflows/integration.yml.

Changes:

- .github/workflows/integration.yml (new):
  - contract-integration: `mvn verify -DskipUnitTests` (WireMock).
    Runs on every PR + push. NO continue-on-error.
  - live-integration: clone community stack, docker compose up, run
    examples/basic against http://localhost:8080. Skipped on PR
    (Go pattern); runs on push to main + schedule + workflow_dispatch.
- .github/workflows/ci.yml: drop the stale `integration-test` job
  (continue-on-error wrapper around `mvn verify -DskipUnitTests`).
  The same step now lives in integration.yml without the suppression.
- examples/basic/: new minimal example exercising healthCheck() +
  proxyLLMCall() against a running agent. One pom.xml + one Java class.
  pom.xml declares axonflow-sdk 6.1.0; the live-integration job
  installs the local SNAPSHOT before running.

Pre-existing test fix (caught and fixed when the suppression came off):

- AxonFlowIntegrationTest#generatePlanShouldReturnPlan stubbed
  /api/v1/orchestrator/plan, but the Java SDK actually POSTs to
  /api/request with request_type=multi-agent-plan and reads the
  agent's enveloped {success, plan_id, data: {steps, domain, ...}}
  shape. Updated stub URL + response body to match the real wire
  contract. This is the kind of drift `continue-on-error: true` was
  hiding.

Issue #146 tracks the follow-up planning + connectors example dirs
(scoped out of this PR per QF-10's "trivially small" constraint).

Part of the Phase 1 quality-freeze QF-14 row.
@saurabhjain1592 saurabhjain1592 merged commit 4684ffa into main Apr 28, 2026
11 checks passed
@saurabhjain1592 saurabhjain1592 deleted the feature/qf-10-java-integration branch April 28, 2026 14:45
saurabhjain1592 added a commit that referenced this pull request Apr 28, 2026
* review fixes: integration.yml, basic example, surefire binding

Deep-review pass over QF-10 Java workstream (#147) caught several issues.
All P0+P1 findings addressed here.

Workflow correctness:
- `mvn verify -DskipUnitTests` was running unit tests AND integration
  tests redundantly because `skipUnitTests` was an unbound property —
  the flag was a no-op. Bound to surefire's <skipTests> via
  ${skipUnitTests} property in pom.xml; default false. Verified locally:
  `mvn verify -DskipUnitTests=true` now runs only the 12 failsafe
  integration tests, not 1211 surefire tests as well.
- Contract-integration job now matrixed across Java 11/17/21 (was JDK
  17 only). Matches the unit-test matrix in ci.yml so JDK-specific drift
  in WireMock contract tests fails at PR time.
- `mvn install -DskipTests` in live-integration now wrapped in a
  3-attempt retry loop, matching the pattern already used in ci.yml's
  Build/Test steps for transient Maven Central flakes.
- Example pom version pinning closed: live-integration now rewrites
  `examples/basic/pom.xml`'s axonflow-sdk version to match the parent
  pom's project.version before running. Previously the example pinned
  6.1.0 literally, so when the parent bumped (e.g. to 6.2.0) the example
  silently kept resolving 6.1.0 from Maven Central instead of the
  freshly-installed local SNAPSHOT.
- Logs ordering: capture docker logs BEFORE compose down (compose down
  destroys the containers, so the previous step ran AFTER teardown
  always returned empty).
- Persist logs to disk + upload as actions/upload-artifact so failure
  triage doesn't require re-running.
- Failsafe-reports artifact also uploaded on contract-integration
  failure (per JDK).
- docker compose up -d --wait --wait-timeout 120 — let compose's
  healthcheck do the polling work; kept curl /health loop as
  belt-and-suspenders.
- concurrency.group on workflow — back-to-back pushes won't spawn
  parallel docker stacks.
- Cron moved from Monday 06:00 UTC to Tuesday 06:00 UTC — failures
  land in EU/IN working hours, not weekend handover.

examples/basic/Basic.java:
- AxonFlow now created in try-with-resources. Without this OkHttp's
  dispatcher + connection pool keep the JVM alive ~60s after main()
  returns, eating into the workflow's 90s exec budget.
- userToken("demo-user") removed from the ClientRequest builder. The
  agent's JWT middleware rejects literal non-JWT strings; the SDK
  auto-populates user_token from clientId when omitted, which is what
  the example wanted.
- Catch-all `catch (Exception e)` replaced with narrow handlers:
  AuthenticationException + ConnectionException now exit non-zero
  (real failures), PolicyViolationException is logged as a valid
  outcome, only RuntimeException not in those classes is treated as
  community-fail-open. Smoke no longer masks regressions as success.
- Reads AXONFLOW_CLIENT_ID/SECRET from env without silent default —
  matches the TS basic example's pattern, exits 1 if missing.
- Added Step 3: listConnectors() — read-only, works on community,
  exercises a 4th SDK surface for free coverage.

CHANGELOG: example added + skipUnitTests fix listed under
[Unreleased] / Added + Fixed (user-facing).

* ci: skip jacoco when skipUnitTests is set

The contract-integration job runs `mvn verify -DskipUnitTests=true`,
which now actually skips surefire (per the pom binding in this PR).
But jacoco:check (bound to verify) still ran and failed: "branches
covered ratio is 0.04, but expected minimum is 0.35" — coverage data
came only from the 12 failsafe tests because surefire was skipped.

Coverage gating is the unit-test job's responsibility (ci.yml
`build (17)`). Pass `-Djacoco.skip=true` here so the contract job
doesn't double up on coverage enforcement.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant