test(e2e-http): parametrize compose runner by VECTOR_DB_TYPE / GRAPH_DB_TYPE + 2-combo CI matrix by earayu · Pull Request #1740 · apecloud/ApeRAG

earayu · 2026-04-27T11:42:14Z

Summary

The HTTP E2E suite previously only proved one fixed deployment shape (Qdrant + PG-graph) actually boots end-to-end. This PR parametrizes the compose runner by backend selection and runs the smoke suite under two production-shaped combos:

Lite — VECTOR_DB_TYPE=pgvector, GRAPH_DB_TYPE=postgresql (single-PG ApeRAG-Lite deployment)
Full — VECTOR_DB_TYPE=qdrant, GRAPH_DB_TYPE=neo4j (distributed deployment)

Connector-level swap correctness (PG/Neo4j/Nebula × Qdrant/pgvector) is already covered by tests/integration/compat/* (matrixed in compat-test.yml); this PR exercises the full FastAPI lifespan + bootstrap + Hurl suite under each shape.

Why 2 combos and not 3?

Neo4j and Nebula are mutually-exclusive graph backends — full E2E doesn't need both. Picked Neo4j for the Full combo; Nebula remains usable via GRAPH_DB_TYPE=nebula.
Compat-test matrix already covers the third dimension. Three full E2Es per PR mostly duplicates that signal at much higher CI cost.

Changes

tests/e2e_http/runners/compose/up.sh — accept VECTOR_DB_TYPE (qdrant|pgvector) and GRAPH_DB_TYPE (postgresql|neo4j|nebula); validate, idempotently rewrite .env so the api container picks them up, activate matching --profile. Defaults preserve historical behavior. Qdrant container always runs because the api's depends_on requires it healthy regardless of vector backend selection (avoids docker-compose surgery).
envs/docker.env.overrides — set PGVECTOR_DATABASE_URL to the same DSN as DATABASE_URL (required when VECTOR_DB_TYPE=pgvector, harmless when qdrant).
Makefile — add test-http-smoke-compose-lite and test-http-smoke-compose-full shortcuts.
.github/workflows/e2e-http-smoke.yml — convert e2e-http-smoke job to a fail-fast=false matrix over [lite, full]. Diagnostic dump now includes --profile neo4j --profile nebula to capture optional service logs. The provider-aware job (e2e-http-provider) is unchanged on default backends, since LLM-provider coverage is orthogonal to graph backend choice and matrixing it would double API cost.

Test plan

Local sanity-check of up.sh: validation rejects bogus values, env-var rewrite is idempotent (no duplication after re-runs), empty profile-flag array works under set -u, defaults preserve historical behavior
CI green on the new 2-combo matrix (lite + full)
CI green on the unchanged provider-aware job

Notes

Branch feat/e2e-db-combos is based on origin/main, no dependencies on other open work.
Closes context for an internal task (#测试 task enabling CI to check code formatting and lint errors #22).

🤖 Generated with Claude Code

…DB_TYPE + 2-combo CI matrix The compose runner ran a single fixed backend shape (Qdrant + PG-graph), so the HTTP E2E suite only proved one of the two production deployment shapes actually boots end-to-end. Connector-level swap correctness is covered by tests/integration/compat (matrixed in compat-test.yml), but that doesn't exercise the full FastAPI lifespan against the alternate backends. Changes: - runners/compose/up.sh: accept VECTOR_DB_TYPE (qdrant|pgvector) and GRAPH_DB_TYPE (postgresql|neo4j|nebula); validate, idempotently rewrite .env so the api container picks them up, activate matching --profile. Defaults preserve historical behavior (qdrant + postgresql). qdrant is always included in the service set because the api container's depends_on requires it healthy regardless of vector backend selection. - envs/docker.env.overrides: set PGVECTOR_DATABASE_URL to the same DSN as DATABASE_URL — required when VECTOR_DB_TYPE=pgvector, harmless when qdrant. - Makefile: add test-http-smoke-compose-{lite,full} shortcuts for the two shapes (Lite = pgvector + postgresql, Full = qdrant + neo4j). - .github/workflows/e2e-http-smoke.yml: convert e2e-http-smoke job to a fail-fast=false matrix over [lite, full]. Provider-aware job is unchanged (defaults), since LLM-provider coverage is orthogonal to graph backend choice and the matrix would double API-cost.

…ds_on (#1743) PR #1740 added a vector/graph backend matrix to e2e-http-smoke, but the infrastructure had two structural shortcomings that meant the matrix didn't deliver real coverage: 1. The api container's ``depends_on`` hard-required qdrant healthy regardless of vector backend choice, which forced every "deployment shape" to start qdrant. Lite (pgvector) actually ran with a live qdrant container next to it, masking any "code accidentally still calls qdrant" regression and making the matrix lanes structurally identical at the container level. 2. Each shape's (vector backend, graph backend, service set, profile set) was assembled inline in up.sh from two loose env vars, with no single name for the combination. Adding nebula or any new shape required edits across runner script, Makefile, and CI. Changes: - ``docker-compose.yml`` removes qdrant from api's ``depends_on``. ``aperag/vectorstore/connector.py``'s VectorStoreConnectorAdaptor is lazy-import + lazy-instantiate, so the api boots cleanly with ``VECTOR_DB_TYPE=pgvector`` even when no qdrant container is running. Local dev (``make stack-up``, bare ``docker compose up -d``) still starts qdrant unchanged. - ``tests/e2e_http/shapes/{lite,full-neo4j,full-nebula}.env`` are new shape definition files. Each declares the canonical (vector, graph, services, profiles) combo as one named form. Adding a new shape = adding one new file. - ``tests/e2e_http/runners/compose/up.sh`` accepts ``SHAPE=<name>`` as the canonical knob; it sources the shape file and applies its values. Direct ``VECTOR_DB_TYPE`` / ``GRAPH_DB_TYPE`` overrides are kept for backward compatibility with PR #1740's CI matrix and for one-off experimentation. In both modes the service list is now derived from the vector backend choice — lite shapes do not start qdrant anymore, which is the intended behavior. - ``Makefile``'s ``test-http-smoke-compose-{lite,full}`` targets switch to ``SHAPE=lite`` / ``SHAPE=full-neo4j`` so the CLI vocabulary lines up with the shape file names. - ``tests/e2e_http/runners/compose/README.md`` documents SHAPE. CI is intentionally NOT touched in this PR — the matrix from PR #1740 keeps working through the back-compat path. A follow-up PR replaces that matrix with two thin caller workflows (e2e-http-lite, e2e-http-full) that invoke a shared reusable workflow keyed on SHAPE.

…ng reusable shape workflow (PR-2 of 2) (#1744) * ci(e2e-http): split into 2 PR-triggered workflows (lite + full) sharing reusable shape workflow Follow-up to #1740 / #1743. Replaces #1740's intra-job matrix on ``e2e-http-smoke`` with two separate top-level workflows, each PR-triggered and each pinned to one deployment shape. The matrix in #1740 ran identical lanes through the smoke layer; the only difference between lanes was which lazy-init backend the api picked, and the smoke hurl files all set ``enable_*: false`` so neither lane actually exercised pgvector / PG-graph / qdrant / neo4j data paths. This PR moves the shape variation up one layer so each PR runs *both* the smoke layer *and* the provider-aware layer under each shape — the provider layer is where real ingest (embedding + KG extraction) and recall happen, so under this structure the lite job actually exercises the pgvector + PG-graph data path and the full job actually exercises qdrant + neo4j. LLM token cost per PR doubles; this was confirmed acceptable. Files: - ``e2e-http-shape.yml`` (NEW, reusable) — takes ``shape`` input, runs smoke + provider-preflight + provider-aware suite for that shape. ``SHAPE`` env var threads through to ``up.sh``, which sources the matching ``tests/e2e_http/shapes/<shape>.env`` (introduced in #1743). Artifact upload names include the shape so multi-shape runs do not clash. - ``e2e-http-lite.yml`` (NEW, top-level) — PR-triggered + manual; calls ``e2e-http-shape.yml`` with ``shape: lite`` (pgvector + PG-graph). - ``e2e-http-full.yml`` (NEW, top-level) — PR-triggered + manual; calls ``e2e-http-shape.yml`` with ``shape: full-neo4j`` (Qdrant + Neo4j). - ``e2e-http-smoke.yml`` (DELETED) — superseded. - ``cicd-pull-request.yml`` (DELETED) — superseded; the lite/full workflows now PR-trigger directly. Adding a new shape (e.g. nebula) is one new ``e2e-http-<name>.yml`` caller file plus the matching ``shapes/<name>.env`` from #1743. Doc references to the deleted workflow file names (e.g. ``docs/modularization/hurl-coverage-matrix.md``) are deferred to a follow-up sweep — the ``e2e-http-smoke`` *job* name still exists inside ``e2e-http-shape.yml``, so most of those references remain semantically correct. * ci/test: rename shapes to <vector>-<graph>; add qdrant-nebula workflow "full" was misleading because Qdrant+Neo4j and Qdrant+Nebula are equally "full" deployments. Switch to ``<vector>-<graph>`` naming with ``lite`` kept as the special name for the single-PG (pgvector + PG-graph) ApeRAG-Lite deployment: - ``full-neo4j`` → ``qdrant-neo4j`` - ``full-nebula`` → ``qdrant-nebula`` - ``lite`` unchanged The new naming scales: a future hybrid like ``pgvector-neo4j`` slots in cleanly without needing a "full / fuller / fullest" hierarchy. Also wires Nebula into PR-triggered CI per request — every PR now runs all three shapes through both the smoke and the provider-aware layer. LLM token cost per PR triples; this is acceptable for now and a fake provider is on the roadmap to drop the cost back down. Files: - ``tests/e2e_http/shapes/qdrant-{neo4j,nebula}.env`` (renamed; header comments updated) - ``.github/workflows/e2e-http-qdrant-neo4j.yml`` (renamed from ``e2e-http-full.yml``; display name "E2E HTTP Qdrant + Neo4j"; ``shape: qdrant-neo4j``) - ``.github/workflows/e2e-http-qdrant-nebula.yml`` (NEW; display name "E2E HTTP Qdrant + Nebula"; ``shape: qdrant-nebula``) - ``Makefile`` shortcuts: ``test-http-smoke-compose-qdrant-neo4j`` / ``-qdrant-nebula`` (old ``-full`` target dropped — only existed in #1743 / origin/main, not yet referenced anywhere external) - ``tests/e2e_http/runners/compose/README.md`` reflects new names * test(unit): update e2e-http workflow contract test for new shape structure

earayu merged commit 0c73b91 into main Apr 27, 2026
5 checks passed

earayu deleted the feat/e2e-db-combos branch April 27, 2026 11:48

earayu mentioned this pull request Apr 27, 2026

test(e2e-http): introduce SHAPE concept + remove qdrant from api depends_on (PR-1 of 2) #1743

Merged

3 tasks

earayu mentioned this pull request Apr 27, 2026

ci(e2e-http): split into 2 PR-triggered workflows (lite + full) sharing reusable shape workflow (PR-2 of 2) #1744

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(e2e-http): parametrize compose runner by VECTOR_DB_TYPE / GRAPH_DB_TYPE + 2-combo CI matrix#1740

test(e2e-http): parametrize compose runner by VECTOR_DB_TYPE / GRAPH_DB_TYPE + 2-combo CI matrix#1740
earayu merged 1 commit into
mainfrom
feat/e2e-db-combos

earayu commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

earayu commented Apr 27, 2026

Summary

Why 2 combos and not 3?

Changes

Test plan

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant