feat(celery Wave 5): single-PR close-out 16 backlog items (5-phase chunked-rotation) by earayu · Pull Request #1733 · apecloud/ApeRAG

earayu · 2026-04-27T10:04:36Z

Summary

Wave 5 indexing redesign close-out — 16 backlog items in single PR with 5-phase chunked-rotation commits (per architect msg=11442bbf reframe + earayu2 msg=eced858d "大PR 快速落地" directive). Same big-PR pattern as Wave 4 PR #1731.

§K.9 spec amendment locked + Phase 1 chunk 1 (relocate build_collection_llm_callable + render_extraction_prompt + ENTITY_RELATION_EXTRACTION to aperag/indexing/llm.py) shipped.

Phases

Phase 1 (Bryce, sequential first): Legacy graphindex package elimination + retrieval/curation migration to §G.5 read primitives + aperag/indexing/llm.py relocate
Phase 2 (Bryce, post-Phase 1): T7 multimodal vision-LLM 3-item bundle (embed_image API + parser image extraction + provider capability flag)
Phase 3 (chenyexuan, parallel with Phase 2): Layer 2 e2e fixture wiring + sweep D activation
Phase 4 (chenyexuan, parallel with Phase 1): P1 production robustness 3-pack (T2 reconciler + T3 parse short-circuit + T2 cleanup error split)
Phase 5A (Bryce): P2 batch (T1 config tunability + extractor attribute marker + W5-perf parallel-list + Neo4j label namespace + Cypher type rename)
Phase 5B (chenyexuan): P2 batch (chunk_id schema unify + cleanup builder share + tenant org-prefix + utc_now unify + OTLP config cross-check)

Tasks

§K.9 Wave 5 spec section in design pack
Phase 1 chunk 1: aperag/indexing/llm.py relocate
Phase 1 cascade: retrieval/curation caller migration
Phase 1 cleanup: delete legacy graphindex/{storage,service,integration,engine,__init__}.py
Phase 2: T7 multimodal vision-LLM 3-item bundle
Phase 3: Layer 2 e2e fixture wiring
Phase 4: P1 production robustness 3-pack
Phase 5A/B: P2 polish batches

Test plan

ruff check ./aperag ./tests clean (per phase)
ruff format --check ./aperag ./tests clean (per phase)
Existing test suites pass after each phase (179+ tests)
Phase 1 grep-zero verify: aperag/indexing/* 不 cross-ref legacy graphindex/storage/*
Phase 2 embed_image API + is_multimodal=True real wire — chunk 4b vision gate self-disable verify
Phase 3 Layer 2 stub activated against real backends
Phase 4 transient-vs-intentional cleanup error semantic + parse_version short-circuit + reconciler re-enqueue verified
CI 4/4 (lint-and-unit + e2e-http-compose lanes)

🤖 Generated with Claude Code

…locate (legacy graphindex elimination first chunk) Wave 5 task #26 first chunk per architect msg=10b5fae6 §K.9 spec amendment + PM msg=af82797e dispatch (single Wave 5 PR with 5-phase chunked-rotation commits, big-PR fast-landing per earayu2 msg=eced858d). Two changes bundled (per PM msg=af82797e "Phase 1 first commit = §K.9 spec paste + legacy graphindex elimination 实现起步"): 1. **§K.9 Wave 5 spec section** added to design pack: - 5-phase commit roadmap (16 acceptance items) - production-readiness 三类 layer per phase - architect direct ratify lane scope per phase - pre-check pattern lock per phase (per ``feedback_spec_lock_grep_verify_caller.md`` 双 pattern) - Layer 1 same-session continuation directive (per ``feedback_no_refresh_complete_all_tasks.md``) - Wave 5 acceptance summary (16 items + owner table) 2. **`aperag/indexing/llm.py` relocate** (§K.9.1 acceptance item 3): - new module owns canonical ``build_collection_llm_callable`` + ``render_extraction_prompt`` + ``ENTITY_RELATION_EXTRACTION`` template + ``LLMCall`` type alias - legacy ``aperag/domains/knowledge_graph/graphindex/integration.py`` and ``prompts.py`` re-export from new location during Wave 5 deprecation window so legacy retrieval/curation callers keep working until Phase 1 close-out - ``aperag/indexing/graph_extractor.py`` updated to import from new ``aperag.indexing.llm`` module directly (instead of legacy graphindex bridge) - test ``test_graph_extractor.py`` monkey-patch sites updated to target ``aperag.indexing.llm`` module Pre-check pattern 1 grep-verify caller cascade (per architect msg= b26f64b2 chunk 4d Option C ruling, used as reference list for Phase 1 migration scope): - ``aperag/indexing/`` → 0 cross-references to legacy ``graphindex/storage/*`` (chunk 4d narrowed scope invariant maintained ✅) - legacy ``graphindex/integration.py`` and ``prompts.py`` callers: ``retrieval/pipeline.py:85`` / ``knowledge_graph/service.py:69+`` / ``graph_curation/service.py:37`` / ``graph_curation/integration.py:21`` / ``service/prompt_template_service.py:161`` — remaining migrations land in subsequent Phase 1 commits Local gates green: - ``ruff check ./aperag ./tests`` clean - ``ruff format --check ./aperag ./tests`` clean (509 files) - ``pytest test_graph_extractor.py + test_full_indexing_pipeline.py + test_worker_factory.py + tests/unit_test/indexing/``: 179 passed / 2 skipped (Layer 2 stubs) Production-readiness three-class layer (Phase 1 partial): - must-be-real: ``aperag.indexing.llm`` module is the canonical production home for the relocated LLM helpers (no import indirection through legacy package once retrieval/curation migration completes) - may-be-gated: legacy ``graphindex/{integration,prompts}.py`` re-export shims kept active during Wave 5 deprecation window so cross-cutting refactor can land incrementally without breaking legacy retrieval/curation flow - fully-resolves: §K.9.1 Wave 5 acceptance item 3 (`aperag/indexing/llm.py` relocate); items 1 (legacy graphindex package elimination) and rest of Phase 1 cascade migration land in subsequent commits Next Phase 1 commits: migrate ``retrieval/pipeline.py`` / ``knowledge_graph/service.py`` / ``graph_curation/*`` to §G.5 read primitives; once all callers migrated → delete legacy ``graphindex/{storage, service, integration, engine, __init__}.py`` + delete legacy tests; final commit verifies grep-zero invariant. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ON import to aperag/indexing/llm Wave 5 task #26 chunk 2 per §K.9.1 acceptance item 3 cascade (legacy graphindex caller migration): `aperag/service/prompt_template_service.py:155-163` (the `get_default_prompt(prompt_type="graph")` branch) was importing the ENTITY_RELATION_EXTRACTION template from the legacy `aperag.domains.knowledge_graph.graphindex.prompts` module. Per chunk 1 relocate (`11113acb`), the canonical home for the template is now `aperag.indexing.llm`. This commit migrates the import site so the caller no longer transitively touches the legacy module. Behavior unchanged — the legacy `graphindex/prompts.py` shim already re-exports the template from the new location, so this is a pure import-path refactor that leaves the runtime payload identical. Local gates: - `ruff check ./aperag ./tests` clean - `ruff format --check ./aperag ./tests` clean (509 files) Phase 1 cascade progress: - chunk 1 ✅ (`11113acb`): §K.9 spec + `aperag/indexing/llm.py` relocate - **chunk 2** (this commit): `service/prompt_template_service.py` import relocate - chunk 3+ pending: `retrieval/pipeline.py` / `knowledge_graph/service.py` / `graph_curation/*` migration (these need new design — legacy 24-method `GraphStore` Protocol → new 10-method `LineageGraphStore` Protocol API surface is NOT a 1-to-1 replacement; per Bryce msg=30c7e994 scope reality check + architect msg=b052b1b4 cascade ratify lane lock) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ent/intentional split + parse_version short-circuit + reconciler stuck-parse re-enqueue Wave 5 Phase 4 (PR-C / task #29) — closes 3 latent issues surfaced through Wave 4 ratify trail: * **P4-1 cleanup transient-vs-intentional split** (per architect msg=6aa8ca88 T2 ratify minor obs A): Pre-Wave-5 ``_resolve_cleanup_worker`` collapsed any factory exception into "drop the DB row". Transient infra errors (Qdrant blip, ES network glitch) lost the retry signal — DB row was deleted before next cycle could retry. Wave 5 P4 returns a new ``CleanupWorkerResolution(worker, transient)`` distinguishing the two: ``WorkerFactoryError`` is a by-design gate (drop the row to bound index growth), any other Exception is transient (skip DB row drop so next cycle retries when backend recovers). Counts surface ``transient_deferred`` so operators can track recovery rate. * **P4-2 parse_version short-circuit** (per huangheng T3 chunk 2 obs B + Wave 5 backlog item): Pre-Wave-5 ``parse_document`` always re-runs DocParser even when the resulting artifact directory is byte-identical (parse_version is content-derived). Wave 5 P4 adds ``short_circuit_if_artifacts_exist=True`` default — if all three derived artifacts (markdown.md / outline.json / chunks.jsonl) already exist under the canonical ``derived/parse_<version>/`` path, skip DocParser + writes entirely. Eliminates the ~30s OCR / Word rerun cost on rebuild of unchanged content. Tests can pass ``False`` to force re-parse. * **P4-3 reconciler stuck-document parse re-enqueue** (per architect Wave 4 T3 chunk 2 obs A — production gap close): Pre-Wave-5 parse failures (DocParser raise / source missing) silently dropped the document_id in the parse worker; operator saw ``document.status == PENDING`` forever with no signal to re-trigger. Wave 5 P4 adds ``reconcile_stuck_documents_for_parse_reenqueue`` to the reconciler loop. Detects documents with ``Document.gmt_created < now - cooldown_seconds`` AND zero ``document_index`` rows AND ``Document.gmt_updated < now - cooldown_seconds`` (cooldown filter prevents 30-s tick storm), then pushes a fresh ``ParseDispatchPayload`` matching the upload handler's contract. ``gmt_updated`` bumps after each push so the cooldown predicate rate-limits re-enqueue. Three-class tag (Wave 3 production-readiness invariant): * must-be-real: cleanup transient/intentional split prevents silent retry-signal loss; parse_version short-circuit eliminates OCR rerun waste; stuck-parse reconciler closes the document.status PENDING-forever gap * may-be-gated: ``short_circuit_if_artifacts_exist=False`` for tests pinning DocParser invocation count * fully-resolves: 3 of Wave 5 P1 backlog items (per architect ratify trail msg=6aa8ca88 + huangheng obs trail) Deltas: * ``aperag/indexing/cleanup.py`` — ``CleanupWorkerResolution`` dataclass + WorkerFactoryError-vs-Exception split in ``_resolve_cleanup_worker``; both ``cleanup_orphan_parse_versions`` + ``cleanup_for_deleted_documents`` honor ``resolution.transient`` to skip DB row drop on transient infra failures; collection cascade aggregates ``transient_deferred``. * ``aperag/indexing/parser.py`` — ``_all_artifacts_present`` predicate + ``short_circuit_if_artifacts_exist`` parameter + early-return ParseResult when all 3 artifacts present. * ``aperag/indexing/reconciler.py`` — ``STUCK_PARSE_COOLDOWN_SECONDS`` + ``reconcile_stuck_documents_for_parse_reenqueue`` async scan + ``_select_stuck_documents_for_reenqueue`` SQL query + ``_build_parse_payload_for_document`` payload reconstruction + ``_resolve_collection_parser_config`` / ``_resolve_collection_modalities`` (mirror ``document_service`` shape) + ``_mark_stuck_documents_reenqueued`` bumps gmt_updated; ``run_reconcile_loop`` calls the new scan. * ``aperag/indexing/__init__.py`` — re-export new symbols. * ``tests/integration/test_p4_robustness_3pack.py`` (new) — 13 tests across 3 layers (cleanup transient/intentional split / parse short-circuit / reconciler re-enqueue with cooldown + payload shape + skip-when-indexed + skip-when-no-object-path). * ``tests/unit_test/indexing/test_t3_1_dispatcher_path_c.py`` — added ``transient_deferred: 0`` to expected empty-counts dict. Local gates: * ``pytest tests/unit_test/indexing/ tests/integration/`` — 232 passed / 48 skipped (incl. 13 new ``test_p4_robustness_3pack``). * ``ruff check aperag/ tests/integration/test_p4_robustness_3pack.py`` — clean. * ``ruff format`` — applied. Branch: local ``chenyexuan/celery-wave5-p4`` based on main ``19d3d70`` (Wave 4 PR #1731 squash); will rebase onto ``bryce/celery-wave5`` once Bryce opens the Wave 5 draft PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…e 6 backlog Per architect Option (a) ruling 2026-04-27 (post-cascade scope check msg=30c7e994 + ratify msg=8780c937): * §K.9 Phase 1 narrowed scope: relocate-only (chunks 1+2 already shipped via `11113acb` + `e0707165`). Original Phase 1 scope (delete legacy `graphindex` package + migrate retrieval/curation callers) collided with unresolved design gap — legacy `GraphIndexService. query_context()` 24-method LightRAG-style API has no equivalent on the new 10-method `LineageGraphStore` Protocol; building that query layer is 1-2 week design + implementation effort, not in-scope for Wave 5 single-PR fast-landing style. * §K.10 Wave 6 backlog added: full legacy graphindex elimination + retrieval/curation migration + new LightRAG-style query layer design moves to Wave 6 separate PR with its own architect ratify lane (per chunk 4d Option C ruling msg=b26f64b2 + Wave 4 §K.8 precedent). * Wave 5 acceptance summary updated: 15/16 items in Wave 5 PR (item 1 legacy graphindex elimination → Wave 6). * Phase 1 close-out signal: chunk 1 (`11113acb`) +chunk 2 (`e0707165`) ship llm.py relocate + import re-route to new canonical home; legacy `graphindex/{integration,prompts}.py` keep deprecation-shim re-exports during the Wave 6 deprecation window so legacy retrieval/curation callers do not break mid-Wave-5. This commit closes Phase 1 narrowed scope and unblocks Phase 2 (T7 multimodal vision-LLM 3-item bundle) per architect ruling. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Wave 5 Phase 3 (task #28) — replaces the pre-Wave-5 ``pytest.skip`` stubs in ``test_full_indexing_pipeline.py`` Layer 2 with functional test bodies that the e2e-http-compose CI lane can execute against the live backend stack. Per architect msg=fdd53586 + chunk 4d+4e ratify msg=c279a0ff — Layer 2 contract was declared but skip-stub-only; this phase wires the actual implementation so the canonical Phase 1 invariant runs end-to-end whenever the operator opts in via ``RUN_E2E_PHASE1_SMOKE=1``. Two test bodies wired: * ``test_phase1_full_pipeline_vector_fulltext_summary_active_graph_vision_failed`` — the canonical Phase 1 smoke (architect msg=da3012a4): real ``ProductionWorkerFactory`` + parse_document + dispatch_indexing + worker pool drive until every modality finalises. Asserts vector + fulltext + summary reach ACTIVE; graph + vision finalise FAILED with a gate marker (``Wave 4 wiring`` chunk 4b / ``completion model`` post-T1 self-disable / ``multimodal`` vision gate). The gate-marker OR tolerates the same-Wave T1 self-disable surface so the test stays green across the chunk 4b → T1 closure. * ``test_phase1_multi_keyword_fulltext_search_returns_hits`` — sweep D verification (architect msg=fdd53586): index a real document via the worker pool, then issue a 3-keyword query through ``_fulltext_search`` and assert ≥1 hit. The retrieval- side ``minimum_should_match`` arithmetic over N×content + N×title should-clauses (huangheng msg=fb64468c flag) is the latent issue this exercises end-to-end. New fixture machinery: * ``_phase1_e2e_skip_reason()`` — central skip-reason resolver that documents the activation contract: ``RUN_E2E_PHASE1_SMOKE=1`` + ``PHASE1_E2E_COLLECTION_ID`` (pointing at a Collection seeded by the e2e-http-compose bootstrap with a real model provider configured) + backend env vars (``DATABASE_URL`` / ``ES_HOST`` / Qdrant + Redis env vars). Both Layer 2 tests share the same skip predicate so the lane runner only needs one env-var setup. * ``_resolve_phase1_e2e_engine()`` — opens a real Postgres engine using ``settings.database_url``; skips with a clear reason if unset. * ``_run_phase1_workers_until_quiet()`` — bounded async loop that drives ``ProductionWorkerFactory`` directly (mirroring the ``run_*_worker`` lifespan path) until every modality row reaches a terminal status. Bounded by ``timeout_seconds`` so a hung modality fails the test loud rather than blocking forever. Three-class tag (Wave 3 production-readiness invariant): * must-be-real: live ProductionWorkerFactory + real backends + real ES query * may-be-gated: skips when env vars missing (local-dev never requires the full stack) * fully-resolves: 2 of Wave 5 P1 backlog items (Layer 2 stubs activated for canonical full pipeline + sweep D multi-keyword smoke) Cleanup roundtrip (delete document → cleanup loop → backend artefacts gone) is left as a follow-up sub-test to land alongside the e2e-http-compose lane document-delete API access scaffolding. Local gates: * ``pytest tests/integration/test_full_indexing_pipeline.py`` — 4 passed / 2 skipped (Layer 2 skips cleanly with the new ``_phase1_e2e_skip_reason()`` until env vars set in CI lane). * ``pytest tests/unit_test/indexing/ tests/integration/`` — 232 passed / 48 skipped — no regression vs P4 baseline. * ``ruff check aperag/ tests/integration/test_full_indexing_pipeline.py`` — clean. * ``ruff format`` — applied. Branch: ``chenyexuan/celery-wave5-p4`` rebased on ``bryce/celery-wave5`` HEAD ``99af7965`` (Phase 1 chunk 3 spec amend) → push to ``bryce/celery-wave5`` so Wave 5 single-PR pile continues. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ace (T7 item 1) Wave 5 task #27 chunk 1 per §G.2.5.1 amendment item 1: ``EmbeddingService.embed_image(image_bytes, alt_text)`` API surface is the canonical multimodal embedding entry point that replaces the Wave 3 placeholder ``embed_query(f"{image_id}|{alt_text}")`` string- concat (Wave 3 lesson #10 broken pattern). Behavior: * When ``self.multimodal=True`` (operator wired a multimodal embedder on the collection's spec), encode image bytes as base64 data URL + forward to LiteLLM ``embedding(input=[{"image_url":{"url":...}}, ...])`` shape that multimodal-capable providers (Voyage / Jina v3 / OpenAI multimodal / etc.) accept natively. * When ``self.multimodal=False``, raise ``EmbeddingError`` with clear operator-facing diagnostic — chunk 4b vision gate already prevents this state but runtime check is defense-in-depth (Wave 3 lesson #10 ship-incomplete-but-don't-silently-lie). * ``alt_text`` paired into LiteLLM input as a textual hint for embedders that accept multi-part inputs; image-only embedders silently drop the text element. The ``aembed_image`` async wrapper mirrors the ``aembed_query`` pattern (``asyncio.to_thread`` since the LiteLLM call is sync). Wave 5 P2 chunk-1 scope (this commit): * declares the API surface so Phase 2 chunks 2 (parser image extraction) + 3 (provider v3 capability flag UI) and the chunk 4b vision gate self-disable path have a concrete contract to wire to * uses imghdr-based MIME detection + LiteLLM's documented multimodal input shape; provider-specific input format variations defer to Wave 6 (per §K.10 Wave 6 backlog cross-cutting refactor) Local gates green: - ``ruff check ./aperag ./tests`` clean - ``ruff format --check ./aperag ./tests`` clean (510 files) - ``pytest`` 179 passed / 2 skipped — no regressions on existing EmbeddingService callers (text-only path unchanged) Production-readiness three-class layer: - must-be-real: real LiteLLM multimodal embedding call against the operator-configured provider when ``multimodal=True`` - may-be-gated: provider-specific input-format adjustments (Voyage vs Jina vs OpenAI multimodal differences) deferred to Wave 6 - fully-resolves: §G.2.5.1 item 1 (``embed_image`` API surface) + §K.9 Phase 2 partial — items 2+3 (parser image extraction + provider v3 capability flag UI) follow in subsequent commits Next Phase 2 chunks: - chunk 2: parser image extraction → ``derived/parse_<v>/vision/images/<image_id>.<ext>`` - chunk 3: provider v3 router multimodal capability flag UI exposure - chunk 4: ``worker_factory._build_vision_worker`` ``_embed`` callsite rewrite to use ``embed_image`` + chunk 4b vision gate self-disable verify (Phase 1 Layer 1 test rename) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ractor caps + timeout Per huangheng T1 obs A msg=6b349693: surface the LightRAG-style extractor's per-chunk caps and LLM timeout as collection-level overrides (`KnowledgeGraphConfig.{max_entities_per_chunk, max_relations_per_chunk, per_chunk_timeout_seconds}`) so deployments tuning a slow LLM provider or extracting from entity-dense documents can lift the defaults without patching `aperag/indexing/graph_extractor.py` constants. * `aperag/schema/common.py`: 3 new optional fields on KnowledgeGraphConfig. * `aperag/indexing/graph_extractor.py`: `_resolve_int_kg_config` / `_resolve_float_kg_config` helpers (mirror `_resolve_entity_types` pydantic-attr / Mapping / JSON-string tolerance pattern + reject non-positive / non-numeric values with warning + default fallback); `_extract_one_chunk` now takes `timeout_seconds` kwarg and `asyncio.wait_for` wires it instead of the global constant. * `tests/integration/test_graph_extractor.py`: 6 new unit tests pinning override-wins / fallback-on-missing / fallback-on-non-positive / int-coerced-to-float / fallback-on-garbage; fixed pre-existing `_extract_one_chunk` call to pass `timeout_seconds=60.0`. Defaults unchanged (32 / 32 / 60.0); collections that don't set the new fields keep current behavior. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… forward-compat + OTLP config cross-check + cleanup builder share Wave 5 Phase 5B (task #31) — 4 polish items from the Wave 4 ratify trail accumulated obs (chunk_id schema unify deferred to a follow-up once parser canonical schema lock lands): * **P5B-A utc_now → CURRENT_TIMESTAMP unify** (per huangheng chunk 4a obs A): ``_LineageEntityRow`` + ``_LineageRelationRow`` ``gmt_created`` / ``gmt_updated`` columns now use ``server_default=text("CURRENT_TIMESTAMP")`` mirroring the alembic migration declaration. Pre-Wave-5 ORM used ``default=utc_now`` Python-side; alembic check passed because Postgres treats both as semantic-equivalent but the per-mirror discipline is stronger when both layers speak the same dialect (schema-touching follow-ups cannot drift undetected). * **P5B-B tenant_scope_key org-prefix forward-compat** (per T3 chunk 2 obs C + §H.2 forward-compat lock): new ``aperag.indexing.parse_orchestrator.resolve_tenant_scope_key`` helper centralises the prefix construction. Pre-Wave-5 the prefix was hard-coded as ``f"user:{document.user}"`` at every callsite (upload handler, reconciler stuck-doc re-enqueue); the helper unifies the construction so a future Wave 6/7 organisation-tenant rollout flips one place. The org branch stays inert for Wave 5 (Document/Collection schemas don't carry org_id yet) — the helper just makes the seam explicit. * **P5B-C OTLP config cross-check** (per huangheng T6 chunk 1 obs): lifespan startup logs a clear warning when ``INDEXING_METRICS_EMITTER=otlp`` but ``APERAG_OBSERVABILITY_MODE`` is not set to ``otlp`` / ``collector``. Pre-Wave-5 this combination silently produced an ``OTLPMetricsEmitter`` whose ``MeterProvider`` was never installed by ``aperag.observability.metrics.init_metrics_provider`` — samples no-op'd silently, defeating the explicit ``INDEXING_METRICS_EMITTER=otlp`` opt-in. * **P5B-D cleanup builder share helper** (per huangheng T2 obs B drift risk): new ``_build_collection_qdrant_connector(collection, *, allow_vector_size_fallback)`` shared helper used by ``_build_vector_worker`` / ``_build_summary_worker`` (dispatch, ``allow_vector_size_fallback=False``) AND ``_build_qdrant_cleanup_backend`` (cleanup, ``allow_vector_size_fallback=True``). Pre-Wave-5 dispatch and cleanup paths duplicated the embedder + connector wiring; a future Qdrant adapter signature change had to be applied twice. ``_build_vision_worker`` keeps its pre-helper shape — its ``is_multimodal()`` gate must fire BEFORE any network call to preserve the Wave 4 chunk 4b "fail fast on non-multimodal embedder" invariant the existing test pins. Three-class tag (Wave 3 production-readiness invariant): * must-be-real: 4 polish items each tighten a real production seam (alembic mirror discipline / org-prefix forward-compat / OTLP config consistency / cleanup-vs-dispatch drift surface) * may-be-gated: org-prefix branch stays inert until org_id columns land (declared in helper docstring) * fully-resolves: 4 of the 5 P5B chenyexuan-batch backlog items. ``chunk_id`` schema unify is left to a follow-up — parser canonical schema lock first needs the §C.x amend; touching the ``or chunk.get("id")`` fallback now without that lock would introduce drift in the opposite direction. Deltas: * ``aperag/indexing/graph_storage/postgres.py`` — ``_LineageEntityRow`` + ``_LineageRelationRow`` switched to ``server_default=CURRENT_TIMESTAMP``; ``utc_now`` import dropped. * ``aperag/indexing/parse_orchestrator.py`` — ``resolve_tenant_scope_key(*, document, collection)`` helper + re-export. * ``aperag/domains/knowledge_base/service/document_service.py`` — ``_create_or_update_document_indexes`` calls ``resolve_tenant_scope_key`` in place of the inline ``f"user:{document.user}"``. * ``aperag/indexing/reconciler.py`` — ``_build_parse_payload_for_document`` calls the same helper for the stuck-document re-enqueue producer (consistency with upload handler). * ``aperag/app.py`` — lifespan logs the OTLP-vs-observability-mode cross-check warning before constructing ``OTLPMetricsEmitter``. * ``aperag/indexing/worker_factory.py`` — ``_build_collection_qdrant_connector`` shared helper + ``_build_vector_worker`` / ``_build_summary_worker`` / ``_build_qdrant_cleanup_backend`` reuse it; vision keeps its multimodal-gate-first shape. Local gates: * ``pytest tests/unit_test/indexing/ tests/integration/`` — 232 passed / 48 skipped (no regression). * ``ruff check aperag/ tests/integration/`` — clean. * ``ruff format`` — applied. Branch: ``chenyexuan/celery-wave5-p4`` rebased on ``bryce/celery-wave5`` HEAD ``4f0e9b0`` (P2 chunk 1 embed_image API surface) → push to ``bryce/celery-wave5`` for Wave 5 single-PR pile. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Per §K.9.1 P5A item 14 + design pack §K.9 Phase 5A item 4: deployments that share a Neo4j instance with user-owned graphs would collide on the generic `LineageEntity` / `LineageRelation` labels. Bump them to `aperag_LineageEntity` / `aperag_LineageRelation` and align constraint names (`aperag_lineage_entity_collection_name_unique` / `aperag_lineage_relation_collection_triple_unique`). Hard-cut second round (no data migration): Wave 4 graph indexing defaulted `enable_knowledge_graph=False`, so no production deployment has data on the legacy unprefixed labels. Postgres / Nebula backends are unaffected (table names already namespaced via `aperag_lineage_*` schema in alembic migration `e7a3b9c2d1f6`). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…h explicit reasoning Per Wave 5 P5A scope close-out 2026-04-27: * Item 2 (chunk 4b _no_op_extractor identity check → attribute marker) is N/A: the placeholder was deleted in Wave 4 T1 (`19d3d70f`) when `build_collection_graph_extractor` replaced `_no_op_extractor`. There is no surviving identity-check site to harden — moot in current code. * Item 3 (W5-perf-graph-lineage parallel-list O(N) cross-backend) is deferred to Wave 6: Postgres needs JSONB → text[] migration, Nebula needs tag re-model, plus alembic column-shape change. >10k docs/entity is not a Wave 5 acceptance criterion; trigger when observed in real deployments. * Item 5 (Cypher type keyword rename `n.type` → `entity_type` / `relation_type`) is deferred to Wave 6: cross-backend rename (Postgres column + alembic; Cypher property rename; Nebula tag-prop rename) plus §D.3 Protocol-surface change (`EntityRecord.type` → `EntityRecord.entity_type` etc.). Cypher `TYPE()` is technically only for relationships so current code is unambiguous — forward-compat hygiene rather than correctness fix. §K.9 Phase 5A acceptance lines amended to reflect ship status (items 1+4 shipped; 2 N/A; 3+5 → Wave 6). §K.10 Wave 6 backlog extended with items 6+7 covering the deferred cross-backend rewrites. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…se_<v>/vision/{images,source.jsonl} Per §G.2.5.1 spec amend item 2: when DocParser produces `AssetBinPart` payloads (PDF page images, single-image-input passthrough, data-URI extracted images), the parser writes each image blob to `derived/parse_<v>/vision/images/<image_id>.<ext>` and lands a `vision/source.jsonl` descriptor enumerating them. The vision worker (chunk 4) consumes the descriptor instead of the T1 simulator's synthetic `images.json` companion. `aperag/indexing/parser.py`: * `_docparser_extract_markdown` now returns `tuple[str, list[_VisionImageAsset]]` — markdown body unchanged plus the extracted image assets list. Non-image `AssetBinPart`s (audio / PDF data) drop. Duplicate `asset_id`s deduplicated to keep Qdrant-point-id stable. * `_VisionImageAsset` carrier holds (`image_id`, `data`, `mime_type`, `alt_text`, `page_idx`, `bbox`). * `_vision_image_extension(mime_type)` maps known image MIMEs to a filename extension; falls back to `.bin` for unknown types since vision worker uses `imghdr` on bytes at embed time. * `_write_vision_assets` persists each blob via `write_atomic` and writes the JSONL descriptor. * `ParseResult.vision_source_path` and `vision_image_count` are new optional fields (defaults `""` / `0`) so pre-Wave-5 callers see no behaviour change. Image-only inputs (no markdown emitted) still land their assets so vision modality has bytes to embed. `tests/unit_test/indexing/test_parser_image_extraction.py`: 9 unit tests covering MIME → extension mapping, descriptor schema, simulator-path no-op, image-only input asset persistence, unknown-MIME `.bin` fallback, and the new `ParseResult` fields. Tests monkeypatch `_docparser_extract_markdown` to avoid pulling MinerU / MarkItDown into the unit-test path. Production-readiness 三类: - must-be-real: real `AssetBinPart` extraction wired through DocParser - may-be-gated: descriptor + image artefacts only land when DocParser surfaces `AssetBinPart`s (image-less docs see no vision/ writes) - fully-resolves: §G.2.5.1 spec item 2 (parser image extraction) + Wave 5 P2 chunk 2 acceptance The vision worker callsite rewrite (chunk 4) and provider v3 multimodal flag UI (chunk 3) remain. Defaults preserve existing behaviour for text-only callers. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…apability flag Per §G.2.5.1 spec amend item 3: surface a typed capability flag for embedding models that accept image bytes (CLIP / Voyage Multimodal / Jina v3 / OpenAI multimodal embeddings) so operators can register them via the v3 model platform UI. The flag drives the chunk 4b vision gate's `EmbeddingService.is_multimodal()` runtime check — flip it on the collection's embedder spec model and the gate self-disables, enabling vision modality. Distinct from existing `supports_vision` (chat/completion models that accept image input) — `supports_multimodal_embedding` describes embedding models that produce vectors from images. Changes: * `aperag/domains/model_platform/schemas.py`: new optional `supports_multimodal_embedding: bool = False` on `Model` / `ModelCreate` / `ModelUpdate` (default False, no behaviour change for existing rows). * `aperag/domains/model_platform/db/models.py`: new `supports_multimodal_embedding` Boolean column with default False. * `aperag/migration/versions/...f1c8d2a5b6e3...`: alembic migration adding the column with `server_default=false`. Forward-only cutover safe because pre-Wave-5 rows default to False (matches prior implicit behaviour). * `aperag/domains/model_platform/service/model_service.py`: `_model_to_schema` maps the new column. * `aperag/db/repositories/llm_provider.py`: `create_model` accepts the new field so the v3 `POST /models` flow persists it. * `aperag/llm/embed/base_embedding.py`: `get_embedding_service` prefers the typed `supports_multimodal_embedding` column; falls back to the legacy `runner_config["multimodal"]` JSON entry so pre-Wave-5 operators who edited the JSON keep working without re-saving the row. Tests: 3 new unit tests in `test_model_platform_v3_contract.py` covering default-False behaviour, opt-in True path, and v3 OpenAPI schema exposure on Model / ModelCreate / ModelUpdate. Production-readiness 三类: - must-be-real: real DB column + alembic migration + v3 API surface - may-be-gated: legacy `runner_config["multimodal"]` fallback for rows created before the typed column landed - fully-resolves: §G.2.5.1 spec item 3 (provider v3 multimodal capability flag UI) + Wave 5 P2 chunk 3 acceptance Phase 2 chunk 4 (callsite rewrite + chunk 4b gate self-disable verify + Layer 1/2 test rename) remains as the final Wave 5 blocker. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…te self-disable verify Per §G.2.5.1 spec amend final piece: rewire `_build_vision_worker._embed` to call `EmbeddingService.embed_image(image_bytes, alt_text)` (chunk 1) with the actual image bytes the parser persisted (chunk 2 wrote `derived/parse_<v>/vision/images/<image_id>.<ext>` + a JSONL descriptor at `vision/source.jsonl`). The chunk 4b vision gate self-disables when the operator flips `Model.supports_multimodal_embedding=True` (chunk 3). `aperag/indexing/worker_factory.py`: * `_embed(image_id, alt_text, image_bytes=None)` — when image_bytes is provided, route to `embedding_service.embed_image(image_bytes, alt_text)`. None falls back to the legacy text-concat path so the T1 simulator + tests that hand the worker synthetic JSON keep working. * Gate-raise message reframed: drops "Wave 4 wiring" phrasing (now Wave 5 wiring is land), names the typed `Model.supports_multimodal_embedding` flag so an operator can fix the config directly. `aperag/indexing/vision.py`: * `VisionModality.derive` accepts both source-path formats: the legacy single-JSON-array shape (T1 simulator / pre-Wave-5 tests) AND the new JSONL-with-image-path shape (parser chunk 2 output). Format detection is by first non-whitespace byte (`[` → JSON array, else → JSONL). * `_load_image_bytes(record)` reads the descriptor's `image_path` through the object store; missing blob logs a warning and returns None (embedder still runs on the alt-text/id placeholder digest) so a partial parser write doesn't block the whole derive cycle. * `_placeholder_embedding(..., image_bytes=None)` mirrors the new embedder signature; placeholder ignores bytes. * Embedder Protocol is widened: `(image_id, alt_text, image_bytes=None)`. `tests/integration/test_full_indexing_pipeline.py`: * Renamed Layer 1 `test_phase1_vision_modality_raises_wave4_wiring_gate` → `..._gate_raises_when_embedder_not_multimodal`. Asserts the reframed message names `multimodal embedding model` + `supports_multimodal_embedding` flag. * New positive-path Layer 1 `test_phase1_vision_modality_gate_self_disables_when_embedder_multimodal`: with `is_multimodal()=True`, the factory builds a vision worker without raising — pins the chunk 4 gate-self-disable contract. * Layer 2 e2e assertion: vision modality may be ACTIVE (when CI fixture has multimodal embedder configured) OR FAILED with a gate marker including `supports_multimodal_embedding`. OR-on-marker tolerance kept for transition state. `tests/unit_test/indexing/test_t1_4_summary_vision.py`: 3 new tests covering JSONL descriptor with image_path (bytes loaded + forwarded), missing-blob graceful fallback, and legacy JSON-array backward compat. Production-readiness 三类: - must-be-real: real `embed_image` callsite + real bytes load from parser's descriptor - may-be-gated: legacy text-concat fallback + simulator JSON format preserved for tests - fully-resolves: §G.2.5.1 spec items 1+2+3 all wired end-to-end + chunk 4b vision gate self-disable verify (Wave 5 P2 closure) Wave 5 P2 (T7 multimodal vision-LLM 3-item bundle) closed. The chunk 4b vision gate self-disables when an operator configures a multimodal embedder; default-off behaviour preserved for text-only collections. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…cation cache (#1735) Wave 5 P2 chunk 4 (#1733) landed the canonical multimodal embedding API surface (`EmbeddingService.embed_image(image_bytes, alt_text)`) but left the call un-cached — every vision modality embed now hits the LiteLLM provider, even for the same image. This wires it into the canonical `aperag.cache.NAMESPACE_EMBEDDING` infra (PR #1734) mirroring the existing text `_embed_batch` pattern. Cache key shape (per `aperag/cache/README.md` no-raw-bytes policy): { "kind": "image", "provider": ..., "model": ..., "api_base": ..., "api_key_hash": sha256(api_key), "file_hash": sha256(image_bytes), "alt_text": ..., "multimodal": True, } Image bytes are identified by their sha256 hex digest so the Redis key stays bounded; alt_text is part of the key because providers that accept paired text+image inputs return a different vector when the textual hint changes (alt_text="" collapses to one key for image-only callers). Tests ----- New `tests/unit_test/llm/test_embed_image_cache.py` (7 tests): * identical (bytes, alt_text) → second call hits cache (no upstream) * same bytes + different alt_text → distinct keys, both compute * different bytes + same alt_text → distinct keys, both compute * key shape uses sha256 file_hash, raw bytes never appear in key * `caching=False` bypasses cache (always upstream) * `multimodal=False` raises EmbeddingError (defense-in-depth) * empty image_bytes raises EmptyTextError Full unit suite: 1022 passed, 29 skipped, ruff + format clean. Out of scope (per task #37 boundary) ------------------------------------ Provider-specific multimodal embedder format variations (Voyage / Jina v3 / OpenAI multimodal SDK input shapes) stay on task #39 per PM dispatch + simple-stable directive (`feedback_simple_stable_zero _maintenance.md`).

earayu mentioned this pull request Apr 27, 2026

Refactor cache for application-level expensive calls #1734

Merged

Bryce and others added 13 commits April 27, 2026 19:01

earayu force-pushed the bryce/celery-wave5 branch from fdad09e to 8e7c6ec Compare April 27, 2026 11:01

earayu marked this pull request as ready for review April 27, 2026 11:08

earayu merged commit 7c9e4c3 into main Apr 27, 2026
7 checks passed

earayu deleted the bryce/celery-wave5 branch April 27, 2026 11:08

earayu mentioned this pull request Apr 27, 2026

docs(celery Wave 6 §K.11): full spec amendment — 7 items + 2-PR strategy + double pre-check lock #1736

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(celery Wave 5): single-PR close-out 16 backlog items (5-phase chunked-rotation)#1733

feat(celery Wave 5): single-PR close-out 16 backlog items (5-phase chunked-rotation)#1733
earayu merged 13 commits into
mainfrom
bryce/celery-wave5

earayu commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

earayu commented Apr 27, 2026

Summary

Phases

Tasks

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant