Commit 0711647
* feat(celery Wave 6 #34): chunk_id schema canonical unification
Per docs/modularization/indexing-redesign-design-pack.md §K.11.1 #34
(huangheng T1 obs B + Wave 5 P5B chunk_id 5th item deferred).
## Changes
aperag/indexing/vector.py
- VectorBackend.upsert_point(chunk_id=) → upsert_point(point_id=)
(canonical Qdrant naming, aligned with summary/vision protocols)
- InMemoryVectorBackend record key renamed chunk_id → point_id
- Vector worker callsite passes point_id=chunk["chunk_id"]; chunk_id
remains in payload for hybrid-dedup with fulltext (§C.6 trade-off lock)
aperag/indexing/worker_factory.py
- _QdrantPointBackend.upsert_point(): single point_id keyword (drop
pre-Wave-6 dual chunk_id|point_id transition shim)
- Drop merged_payload.setdefault("chunk_id", identifier) — adapter no
longer injects misleading chunk_id into summary/vision payloads
(their points are not chunks); each modality controls its payload
aperag/indexing/reconciler.py
- utc_now audit comment: utc_now usage in reconciler is
application-level wall-clock for lease comparison + gmt_updated
stamps, distinct from Wave 5 P5B server_default=CURRENT_TIMESTAMP
ORM-creation defaults. No further migration needed.
tests/unit_test/indexing/test_chunk_id_schema_canonical.py (new)
- 7 contract tests pin canonical naming:
* VectorBackend / SummaryBackend / VisionBackend Protocols use point_id
* InMemoryVectorBackend round-trips point_id at record level + chunk_id at payload level
* Legacy chunk_id= keyword raises TypeError on InMemoryVectorBackend
* _QdrantPointBackend.upsert_point uses single point_id param
* _QdrantPointBackend does not inject chunk_id into payload
* Parser chunks.jsonl chunk_id field naming preserved
tests/integration/test_cleanup_fan_out.py
tests/unit_test/indexing/test_t1_3_vector_fulltext.py
tests/unit_test/indexing/test_t2_1_runtime.py
tests/unit_test/indexing/test_t3_1_dispatcher_path_c.py
- backend.upsert_point(chunk_id=...) → point_id=...
- record reads p["chunk_id"] → p["point_id"] / p["payload"]["chunk_id"]
## Production-readiness 三类 layer (per §K.11.4)
- must-be-real: parser layer chunk_id field schema unified across
vector/summary/vision backend protocol surfaces; remaining utc_now
usage audited (reconciler.py only, application-level, distinct from
ORM defaults Wave 5 P5B already migrated)
- may-be-gated: vector worker still keeps chunk_id in payload for
hybrid-dedup with fulltext (§C.6 contract preserved)
- fully-resolves: huangheng T1 obs B + Wave 5 P5B chunk_id 5th item
deferred (per spec §K.11.1 row 34)
## simple-stable 4 guardrail (per feedback_simple_stable_zero_maintenance.md)
1. 不无限扩范围 ✅ — scope limited to vector protocol + adapter, no
cross-cutting touch
2. 功能做实 ✅ — real schema rename, no transitional shim left behind
3. 简单稳定 ✅ — drops dual-API polymorphism + drops hidden payload
side-effect (setdefault chunk_id), each layer's contract is now
self-evident from signature
4. 私有化免维护 ✅ — operator/dev reading code sees canonical naming
without needing to track which arg name belongs to which modality
## hard-cut policy (per earayu2 msg=30c81478)
No production data → schema breaking change applied directly. No
backward-compat shim, no deprecation window. Test that the legacy
chunk_id= keyword raises TypeError catches accidental regression.
## Pre-check (per K.11.5 Pattern 2)
grep upsert_point across aperag/indexing/ + tests/ — all callsites
audited, all callers migrated, contract test pins canonical name.
## Local gates
- 152/152 indexing unit tests pass
- 21/21 indexing integration tests pass (cleanup_fan_out +
dispatch_with_parse + inline_mode_smoke + parse_async_roundtrip)
- 7/7 new contract tests pass
- ruff check + ruff format clean
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(celery Wave 6 #38): application cache cross-loop lazy-rebuild + WARN log
Per docs/modularization/indexing-redesign-design-pack.md §K.11.1 #38
(huangheng PR #1734 sync point 7 + feedback_announce_equals_landed.md
narrative-truth invariant).
## Problem (pre-Wave-6 narrative-truth violation)
`aperag.cache.application_runtime.get_application_cache()` keyed the
cached `ApplicationCache` instance on the running asyncio event loop
(the Redis async client is loop-bound). When the running loop
changed (test process restart, worker reinit, asyncio
re-initialisation, etc.), the function silently swapped the cache
for a `NoopApplicationCacheBackend` with `enabled=False`. From that
point on the worker paid the full LiteLLM / embedding round-trip on
every request, with no log line, no metric, and no operator-visible
signal that caching had degraded to zero.
## Fix
When the loop-switch branch fires:
- emit a WARN log with operator-actionable phrasing
- bump
`application_cache_metrics["application_runtime"]["loop_switch_rebuild"]`
(uses the existing ApplicationCacheMetrics instance — no new
metrics infra)
- reset `_async_cache=None` and fall through to the normal
initialisation path (which re-establishes the real Redis client on
the new loop)
The Noop fallback is now reached only when Redis is genuinely
unreachable on the new loop — never as a silent loop-switch
downgrade.
## Tests (4 new)
tests/unit_test/cache/test_application_cache.py
- test_application_cache_rebuilds_on_loop_switch_instead_of_silent_noop:
two `_run_in_new_loop()` calls produce distinct ApplicationCache
instances, second wraps real `ApplicationRedisCacheBackend` (not
`NoopApplicationCacheBackend`); metric counter incremented to 1;
WARN log captured.
- test_application_cache_does_not_increment_loop_switch_metric_on_first_build:
first-ever build does not bump the loop-switch counter.
- test_application_cache_repeat_call_in_same_loop_is_singleton_no_metric_bump:
same-loop repeat returns identical singleton, no metric bump.
- test_application_cache_falls_back_to_noop_when_redis_unreachable_on_new_loop:
if Redis genuinely fails on the new loop, rebuild attempts the
real client, falls back to Noop, but metric still records the
loop switch (operators see the rebuild attempt + the Redis
failure WARN that was already there).
## Production-readiness 三类 layer (per §K.11.4)
- must-be-real: real WARN log + real metric counter + real Redis
client rebuild on cross-loop call (no silent zero-cache)
- may-be-gated: first cross-loop request may pay rebuild cost
(re-ping Redis) — observable via metric, not silent
- fully-resolves: huangheng PR #1734 sync point 7 +
feedback_announce_equals_landed.md narrative-truth lock
## simple-stable 4 guardrail
1. 不无限扩范围 ✅ — single-function fix in
application_runtime.py, reuses existing
`application_cache_metrics` infra, no new public API
2. 功能做实 ✅ — real cache rebuild not a placeholder
3. 简单稳定 ✅ — fewer paths than before (one rebuild path replaces
the silent-Noop branch), each branch logged
4. 私有化免维护 ✅ — operator can grep
`loop_switch_rebuild` in metrics + log "rebuilding for new event
loop" to diagnose worker setup issues
## Local gates
- 43/43 cache unit tests pass (39 existing + 4 new)
- ruff check + ruff format clean
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 0edc82a commit 0711647
10 files changed
Lines changed: 452 additions & 55 deletions
File tree
- aperag
- cache
- indexing
- tests
- integration
- unit_test
- cache
- indexing
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
46 | 66 | | |
47 | 67 | | |
48 | 68 | | |
49 | 69 | | |
50 | 70 | | |
51 | | - | |
52 | | - | |
53 | | - | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
54 | 75 | | |
55 | | - | |
56 | | - | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
57 | 79 | | |
58 | 80 | | |
59 | 81 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
325 | 325 | | |
326 | 326 | | |
327 | 327 | | |
328 | | - | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
329 | 341 | | |
330 | 342 | | |
331 | 343 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
78 | | - | |
| 78 | + | |
79 | 79 | | |
80 | 80 | | |
81 | 81 | | |
82 | | - | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
83 | 90 | | |
84 | 91 | | |
85 | 92 | | |
86 | 93 | | |
87 | 94 | | |
88 | | - | |
| 95 | + | |
89 | 96 | | |
90 | 97 | | |
91 | 98 | | |
| |||
95 | 102 | | |
96 | 103 | | |
97 | 104 | | |
98 | | - | |
99 | | - | |
| 105 | + | |
| 106 | + | |
100 | 107 | | |
101 | | - | |
| 108 | + | |
102 | 109 | | |
103 | 110 | | |
104 | 111 | | |
105 | 112 | | |
106 | 113 | | |
107 | 114 | | |
108 | | - | |
| 115 | + | |
109 | 116 | | |
110 | 117 | | |
111 | 118 | | |
112 | | - | |
113 | | - | |
| 119 | + | |
| 120 | + | |
114 | 121 | | |
115 | 122 | | |
116 | 123 | | |
| |||
126 | 133 | | |
127 | 134 | | |
128 | 135 | | |
129 | | - | |
| 136 | + | |
130 | 137 | | |
131 | 138 | | |
132 | | - | |
| 139 | + | |
133 | 140 | | |
134 | 141 | | |
135 | 142 | | |
| |||
229 | 236 | | |
230 | 237 | | |
231 | 238 | | |
232 | | - | |
| 239 | + | |
233 | 240 | | |
234 | 241 | | |
235 | 242 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
94 | 94 | | |
95 | 95 | | |
96 | 96 | | |
97 | | - | |
98 | | - | |
99 | | - | |
100 | | - | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
101 | 107 | | |
102 | 108 | | |
103 | 109 | | |
| |||
120 | 126 | | |
121 | 127 | | |
122 | 128 | | |
123 | | - | |
124 | | - | |
| 129 | + | |
125 | 130 | | |
126 | 131 | | |
127 | 132 | | |
128 | | - | |
129 | | - | |
130 | | - | |
131 | | - | |
132 | 133 | | |
133 | 134 | | |
134 | 135 | | |
135 | 136 | | |
136 | 137 | | |
137 | 138 | | |
138 | | - | |
139 | | - | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
140 | 144 | | |
141 | 145 | | |
142 | 146 | | |
143 | 147 | | |
144 | | - | |
145 | | - | |
146 | | - | |
147 | | - | |
148 | | - | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
153 | | - | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
154 | 151 | | |
155 | 152 | | |
156 | 153 | | |
157 | 154 | | |
158 | 155 | | |
159 | | - | |
| 156 | + | |
160 | 157 | | |
161 | 158 | | |
162 | 159 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
136 | 136 | | |
137 | 137 | | |
138 | 138 | | |
139 | | - | |
| 139 | + | |
140 | 140 | | |
141 | 141 | | |
142 | 142 | | |
| |||
150 | 150 | | |
151 | 151 | | |
152 | 152 | | |
153 | | - | |
| 153 | + | |
154 | 154 | | |
155 | 155 | | |
156 | 156 | | |
| |||
245 | 245 | | |
246 | 246 | | |
247 | 247 | | |
248 | | - | |
| 248 | + | |
249 | 249 | | |
250 | 250 | | |
251 | 251 | | |
| |||
360 | 360 | | |
361 | 361 | | |
362 | 362 | | |
363 | | - | |
| 363 | + | |
364 | 364 | | |
365 | 365 | | |
366 | 366 | | |
| |||
415 | 415 | | |
416 | 416 | | |
417 | 417 | | |
418 | | - | |
| 418 | + | |
419 | 419 | | |
420 | 420 | | |
421 | 421 | | |
| |||
489 | 489 | | |
490 | 490 | | |
491 | 491 | | |
492 | | - | |
| 492 | + | |
493 | 493 | | |
494 | 494 | | |
495 | 495 | | |
| |||
0 commit comments