@@ -394,17 +394,323 @@ podDisruptionBudget:
394394
395395---
396396
397- # # 10. 待 @ziang 补充章节
397+ # # 9. K8s 生产部署参数(per earayu2 msg=caf5c760 / msg=4e9c909c — K8s 走 prod, docker-compose 仅 e2e)
398398
399- - §11 读路径 + GraphVectors/chunks.jsonl 复用边界(重复 IO / 重复派生 / cache key 失效)
400- - §12 cleanup / deletion / reconciler(批删 SQL / 失败重试 / stale reclaim)
401- - §13 长文档/大量文档端到端瓶颈排序(parser / artifact / DB / queue / LLM / graph store 分层归因)
402- - §14 联合验收 checklist(合并 §6 实施切片 + ziang 补充项)
399+ # ## 9.1 ApeRAG 部署形态确认
400+
401+ | 部署目标 | 用途 | 当前 default |
402+ |---|---|---|
403+ | **Kubernetes (KubeBlocks for DB)** | **生产** | Helm chart `deploy/aperag/` |
404+ | docker-compose | e2e / 本地开发 / SOHO 单机 | `docker-compose.yml` |
405+
406+ K8s prod 的所有 perf-relevant 参数都在 `deploy/aperag/values.yaml`,本节按层列出 default + tier 1/2/3 推荐。
407+
408+ # ## 9.2 资源 requests/limits
409+
410+ | 组件 | 当前 | Tier 2 (中型 1K-10K docs) | Tier 3 (大型 10K+ / 长文) |
411+ |---|---|---|---|
412+ | `api` | `resources : {}` (Best-Effort QoS) | `requests: cpu=500m mem=1Gi` `limits: cpu=2 mem=4Gi` | `requests: cpu=1 mem=2Gi` `limits: cpu=4 mem=8Gi` |
413+ | `indexingWorker` | `resources : {}` | `requests: cpu=2 mem=4Gi` `limits: cpu=4 mem=8Gi` | `requests: cpu=4 mem=8Gi` `limits: cpu=8 mem=16Gi` |
414+ | `frontend` | `resources : {}` | `requests: cpu=100m mem=256Mi` `limits: cpu=500m mem=512Mi` | 同 Tier 2 |
415+
416+ **P0-Helm-1**:values.yaml 增加 `resources.requests` 默认(Tier 2 sane defaults)。理由:
417+ - `resources : {}` 在 K8s 是 BestEffort QoS,OOM 时第一个被 kill,且没有 CPU 保留;生产环境直接死。
418+ - 默认值不写死成 limits(避免 throttle 引起神奇 latency),只写 requests 保证调度命中合适节点。
419+
420+ # ## 9.3 副本数 + HPA
421+
422+ **当前**:`api.replicaCount=1`, `indexingWorker.replicaCount=1`,没有 HPA。
423+
424+ **P1-Helm-2** HPA 模板(per `q:parse` + `q:indexing:*` queue depth via KEDA):
425+
426+ ` ` ` yaml
427+ # values.yaml 新增
428+ hpa:
429+ api:
430+ enabled: false # default off — 用户决定是否打开
431+ minReplicas: 2
432+ maxReplicas: 10
433+ targetCPUUtilizationPercentage: 70
434+ indexingWorker:
435+ enabled: false
436+ minReplicas: 2
437+ maxReplicas: 8
438+ # KEDA-based scaling on Redis queue depth
439+ keda:
440+ enabled: false
441+ triggers:
442+ - type: redis
443+ metadata:
444+ address: "{redis-host}:6379"
445+ listName: "q:parse"
446+ listLength: "10" # scale up if >10 items pending
447+ - type: redis
448+ metadata:
449+ listName: "q:indexing:vector"
450+ listLength: "100"
451+ ` ` `
452+
453+ **leader-election 边界**(开多 replica 时必须确认):
454+ - ` run_reconcile_loop` 在每 pod 都跑会重复扫表(每 30s 都扫 PENDING + FAILED + RUNNING + stuck + collection_regen + graph_vectors_enqueue)
455+ - ` run_cleanup_loop` 在每 pod 都跑会重复扫 orphan
456+ - 短期:`indexingWorker.replicaCount=1` 时无问题;扩 replica 前必须加 leader-election(Redis Lua SETNX + lease),否则 reconciler 会重复 push(虽然 idempotent,但浪费 Redis IOPS + DB 扫描成本)
457+
458+ **P1-Helm-3** Leader-election 简易实现:
459+ - 启动时每 pod 用 `redis SETNX indexing:leader:<lane> <pod_name> EX 60`,赢的 pod 跑 reconciler / cleanup
460+ - 每 20s renew lease(`SET ... XX EX 60`);丢失 lease 立即停 reconciler / cleanup loop
461+ - worker lane(vector/fulltext/graph_*/summary/vision/parse/graph_curation)多 replica 安全(Redis BLPOP 互斥)— 不需要 leader-election
462+
463+ # ## 9.4 PVC / 持久化
464+
465+ | 卷 | 用途 | Tier 2 | Tier 3 |
466+ |---|---|---|---|
467+ | `api-data` (`/data/aperag`) | API 临时缓存 | 10 GiB | 50 GiB |
468+ | `indexingWorker-objects` (`/data/objects`) | parser 派生 artifact + 用户上传源 | 100 GiB | 1 TiB(或切 S3/MinIO)|
469+ | postgres-data (KubeBlocks 管) | 主 DB + DocumentIndex + pgvector | 50 GiB | 200 GiB |
470+ | qdrant-data (KubeBlocks 管) | 向量库 | 50 GiB | 500 GiB |
471+ | es-data (KubeBlocks 管) | 全文索引 | 30 GiB | 200 GiB |
472+ | redis-data (KubeBlocks 管) | 队列 + 缓存 + quota | 5 GiB | 20 GiB |
473+
474+ **P1-Helm-4**:`OBJECT_STORE_TYPE=local` 在 multi-replica 部署下不能用(每 pod 独立盘看不到对方写的 artifact)。Tier 2+ 必须切 `OBJECT_STORE_TYPE=s3`(含 MinIO)。values.yaml 加 enforcement:`indexingWorker.replicaCount > 1 && OBJECT_STORE_TYPE == "local"` → 启动 Helm template error。
475+
476+ # ## 9.5 PodDisruptionBudget + 滚动升级
477+
478+ **P2-Helm-5**:
479+ ` ` ` yaml
480+ podDisruptionBudget:
481+ api:
482+ enabled: true
483+ minAvailable: 1
484+ indexingWorker:
485+ enabled: true
486+ minAvailable: 1
487+ ` ` `
488+
489+ `indexing-worker-deployment.yaml` 已经用 `livenessProbe.exec : pgrep -f aperag.cli.indexing_worker` + 25s graceful drain(per task # 17 cuiwenbo msg=f7868d2c),rolling update 时不会丢 in-flight task。
490+
491+ # ## 9.6 监控 + 告警(K8s prod 必备)
492+
493+ - ` process_resident_memory_bytes` per-pod,p95 报警 > 80% requests
494+ - ` q:parse` / `q:indexing:*` queue depth(KEDA / Prometheus)
495+ - ` document_index` 表 size + dead tuple ratio
496+ - ` pg_stat_activity` 监控 worker DB 连接占用 vs pool budget
497+ - pgvector / Qdrant write latency p99
498+ - 每个 modality `derive` + `sync` duration p50/p95/p99(已有 OTLP 但默认 emitter=noop,prod 必须切 otlp)
499+
500+ ---
501+
502+ # # 10. PG + KubeBlocks + pgbouncer(per earayu2 msg=e6e4d366 / msg=2f9b062f)
503+
504+ # ## 10.1 当前现况
505+
506+ - ` deploy/databases/postgresql/values.yaml` 走 KubeBlocks 部署 PG cluster(`pg-cluster-postgresql-postgresql` SVC)
507+ - 没有 pgbouncer(ApeRAG 直连 PG)
508+ - API pod `dbPoolSize=5/dbMaxOverflow=5`, indexingWorker `dbPoolSize=10/dbMaxOverflow=10`
509+ - pool 公式手算(values.yaml L324-333):`sum(replicas * (pool+overflow)) + surge + reserved < max_connections * 0.7`
510+
511+ # ## 10.2 引入 pgbouncer 的收益
512+
513+ 1. **副本扩展不再卡 max_connections**:pgbouncer 把 `client_conn` 复用到固定 `pool_size` 个 server connection,PG max_connections 100 时 ApeRAG 端可以挂 200-500 client。
514+ 2. **冷启动 / 拥塞场景更稳**:rolling update 期间多 replica 同时建连不会瞬时打爆 PG。
515+ 3. **PG max_connections 不再频繁调高**(避免 PG memory overhead — 每连接 ~10 MB)。
516+
517+ # ## 10.3 关键决策:pooling mode
518+
519+ | 模式 | 兼容性 | 说明 |
520+ |---|---|---|
521+ | `session pooling` | 100% 兼容 ApeRAG | 每个 client 独占 1 个 server 连接直到 disconnect — 跟没装 pgbouncer 差不多,**不推荐** |
522+ | `transaction pooling`(earayu2 directive) | **需要逐项验证** | 每个 transaction 独占 server 连接 — pool_size 可以远小于 client_conn |
523+ | `statement pooling` | ApeRAG 不兼容 | 跨 statement 不保证同一个 server — 破坏 session 状态 |
524+
525+ **transaction pooling 兼容性 audit checklist**(必须在 PR 前 ✅):
526+
527+ - [ ] **prepared statements**:transaction pooling 不支持跨 transaction 复用 prepared statement。SQLAlchemy 默认不用 server-side prepared,但 asyncpg dialect 有时会启用 → 必须 grep 确认 `prepare_threshold` / `statement_cache_size=0`
528+ - [ ] **`SET LOCAL`**:transaction-scoped,安全。Grep `session.execute("SET LOCAL ...")` 确认所有调用都包在 begin block。
529+ - [ ] **`SET` (session-scoped)**:会跨 transaction 漏到下个 client,必须用 `SET LOCAL`。Grep `session.execute("SET ...")` 看有无 non-LOCAL 用法。
530+ - [ ] **temporary tables**:跨 transaction 不可见,ApeRAG 没用,✅
531+ - [ ] **listen/notify**:跨 transaction 不可见,ApeRAG 没用,✅
532+ - [ ] **advisory locks**:`pg_advisory_lock(...)` 是 session-scoped,会泄漏;必须用 `pg_advisory_xact_lock(...)`(transaction-scoped)。Grep ApeRAG 代码没找到 advisory lock 用法,✅
533+ - [ ] **`reset` 行为**:`pgbouncer.ini` 设 `server_reset_query = DISCARD ALL` 兜底(默认就是这个)
534+
535+ # ## 10.4 推荐 pgbouncer 参数
536+
537+ ` ` ` ini
538+ # pgbouncer.ini — 中型私有化(Tier 2)
539+ [databases]
540+ aperag = host=pg-cluster-postgresql-postgresql port=5432 dbname=postgres
541+
542+ [pgbouncer]
543+ pool_mode = transaction
544+ listen_port = 6432
545+ max_client_conn = 500 # ApeRAG 端可以挂 500 client
546+ default_pool_size = 25 # 每 db 默认 25 server connection
547+ reserve_pool_size = 5 # 拥塞时额外 5 个应急
548+ reserve_pool_timeout = 3 # 等 3s 拿不到才走 reserve
549+ server_idle_timeout = 600 # 10 min 空闲就关,让 PG 内存稳定
550+ server_reset_query = DISCARD ALL
551+ ignore_startup_parameters = extra_float_digits,application_name
552+ log_connections = 0 # prod 关掉减压
553+ log_disconnections = 0
554+ ` ` `
555+
556+ PG 端配套:
557+
558+ ` ` ` yaml
559+ # KubeBlocks PG cluster values
560+ postgresql:
561+ parameters:
562+ max_connections: "100" # 内 reserve 给 pgbouncer (25*N + 10 maintenance)
563+ shared_buffers: "2GB" # 25% of 8GB request
564+ work_mem: "16MB" # vector / graph 复杂查询
565+ maintenance_work_mem: "256MB"
566+ max_wal_size: "2GB"
567+ ` ` `
568+
569+ # ## 10.5 ApeRAG 侧改造
570+
571+ ` ` ` yaml
572+ # values.yaml
573+ api:
574+ dbPoolSize: "20" # 5 → 20(pgbouncer 拿一个,PG 端不感)
575+ dbMaxOverflow: "10"
576+ indexingWorker:
577+ dbPoolSize: "30" # 10 → 30
578+ dbMaxOverflow: "10"
579+ postgres:
580+ POSTGRES_HOST: "pgbouncer-svc" # 走 pgbouncer,不直连 pg-cluster
581+ POSTGRES_PORT: "6432" # pgbouncer 端口
582+ ` ` `
583+
584+ ApeRAG 应用层 `pool_size=20+10=30`,4 个 replica = 120 client 连 pgbouncer,pgbouncer pool_size=25 server connection 接 PG。PG max_connections=100,剩余 75 给 KubeBlocks 维护 + 备份 + 监控 + 应急。
585+
586+ **P1-Helm-6** Helm 模板:
587+
588+ 1. 增加 `deploy/databases/pgbouncer/values.yaml`(KubeBlocks pgbouncer addon)
589+ 2. `deploy/aperag/values.yaml` 加 `postgres.via_pgbouncer : true` 默认 enable
590+ 3. `aperag-secret.yaml` 改用 pgbouncer SVC 注入 `DATABASE_URL`
591+ 4. 启动 self-check:`SHOW pool_mode` if 走 pgbouncer,必须 == `transaction`
592+
593+ # ## 10.6 验证
594+
595+ - 单元测试:mock pgbouncer `pool_mode=transaction`,跑全部 `tests/db/` 用例(特别是 `test_collection_regen_lease.py` advisory lock 类)
596+ - 压测:4 replica × `dbPoolSize=30` 同时启动 → pgbouncer 不应该 client_conn 撑爆
597+ - 回归:task # 61 P2-S2 N-seed PG connection saturation 复测(已经 ship 的 fix),确认 pgbouncer 引入后还是稳定
598+
599+ ---
600+
601+ # # 11. 可配化清单(admin UI 接入建议,per earayu2 msg=99c1d23a)
602+
603+ # ## 11.1 现况
604+
605+ ` /admin/configuration` page (`web/src/app/admin/configuration/page.tsx`) 已有:
606+ - ` ParserSettings` :`use_markitdown` / `use_mineru` / `mineru_api_token`
607+ - ` QuotaSettings` :每用户 / 每 collection 配额
608+
609+ settings 持久化走 `aperag/domains/governance/service/setting_service.py`(DB key-value 表),单条 update 用 `update_setting(key, value)`。
610+
611+ # ## 11.2 审计找到的新参数 — 建议接入 admin UI
612+
613+ # ### 类 A:runtime perf 类(**强烈建议接入** — 非 ops,运维 + 开发都关心)
614+
615+ | 参数 | 当前 default | 建议 admin UI 范围 | 卡片归属 |
616+ |---|---|---|---|
617+ | `embedding_max_chunks_in_batch` | 10 | 8-128 | **新建 IndexingSettings 卡片** |
618+ | `embedding_max_workers` | 1 | 1-8 | IndexingSettings |
619+ | `indexing_vector_concurrency` | 16 | 4-64 | IndexingSettings |
620+ | `indexing_fulltext_concurrency` | 32 | 8-128 | IndexingSettings |
621+ | `indexing_graph_facts_concurrency` | 4 | 1-16 | IndexingSettings |
622+ | `indexing_graph_vectors_concurrency` | 4 | 1-16 | IndexingSettings |
623+ | `indexing_summary_concurrency` | 4 | 1-16 | IndexingSettings |
624+ | `indexing_vision_concurrency` | 4 | 1-16 | IndexingSettings |
625+ | `indexing_parse_concurrency` | 8 | 1-32 | IndexingSettings |
626+ | `indexing_reconcile_interval_seconds` | 30 | 10-300 | IndexingSettings(高级)|
627+ | `indexing_reconcile_batch_size` | 100 | 10-1000 | IndexingSettings(高级)|
628+ | `indexing_cleanup_interval_seconds` | 300 | 60-3600 | IndexingSettings(高级)|
629+ | `chunk_size` | 400 | 100-2000 | ParserSettings(已有卡,加新字段)|
630+ | `chunk_overlap_size` | 20 | 0-200 | ParserSettings |
631+
632+ # ### 类 B:collection-level(**已经在 collection config**,admin UI 是修改 default)
633+
634+ | 参数 | 用途 | admin UI 卡片 |
635+ |---|---|---|
636+ | `enable_vector` / `enable_fulltext` / `enable_knowledge_graph` / `enable_summary` / `enable_vision` | 5 lane on/off | 已在 collection 创建页;admin 设全局 default |
637+ | `graph_extraction_window_size` | graph chunk window | 已在 collection.config;admin UI 没必要重复 |
638+ | `graph_extraction_llm_concurrency` | graph LLM 并发(建议 P2-7 抽出) | 同上 |
639+
640+ # ### 类 C:infra ops 类(**不接入 admin UI** — 走 Helm/env,避免运行时改动 K8s 配置)
641+
642+ | 参数 | 理由 |
643+ |---|---|
644+ | `db_pool_size` / `db_max_overflow` / `db_pool_timeout` | 改了要重启进程,是部署期参数,不是运行期参数 |
645+ | `indexing_queue_redis_url` / `indexing_quota_redis_url` | 部署期参数 |
646+ | `pgbouncer pool_size / max_client_conn / server_idle_timeout` | pgbouncer 自己的配置,跟 ApeRAG 无关 |
647+ | `K8s resources.requests/limits` | Helm 改完滚动升级 |
648+ | `HPA min/max replicas` | Helm 改完滚动升级 |
649+ | `qdrant_quantization_*` / `qdrant_hnsw_on_disk` | Qdrant collection 创建参数,改了要重建集合 |
650+ | `pgvector_hnsw_m / ef_construction` | 同上 |
651+ | `MAX_DOCUMENT_SIZE` | 上传限制,admin UI 已有 quota 卡,应放 quota 卡而非 indexing 卡 |
652+
653+ # ## 11.3 推荐 UI 落地
654+
655+ **P2-Admin-1** 新建 `IndexingSettings` 卡片(`web/src/app/admin/configuration/indexing-settings.tsx`):
656+
657+ ```
658+ ┌─ Indexing Settings ───────────────────────────────────┐
659+ │ [ Embedding] │
660+ │ Max chunks per batch: [ ____ 10__ ] (8-128) │
661+ │ Max parallel workers: [ _____ 1__ ] (1-8) │
662+ │ │
663+ │ [ Worker concurrency] (per modality, asyncio Semaphore) │
664+ │ Vector: [ ____ 16__ ] (4-64) │
665+ │ Fulltext: [ ____ 32__ ] (8-128) │
666+ │ Graph facts: [ _____ 4__ ] (1-16) │
667+ │ Graph vectors: [ _____ 4__ ] (1-16) │
668+ │ Summary: [ _____ 4__ ] (1-16) │
669+ │ Vision: [ _____ 4__ ] (1-16) │
670+ │ Parse: [ _____ 8__ ] (1-32) │
671+ │ │
672+ │ [ Advanced: reconciler / cleanup] │
673+ │ Reconcile interval (s): [ ____ 30__ ] (10-300) │
674+ │ Reconcile batch size: [ ___ 100__ ] (10-1000) │
675+ │ Cleanup interval (s): [ ___ 300__ ] (60-3600) │
676+ │ Cleanup batch size: [ ___ 200__ ] (10-1000) │
677+ │ │
678+ │ [ ⚠️ 改动需要重启 indexing-worker pod 才能生效] │
679+ │ │
680+ │ [ Save] [ Reset to defaults] │
681+ └───────────────────────────────────────────────────────┘
682+ ```
683+
684+ **关键 UX 决策**:改动后只入 DB(settings 表),不立即生效。worker 启动时读 settings → 覆盖 env default。这样:
685+ - ops 不需要 kubectl edit values.yaml 滚动升级
686+ - admin UI 改动 → kubectl rollout restart deployment/indexing-worker → 30s 内新值生效
687+ - 兼容当前 env-based 部署:env 优先级 > DB settings(避免 admin UI 误改影响紧急止损)
688+
689+ **P2-Admin-2** Backend changes:
690+ - `aperag/config.py`:每个 indexing_* 参数加一个 helper `get_indexing_setting(key, default)`,启动时优先读 env,env 没有则读 DB settings 表
691+ - `aperag/cli/indexing_worker.py:_amain()` 启动时把 settings 注入 `OrchestratorConfig` / `ParseOrchestratorConfig`
692+ - `aperag/domains/governance/service/setting_service.py` 增加 `get_indexing_settings()` / `update_indexing_settings(...)` helper
693+ - 新建 OpenAPI route `GET/PUT /api/v1/admin/configuration/indexing`(参考已有 `/admin/configuration/parser`)
694+
695+ ### 11.4 hook 给前端 (@dongdong)
696+
697+ - 新建 `web/src/features/admin/indexing-settings/`:i18n key `admin_config.indexing.*`,schema validation(zod)
698+ - 加 sidebar 菜单项 `Indexing` 单独卡片或 ParserSettings 同卡分组
699+ - e2e:`web-e2e/admin/configuration-indexing.spec.ts`
700+
701+ ---
702+
703+ ## 12. (待 @ziang 补充)读路径 + GraphVectors/chunks.jsonl 复用边界
704+ ## 13. (待 @ziang 补充)cleanup / deletion / reconciler 批删 SQL
705+ ## 14. (待 @ziang 补充)端到端瓶颈归因(你的视角,跟 §5 互补)
706+ ## 15. (待 @ziang 补充)KubeBlocks 研究(kubeblocks-skills 仓) → 折进 §10 PG/pgbouncer 章节
707+ ## 16. (待联合)验收 checklist + Wave 1-4 排期收口
403708
404709---
405710
406- > 文档版本:v1
711+ > 文档版本:v1(§1-§11 by 符炫炜)
407712> 作者:@符炫炜(架构师)
408- > 评审 :@ziang(待补充 §11-13)
713+ > 待补 :@ziang §12-§16
409714> 验收:@earayu2(msg=718c79ba directive)
410715> 跟踪:@不穷(PM)
716+ > main HEAD pin: `eb4c4f3d` (2026-04-30 18:46)
0 commit comments