You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**Current state:**`pyproject.toml` requires FastAPI, FAISS, OpenAI, Gemini, numpy, uvicorn, and watchdog for *every* install — even users who only need `knowcode analyze` + `knowcode query`.
826
+
827
+
**Impact:** Slow installs, platform-specific failures (FAISS wheels, numpy ABI), increased vulnerability surface, and import-time latency for CLI-only users.
|`knowcode[watch]`|`watchdog`|`knowcode server --watch`|
837
+
|`knowcode[all]`| All of the above | Batteries-included (preserves backward compatibility) |
838
+
839
+
Commands invoked without the required extra should fail fast with: *"Install knowcode[server] to use `knowcode server`"*.
840
+
841
+
### **AD-2: Hidden Side Effects in Query Paths***(Priority: Critical)*
842
+
843
+
**Current state:**`KnowCodeService.retrieve_context_for_query()` auto-triggers `analyze()` and `_build_index()` if artifacts are missing. A read operation silently performs expensive writes.
844
+
845
+
**Impact:** Unpredictable latency in API/MCP server calls; surprises in CI/CD pipelines; makes the system non-deterministic from the caller's perspective.
846
+
847
+
**Target state:** Query methods fail fast with actionable errors when prerequisites are missing (e.g., *"Knowledge store not found. Run `knowcode analyze <dir>` first."*). Opt-in helpers `ensure_store()` and `ensure_index()` are available for callers who want the auto-build behavior.
848
+
849
+
### **AD-3: No Schema Versioning on Persisted Artifacts***(Priority: High)*
850
+
851
+
**Current state:** The JSON knowledge store and FAISS index contain no `schema_version` field. Data model changes silently corrupt existing stores.
852
+
853
+
**Impact:** No safe migration path; users must manually delete and rebuild after upgrades.
854
+
855
+
**Target state:** Top-level `schema_version` field in both the knowledge store JSON and the index metadata. A minimal migration shim validates version on load and either migrates or emits a clear error.
856
+
857
+
### **AD-4: Metadata Type Restriction***(Priority: High)*
858
+
859
+
**Current state:**`Entity.metadata`, `Relationship.metadata`, and `CodeChunk.metadata` are typed as `dict[str, str]`, forcing stringification of booleans, integers, and lists.
860
+
861
+
**Target state:** Change to `dict[str, Any]` across all data models. Serialization/deserialization handles mixed types natively.
**Current state:**`AppConfig._load_from_yaml()` catches all exceptions, prints to stdout, and silently falls back to defaults. No schema validation on YAML keys.
866
+
867
+
**Target state:** Use `logging.warning()` instead of `print()`. In server/MCP contexts, raise on invalid configuration. Validate known config keys and warn on unrecognised ones.
868
+
869
+
### **AD-6: Service Layer Cohesion***(Priority: Medium)*
870
+
871
+
**Current state:**`KnowCodeService` handles orchestration, caching, persistence, query classification, retrieval strategy selection, index validation, and auto-building — too many reasons to change.
872
+
873
+
**Target state:** Extract retrieval orchestration into a dedicated `RetrievalOrchestrator` class. `KnowCodeService` delegates to specialised components. Define `Protocol` interfaces for `EmbeddingProvider`, `VectorStore`, and `KnowledgeStoreProtocol` to decouple layers.
**Current state:** Entity IDs use `file_path::qualified_name`. File renames or moves break identity, poisoning temporal history and cached indexes.
878
+
879
+
**Target state:** Retain `file_path::qualified_name` as the primary ID but add a `content_hash` (SHA-256 of canonical source snippet) to entity metadata for rename-resilient correlation.
**Current state:** NetworkX in-memory graph + full JSON serialization. Adequate for small/medium repos but will hit memory and load-time walls on large monorepos (>100k entities).
884
+
885
+
**Target state:** Evaluate SQLite-backed storage for entities/edges/chunks with FTS, enabling incremental loads and partial queries. This is a Phase 6 concern.
886
+
887
+
### **AD-9: `[HARDENED]` Tag Clarity***(Priority: Low)*
888
+
889
+
**Current state:** Layer descriptions throughout this document include `[HARDENED]` items that represent aspirational capabilities, not shipped features. This can mislead readers about the system's current state.
890
+
891
+
**Target state:** All `[HARDENED]` items are clearly labelled as *"ASPIRATIONAL — not yet implemented"* where they first appear (Section 1 preamble), and individual items are not removed — they remain as the north-star design.
892
+
893
+
---
894
+
895
+
> **Note on `[HARDENED]` tags:** Throughout the layer descriptions above, items marked `[HARDENED]` represent the *target design* for a production-grade system. They are **not yet implemented** in the current codebase. See the roadmap below for the phased plan to address them.
896
+
819
897
## **Implementation Status & Roadmap**
820
898
821
899
### **Phase 1: Foundation (COMPLETED)**
@@ -840,31 +918,41 @@ You've essentially defined a **code intelligence system**, not a chatbot with em
840
918
13.**[x] Markdown Export (MVP)**: CLI `export` produces an index-style Markdown doc (see `docs_test/index.md`).
16.**[x] Side-Effect-Free Query Paths (AD-2)**: Remove auto-analyze/index from `retrieve_context_for_query()`. Fail fast with actionable errors. Add explicit `ensure_store()` / `ensure_index()` helpers.
924
+
17.**[] Schema Versioning (AD-3)**: Add `schema_version` to knowledge store JSON and index metadata. Write migration shim for version validation on load.
925
+
18.**[] Data Model Fixes (AD-4)**: Change `metadata: dict[str, str]` to `dict[str, Any]` across `Entity`, `Relationship`, and `CodeChunk`.
926
+
19.**[] Configuration Hardening (AD-5)**: Replace `print()` with `logging`; raise on invalid config in server contexts; validate YAML schema.
927
+
20.**[] Service Layer Decomposition (AD-6)**: Extract `RetrievalOrchestrator` from `KnowCodeService`. Define `Protocol` interfaces for `EmbeddingProvider`, `VectorStore`, `KnowledgeStoreProtocol`.
928
+
21.**[] Entity Identity Resilience (AD-7)**: Add `content_hash` to entity metadata for rename-resilient correlation.
929
+
22.**[] Layer Contract Tests**: Parser → `ParseResult` contract tests; store save/load roundtrip with schema version; retrieval golden-query tests; CLI smoke tests (Click runner); API endpoint contract tests (conditional on `server` extra).
930
+
931
+
### **Phase 5: Deep Analysis**
932
+
23.**[] Static Behavioral Analysis (Layer 4)**: Data flow, state transitions, side-effect classification.
0 commit comments