This plan solidifies the next-stage evolution of NoteConnection from a "knowledge visualization system" into a local-first "knowledge parsing + mastery loop + divergence thinking + pluggable LLM tutor" platform.
This document is implementation-facing and decision-complete for the next 6-9 months.
- Deployment priority: local-first and privacy-preserving by default.
- Learning objective: dual-core strategy, mastery closure + divergence thinking.
- LLM strategy: pluggable adapter for both local and cloud models.
- Graph backbone: introduce a local graph database as advanced engine.
- Delivery cadence: three phases in 6-9 months.
- Primary success metric: mastery improvement.
- Knowledge Atom: smallest independently assessable unit of knowledge.
- Evidence Span: traceable source segment backing a knowledge atom.
- Relation Edge: prerequisite/analogy/contrast/causal/application relationship between atoms.
- Temporal Evolution: versioned state change and validity window of atoms and relations.
- Mastery State: observable probability of user mastery per atom.
- Divergence Graph: graph of cross-domain expansion around a current topic.
- Learning Action: executable next step (quiz, explanation, analysis, reflection, transfer task).
- Parse Markdown, code blocks, formulas, and Mermaid blocks into
KnowledgeAtom + EvidenceSpan. - Every atom must keep source provenance for explainable retrieval.
- Build static graph, process graph, and temporal graph from atoms and relation edges.
- Distinguish fact edges from inferred edges to prevent path hallucination.
- Hybrid retrieval: keyword + vector + graph traversal + temporal filtering.
- Every answer must include evidence, relation path, and temporal validity.
- Decide next learning action using
MasteryState + DivergenceGraph. - Do not consume raw black-box LLM output without evidence binding.
- Provide learning workspace + tutor action APIs + evaluation feedback loop.
- Support local model and cloud model through one adapter contract.
- Enforce freshness checks, API contracts, rollback switches, quality gates, and privacy boundaries.
- Gate the full chain from L0 through L4.
- Fast-GraphRAG: absorb stateful insertion/query and high-speed local retrieval pipeline.
- LightRAG: absorb dual-level retrieval and incremental update orientation.
- Graphiti: absorb temporal knowledge graph concepts for evolving context.
- Neo4j GraphRAG: absorb graph-driven explainable retrieval patterns and tool contract discipline.
- MemOS: absorb layered memory policy (session/unit/long-term) and memory scheduling.
- GitNexus: absorb process-context, staleness discipline, and agent-consumable interface patterns.
- No cloud-first multi-tenant architecture.
- No deep distributed complexity at v1.
- No direct one-to-one reuse of code intelligence patterns as learning intelligence.
- Build unified parser pipeline for
KnowledgeAtom + EvidenceSpan. - Introduce local graph database as advanced engine, keep lightweight path for compatibility.
- Implement temporal model for atom/relation versioning and validity.
- Add staleness detection by source hash binding.
- Deliverables:
- Incremental rebuildable knowledge graph service.
- Evidence-traceable query interface.
- Temporal validity annotations.
- Introduce
LearnerConceptStateper atom (mastery, error tags, retest outcomes). - Build mastery closure loop: diagnose -> classify errors -> personalized practice -> retest update.
- Build divergence engine for same-level expansion, cross-level transfer, and counter-example exploration.
- Support dual output paths:
MasteryPath[]andDivergencePath[]. - Deliverables:
- Learning path orchestrator.
- Error taxonomy knowledge base.
- Dual-core learning panel.
- Build unified LLM adapter for local and cloud providers.
- Implement tutor actions: quiz generation, probing questions, answer analysis, misconception diagnosis, transfer-task generation, recap synthesis.
- Implement layered memory: session memory, unit memory, long-term mastery memory.
- Add safety guardrails: evidence-first responses, source traceability, confidence-based downgrade.
- Deliverables:
- LLM tutor orchestration layer.
- Memory policy engine.
- Learning quality dashboard.
KnowledgeIngestAPI- Input: document payload + incremental change metadata.
- Output: atom/evidence/relation/temporal metadata.
KnowledgeQueryAPI- Unified retrieval entry with evidence-first response contract.
MasteryDiagnosticsAPI- Input: learner answer/behavior events.
- Output: mastery updates + error labels.
LearningPathAPI- Output: prioritized
MasteryPath[]andDivergencePath[].
- Output: prioritized
TutorActionAPI- Unified tutor action contract (ask/analyze/feedback/recap).
MemoryPolicyAPI- Session/unit/long-term memory write and eviction policy management.
KnowledgeAtomEvidenceSpanRelationEdgeTemporalEdgeLearnerConceptStateLearningActionTutorTrace
- Parsing correctness: atom extraction, evidence alignment, relation consistency.
- Retrieval trust: evidence traceability, path explainability, temporal validity hit rate.
- Learning effectiveness: mastery uplift, misconception recurrence decline, retest pass-rate uplift.
- Divergence quality: cross-topic linkage quality, counter-example quality, transfer-task quality.
- Performance: p95 query latency and rebuild duration at 10k atom scale.
- Privacy/security: no external leakage by default, model-call auditability, boundary enforcement.
- Retest pass-rate uplift >= 20%.
- High-frequency misconception recurrence reduction >= 25%.
- Evidence-backed learning suggestion ratio >= 90%.
- Path effectiveness significantly better than random baseline.
- Key p95 interactions remain at interactive latency and gates stay green.
- Learning is a state-estimation + intervention-control problem, not only a content presentation problem.
- Without atomization, mastery cannot be measured or improved robustly.
- Without explainable retrieval, feedback loops are not trustworthy.
- Without temporal and memory layers, forgetting and transfer cannot be modeled correctly.
- Without governance gates, quality drifts and model hallucinations will erode learning reliability.
- State-space loop:
Knowledge State -> Observation -> Update -> Policy. - Dual-objective optimization: mastery gain and divergence quality under explicit constraints.
- Evidence-first orchestration: every tutor action must map to source evidence and relation path.
- Layered memory: separate short-term interaction memory from long-term mastery memory.
- Controlled evolution: each capability expansion is contract-tested and gate-verified.
- Optimizing vector recall only without graph and evidence chains.
- Treating raw LLM output as ground truth.
- Recommending paths without updating mastery state and retest loops.
- Pursuing full-modality scope too early and increasing architecture risk.
- Ignoring local privacy and auditability until late-stage.
- The direction shift to a verifiable learning system is feasible and strategically sound.
- Local-first plus graph-database backbone is foundational for long-term capability ceiling.
- Dual-core value requires mastery loop and divergence engine to be implemented together.
- Pluggable LLM works only when built on evidence-first retrieval and layered memory.
- A 6-9 month phased roadmap can generate measurable value while controlling risk.
- GitNexus: https://github.com/abhigyanpatwari/GitNexus
- Fast-GraphRAG: https://github.com/circlemind-ai/fast-graphrag
- LightRAG: https://github.com/HKUDS/LightRAG
- Graphiti: https://github.com/getzep/graphiti
- Neo4j GraphRAG Python: https://github.com/neo4j/neo4j-graphrag-python
- MemOS: https://github.com/MemTensor/MemOS
- Neo4j GraphRAG Docs: https://neo4j.com/docs/neo4j-graphrag-python/current/