- Project Overview
- Project Principles
- Project Conventions & Patterns
- High-Level Architecture
- Subsystems Reference
- Code Organization
- Testing Approach
- Development Commands
Agentic Memorizer is an automated knowledge graph builder that monitors user-configured filesystem paths, applies a set of filters, analyzes content using AI providers, and maintains a searchable graph. The daemon watches and walks registered directories for changes and automatically processes files through format-specific chunkers, semantic analysis, and embeddings generation. Results are exposed to AI assistants via the Model Context Protocol (MCP), Hooks, and Plugins.
Key capabilities:
- Filesystem Monitoring: Watches registered directories for file changes with event coalescing
- Intelligent Chunking: 22 format-specific chunkers for code (Tree-sitter AST with 8 languages), documents (PDF, DOCX, ODT), markup (Markdown, LaTeX, HTML), configuration (TOML, HCL, Dockerfile), data formats (JSON, YAML, SQL), and notebooks (Jupyter)
- Semantic Analysis: Pluggable AI providers (Anthropic, OpenAI, Google) extract topics, entities, and summaries
- Vector Embeddings: OpenAI, Voyage AI, and Google providers for semantic similarity search
- Knowledge Graph: FalkorDB (Redis Graph) backend with typed metadata relationships
- AI Tool Integration: MCP server and hooks for Claude Code, Gemini CLI, Codex, and OpenCode
-
Unix Philosophy: Each component does one thing well. Data flows through text-based formats (JSON, YAML). Components are composable and can be scripted. Output is silent by default; verbosity is opt-in. All state is inspectable and human-readable.
-
Graceful Component Degradation: The system continues operating with reduced functionality when external services fail. If the graph connection fails, the daemon enters degraded mode but continues processing. If a provider is unavailable, analysis proceeds without that capability. Failures are logged and surfaced via health endpoints, never silently ignored.
-
Loose Coupling: Components communicate via an event bus rather than direct method calls. The watcher publishes events; the queue subscribes. The cleaner subscribes to deletion events. Components can be replaced or extended without modifying their consumers.
-
Eventual Consistency: The filesystem is the source of truth. Changes produce events that propagate asynchronously through the system. The knowledge graph reflects filesystem state only after processing completes. Queries may return stale data during processing; this is acceptable.
-
Observability: Comprehensive logging, health checks, and status commands provide visibility into system state. Each component logs key events and errors with context. Health endpoints report component status and degradation.
-
Extensibility: Interface-first design, registry patterns, and event-driven architecture enables easy addition or replacement of the concrete types of individual components.
-
Unix Philosophy:
- New components should have a single, clear responsibility
- Prefer text-based serialization (JSON, YAML) over binary formats
- Support
--quietand--verboseflags where applicable - Expose internal state via health endpoints or status commands
- Design for scriptability: predictable exit codes, machine-parseable output options
-
Graceful Component Degradation:
- Initialize optional components (graph, providers, MCP) with error handling that logs warnings and continues
- Track degraded state via boolean flags (e.g.,
graphDegraded,mcpDegraded) - Surface degradation in health endpoints and status commands
- Never crash the daemon due to external service failures
- Use retry logic with configurable backoff for transient failures
-
Loose Coupling:
- Components should depend on interfaces, not concrete implementations
- Use the event bus for cross-component communication instead of direct calls
- New functionality should subscribe to existing events rather than modify publishers
- Registries (chunkers, handlers, providers) enable runtime component selection
- Avoid circular dependencies between packages
-
Eventual Consistency:
- Accept that queries may return stale data during processing
- Design reconciliation logic to handle filesystem changes that occurred during walks
- Use the cleaner to remove stale graph entries after walks complete
- Emit events when state changes so dependent components can react
- Avoid synchronous dependencies between the filesystem state and graph state
-
Observability:
- Implement health checks via
ComponentHealthstruct with status (running/degraded/failed), error message, and timestamps - Register components with
ComponentHealthCollectorto participate in aggregate health reporting - Expose metrics via the
MetricsProviderinterface (CollectMetrics(ctx) error) - Use structured logging with
slogand component context:slog.Default().With("component", "name") - Populate health
Detailsmap with actionable diagnostics (queue depths, drop rates, counts) - Support
/healthz(liveness),/readyz(readiness with component breakdown), and/metricsendpoints
- Implement health checks via
-
Extensibility:
- Define component behavior as interfaces before implementation (see
Chunker,Provider,Graph,Bus) - Use registry pattern with priority ordering for pluggable components (chunkers, providers)
- Implement
CanHandle()for capability-based selection andPriority()for ordering - Use functional options pattern (
WithXXXfunctions) for configurable constructors - Subscribe to events rather than modifying publishers when adding new functionality
- New chunkers: implement interface, set priority, register in
DefaultRegistry() - New providers: implement interface, call
RegisterSemantic()orRegisterEmbeddings()
- Define component behavior as interfaces before implementation (see
-
Typed Configuration & User Input: Access configuration via typed structs, never string keys. CLI flags use variable-based storage with
{commandName}{FlagName}naming. All user input is validated inPreRunEhooks before business logic executes.// Typed config access cfg := config.Get() port := cfg.Daemon.HTTPPort pidFile := config.ExpandPath(cfg.Daemon.PIDFile) // Variable-based flag storage var rebuildForce bool func init() { RebuildCmd.Flags().BoolVar(&rebuildForce, "force", false, "Force rebuild") } // PreRunE validation pattern var MyCmd = &cobra.Command{ PreRunE: validateMy, RunE: runMy, } func validateMy(cmd *cobra.Command, args []string) error { // Validate input... cmd.SilenceUsage = true // Set AFTER validation passes return nil }
-
Interface-First Design: Major subsystems define behavior through interfaces before implementation.
Graph,Walker,Watcher,Registry,Bus,Chunker, andProviderare all interfaces with concrete implementations. This enables testing via mocks, component substitution, and clear contracts between packages. -
Functional Options Pattern: Constructors use
WithXXXoption functions for configuration:q := analysis.NewQueue(bus, analysis.WithWorkerCount(4), analysis.WithLogger(slog.Default()), )
-
Registry Pattern with Priority Selection: Pluggable components use centralized registries with priority ordering. When a component fails, the system falls through to lower-priority alternatives. Chunkers, handlers, and providers all use this pattern.
-
Ordered Component Lifecycle: Components follow Initialize→Start→Stop lifecycle. The Orchestrator initializes in dependency order and shuts down in reverse order. This ensures pending work drains before dependencies close.
-
Panic Recovery in Concurrent Code: Event handlers and worker goroutines recover from panics to prevent cascading failures. Panics are logged with context but don't crash the daemon.
-
Error Handling: Use semicolons (not colons) when wrapping errors for cleaner CLI output:
return fmt.Errorf("failed to initialize config; %w", err) // Correct return fmt.Errorf("failed to initialize config: %w", err) // Incorrect
-
Structured Logging: Use
log/slogwith key-value pairs. Add component context via.With():slog.Info("starting daemon", "http_port", cfg.Daemon.HTTPPort) logger := slog.Default().With("component", "graph")
-
CLI Command Organization: Commands organized by domain with subcommands in nested directories:
cmd/{parent}/ ├── {parent}.go # Parent command definition └── subcommands/ ├── {subcommand}.go # One file per subcommand └── helpers.go # Shared utilities
┌─────────────────────────────────────────────────────────────────────┐
│ CLI Layer (Cobra) │
│ [version] [initialize] [daemon] [remember] [forget] [list] [read] │
│ [integrations] [providers] [config] │
└──────────────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────────────▼──────────────────────────────────────┐
│ Daemon Core │
│ Component Lifecycle │ Health Manager │ HTTP Server (7600) │
└──────────────────────────────┬──────────────────────────────────────┘
│
┌────────────────────────┼────────────────────────┐
│ │ │
┌─────▼─────┐ ┌───────────────▼───────────────┐ ┌─────▼───────┐
│ Watcher │ │ Analysis Pipeline │ │ Graph │
│ (fsnotify)│ │ Queue → Workers → Handlers │ │ (FalkorDB) │
└─────┬─────┘ └───────────────┬───────────────┘ └─────────────┘
│ │
│ ┌──────────────┼──────────────┐
│ │ │ │
│ ┌────▼────┐ ┌─────▼─────┐ ┌─────▼──────┐
│ │Chunkers │ │ Semantic │ │ Embeddings │
│ │ (22) │ │ Providers │ │ Providers │
│ └─────────┘ └───────────┘ └────────────┘
│
└──→ Event Bus ──→ Analysis Queue
Data Flow:
- Watcher detects filesystem changes in remembered directories
- Events are coalesced and published to the event bus
- Analysis workers process files through chunkers and AI providers
- Results are persisted to the FalkorDB knowledge graph
- CLI, MCP server, and integrations query the graph
Key External Dependencies:
- FalkorDB: Redis Graph for knowledge storage
- SQLite: Registry database for remembered paths
- AI Providers: Anthropic, OpenAI, Google, Voyage AI
No subsystem documentation exists yet at docs/subsystems/. Key internal packages:
Core Subsystems:
| Package | Purpose |
|---|---|
internal/daemon |
Daemon lifecycle, health monitoring, component orchestration |
internal/events |
Event bus, event type definitions, critical queue for dropped events |
internal/watcher |
Real-time filesystem monitoring with fsnotify and event coalescing |
internal/walker |
Directory traversal and file discovery for initial/incremental walks |
internal/analysis |
Work queue, workers, pipeline stages, and analysis coordination |
internal/chunkers |
22 format-specific file chunkers with Tree-sitter code support |
internal/providers |
Semantic analysis and embeddings provider implementations |
internal/graph |
FalkorDB client, schema definitions, and graph operations |
internal/cleaner |
Removes stale cache and graph entries after walks complete |
internal/cache |
Caching layer for semantic and embeddings results |
Infrastructure:
| Package | Purpose |
|---|---|
internal/config |
Typed configuration with Viper, validation, and defaults |
internal/registry |
SQLite registry for remembered paths and file states |
internal/logging |
Structured logging setup and configuration |
internal/metrics |
Prometheus metrics collection and recording |
internal/container |
Dependency injection container for component bootstrapping |
Integration:
| Package | Purpose |
|---|---|
internal/mcp |
Model Context Protocol server implementation |
internal/integrations |
Hook, MCP, and plugin integrations for AI tools |
internal/export |
Data export functionality for CLI |
internal/daemonclient |
HTTP client for CLI-to-daemon communication |
Utilities:
| Package | Purpose |
|---|---|
internal/fsutil |
Common filesystem operations and helpers |
internal/cmdutil |
CLI command utilities and helpers |
internal/testutil |
Testing utilities with isolated environments |
internal/tui |
Terminal UI components for interactive commands |
internal/version |
Build version and metadata |
agentic-memorizer/
├── main.go # Entry point
├── cmd/ # CLI commands (Cobra)
│ ├── root.go # Root command with PersistentPreRunE
│ ├── daemon/ # Daemon subcommands
│ ├── remember/ # Path registration
│ └── ... # Other command groups
├── internal/ # Internal packages
│ ├── config/ # Configuration (types, defaults, validate, load)
│ ├── daemon/ # Daemon lifecycle management
│ ├── chunkers/ # File chunkers by format
│ │ └── code/ # Tree-sitter code chunkers
│ ├── providers/ # AI provider implementations
│ │ ├── semantic/ # Anthropic, OpenAI, Google
│ │ └── embeddings/ # OpenAI, Voyage, Google
│ ├── graph/ # FalkorDB client and models
│ └── ... # Other subsystems
├── testdata/ # Test fixtures organized by type
├── Makefile # Build automation
└── config.yaml.example # Example configuration
Framework: Go stdlib testing package only (no testify, ginkgo, etc.)
Table-Driven Tests: Standard pattern throughout:
tests := []struct {
name string
// fields...
}{
{"case 1", ...},
{"case 2", ...},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// test logic
})
}Test Utilities (internal/testutil/):
TestEnvprovides isolated test environments with temp directories- Automatic cleanup via
t.Cleanup() - Environment variable isolation via
t.Setenv()
Coverage: 90+ test files across all major subsystems including config, chunkers, graph, commands, and providers.
# Build the binary
make build
# Build and install to ~/.local/bin
make install
# Run all tests
make test
# Run tests with race detector
make test-race
# Run linter
make lint
# Run linter with auto-fix
make lint-fix
# Clean build artifacts
make clean# Run interactive setup wizard
memorizer initialize
# Start the daemon (foreground)
memorizer daemon start
# Stop the daemon
memorizer daemon stop
# Check daemon status
memorizer daemon status
# Remember a directory
memorizer remember ~/projects/myapp
# List remembered directories
memorizer list
# Export knowledge graph
memorizer read --format json
# Setup an integration
memorizer integrations setup claude-code-mcp