Skip to content

Latest commit

 

History

History
28 lines (20 loc) · 3.66 KB

File metadata and controls

28 lines (20 loc) · 3.66 KB

FastMemory Enterprise Architecture

Overview

FastMemory is architected to integrate seamlessly into complex, high-throughput enterprise data ecosystems. While it natively clusters Markdown-based Atomic Text Functions (ATFs) via rust-louvain, in an enterprise environment where data is distributed across Data Warehouses, Data Lakes, and specialized analytics platforms (e.g., Databricks, Microsoft Fabric, AWS Glue), FastMemory acts as an ontological orchestrator and agentic query engine bridging structured pipelines and autonomous AI logic.

Integration Patterns

1. Data Warehouses (Snowflake, BigQuery, Redshift)

  • ETL to ATFs: Structured relational data and unstructured text streams reside in data warehouses. Use standard ETL/ELT pipelines (e.g., dbt, Airflow) to extract relevant schemas, logs, or reports and systematically format them into FastMemory's ATF Markdown syntax (Data_Connections, Events, Access, Action).
  • Micro-Batching: Run fastmemory build on micro-batches of generated ATFs to continuously update the output.json cluster maps representing the exact state of the warehouse.

2. Data Lakes & Databricks (Delta Lake)

  • PySpark Parsing: Within Databricks or Delta Lake environments, utilize PySpark notebooks to run distributed NLP processing (similar to our build_huggingface_examples.py pipeline) over petabytes of raw text. Extract entities, roles, and events, then serialize them into FastMemory ATFs in bulk.
  • Vector Store Synergy (Hybrid RAG): FastMemory is not a replacement for vector DBs (e.g., Pinecone, Milvus), but a Structural Reasoning Layer on top. Vector databases handle nearest-neighbor fuzzy matching, while FastMemory's Topology clusters provide the logical, rule-based contexts (Components and Blocks) required for deterministic AI orchestration.

3. AWS Ecosystem (Glue, S3, Athena)

  • AWS Glue Crawlers & Jobs: Configure AWS Glue Python Shell or Spark jobs to crawl raw S3 buckets, parse documents, and trigger the FastMemory binary as a containerized ECS task.
  • S3 Persistence: Store the generated output.json Topology graphs back into Amazon S3. The FastMemory HTTP API (fastmemory serve) can be wrapped inside an AWS AppRunner or ECS Fargate service, securely serving queries to downstream applications.

4. Microsoft Fabric & OneLake

  • OneLake Centralization: Microsoft Fabric's OneLake can serve as the unified storage layer for both the raw ATF inputs and the clustered JSON outputs.
  • Copilot Agent Integration: FastMemory’s Model Context Protocol (fastmemory mcp) is the ideal interface for Microsoft Fabric's Copilots. By pointing Copilot to the MCP stdio stream, the AI can independently execute query_memory and get_block tools across the Fabric ecosystem securely.

Enterprise Security & Compliance

  • Access Mappings (The A_ Node): Enterprise environments demand strict Role-Based Access Control (RBAC). FastMemory physically embeds access limits by mapping IAM roles (e.g., AWS_Role_Finance, AzureAD_Group_Admin) directly into the graph as A_ (Access) nodes. Any AI parsing the block intrinsically sees the security boundaries attached to the functions.
  • Audit Triggers (The E_ Node): Map enterprise audit logs, webhook triggers, and Airflow DAG success states to Event (E_ nodes). This produces a highly traceable, event-driven memory graph where every function is visibly linked to its instigating enterprise trigger.
  • Isolated Domain Clusters: For large-scale multi-tenant architectures, avoid monolithic graphs. Run segmented fastmemory serve instances per domain (e.g., HR, Engineering, Operations) to prevent cross-contamination and bound memory usage.