|
| 1 | +# Roadmap |
| 2 | + |
| 3 | +Meteor started as a metadata collector. It did one thing well: pull metadata from data infrastructure and push it to a catalog. That was the right tool for the catalog era, where the consumer was a human browsing a UI. |
| 4 | + |
| 5 | +The consumer has changed. AI agents, copilots, and autonomous systems now need to understand an organization's data landscape to do useful work. They need structured context, not catalog pages. They need a graph they can traverse, not a search index they can query. Meteor is already most of the way there. The extractors, the plugin system, the asset model, the lineage support — all of that is foundational. What's missing is richer extraction and smarter graph construction before the data reaches its destination. |
| 6 | + |
| 7 | +This document describes where Meteor goes next. |
| 8 | + |
| 9 | +## The Shift |
| 10 | + |
| 11 | +The metadata catalog era was about **collecting and displaying**. The AI era is about **collecting and grounding**. The difference matters because it changes what needs to be extracted, how entities are resolved, what relationships are captured, and how fresh the data needs to be. |
| 12 | + |
| 13 | +Today, Meteor extracts assets and pushes them to sinks as flat records. Each asset carries some lineage, some ownership, some schema information. But there is no entity resolution across sources. There is no cross-source relationship inference. The extraction model is batch-only, and the relationship vocabulary is limited to upstream/downstream lineage. |
| 14 | + |
| 15 | +Meteor's job is to produce the richest, most connected, most current representation of an organization's metadata — and deliver it to Compass, which stores, indexes, and serves it. Meteor owns the supply side. Collection, resolution, and graph construction. Everything that happens before the data reaches the store. |
| 16 | + |
| 17 | +## What Changes |
| 18 | + |
| 19 | +### From Records to Graph |
| 20 | + |
| 21 | +Meteor currently treats each extracted asset as an independent record flowing through a pipeline. The new model treats extraction as contributing nodes and edges to a unified graph. |
| 22 | + |
| 23 | +This means: |
| 24 | + |
| 25 | +- **Entity resolution across sources.** The same logical table appears in BigQuery, dbt, Tableau, and Airflow. Today, Meteor emits four separate assets. It should recognize these as one entity with four facets and merge them before delivery. Compass stores resolved entities, but Meteor is the one that sees all sources and does the resolution. |
| 26 | +- **Richer relationship types.** Lineage (upstream/downstream) is one relationship. Ownership, read-access, produced-by, documented-in, tested-by, derived-from — these all matter for AI reasoning. Extractors should capture these relationships at the source, and the asset model should carry them. |
| 27 | +- **Temporal awareness.** Meteor should track what changed between extraction runs. Schema evolution, ownership transfers, new assets, removed assets. Delivering deltas rather than full snapshots makes the downstream graph fresher and the pipeline more efficient. |
| 28 | + |
| 29 | +### From Data-Infra Only to Full Organizational Context |
| 30 | + |
| 31 | +Meteor's current extractors cover databases, warehouses, BI tools, message queues, orchestrators, and cloud storage. That is the data infrastructure layer. For AI context to be genuinely useful, coverage needs to expand well beyond it. |
| 32 | + |
| 33 | +- **Code and version control.** Repositories, pull requests, CI/CD pipelines. An AI agent debugging a data issue needs to know what code produces a table, when it last changed, and what the deploy pipeline looks like. |
| 34 | +- **Documentation.** Confluence, Notion, internal wikis. The business context that explains why a table exists or what a metric means often lives outside the data stack. |
| 35 | +- **API schemas.** OpenAPI specs, gRPC definitions. AI agents building integrations need to know what endpoints exist and what they accept. |
| 36 | +- **Infrastructure topology.** Service dependencies, Kubernetes deployments, cloud resource relationships. The operational context that connects data assets to the systems that serve them. |
| 37 | +- **Incidents and operations.** PagerDuty, OpsGenie, on-call schedules. When an AI agent is investigating a data quality issue, knowing that the upstream service had an incident last night is essential context. |
| 38 | +- **Access and permissions.** Who can access what. AI agents need to respect and communicate boundaries. |
| 39 | + |
| 40 | +Each new source type makes the context graph more complete. The extractor plugin system is Meteor's core strength — this is where it should invest most aggressively. |
| 41 | + |
| 42 | +### From Batch Pull to Incremental and Event-Driven |
| 43 | + |
| 44 | +Meteor runs on a batch schedule: execute a recipe, extract everything, push to sinks. For a context graph powering real-time AI agents, this model has limits. |
| 45 | + |
| 46 | +- **Change detection.** Extractors should track watermarks and emit only what changed since the last run. This reduces load on sources, shrinks payloads, and enables faster refresh cycles. |
| 47 | +- **Event receivers.** A lightweight webhook/event ingestion layer that can receive notifications — schema change deployed, incident opened, ownership transferred — and convert them into graph updates without a full extraction cycle. Not every source needs to be polled. |
| 48 | +- **Incremental delivery.** Sinks should be able to push deltas rather than full snapshots. When Meteor detects that only 3 out of 500 tables changed, it should deliver 3 updates, not 500. |
| 49 | + |
| 50 | +### From Flat Delivery to Rich Graph Delivery |
| 51 | + |
| 52 | +Today, Meteor pushes individual asset records to sinks. Each record is self-contained. The sink doesn't get the big picture — it gets one asset at a time. |
| 53 | + |
| 54 | +For Compass to build a proper graph, Meteor should deliver richer payloads: |
| 55 | + |
| 56 | +- **Resolved entities.** When Meteor determines that assets from different sources are the same logical entity, it should communicate that resolution, not leave it to the sink to figure out. |
| 57 | +- **Typed relationships.** Beyond upstream/downstream edges, Meteor should deliver the full set of relationships it discovered during extraction — ownership, read/write patterns, documentation links, test coverage. |
| 58 | +- **Cross-source edges.** Some relationships only become visible when you see multiple sources together. A BigQuery table is produced by an Airflow job and consumed by a Tableau dashboard. No single extractor sees this full chain. Meteor, seeing all sources, can infer and deliver these cross-source edges. |
| 59 | + |
| 60 | +## Architecture Direction |
| 61 | + |
| 62 | +The pipeline model stays. Extract, process, deliver. But the internals evolve: |
| 63 | + |
| 64 | +``` |
| 65 | +Sources (extractors + event receivers) |
| 66 | + │ |
| 67 | + v |
| 68 | + Graph Builder |
| 69 | + - entity resolution |
| 70 | + - cross-source edge inference |
| 71 | + - deduplication |
| 72 | + - change detection |
| 73 | + │ |
| 74 | + v |
| 75 | + Sinks |
| 76 | + - Compass (primary: rich graph delivery) |
| 77 | + - Kafka, GCS, HTTP (secondary: streaming, storage) |
| 78 | + - traditional sinks (backward compatible) |
| 79 | +``` |
| 80 | + |
| 81 | +**Graph Builder** replaces the current flat record stream as the core processing layer. It sees assets from across sources, resolves entities, infers cross-source relationships, deduplicates, and detects changes. This is the new center of Meteor — the intelligence layer between extraction and delivery. |
| 82 | + |
| 83 | +**Event Receivers** complement extractors. Instead of only pulling metadata on a schedule, Meteor can also receive events and convert them into graph updates. A webhook from GitHub on a schema migration merge. A notification from PagerDuty when an incident opens. These become nodes and edges in the graph without waiting for the next batch run. |
| 84 | + |
| 85 | +**Compass as the primary sink.** While Meteor retains its multi-sink architecture, Compass becomes the primary destination. The delivery protocol between Meteor and Compass should evolve from flat asset upserts to rich graph payloads — resolved entities, typed relationships, and change deltas. |
| 86 | + |
| 87 | +## Priorities |
| 88 | + |
| 89 | +Not all of this happens at once. The ordering reflects what delivers value fastest with the least disruption. |
| 90 | + |
| 91 | +**First: Strengthen the graph model.** Enrich the relationship model beyond lineage. Add entity resolution across sources. Deliver resolved entities and typed relationships to Compass. Everything else builds on this, and it can be done without breaking existing recipes. |
| 92 | + |
| 93 | +**Second: Expand coverage.** Add extractors for code repositories, documentation systems, API schemas, and infrastructure topology. Each new source makes the context graph more complete and more valuable. This is where Meteor's plugin architecture pays off. |
| 94 | + |
| 95 | +**Third: Go incremental.** Add change detection to existing extractors. Build the event receiver framework. Support delta delivery to sinks. This makes the graph fresher without increasing load. |
| 96 | + |
| 97 | +**Fourth: Cross-source intelligence.** Build the graph builder layer that sees assets from multiple sources together, infers cross-source edges, and detects patterns no single extractor can see. This is the hardest piece — it requires Meteor to move beyond per-recipe isolation. |
| 98 | + |
| 99 | +## What Stays the Same |
| 100 | + |
| 101 | +- **Single binary, no dependencies.** Meteor's operational simplicity is a feature. |
| 102 | +- **Plugin architecture.** New extractors, processors, and sinks should still be easy to build and register. |
| 103 | +- **Recipe-based configuration.** Recipes continue to work for defining extraction jobs. New modes (auto-discovery, event-driven) complement recipes, they don't replace them. |
| 104 | +- **Existing extractors and sinks.** Everything that works today keeps working. The graph builder is an additive layer, not a replacement. |
| 105 | + |
| 106 | +## Sink Strategy |
| 107 | + |
| 108 | +Meteor currently ships nine sinks. With Compass as the primary graph store and serving layer, most of the sink surface becomes unnecessary. The investment goes into making the Compass sink richer, not into adding more destinations. |
| 109 | + |
| 110 | +**Compass** is the primary sink. The delivery protocol evolves from flat asset upserts to rich graph payloads — resolved entities, typed relationships, change deltas, cross-source edges. This is where most of the sink investment goes. The sophistication is in what Meteor delivers to Compass, not in how many places it can deliver to. |
| 111 | + |
| 112 | +**Kafka** stays for event streaming. Other systems need to react to metadata changes in real time — a governance tool notified when a new PII column appears, a cost system alerted when a new dataset is created, an observability platform tracking lineage shifts. These consumers don't want to poll Compass. They want events on a bus. Kafka becomes more valuable as Meteor moves to incremental extraction — instead of dumping the full catalog every run, Meteor publishes change events. |
| 113 | + |
| 114 | +**Object storage** stays for archival and compliance. Raw metadata snapshots for audit trails, regulatory reviews, or historical analysis. "Show me what the schema looked like six months ago" or "prove we tracked PII lineage for this period." Compass keeps current state and some version history, but long-term archival is a different concern. The current GCS sink should generalize to support S3 and Azure Blob as well. |
| 115 | + |
| 116 | +**HTTP** stays as the generic escape hatch. Someone always has a use case you didn't anticipate — a custom internal system, a third-party tool, a one-off migration. Rather than building a dedicated sink for every edge case, a configurable HTTP sink covers it. |
| 117 | + |
| 118 | +**Console and File** stay for development and debugging. Zero cost to maintain, essential for local development and testing recipes. |
| 119 | + |
| 120 | +**Frontier and Stencil** should be retired or frozen. They are tightly coupled to specific Raystack services and serve niche use cases that the HTTP sink can handle. Maintaining dedicated sinks for them is not justified going forward. |
| 121 | + |
| 122 | +The principle is simple: Compass is the graph store, Kafka is the event bus, object storage is the archive, HTTP is the escape hatch. Everything else is either Compass's responsibility or too niche for Meteor to own. |
| 123 | + |
| 124 | +## What Gets Simplified |
| 125 | + |
| 126 | +- **Per-source isolation.** Today each recipe runs one source in isolation. For entity resolution and cross-source relationships, Meteor needs a mode where it can see assets from multiple sources together. |
| 127 | +- **Manual enrichment.** Much of what processors do today — adding labels, enriching fields — should eventually be inferrable from the graph itself. The script processor remains for custom logic, but the common cases should be automatic. |
| 128 | + |
| 129 | +## What Doesn't Belong in Meteor |
| 130 | + |
| 131 | +Meteor is the collection and graph construction layer. It does not own persistence, querying, or serving. Specifically: |
| 132 | + |
| 133 | +- **MCP server, context composition, and AI serving** belong in Compass. Compass is the always-on service with the full persisted graph. It is the natural interface for AI agents to query. |
| 134 | +- **Semantic search and embeddings** belong in Compass. Indexing and retrieval are query-side concerns, not collection-side. |
| 135 | +- **Change feeds and subscriptions** belong in Compass. Consumers subscribe to the store, not the pipeline. |
| 136 | +- **Usage tracking, quality scoring, and impact analysis** belong in Compass. These require the full graph state and query history that only the persistent store has. |
| 137 | + |
| 138 | +Meteor's job is to make Compass's graph as rich, connected, and current as possible. Compass's job is to make that graph useful. The boundary is delivery. |
| 139 | + |
| 140 | +## The Bet |
| 141 | + |
| 142 | +Every team building AI agents will need a way to ground those agents in organizational context. The metadata is scattered across dozens of systems. Someone needs to collect it, connect it, and deliver it in a form that a graph store can serve. |
| 143 | + |
| 144 | +Meteor already knows how to collect. The next step is to connect — richer extraction, entity resolution, cross-source relationships, and incremental delivery. That is the roadmap. |
0 commit comments