Skip to content

Commit ca7d89e

Browse files
committed
docs: rewrite documentation to reflect context engine vision and current architecture
Update all documentation to align with Compass as a context engine building a knowledge graph of organizational metadata. Remove all Elasticsearch references (fully removed from codebase), update CLI reference to reflect entity-based commands, rewrite API reference for Connect RPC endpoints and MCP server, and fix stale versions and configuration keys.
1 parent a1d1776 commit ca7d89e

File tree

14 files changed

+340
-2518
lines changed

14 files changed

+340
-2518
lines changed

README.md

Lines changed: 17 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -6,20 +6,21 @@
66
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg?logo=apache)](LICENSE)
77
[![Version](https://img.shields.io/github/v/release/raystack/compass?logo=semantic-release)](Version)
88

9-
Compass is a search and discovery engine built for querying application deployments, datasets and meta resources. It can also optionally track data flow relationships between these resources and allow the user to view a representation of the data flow graph.
9+
Compass is a context engine that builds a knowledge graph of your organization's metadata, capturing entities, relationships, and lineage across systems and time, making it discoverable and queryable for both humans and AI agents.
10+
11+
Critical organizational knowledge lives scattered across dozens of systems: services, datasets, applications, teams, configurations, decisions, and the relationships between them. Compass resolves observations from these sources into unified entities, constructs a temporal graph of their relationships, and indexes everything for both keyword and semantic search. The result is a context graph that stitches together what exists, who owns it, how it connects, and what changed over time, so both humans and AI agents can discover, traverse, and reason over the full picture.
1012

1113
<p align="center"><img src="./docs/static/assets/overview.svg" /></p>
1214

1315
## Key Features
1416

15-
Discover why users choose Compass as their main data discovery and lineage service
16-
17-
- **Full text search** Faster and better search results powered by ElasticSearch full text search capability.
18-
- **Search Tuning** Narrow down your search results by adding filters, getting your crisp results.
19-
- **Data Lineage** Understand the relationship between metadata with data lineage interface.
20-
- **Scale:** Compass scales in an instant, both vertically and horizontally for high performance.
21-
- **Extensibility:** Add your own metadata types and resources to support wide variety of metadata.
22-
- **Runtime:** Compass can run inside VMs or containers in a fully managed runtime environment like kubernetes.
17+
- **Entity Resolution:** Resolve and deduplicate metadata observations from multiple sources into unified entities with stable identity.
18+
- **Knowledge Graph:** Store typed, directed relationships between entities including lineage, ownership, documentation, and custom edge types.
19+
- **Hybrid Search:** Combine keyword precision with semantic similarity using Postgres-native full-text search and pgvector embeddings.
20+
- **Graph Traversal:** Multi-hop traversal queries across the entity graph for impact analysis, dependency tracking, and path discovery.
21+
- **Context Composition:** Assemble schema, lineage, ownership, and quality signals into context documents ready for LLM consumption.
22+
- **AI Serving:** Expose the full graph as an MCP server so AI agents can discover, traverse, and reason over organizational knowledge.
23+
- **Extensibility:** Open type system for entities and relationships to support any kind of metadata across your infrastructure.
2324

2425
## Documentation
2526

@@ -95,13 +96,11 @@ alias compass="docker run -e HOME=/tmp -v $HOME/.config/raystack:/tmp/.config/ra
9596

9697
## Usage
9798

98-
Compass is purely API-driven. It is very easy to get started with Compass. It provides CLI and HTTP APIs for simpler developer experience.
99+
Compass provides a CLI, Connect RPC API (HTTP + gRPC), and an MCP server for AI agents.
99100

100101
#### CLI
101102

102-
Compass CLI is fully featured and simple to use, even for those who have very limited experience working from the command line. Run `compass --help` to see list of all available commands and instructions to use.
103-
104-
List of commands
103+
Compass CLI is fully featured and simple to use. Run `compass --help` to see all available commands.
105104

106105
```
107106
compass --help
@@ -115,7 +114,11 @@ compass reference
115114

116115
#### API
117116

118-
Compass provides a fully-featured HTTP API to interact with Compass server. The API is built with [Connect RPC](https://connectrpc.com/) and supports both Connect and gRPC protocols. Please refer to [proton](https://github.com/raystack/proton/tree/main/raystack/compass/v1beta1) for API definitions.
117+
Compass provides a Connect RPC API that supports both Connect (HTTP) and gRPC protocols. Please refer to [proton](https://github.com/raystack/proton/tree/main/raystack/compass/v1beta1) for API definitions.
118+
119+
#### MCP Server
120+
121+
Compass exposes an MCP server at `/mcp` for AI agent integration. MCP-compatible systems can connect and use tools like `search_entities`, `get_context`, and `impact`.
119122

120123
## Contribute
121124

docs/docs/concepts/architecture.md

Lines changed: 31 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,47 @@
11
# Architecture
22

3-
Compass' architecture is pretty simple. It has a client-server architecture backed by PostgreSQL as a main storage and Elasticsearch as a secondary storage and provides HTTP & gRPC interface to interact with.
3+
Compass is built as a context engine with a layered architecture. Raw metadata observations flow in, get resolved into unified entities, are stored in a temporal knowledge graph, indexed for hybrid search, and served to both human interfaces and AI agents.
44

55
![Compass Architecture](/assets/architecture.png)
66

77
## System Design
88

99
### Components
1010

11-
#### gRPC Server
11+
#### Entity Resolver
1212

13-
- gRPC server is the main interface to interact with Compass.
14-
- The protobuf file to define the interface is centralized in [raystack/proton](https://github.com/raystack/proton/tree/main/raystack/compass/v1beta1)
13+
Incoming metadata observations from collection systems like Meteor are resolved against the existing graph. The resolver deduplicates, merges facets from multiple sources, and maintains stable entity identity. The same logical entity appearing across different systems is recognized and unified.
1514

16-
#### gRPC-gateway Server
15+
#### Graph Store (PostgreSQL)
1716

18-
- gRPC-gateway server transcodes HTTP call to gRPC call and allows client to interact with Compass using RESTful HTTP request.
17+
PostgreSQL is the primary store for the knowledge graph. Entities, typed directed edges, and temporal metadata (valid_from/valid_to) are stored relationally. Recursive CTE queries power multi-hop graph traversal for impact analysis and dependency tracking. Row Level Security enforces multi-tenant isolation at the database level.
1918

20-
#### PostgreSQL
19+
#### Vector Index (pgvector)
2120

22-
- Compass uses PostgreSQL as it is main storage for storing all of its metadata.
21+
Semantic search is powered by pgvector embeddings stored alongside entities. When an entity is created or updated, Compass generates vector embeddings of its semantic content. This enables similarity-based discovery where keyword search falls short.
2322

24-
#### Elasticsearch
23+
#### Search Engine
2524

26-
- Compass uses Elasticsearch as it is secondary storage to power search of metadata.
25+
Compass supports hybrid search combining multiple strategies:
26+
27+
- **Keyword search:** Postgres tsvector full-text search with weighted fields (URN and name weighted highest, descriptions next, source metadata lowest).
28+
- **Fuzzy matching:** pg_trgm trigram indexes for typo-tolerant and partial matching.
29+
- **Semantic search:** pgvector cosine similarity for conceptual matching.
30+
- **Hybrid ranking:** Reciprocal Rank Fusion combines results from keyword and semantic search into a single ranked list.
31+
32+
All search is Postgres-native. There are no external search engine dependencies.
33+
34+
#### Query Engine
35+
36+
The query engine orchestrates graph traversal, hybrid search, and context composition. It handles:
37+
38+
- Multi-hop lineage and dependency traversal
39+
- Impact analysis (what breaks if this changes)
40+
- Context assembly (composing schema, lineage, ownership, and quality signals into a single response)
41+
42+
#### Serving Layer
43+
44+
Compass exposes its capabilities through multiple interfaces:
45+
46+
- **Connect RPC:** The primary API interface, supporting both Connect (HTTP) and gRPC protocols. API definitions are maintained in [raystack/proton](https://github.com/raystack/proton/tree/main/raystack/compass/v1beta1).
47+
- **MCP Server:** Model Context Protocol interface for AI agents. Any MCP-compatible system can connect and use tools like search, lineage traversal, and context assembly.

docs/docs/concepts/internals.md

Lines changed: 43 additions & 131 deletions
Original file line numberDiff line numberDiff line change
@@ -1,152 +1,64 @@
11
# Internals
22

3-
This document details information about how Compass interfaces with elasticsearch. It is meant to give an overview of how some concepts work internally, to help streamline understanding of how things work under the hood.
4-
5-
## Index Setup
6-
7-
There is a migration command in compass to setup all storages. The indices are configured with a camel case tokenizer, to support proper lexing of some resources that use camel case in their nomenclature \(protobuf names for instance\). Given below is a sample of the index settings that are used:
8-
9-
```javascript
10-
// PUT http://${ES_HOST}/{index}
11-
{
12-
"mappings": {}, // used for boost
13-
"aliases": { // all indices are aliased to the "universe" index
14-
"universe": {}
15-
},
16-
"settings": { // configuration for handling camel case text
17-
"analysis": {
18-
"analyzer": {
19-
"default": {
20-
"type": "pattern",
21-
"pattern": "([^\\p{L}\\d]+)|(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)|(?<=[\\p{L}&&[^\\p{Lu}]])(?=\\p{Lu})|(?<=\\p{Lu})(?=\\p{Lu}[\\p{L}&&[^\\p{Lu}]])"
22-
}
23-
}
24-
}
25-
}
26-
}
27-
```
3+
This document details how Compass works under the hood. It covers the search architecture, storage internals, and multi-tenancy model.
284

29-
One shared index is created for all services and tenants but each request(read/write) is routed to a unique shard for each tenant. Compass categorize tenants into two tires, `shared` and `dedicated`. For shared tenants, all the requests will be routed by namespace id over a single shard in an index. For dedicated tenants, each tenant will have its own index. Note, a single index will have N number of `types` same as the number of `Services` supported in Compass. This design will ensure, all the document insert/query requests are only confined to a single shard(in case of shared) or a single index(in case of dedicated).
30-
Details on why we did this is available at [issue #208](https://github.com/raystack/compass/issues/208).
5+
## Search Architecture
316

32-
## Postgres
7+
All search in Compass is Postgres-native, combining keyword, fuzzy, and semantic strategies with no external search engine dependencies.
338

34-
To enforce multi-tenant restrictions at the database level, [Row Level Security](https://www.postgresql.org/docs/current/ddl-rowsecurity.html) is used. RLS requires Postgres users used for application database connection not to be a table owner or a superuser else all RLS are bypassed by default. That means a Postgres user that is migrating the application and a user that is used to serve the app should both be different.
9+
### Postgres-Native Search
3510

36-
To create a postgres user
11+
#### Full-Text Search (tsvector)
3712

38-
```sql
39-
CREATE USER "compass_user" WITH PASSWORD 'compass';
40-
GRANT CONNECT ON DATABASE "compass" TO "compass_user";
41-
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO "compass_user";
42-
GRANT ALL ON ALL SEQUENCES IN SCHEMA public TO "compass_user";
43-
GRANT ALL ON ALL FUNCTIONS IN SCHEMA public TO "compass_user";
13+
Entities are indexed using PostgreSQL's built-in full-text search. A `search_vector` generated column is maintained on the entities table with weighted fields:
4414

45-
ALTER DEFAULT PRIVILEGES IN SCHEMA "public" GRANT SELECT, INSERT, UPDATE, DELETE, REFERENCES
46-
ON TABLES TO "compass_user";
47-
ALTER DEFAULT PRIVILEGES IN SCHEMA "public" GRANT USAGE ON SEQUENCES TO "compass_user";
48-
ALTER DEFAULT PRIVILEGES IN SCHEMA "public" GRANT EXECUTE ON FUNCTIONS TO "compass_user";
49-
```
15+
- **Weight A:** URN and name (highest relevance)
16+
- **Weight B:** Description
17+
- **Weight C:** Source and service metadata
5018

51-
A middleware looks for `x-namespace` header to extract tenant id if not found falls back to `default` namespace.
52-
Same could be passed in a `jwt token` of Authentication Bearer with `namespace_id` as a claim.
19+
GIN indexes on the search vector enable fast full-text queries.
5320

54-
## Search
21+
#### Fuzzy Matching (pg_trgm)
5522

56-
We use elasticsearch's `multi_match` search for running our queries. Depending on whether there are additional filter's specified during search, we augment the query with a custom script query that filter's the result set.
23+
Trigram indexes powered by the `pg_trgm` extension support typo-tolerant and partial matching. This handles cases where users misspell entity names or search with partial terms.
5724

58-
The script filter is designed to match a document if:
25+
#### Semantic Search (pgvector)
5926

60-
- the document contains the filter key and it's value matches the filter value OR
61-
- the document doesn't contain the filter key at all
27+
Vector embeddings are stored in a chunks table and indexed for cosine similarity search using pgvector. When an entity is created or updated, its semantic content (description, properties, labels) is embedded and stored. Semantic search finds conceptually related entities even when the exact terms don't overlap.
6228

63-
To demonstrate, the following API call:
29+
#### Hybrid Ranking
6430

65-
```text
66-
$ curl http://localhost:8080/v1beta1/search?text=log&filter[landscape]=id
67-
```
31+
Results from keyword and semantic search are combined using Reciprocal Rank Fusion (RRF). This produces a single ranked list that balances keyword precision with semantic recall.
6832

69-
is internally translated to the following elasticsearch query
70-
71-
```javascript
72-
{
73-
"query": {
74-
"bool": {
75-
"must": {
76-
"multi_match": {
77-
"query": "log"
78-
}
79-
},
80-
"filter": [{
81-
"script": {
82-
"script": {
83-
"source": "doc.containsKey(\"landscape.keyword\") == false || doc[\"landscape.keyword\"].value == \"id\""
84-
}
85-
}
86-
}]
87-
}
88-
}
89-
}
90-
```
33+
## Entity Storage
9134

92-
Compass also supports filter with fuzzy match with `query` query params. The script query is designed to match a document if:
35+
### Temporal Model
9336

94-
- the document contains the filter key and it's value is fuzzily matches the `query` value
37+
Entities in Compass are temporal. Each entity version carries `valid_from` and `valid_to` timestamps, allowing Compass to track how entities and their properties evolve over time. This supports queries like "what did this entity look like last week" and "what changed in the last 24 hours."
9538

96-
```text
97-
$ curl http://localhost:8080/v1beta1/search?text=log&filter[landscape]=id
98-
```
39+
### Graph Edges
40+
41+
Relationships between entities are stored as typed, directed edges. Each edge has a type (lineage, ownership, documentation, etc.) and optional properties. Edges are also temporal, capturing when relationships were established and when they ended.
9942

100-
is internally translated to the following elasticsearch query
101-
102-
```javascript
103-
{
104-
"query":{
105-
"bool":{
106-
"filter":{
107-
"match":{
108-
"description":{
109-
"fuzziness":"AUTO",
110-
"query":"test"
111-
}
112-
}
113-
},
114-
"should":{
115-
"bool":{
116-
"should":[
117-
{
118-
"multi_match":{
119-
"fields":[
120-
"urn^10",
121-
"name^5"
122-
],
123-
"query":"log"
124-
}
125-
},
126-
{
127-
"multi_match":{
128-
"fields":[
129-
"urn^10",
130-
"name^5"
131-
],
132-
"fuzziness":"AUTO",
133-
"query":"log"
134-
}
135-
},
136-
{
137-
"multi_match":{
138-
"fields":[
139-
140-
],
141-
"fuzziness":"AUTO",
142-
"query":"log"
143-
}
144-
}
145-
]
146-
}
147-
}
148-
}
149-
},
150-
"min_score":0.01
151-
}
43+
Graph traversal uses recursive Common Table Expressions (CTEs) in PostgreSQL, enabling multi-hop queries without external graph database dependencies.
44+
45+
## PostgreSQL Multi-Tenancy
46+
47+
To enforce multi-tenant restrictions at the database level, [Row Level Security](https://www.postgresql.org/docs/current/ddl-rowsecurity.html) is used. RLS requires Postgres users used for application database connection not to be a table owner or a superuser, else all RLS policies are bypassed by default. That means the Postgres user that runs migrations and the user that serves the app should be different.
48+
49+
To create a postgres user:
50+
51+
```sql
52+
CREATE USER "compass_user" WITH PASSWORD 'compass';
53+
GRANT CONNECT ON DATABASE "compass" TO "compass_user";
54+
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO "compass_user";
55+
GRANT ALL ON ALL SEQUENCES IN SCHEMA public TO "compass_user";
56+
GRANT ALL ON ALL FUNCTIONS IN SCHEMA public TO "compass_user";
57+
58+
ALTER DEFAULT PRIVILEGES IN SCHEMA "public" GRANT SELECT, INSERT, UPDATE, DELETE, REFERENCES
59+
ON TABLES TO "compass_user";
60+
ALTER DEFAULT PRIVILEGES IN SCHEMA "public" GRANT USAGE ON SEQUENCES TO "compass_user";
61+
ALTER DEFAULT PRIVILEGES IN SCHEMA "public" GRANT EXECUTE ON FUNCTIONS TO "compass_user";
15262
```
63+
64+
A middleware looks for `x-namespace` header to extract tenant id. If not found, it falls back to the `default` namespace. The same can be passed in a JWT token of Authentication Bearer with `namespace_id` as a claim.

docs/docs/concepts/overview.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Overview
22

3-
Compass is an organizational context engine that builds a temporal entity graph of your systems and serves it to AI agents via MCP.
3+
Compass is a context engine that builds a knowledge graph of your organization's metadata, capturing entities, relationships, and lineage across systems and time, making it discoverable and queryable for both humans and AI agents.
44

55
## Core Concepts
66

@@ -21,7 +21,7 @@ Compass is an organizational context engine that builds a temporal entity graph
2121

2222
## Architecture
2323

24-
All search is Postgres-native — no Elasticsearch dependency:
24+
All search is Postgres-native:
2525

2626
| Mode | Engine | Purpose |
2727
|---|---|---|

0 commit comments

Comments
 (0)