|
| 1 | +# Projections (Developer Guide) |
| 2 | + |
| 3 | +This document is a *developer-facing* guide for implementing projections in Continuum. |
| 4 | +It focuses on the practical decisions that determine whether your projections stay correct over time: |
| 5 | + |
| 6 | +- Choosing a projection key (`extractKey`) |
| 7 | +- Designing events so the key is derivable |
| 8 | +- Handling “multi-stream joins” without loading aggregates |
| 9 | + |
| 10 | +## What a projection is (in Continuum terms) |
| 11 | + |
| 12 | +A projection is a pure event consumer that transforms a sequence of events into a read model. |
| 13 | + |
| 14 | +Key constraints: |
| 15 | + |
| 16 | +- A projection **must not load aggregates**. |
| 17 | +- A projection **must not issue commands**. |
| 18 | +- A projection should be **deterministic**: applying the same event to the same current read model must always yield the same result. |
| 19 | + |
| 20 | +In code, this is represented by `ProjectionBase<TReadModel, TKey>`. |
| 21 | +The critical method for correctness is: |
| 22 | + |
| 23 | +- `TKey extractKey(StoredEvent event)` |
| 24 | + |
| 25 | +## The meaning of the key |
| 26 | + |
| 27 | +The key is **the identity of the read model instance** that should be updated by a given event. |
| 28 | + |
| 29 | +Think of it as the primary key of the table/record that stores your read model: |
| 30 | + |
| 31 | +- `UserId` for a “user profile” read model |
| 32 | +- `TenantId` for a “tenant dashboard” read model |
| 33 | +- `ConversationId` for a “conversation summary” read model |
| 34 | + |
| 35 | +### Rule of thumb |
| 36 | + |
| 37 | +- **Single-stream projections**: key is often the stream ID. |
| 38 | +- **Multi-stream projections**: key is usually a *domain identifier shared across events*, not the event stream ID. |
| 39 | + |
| 40 | +## SingleStreamProjection: typical key strategy |
| 41 | + |
| 42 | +A single-stream projection consumes events from exactly one stream per read model instance. |
| 43 | + |
| 44 | +Common choice: |
| 45 | + |
| 46 | +- `extractKey(event) => event.streamId` |
| 47 | + |
| 48 | +This works because: |
| 49 | + |
| 50 | +- all events for that read model instance come from the same stream |
| 51 | +- the stream’s identity is the read model’s identity |
| 52 | + |
| 53 | +## MultiStreamProjection: how to build the key correctly |
| 54 | + |
| 55 | +A multi-stream projection intentionally merges events from **multiple streams** into **one** read model instance. |
| 56 | + |
| 57 | +This is exactly why “use `event.streamId` as the key” is usually wrong in multi-stream: |
| 58 | + |
| 59 | +- Different aggregates (different streams) would produce different keys |
| 60 | +- You would accidentally create multiple read models where you wanted one |
| 61 | + |
| 62 | +### Correct mental model |
| 63 | + |
| 64 | +For multi-stream projections, `extractKey` must answer: |
| 65 | + |
| 66 | +> “Which read model row does this event belong to?” |
| 67 | +
|
| 68 | +Not: |
| 69 | + |
| 70 | +> “Which stream did this event come from?” |
| 71 | +
|
| 72 | +### Practical key sources (in order of preference) |
| 73 | + |
| 74 | +#### 1) The key is present in the event payload |
| 75 | + |
| 76 | +This is the cleanest design: |
| 77 | + |
| 78 | +- Events emitted by different aggregates include a shared identifier |
| 79 | +- Your projection key is that identifier |
| 80 | + |
| 81 | +Example idea: |
| 82 | + |
| 83 | +- `OrderPlaced(orderId, customerId)` and `CustomerEmailChanged(customerId, ...)` |
| 84 | +- Read model is “CustomerSummary” keyed by `customerId` |
| 85 | + |
| 86 | +Then: |
| 87 | + |
| 88 | +- For `OrderPlaced`, key is `customerId` (not the order stream ID) |
| 89 | +- For `CustomerEmailChanged`, key is `customerId` (often matches the customer stream ID, but you still use the field) |
| 90 | + |
| 91 | +#### 2) The key is derivable from event metadata/stream naming |
| 92 | + |
| 93 | +Sometimes you can derive a domain identifier from the stream ID. |
| 94 | +For example, if stream IDs are structured like `customer-<customerId>`. |
| 95 | + |
| 96 | +This can work, but it’s brittle unless your stream ID format is treated as a stable API. |
| 97 | + |
| 98 | +Prefer explicit IDs in the event payload when you can. |
| 99 | + |
| 100 | +#### 3) The key is resolvable via a projection-maintained mapping (“join/index”) |
| 101 | + |
| 102 | +This is the common case when later events do not contain the grouping key. |
| 103 | + |
| 104 | +Example: |
| 105 | + |
| 106 | +- `OrderPlaced(orderId, customerId)` contains both IDs |
| 107 | +- later `OrderShipped(orderId)` does *not* contain `customerId` |
| 108 | + |
| 109 | +If your read model is keyed by `customerId`, you need a mapping: |
| 110 | + |
| 111 | +- When you see `OrderPlaced`, store `orderId -> customerId` |
| 112 | +- When you see `OrderShipped`, look up `orderId -> customerId` and route to that read model key |
| 113 | + |
| 114 | +Important constraint: |
| 115 | + |
| 116 | +- The mapping must be stored in your read model (or a dedicated auxiliary read model) |
| 117 | +- You still do not load aggregates |
| 118 | + |
| 119 | +### What about the very first event? |
| 120 | + |
| 121 | +It’s normal that the first event you see comes from exactly one stream. |
| 122 | +That does **not** mean the key should be that stream ID. |
| 123 | + |
| 124 | +The key should still be the read model’s true identity. |
| 125 | + |
| 126 | +If the first event does not contain enough information to determine the key, you have two options: |
| 127 | + |
| 128 | +1) **Change the event schema** to include the required identifier. |
| 129 | +2) **Maintain a mapping** seeded by earlier events that *do* contain the identifier. |
| 130 | + |
| 131 | +If neither is possible, you cannot build a correct multi-stream projection. |
| 132 | + |
| 133 | +## “Join” patterns that work well |
| 134 | + |
| 135 | +### Pattern A: Emit correlation IDs in every event |
| 136 | + |
| 137 | +If multiple aggregates contribute to the same read model, ensure each event includes the grouping key. |
| 138 | + |
| 139 | +Pros: |
| 140 | + |
| 141 | +- simplest `extractKey` |
| 142 | +- no extra read model/index |
| 143 | + |
| 144 | +Cons: |
| 145 | + |
| 146 | +- requires careful event design discipline |
| 147 | + |
| 148 | +### Pattern B: Maintain an index read model |
| 149 | + |
| 150 | +Create a small read model dedicated to joins. |
| 151 | + |
| 152 | +Example: |
| 153 | + |
| 154 | +- `OrderToCustomerIndex` keyed by `orderId` containing `customerId` |
| 155 | + |
| 156 | +Then other projections can: |
| 157 | + |
| 158 | +- resolve `customerId` from `orderId` deterministically |
| 159 | + |
| 160 | +Pros: |
| 161 | + |
| 162 | +- handles events that only carry local IDs |
| 163 | + |
| 164 | +Cons: |
| 165 | + |
| 166 | +- increases projection surface area |
| 167 | + |
| 168 | +### Pattern C: Use a “root stream” key |
| 169 | + |
| 170 | +Pick one aggregate as the “root” identity of the read model. |
| 171 | + |
| 172 | +Example: |
| 173 | + |
| 174 | +- Read model “CustomerSummary” is keyed by `customerId` |
| 175 | +- Customer aggregate is the root |
| 176 | +- Order events must carry `customerId` to join |
| 177 | + |
| 178 | +This is essentially Pattern A with an explicit “owner”. |
| 179 | + |
| 180 | +## Type routing vs persistence shape (important for generated projections) |
| 181 | + |
| 182 | +Continuum stores event payloads in `StoredEvent.data` as a serialized map. |
| 183 | + |
| 184 | +Generated projection handlers dispatch on the typed domain event (when available). |
| 185 | +Practically: |
| 186 | + |
| 187 | +- Inline paths can provide a typed `domainEvent` |
| 188 | +- Persisted events loaded from storage may only have serialized `data` unless your store/executor provides a way to deserialize |
| 189 | + |
| 190 | +Developer takeaway: |
| 191 | + |
| 192 | +- If you rely on typed dispatch in projections, ensure the execution path provides domain events (or a deserialization step) consistently. |
| 193 | + |
| 194 | +## Example (conceptual): multi-stream projection keyed by CustomerId |
| 195 | + |
| 196 | +Below is a conceptual sketch (names are illustrative). |
| 197 | + |
| 198 | +- Read model key: `CustomerId` |
| 199 | +- Streams: |
| 200 | + - `customer-<customerId>` emits customer events |
| 201 | + - `order-<orderId>` emits order events |
| 202 | + |
| 203 | +Events: |
| 204 | + |
| 205 | +- `CustomerRegistered(customerId, email)` |
| 206 | +- `OrderPlaced(orderId, customerId, total)` |
| 207 | +- `OrderShipped(orderId)` |
| 208 | + |
| 209 | +Key strategy: |
| 210 | + |
| 211 | +- `CustomerRegistered` → key is `customerId` (present) |
| 212 | +- `OrderPlaced` → key is `customerId` (present) |
| 213 | +- `OrderShipped` → requires `orderId -> customerId` mapping |
| 214 | + |
| 215 | +The mapping can be stored in the read model or in an auxiliary index. |
| 216 | + |
| 217 | +## Checklist for adding a MultiStreamProjection |
| 218 | + |
| 219 | +- Decide the read model identity (the key) first. |
| 220 | +- Verify that **every event type** you plan to consume can be mapped to that key: |
| 221 | + - directly (event includes the key), or |
| 222 | + - indirectly via deterministic mapping/index. |
| 223 | +- Avoid “key = first event’s stream ID” unless the stream is genuinely the read model identity. |
| 224 | +- Keep the projection pure: no aggregate loads, no commands, no external IO. |
| 225 | + |
| 226 | +## Common pitfalls |
| 227 | + |
| 228 | +- Using `event.streamId` as the key for multi-stream projections and accidentally creating one read model per aggregate stream. |
| 229 | +- Depending on arrival order of events to decide identity. |
| 230 | +- Needing the key but not encoding it anywhere (no payload field, no deterministic mapping). |
| 231 | +- Treating serialized payload (`StoredEvent.data`) as a typed event object. |
0 commit comments