|
| 1 | +# Gather Step Plan: Projection Impact |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +Add a projection/data-model impact capability to Gather Step so planning packs can trace derived data fields across storage, event consumers, API mappers, filters, indexes, migrations, and frontend readers before implementation starts. |
| 6 | + |
| 7 | +This plan is anchored by the REG-14003 failure mode: a UI column displayed `tasksCount`, but the real defect was a backend denormalized projection that counted `workflow.taskIds` only, ignored subtasks, needed a new `workflow.subtaskIds` field, needed event-handler updates, required a source-of-truth backfill, and created DB/search-index rollout risk. |
| 8 | + |
| 9 | +**ELI5:** A normal code graph can show "this UI reads this API." A projection graph must also show "this API field is a stored shortcut, here is what keeps it fresh, here is what existing data needs, and here are the indexes/search mappings that can break if we change it." |
| 10 | + |
| 11 | +## Goals |
| 12 | + |
| 13 | +- Identify field-level projection chains such as `workflow.taskIds -> tasksCount -> datagrid column`. |
| 14 | +- Surface write/update paths for projected fields: event handlers, use cases, migrations, and update operators. |
| 15 | +- Surface read/query paths: mappers, DTO/API responses, filters, search/index mappings, and frontend consumers. |
| 16 | +- Add a task-shaped `projection_impact` capability usable from CLI/MCP and pack orchestration. |
| 17 | +- Add oracle coverage for the REG-14003 class of mistakes. |
| 18 | + |
| 19 | +## Non-Goals |
| 20 | + |
| 21 | +- Full ERD generation. |
| 22 | +- Runtime database inspection by default. |
| 23 | +- General-purpose schema migration generation. |
| 24 | +- Guessing business semantics without code, ticket, or memory evidence. |
| 25 | +- Supporting every ORM in the first release. |
| 26 | + |
| 27 | +## Phase 1: Oracle And Fixture First |
| 28 | + |
| 29 | +Create a small synthetic workspace fixture that models the REG-14003 pattern without depending on private RegASK code. |
| 30 | + |
| 31 | +**ELI5:** Before teaching Gather Step the new trick, write the exam. The fixture should fail until the graph can see the important projection surfaces. |
| 32 | + |
| 33 | +Steps: |
| 34 | + |
| 35 | +1. Add fixture repos under `tests/fixtures/workspace/` for: |
| 36 | + - `frontend_projection`: reads `row.tasksCount`. |
| 37 | + - `alert_projection`: owns alert entity/workflow, mapper, event handlers, filters, migration, and Atlas/search mapping. |
| 38 | + - `task_projection`: emits generic task lifecycle events with parent/subtask discriminator. |
| 39 | + - `shared_contracts_projection`: owns event body/type definitions. |
| 40 | + → verify: fixture files parse under the existing indexer without new extraction logic. |
| 41 | + |
| 42 | +2. Add an oracle scenario under `tests/fixtures/oracle/projection_impact_reg_14003_like/`. |
| 43 | + Required evidence should include: |
| 44 | + - alert workflow model/entity file. |
| 45 | + - mapper that derives `tasksCount`. |
| 46 | + - parent-only `taskIds` field. |
| 47 | + - missing or new `subtaskIds` field depending on fixture stage. |
| 48 | + - task-created/task-deleted consumer handlers. |
| 49 | + - task event contract with discriminator. |
| 50 | + - backfill migration. |
| 51 | + - datagrid/filter code using the projected field. |
| 52 | + - Atlas/search-index mapping or DB index file. |
| 53 | + → verify: oracle fails before implementation with clear missing expected files/edge kinds. |
| 54 | + |
| 55 | +3. Extend oracle assertions only as needed for projection-specific expectations. |
| 56 | + Candidate additions: |
| 57 | + - `expected_projection_fields` |
| 58 | + - `expected_projection_writers` |
| 59 | + - `expected_projection_readers` |
| 60 | + - `expected_projection_rollout_risks` |
| 61 | + → verify: existing oracle scenarios remain unchanged or require only additive defaults. |
| 62 | + |
| 63 | +## Phase 2: Core Graph Vocabulary |
| 64 | + |
| 65 | +Add enough graph vocabulary to represent projection chains without over-modeling databases. |
| 66 | + |
| 67 | +**ELI5:** Add labels for the things Gather Step needs to connect: field, writer, reader, source of truth, derived output, index, and migration. |
| 68 | + |
| 69 | +Candidate node kinds: |
| 70 | + |
| 71 | +- `DataField`: stable field path such as `Alert.workflow.taskIds` or `workflow.subtaskIds`. |
| 72 | +- `Projection`: derived output such as `tasksCount`. |
| 73 | +- `DataIndex`: DB/search index or Atlas mapping surface. |
| 74 | +- `Migration`: one-shot data repair or schema/data migration. |
| 75 | + |
| 76 | +Candidate edge kinds: |
| 77 | + |
| 78 | +- `ReadsField` |
| 79 | +- `WritesField` |
| 80 | +- `DerivesFieldFrom` |
| 81 | +- `ProjectsAs` |
| 82 | +- `FiltersOnField` |
| 83 | +- `IndexesField` |
| 84 | +- `BackfillsField` |
| 85 | +- `UsesSourceOfTruth` |
| 86 | + |
| 87 | +Implementation notes: |
| 88 | + |
| 89 | +- Keep external IDs deterministic and field-path based. |
| 90 | +- Reuse existing `File`/symbol nodes instead of creating redundant code-location nodes. |
| 91 | +- Treat field-path evidence as best-effort; graph edges should carry metadata for confidence/source pattern. |
| 92 | + |
| 93 | +→ verify: graph serialization, storage, search, schema summary, and existing tests pass with additive variants. |
| 94 | + |
| 95 | +## Phase 3: TypeScript/Nest/Mongo Extraction MVP |
| 96 | + |
| 97 | +Add parser/semantic augmentation for common projection patterns in TypeScript services. |
| 98 | + |
| 99 | +**ELI5:** Scan code for the shapes engineers already write: class fields, `alert.workflow?.taskIds`, Mongo update operators, filters, migrations, and search mappings. |
| 100 | + |
| 101 | +Extraction targets: |
| 102 | + |
| 103 | +- Class/interface/entity fields: |
| 104 | + - `workflow.taskIds` |
| 105 | + - `workflow.subtaskIds` |
| 106 | + - `tasksCount` |
| 107 | +- Property reads: |
| 108 | + - optional chaining and member access. |
| 109 | + - string paths like `"workflow.subtaskIds"`. |
| 110 | +- Derived assignments: |
| 111 | + - `tasksCount = taskIds.length + subtaskIds.length` |
| 112 | + - mapper object literals returning API rows. |
| 113 | +- Mongo-style writes: |
| 114 | + - `$set`, `$addToSet`, `$pull`, `$push`, `$unset` |
| 115 | + - nested/dotted paths. |
| 116 | +- Filter/query use: |
| 117 | + - `$or`, `$exists`, `$size`, `Contains`, search-filter builders. |
| 118 | +- Migration/backfill markers: |
| 119 | + - files under migration directories. |
| 120 | + - aggregation pipelines. |
| 121 | + - collection names used as source of truth. |
| 122 | +- Atlas/search-index mappings: |
| 123 | + - `dynamic: false` |
| 124 | + - explicit field mapping blocks. |
| 125 | + |
| 126 | +Initial heuristics: |
| 127 | + |
| 128 | +- Prefer strong evidence over broad matching. |
| 129 | +- Only emit projection edges for nested field paths or fields that cross API boundaries. |
| 130 | +- Record unresolved/weak evidence as warnings, not primary facts. |
| 131 | + |
| 132 | +→ verify: parser unit tests cover each extraction pattern with minimal source snippets. |
| 133 | + |
| 134 | +## Phase 4: Projection Impact Analysis |
| 135 | + |
| 136 | +Build an analysis pass that starts from a target field/symbol/file and assembles the projection chain. |
| 137 | + |
| 138 | +**ELI5:** Given `tasksCount`, walk backward to what data creates it and forward to who reads or queries it. |
| 139 | + |
| 140 | +Traversal rules: |
| 141 | + |
| 142 | +- From API/output field, find `ProjectsAs` and `DerivesFieldFrom`. |
| 143 | +- From source fields, find writers, event consumers, filters, indexes, and migrations. |
| 144 | +- From event consumers, join through existing event topology to producers and payload contracts. |
| 145 | +- From queried fields, check for nearby `DataIndex` / search mappings. |
| 146 | +- From new/changed fields, check for `Migration` / `BackfillsField`. |
| 147 | + |
| 148 | +Risk hints: |
| 149 | + |
| 150 | +- `missing_backfill`: projected field added or semantics changed without a migration/backfill. |
| 151 | +- `missing_index`: field used in high-volume query/filter without known index. |
| 152 | +- `missing_search_mapping`: field used in Atlas/search mapping with `dynamic: false` and no mapping. |
| 153 | +- `event_discriminator_gap`: consumer handles a generic event but filters out known discriminator values. |
| 154 | +- `future_events_only`: event handlers repair future state but existing records remain stale. |
| 155 | +- `projection_semantics_gap`: derived name suggests aggregate/total but inputs cover a narrower source. |
| 156 | + |
| 157 | +→ verify: analysis unit tests produce deterministic chains and risk hints for the fixture graph. |
| 158 | + |
| 159 | +## Phase 5: CLI And MCP Surface |
| 160 | + |
| 161 | +Expose projection impact as a first-class workflow without disrupting existing pack modes. |
| 162 | + |
| 163 | +**ELI5:** Make it easy for an AI client to ask the right question: "what does changing this derived field affect?" |
| 164 | + |
| 165 | +Options: |
| 166 | + |
| 167 | +- Add `projection_impact` as a new pack mode. |
| 168 | +- Add a dedicated CLI command such as `gather-step projection impact <target>`. |
| 169 | +- Add an MCP tool such as `projection_impact` if response shape differs materially from existing packs. |
| 170 | + |
| 171 | +Preferred first cut: |
| 172 | + |
| 173 | +- Add dedicated CLI/MCP tool returning a projection-specific response. |
| 174 | +- Feed a short projection summary into `planning_pack` and `change_impact_pack` when relevant. |
| 175 | + |
| 176 | +Response should include: |
| 177 | + |
| 178 | +- `target` |
| 179 | +- `projection_fields` |
| 180 | +- `source_fields` |
| 181 | +- `writers` |
| 182 | +- `readers` |
| 183 | +- `filters` |
| 184 | +- `indexes` |
| 185 | +- `migrations` |
| 186 | +- `event_links` |
| 187 | +- `risks` |
| 188 | +- `next_steps` |
| 189 | +- `confidence` |
| 190 | + |
| 191 | +→ verify: MCP schema snapshot or response-shape tests cover the new response without bloating existing pack responses. |
| 192 | + |
| 193 | +## Phase 6: REG-14003-Like Planning Guardrail |
| 194 | + |
| 195 | +Add a planning guardrail that detects when a user-facing field is likely a backend projection and nudges agents toward projection impact before planning implementation. |
| 196 | + |
| 197 | +**ELI5:** If the user says "datagrid count is wrong," Gather Step should say "this looks like a derived backend value; inspect the projection chain first." |
| 198 | + |
| 199 | +Guardrail triggers: |
| 200 | + |
| 201 | +- Field names ending in `Count`, `Total`, `Status`, `has*`, or other derived-sounding names. |
| 202 | +- Frontend reads a field with no local computation. |
| 203 | +- Backend mapper returns same field. |
| 204 | +- Field also appears in filters/search/index paths. |
| 205 | +- Field is maintained by event handlers or update operators. |
| 206 | + |
| 207 | +→ verify: planning pack for the oracle target includes projection impact next steps and does not over-focus on the frontend reader. |
| 208 | + |
| 209 | +## Phase 7: Documentation And Operator Workflow |
| 210 | + |
| 211 | +Document when to use projection impact and how to interpret risks. |
| 212 | + |
| 213 | +**ELI5:** Give agents and humans a checklist for derived-data bugs: source, projection, existing data, indexes, rollout. |
| 214 | + |
| 215 | +Docs to update: |
| 216 | + |
| 217 | +- README feature list. |
| 218 | +- MCP tool reference. |
| 219 | +- CLI reference. |
| 220 | +- Context-pack/operator workflow docs. |
| 221 | +- Memory-backed planning docs, with Braingent as prior-learning source and Gather Step as current-code graph source. |
| 222 | + |
| 223 | +→ verify: docs mention scope boundaries and do not imply runtime DB inspection or automatic migration generation. |
| 224 | + |
| 225 | +## Rollout Strategy |
| 226 | + |
| 227 | +1. Ship hidden/experimental CLI command behind docs-only usage. |
| 228 | + → verify: oracle passes; no existing pack behavior changes. |
| 229 | + |
| 230 | +2. Enable MCP tool after response shape stabilizes. |
| 231 | + → verify: schema tests and budget tests pass. |
| 232 | + |
| 233 | +3. Feed concise projection hints into `planning_pack`. |
| 234 | + → verify: existing planning oracles stay stable; REG-14003-like oracle requires the projection hint. |
| 235 | + |
| 236 | +4. Add more framework packs only after real misses. |
| 237 | + → verify: new framework support adds fixture/oracle coverage before extractor code. |
| 238 | + |
| 239 | +## Test Plan |
| 240 | + |
| 241 | +- Parser unit tests for field-path extraction and Mongo/search patterns. |
| 242 | +- Analysis unit tests for projection-chain assembly and risk classification. |
| 243 | +- Oracle scenario for REG-14003-like planning. |
| 244 | +- MCP response-shape tests for projection impact. |
| 245 | +- CLI integration test for `projection impact`. |
| 246 | +- Regression pass for existing pack oracles. |
| 247 | + |
| 248 | +Commands: |
| 249 | + |
| 250 | +```bash |
| 251 | +cargo test -p gather-step-parser projection |
| 252 | +cargo test -p gather-step-analysis projection |
| 253 | +cargo test -p gather-step-mcp projection |
| 254 | +cargo test -p gather-step-cli --test pack_oracle projection_impact_reg_14003_like |
| 255 | +cargo test --workspace |
| 256 | +``` |
| 257 | + |
| 258 | +## Complexity Estimate |
| 259 | + |
| 260 | +- MVP oracle + graph vocabulary + limited extractor: 1-2 weeks. |
| 261 | +- REG-14003-grade Mongo/Nest/Atlas support: 2-4 weeks. |
| 262 | +- General framework coverage across ORMs/databases: 1-2 months. |
| 263 | + |
| 264 | +Main complexity is not adding tool plumbing. Main complexity is extracting projection semantics reliably from real service code without noisy false positives. |
| 265 | + |
| 266 | +## Open Questions |
| 267 | + |
| 268 | +- Should `projection_impact` be a pack mode, a dedicated tool, or both? |
| 269 | +- Should field-path nodes be global by path or scoped to entity/model? |
| 270 | +- How should the tool distinguish DB indexes from search indexes in response shape? |
| 271 | +- How much should Braingent learnings influence risk labels versus only planning prose? |
| 272 | +- Should the first release support runtime schema snapshots, or keep strictly static-code only? |
| 273 | + |
| 274 | +## Done Criteria |
| 275 | + |
| 276 | +- REG-14003-like oracle fails without projection support and passes after implementation. |
| 277 | +- Planning output identifies backend projection ownership before frontend change suggestions. |
| 278 | +- Projection impact output surfaces source fields, writers, readers, filters, migrations, and index/search risks. |
| 279 | +- Existing route/event/shared-contract packs do not regress. |
| 280 | +- Documentation clearly states scope and limitations. |
0 commit comments