Skip to content

Commit 525bf4e

Browse files
vmehtacodeclaude
andcommitted
docs(01): create phase plan for GNN Verifier Foundation
Phase 01: gnn-verifier-foundation - 3 plans in 3 waves (sequential) - Plan 01: Setup PyG + Graph Builder - Plan 02: Temporal Encoder + GAT Verifier Model - Plan 03: Unit Tests + Integration Verification - Ready for execution Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 815195e commit 525bf4e

4 files changed

Lines changed: 1402 additions & 1 deletion

File tree

.planning/ROADMAP.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,13 @@
1212

1313
**Requirements:** GNN-01, GNN-02
1414

15+
**Plans:** 3 plans
16+
17+
Plans:
18+
- [ ] 01-01-PLAN.md — Setup PyG + Graph Builder (SSEN metadata to PyG Data)
19+
- [ ] 01-02-PLAN.md — Temporal Encoder + GAT Verifier Model
20+
- [ ] 01-03-PLAN.md — Unit Tests + Integration Verification
21+
1522
**Success Criteria:**
1623
1. Graph construction pipeline transforms SSEN metadata into PyTorch Geometric Data batches with nodes (households, feeders, substations) and edges (physical topology)
1724
2. GNN model (GAT/GraphSAGE + temporal layer) processes graph-structured input and outputs per-node anomaly scores
@@ -75,7 +82,7 @@
7582

7683
| Phase | Name | Status | Requirements | Coverage |
7784
|-------|------|--------|--------------|----------|
78-
| 1 | GNN Verifier Foundation | Not started | GNN-01, GNN-02 | 2/10 |
85+
| 1 | GNN Verifier Foundation | Planned | GNN-01, GNN-02 | 2/10 |
7986
| 2 | Hybrid Verifier Integration | Not started | GNN-03, ENS-01, ENS-02 | 3/10 |
8087
| 3 | Graph-Aware Proposer | Not started | SELF-01, SELF-02 | 2/10 |
8188
| 4 | Evaluation Framework | Not started | EVAL-01, EVAL-02, EVAL-03 | 3/10 |
Lines changed: 265 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,265 @@
1+
---
2+
phase: 01-gnn-verifier-foundation
3+
plan: 01
4+
type: execute
5+
wave: 1
6+
depends_on: []
7+
files_modified:
8+
- pyproject.toml
9+
- src/fyp/gnn/__init__.py
10+
- src/fyp/gnn/graph_builder.py
11+
autonomous: true
12+
13+
must_haves:
14+
truths:
15+
- "PyTorch Geometric is installed and importable"
16+
- "SSEN feeder metadata can be transformed into PyG Data objects"
17+
- "Graph has three node types: substations, feeders, households"
18+
- "Edges are bidirectional representing physical connectivity"
19+
artifacts:
20+
- path: "src/fyp/gnn/__init__.py"
21+
provides: "GNN module initialization"
22+
exports: ["GridGraphBuilder"]
23+
- path: "src/fyp/gnn/graph_builder.py"
24+
provides: "SSEN metadata to PyG Data transformation"
25+
min_lines: 100
26+
contains: "class GridGraphBuilder"
27+
key_links:
28+
- from: "src/fyp/gnn/graph_builder.py"
29+
to: "torch_geometric.data.Data"
30+
via: "import and instantiation"
31+
pattern: "from torch_geometric.data import Data"
32+
- from: "src/fyp/gnn/graph_builder.py"
33+
to: "src/fyp/ingestion/ssen_ingestor.py"
34+
via: "compatible data format"
35+
pattern: "feeder_id|secondary_substation|primary_substation"
36+
---
37+
38+
<objective>
39+
Set up PyTorch Geometric dependency and implement the graph construction pipeline that transforms SSEN feeder/substation metadata into PyG Data objects.
40+
41+
Purpose: Establishes the foundation for GNN-based anomaly detection by representing the UK distribution network as a graph structure that captures physical topology.
42+
43+
Output: Working graph builder that produces properly formatted PyG Data with three-level node hierarchy.
44+
</objective>
45+
46+
<execution_context>
47+
@/Users/vatsalmehta/.claude/get-shit-done/workflows/execute-plan.md
48+
@/Users/vatsalmehta/.claude/get-shit-done/templates/summary.md
49+
</execution_context>
50+
51+
<context>
52+
@.planning/PROJECT.md
53+
@.planning/ROADMAP.md
54+
@.planning/STATE.md
55+
@.planning/phases/01-gnn-verifier-foundation/01-CONTEXT.md
56+
@.planning/phases/01-gnn-verifier-foundation/01-RESEARCH.md
57+
58+
# Existing code to understand SSEN data format
59+
@src/fyp/ingestion/ssen_ingestor.py
60+
</context>
61+
62+
<tasks>
63+
64+
<task type="auto">
65+
<name>Task 1: Add PyTorch Geometric dependency</name>
66+
<files>pyproject.toml</files>
67+
<action>
68+
Add torch-geometric ^2.7.0 to the project dependencies in pyproject.toml under [tool.poetry.dependencies].
69+
70+
Place it in the "Machine learning and statistics" section after torch.
71+
72+
Run `poetry add torch-geometric` to install and update the lock file.
73+
74+
Do NOT install torch-scatter or torch-sparse yet - these are optional performance optimizations.
75+
</action>
76+
<verify>
77+
```bash
78+
poetry run python -c "import torch_geometric; print(f'PyG version: {torch_geometric.__version__}')"
79+
```
80+
Should print PyG version 2.7.x without errors.
81+
</verify>
82+
<done>torch-geometric is in pyproject.toml and importable in the project environment</done>
83+
</task>
84+
85+
<task type="auto">
86+
<name>Task 2: Create GNN module structure</name>
87+
<files>src/fyp/gnn/__init__.py</files>
88+
<action>
89+
Create the new `src/fyp/gnn/` directory and its `__init__.py`.
90+
91+
The __init__.py should:
92+
1. Import and export GridGraphBuilder from graph_builder (will be created next)
93+
2. Define __all__ with public exports
94+
3. Include module docstring explaining GNN verifier purpose
95+
96+
Module docstring should mention:
97+
- Graph-based anomaly detection for UK distribution networks
98+
- Uses SSEN topology (substations -> feeders -> households)
99+
- PyTorch Geometric-based implementation
100+
</action>
101+
<verify>
102+
```bash
103+
ls -la src/fyp/gnn/
104+
```
105+
Directory exists with __init__.py file.
106+
</verify>
107+
<done>src/fyp/gnn/ module exists with proper __init__.py</done>
108+
</task>
109+
110+
<task type="auto">
111+
<name>Task 3: Implement GridGraphBuilder</name>
112+
<files>src/fyp/gnn/graph_builder.py</files>
113+
<action>
114+
Implement GridGraphBuilder class that transforms SSEN metadata into PyG Data objects.
115+
116+
**Class structure:**
117+
```python
118+
class GridGraphBuilder:
119+
"""Build PyG graphs from SSEN distribution network topology."""
120+
121+
def __init__(self, exclude_incomplete: bool = True):
122+
"""Initialize builder.
123+
124+
Args:
125+
exclude_incomplete: If True, skip nodes with missing metadata (recommended)
126+
"""
127+
128+
def build_from_metadata(
129+
self,
130+
metadata_df: pd.DataFrame,
131+
node_features: dict[str, torch.Tensor] | None = None,
132+
) -> Data:
133+
"""Build graph from SSEN metadata DataFrame.
134+
135+
Expects columns: lv_feeder_id, secondary_substation_id, primary_substation_id
136+
Optional: total_mpan_count, postcode, etc.
137+
138+
Returns:
139+
PyG Data with:
140+
- x: node features [num_nodes, num_features]
141+
- edge_index: COO format [2, num_edges]
142+
- node_type: 0=substation, 1=feeder, 2=household
143+
- node_ids: original IDs for each node
144+
"""
145+
146+
def build_from_parquet(self, parquet_path: Path | str) -> Data:
147+
"""Convenience method to load from SSEN metadata parquet."""
148+
```
149+
150+
**Implementation details:**
151+
152+
1. **Node extraction:**
153+
- Extract unique primary_substation_id -> type 0
154+
- Extract unique secondary_substation_id -> type 1 (these are "feeders" in the hierarchy)
155+
- Extract unique lv_feeder_id -> type 2 (these connect to households)
156+
- Create node_to_idx mapping
157+
158+
2. **Edge construction (COO format):**
159+
- primary_substation <-> secondary_substation (bidirectional)
160+
- secondary_substation <-> lv_feeder (bidirectional)
161+
- Use torch.tensor(...).t().contiguous() pattern from research
162+
163+
3. **Node features (if not provided):**
164+
- Default: one-hot node type encoding [3 dims]
165+
- Add log(total_mpan_count + 1) if available [1 dim]
166+
- Total default: 4-dimensional features
167+
168+
4. **Data object:**
169+
- Always set num_nodes explicitly (handles isolated nodes)
170+
- Store node_ids as string list for reverse lookup
171+
- Store node_type as torch.long tensor
172+
173+
5. **Error handling:**
174+
- Log warning for missing columns
175+
- Skip rows with NaN in required ID columns
176+
- Track and log number of nodes/edges created
177+
178+
**Key patterns to follow (from research):**
179+
```python
180+
# COO format construction
181+
edge_index = torch.tensor(edges, dtype=torch.long).t().contiguous()
182+
183+
# Always explicit num_nodes
184+
data = Data(x=x, edge_index=edge_index, num_nodes=x.size(0))
185+
```
186+
187+
**Anti-patterns to avoid:**
188+
- Do NOT use adjacency matrix format
189+
- Do NOT forget .contiguous() after transpose
190+
- Do NOT let PyG infer num_nodes from edge_index (breaks isolated nodes)
191+
</action>
192+
<verify>
193+
```bash
194+
poetry run python -c "
195+
from fyp.gnn import GridGraphBuilder
196+
import pandas as pd
197+
198+
# Test with minimal mock data
199+
df = pd.DataFrame({
200+
'primary_substation_id': ['PS1', 'PS1', 'PS1'],
201+
'secondary_substation_id': ['SS1', 'SS1', 'SS2'],
202+
'lv_feeder_id': ['LV1', 'LV2', 'LV3'],
203+
'total_mpan_count': [50, 30, 20],
204+
})
205+
206+
builder = GridGraphBuilder()
207+
data = builder.build_from_metadata(df)
208+
print(f'Nodes: {data.num_nodes}')
209+
print(f'Edges: {data.edge_index.size(1)}')
210+
print(f'Node types: {data.node_type.unique().tolist()}')
211+
print(f'Features shape: {data.x.shape}')
212+
assert data.num_nodes == 6 # 1 PS + 2 SS + 3 LV
213+
assert 0 in data.node_type.tolist() # has substations
214+
assert 2 in data.node_type.tolist() # has feeders
215+
print('SUCCESS')
216+
"
217+
</verify>
218+
<done>GridGraphBuilder correctly transforms SSEN metadata into PyG Data with proper node hierarchy and bidirectional edges</done>
219+
</task>
220+
221+
</tasks>
222+
223+
<verification>
224+
After all tasks complete:
225+
226+
1. **Dependency verification:**
227+
```bash
228+
poetry run python -c "import torch_geometric; from torch_geometric.data import Data; print('OK')"
229+
```
230+
231+
2. **Module import verification:**
232+
```bash
233+
poetry run python -c "from fyp.gnn import GridGraphBuilder; print('OK')"
234+
```
235+
236+
3. **Graph structure verification with real SSEN data (if available):**
237+
```bash
238+
poetry run python -c "
239+
from fyp.gnn import GridGraphBuilder
240+
from pathlib import Path
241+
242+
parquet_path = Path('data/processed/ssen_metadata.parquet')
243+
if parquet_path.exists():
244+
builder = GridGraphBuilder()
245+
data = builder.build_from_parquet(parquet_path)
246+
print(f'Real SSEN graph: {data.num_nodes} nodes, {data.edge_index.size(1)} edges')
247+
else:
248+
print('SSEN metadata not available, skipping real data test')
249+
"
250+
```
251+
</verification>
252+
253+
<success_criteria>
254+
- [ ] torch-geometric ^2.7.0 in pyproject.toml and importable
255+
- [ ] src/fyp/gnn/ module exists with proper structure
256+
- [ ] GridGraphBuilder transforms SSEN metadata to PyG Data
257+
- [ ] Graph has correct three-level hierarchy (substation -> feeder -> household)
258+
- [ ] Edges are bidirectional (edge count = 2 * physical connections)
259+
- [ ] Node features include type encoding and optional metadata
260+
- [ ] num_nodes explicitly set (handles isolated nodes)
261+
</success_criteria>
262+
263+
<output>
264+
After completion, create `.planning/phases/01-gnn-verifier-foundation/01-01-SUMMARY.md`
265+
</output>

0 commit comments

Comments
 (0)