Skip to content

Commit 0ae8c80

Browse files
docs: update design docs to reflect actual implementation
- Add prune() method to both spec and design docs - Rename _propagate_to_children → _propagate_restrictions + _apply_propagation_rule - Fix delete() part_integrity: post-check with rollback, not pre-check - Add _part_integrity instance attribute - Update files affected, verification, and implementation phases - Mark open questions as resolved with actual decisions - Mark export/restore as future work Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 0bede1d commit 0ae8c80

File tree

2 files changed

+116
-88
lines changed

2 files changed

+116
-88
lines changed

docs/design/restricted-diagram-spec.md

Lines changed: 78 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ self._connection # Connection — stored during __init__
3434
self._cascade_restrictions # dict[str, list] — per-node OR restrictions
3535
self._restrict_conditions # dict[str, AndList] — per-node AND restrictions
3636
self._restriction_attrs # dict[str, set] — restriction attribute names per node
37+
self._part_integrity # str — "enforce", "ignore", or "cascade"
3738
```
3839

3940
Initialized empty in `__init__`. Copied in the copy constructor (`Diagram(other_diagram)`).
@@ -105,29 +106,34 @@ rd = dj.Diagram(schema).restrict(Session & cond).restrict(Stimulus & cond2)
105106
1. Verify no existing cascade restrictions (raise if present)
106107
2. Same algorithm as `cascade` but accumulates into `_restrict_conditions` using `AndList`
107108

108-
### `_propagate_to_children(self, parent_node, mode)`
109+
### `_propagate_restrictions(self, start_node, mode, part_integrity="enforce")`
109110

110-
Internal. Propagates restriction from one node to its children.
111+
Internal. Propagates restrictions from `start_node` to all its descendants in topological order. Only processes descendants of `start_node` to avoid duplicate propagation when chaining `restrict()`.
111112

112-
For each `out_edge(parent_node)`:
113+
Uses multiple passes (up to 10) to handle `part_integrity="cascade"` upward propagation, which can add new restricted nodes that need further propagation.
113114

114-
1. Get `child_name, edge_props` from edge
115-
2. If child is an alias node (`.isdigit()`), follow through to the real child
116-
3. Get `attr_map`, `aliased` from `edge_props`
117-
4. Build parent `FreeTable` with current restriction
118-
5. Compute child restriction using propagation rules:
115+
For each restricted node, iterates over `out_edges(node)`:
116+
117+
1. If target is an alias node (`.isdigit()`), follow through to real child via `out_edges(alias_node)`
118+
2. Delegate to `_apply_propagation_rule()` for the actual restriction computation
119+
3. Track propagated edges to avoid duplicate work
120+
4. Handle `part_integrity="cascade"`: if child is a part table and its master is not already restricted, propagate upward from part to master using `make_condition(master, (master.proj() & part.proj()).to_arrays(), ...)`, expand the allowed node set, and continue to next pass
121+
122+
### `_apply_propagation_rule(self, parent_ft, parent_attrs, child_node, attr_map, aliased, mode, restrictions)`
123+
124+
Internal. Applies one of three propagation rules to a parent→child edge:
119125

120126
| Condition | Child restriction |
121127
|-----------|-------------------|
122128
| Non-aliased AND `parent_restriction_attrs ⊆ child.primary_key` | Copy parent restriction directly |
123129
| Aliased FK (`attr_map` renames columns) | `parent_ft.proj(**{fk: pk for fk, pk in attr_map.items()})` |
124130
| Non-aliased AND `parent_restriction_attrs ⊄ child.primary_key` | `parent_ft.proj()` |
125131

126-
6. Accumulate on child:
127-
- `cascade` mode: `_cascade_restrictions[child].extend(child_restr)` — list = OR
128-
- `restrict` mode: `_restrict_conditions[child].extend(child_restr)` — AndList = AND
132+
Accumulates on child:
133+
- `cascade` mode: `restrictions.setdefault(child, []).extend(...)` — list = OR
134+
- `restrict` mode: `restrictions.setdefault(child, AndList()).extend(...)` — AndList = AND
129135

130-
7. Handle `part_integrity="cascade"`: if child is a part table and its master is not already restricted, propagate upward from part to master using `make_condition(master, (master.proj() & part.proj()).to_arrays(), ...)`, then re-propagate from the master.
136+
Updates `_restriction_attrs` for the child with the relevant attribute names.
131137

132138
### `delete(self, transaction=True, prompt=None) -> int`
133139

@@ -140,15 +146,16 @@ rd.delete()
140146

141147
**Algorithm:**
142148

143-
1. Pre-check `part_integrity="enforce"`: for each node in `_cascade_restrictions`, if it's a part table and its master is not restricted, raise `DataJointError`
144-
2. Get nodes with restrictions in topological order
145-
3. If `prompt`: show preview (table name + row count for each)
146-
4. Start transaction (if `transaction=True`)
147-
5. Iterate in **reverse** topological order (leaves first):
149+
1. Get non-alias nodes with restrictions in topological order
150+
2. If `prompt`: show preview (table name + row count for each)
151+
3. Start transaction (if `transaction=True`)
152+
4. Iterate in **reverse** topological order (leaves first):
148153
- `ft = FreeTable(conn, table_name)`
149-
- `ft._restriction = self._cascade_restrictions[table_name]`
154+
- `ft.restrict_in_place(self._cascade_restrictions[table_name])`
150155
- `ft.delete_quick(get_count=True)`
151-
6. On `IntegrityError`: diagnostic fallback — parse FK error for actionable message about unloaded schemas
156+
- Track which tables had rows deleted
157+
5. On `IntegrityError`: cancel transaction, diagnostic fallback — parse FK error for actionable message about unloaded schemas
158+
6. Post-check `part_integrity="enforce"`: if any part table had rows deleted but its master did not, cancel transaction and raise `DataJointError`
152159
7. Confirm/commit transaction (same logic as current `Table.delete`)
153160
8. Return count from the root table
154161

@@ -180,6 +187,34 @@ rd.preview() # logs and returns {table_name: count}
180187

181188
Returns dict of `{full_table_name: row_count}` for each node that has a cascade or restrict restriction.
182189

190+
### `prune(self) -> Diagram`
191+
192+
Removes tables with zero matching rows from the diagram. Returns a new `Diagram`.
193+
194+
```python
195+
# Unrestricted: remove physically empty tables
196+
dj.Diagram(schema).prune()
197+
198+
# After restrict: remove tables with zero matching rows
199+
dj.Diagram(schema).restrict(Session & cond).prune()
200+
```
201+
202+
**Algorithm:**
203+
204+
1. `result = Diagram(self)` — copy
205+
2. If restrictions exist (`_cascade_restrictions` or `_restrict_conditions`):
206+
- For each restricted node, build `FreeTable` with restriction applied
207+
- If `len(ft) == 0`: remove node from restrictions dict, `_restriction_attrs`, and `nodes_to_show`
208+
3. If no restrictions (unrestricted diagram):
209+
- For each node in `nodes_to_show`, check `len(FreeTable(conn, node))`
210+
- If 0: remove from `nodes_to_show`
211+
4. Return `result`
212+
213+
**Properties:**
214+
- Idempotent — pruning twice yields the same result
215+
- Chainable — `restrict()` can be called after `prune()`
216+
- Skips alias nodes (`.isdigit()`)
217+
183218
### Visualization methods (gated)
184219

185220
All existing visualization methods (`draw`, `make_dot`, `make_svg`, `make_png`, `make_image`, `make_mermaid`, `save`, `_repr_svg_`) raise `DataJointError("Install matplotlib and pygraphviz...")` when `diagram_active is False`. When active, they work as before.
@@ -307,23 +342,24 @@ For `_restrict_conditions`: values are `AndList` (AND). Each `.restrict()` call
307342

308343
| File | Change |
309344
|------|--------|
310-
| `src/datajoint/diagram.py` | Restructure: single `Diagram(nx.DiGraph)` class, gate only visualization. Add `_connection`, restriction dicts, `cascade()`, `restrict()`, `_propagate_to_children()`, `delete()`, `drop()`, `preview()`, `_from_table()` |
345+
| `src/datajoint/diagram.py` | Restructure: single `Diagram(nx.DiGraph)` class, gate only visualization. Add `_connection`, restriction dicts, `_part_integrity`, `cascade()`, `restrict()`, `_propagate_restrictions()`, `_apply_propagation_rule()`, `delete()`, `drop()`, `preview()`, `prune()`, `_from_table()` |
311346
| `src/datajoint/table.py` | Rewrite `Table.delete()` (~200 → ~10 lines), `Table.drop()` (~35 → ~10 lines). Remove error-driven cascade code |
312347
| `src/datajoint/user_tables.py` | `Part.drop()`: pass `part_integrity` through to `super().drop()` |
313-
| `tests/integration/test_diagram_operations.py` | **New**tests for `cascade`, `delete`, `drop`, `preview` |
348+
| `tests/integration/test_erd.py` | Add 5 `prune()` integration tests: unrestricted, after restrict, after cascade, idempotency, prune-then-restrict chaining |
314349

315350
## Verification
316351

352+
All phases complete. Tests passing:
353+
317354
1. All existing tests pass unchanged:
318-
- `pytest tests/integration/test_cascading_delete.py -v`
319-
- `pytest tests/integration/test_cascade_delete.py -v`
320-
- `pytest tests/integration/test_erd.py -v`
321-
2. New tests pass: `pytest tests/integration/test_diagram_operations.py -v`
322-
3. Manual: `(Session & 'subject_id=1').delete()` behaves identically
323-
4. Manual: `dj.Diagram(schema).cascade(Session & cond).preview()` shows correct counts
324-
5. `dj.Diagram` works without matplotlib/pygraphviz for operational methods
355+
- `pytest tests/integration/test_cascading_delete.py -v` — 12 tests
356+
- `pytest tests/integration/test_cascade_delete.py -v` — 6 tests (3 MySQL + 3 PostgreSQL)
357+
- `pytest tests/integration/test_erd.py -v` — 7 existing + 5 new prune tests
358+
2. Manual: `(Session & 'subject_id=1').delete()` behaves identically
359+
3. Manual: `dj.Diagram(schema).cascade(Session & cond).preview()` shows correct counts
360+
4. `dj.Diagram` works without matplotlib/pygraphviz for operational methods
325361

326-
## Open questions resolved
362+
## Resolved design decisions
327363

328364
| Question | Resolution |
329365
|----------|------------|
@@ -334,23 +370,23 @@ For `_restrict_conditions`: values are `AndList` (AND). Each `.restrict()` call
334370
| Upward cascade scope? | Master's restriction propagates to all its descendants (natural from re-running propagation) |
335371
| Can cascade and restrict be mixed? | No. Mutually exclusive modes. `cascade` applied once; `restrict` chainable |
336372

337-
## Implementation phases
373+
## Implementation phases (all complete)
338374

339-
### Phase 1: Restructure `Diagram` class
340-
Remove the `if/else` gate. Single class. Gate only visualization methods.
341-
Store `_connection` and restriction dicts. Adjust copy constructor.
375+
### Phase 1: Restructure `Diagram` class
376+
Single class. Gate only visualization methods.
377+
Store `_connection`, restriction dicts, `_part_integrity`. Copy constructor copies all.
342378

343-
### Phase 2: Restriction propagation
344-
`cascade()`, `restrict()`, `_propagate_to_children()`.
379+
### Phase 2: Restriction propagation
380+
`cascade()`, `restrict()`, `_propagate_restrictions()`, `_apply_propagation_rule()`.
345381
Propagation rules, alias node handling, `part_integrity="cascade"` upward propagation.
346382

347-
### Phase 3: Diagram operations
348-
`delete()`, `drop()`, `preview()`, `_from_table()`.
383+
### Phase 3: Diagram operations
384+
`delete()`, `drop()`, `preview()`, `prune()`, `_from_table()`.
349385
Diagnostic fallback for unloaded schemas. Transaction handling.
350386

351-
### Phase 4: Migrate `Table.delete()` and `Table.drop()`
352-
Rewrite to delegate to `Diagram`. Update `Part.drop()`.
353-
Remove dead cascade code from `table.py`.
387+
### Phase 4: Migrate `Table.delete()` and `Table.drop()`
388+
Rewritten to delegate to `Diagram`. Updated `Part.drop()`.
389+
Dead cascade code removed from `table.py`.
354390

355-
### Phase 5: Tests
356-
Run existing tests. Add `test_diagram_operations.py`.
391+
### Phase 5: Tests
392+
Existing tests pass. 5 prune integration tests added to `test_erd.py`.

docs/design/restricted-diagram.md

Lines changed: 38 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -276,7 +276,17 @@ rd = (dj.Diagram(schema)
276276
.restrict(Session & 'subject_id=1')
277277
.restrict(Stimulus & 'type="visual"'))
278278
rd.preview() # show selected tables and row counts
279-
rd.export(path) # includes upstream context, AND at convergence
279+
280+
# prune: remove tables with zero matching rows
281+
rd = (dj.Diagram(schema)
282+
.restrict(Subject & {'species': 'mouse'})
283+
.restrict(Session & 'session_date > "2024-01-01"')
284+
.prune())
285+
rd.preview() # only tables with matching rows
286+
rd # visualize the export subgraph
287+
288+
# unrestricted prune: remove physically empty tables
289+
dj.Diagram(schema).prune()
280290

281291
# drop: no restriction, drops entire tables
282292
rd = dj.Diagram(Session) # Session + all descendants
@@ -296,9 +306,6 @@ rd.delete()
296306
Session.drop()
297307
# equivalent to:
298308
# dj.Diagram(Session).drop()
299-
300-
# Preview and visualization
301-
rd.draw() # visualize with restricted/affected nodes highlighted
302309
```
303310

304311
## Advantages over current approach
@@ -313,64 +320,49 @@ rd.draw() # visualize with restricted/affected nodes highlighted
313320
| Reusability | Delete-only | Delete, export, backup, sharing |
314321
| Inspectability | Opaque recursive cascade | Preview affected data before executing |
315322

316-
## Implementation plan
323+
## Implementation status
317324

318-
### Phase 1: RestrictedDiagram core
325+
### Phase 1: Diagram restructure and restriction propagation ✓
319326

320-
1. Add `_cascade_restrictions` and `_restrict_conditions` to `Diagram` — per-node restriction storage
321-
2. Implement `_propagate_downstream(mode)` — walk edges in topo order, compute child restrictions via `attr_map`, accumulate as OR (cascade) or AND (restrict)
322-
3. Implement `cascade(table_expr)` — OR propagation entry point
323-
4. Implement `restrict(table_expr)` — AND propagation entry point
324-
5. Handle alias nodes during propagation (always OR for multiple FK paths from same parent)
325-
6. Handle `part_integrity` during cascade propagation (upward cascade from part to master)
327+
Single `Diagram(nx.DiGraph)` class with `_cascade_restrictions`, `_restrict_conditions`, `_restriction_attrs`, `_part_integrity`. `cascade()`, `restrict()`, `_propagate_restrictions()`, `_apply_propagation_rule()`. Alias node handling, `part_integrity="cascade"` upward propagation.
326328

327-
### Phase 2: Graph-driven delete and drop
329+
### Phase 2: Graph-driven operations ✓
328330

329-
1. Implement `Diagram.delete()` — reverse topo order, `delete_quick()` at each cascade-restricted node
330-
2. Implement `Diagram.drop()` — reverse topo order, `drop_quick()` at each node (no restrictions)
331-
3. Shared: unloaded-schema fallback error handling, `part_integrity` pre-checks
332-
4. Migrate `Table.delete()` to construct a diagram + `cascade()` internally
333-
5. Migrate `Table.drop()` to construct a diagram + `drop()` internally
334-
6. Preserve `Part.delete()` and `Part.drop()` behavior with diagram-based `part_integrity`
335-
7. Remove error-message parsing from the delete critical path (retain as diagnostic fallback)
331+
`delete()`, `drop()`, `preview()`, `prune()`, `_from_table()`. Unloaded-schema fallback error handling. `Table.delete()` and `Table.drop()` rewritten to delegate to `Diagram`. Dead cascade code removed.
336332

337-
### Phase 3: Preview and visualization
333+
### Phase 3: Tests ✓
338334

339-
1. `Diagram.preview()` — show restricted nodes with row counts
340-
2. `Diagram.draw()` — highlight restricted nodes, show restriction labels
335+
All existing tests pass. 5 prune integration tests added to `test_erd.py`.
341336

342337
### Phase 4: Export and backup (future, #864/#560)
343338

344-
1. `Diagram.export(path)` — forward topo order, fetch + write at each restrict-restricted node
345-
2. Upward pass to include referenced parent rows (referential context)
346-
3. `Diagram.restore(path)` — forward topo order, insert at each node
339+
Not yet implemented. See "Future work" above.
347340

348-
## Files affected
341+
## Files changed
349342

350343
| File | Change |
351344
|------|--------|
352-
| `src/datajoint/diagram.py` | Add `cascade()`, `restrict()`, `_propagate_downstream()`, `delete()`, `drop()`, `preview()` |
353-
| `src/datajoint/table.py` | Rewrite `Table.delete()` and `Table.drop()` to use diagram internally |
354-
| `src/datajoint/user_tables.py` | Update `Part.delete()` and `Part.drop()` to use diagram-based part_integrity |
355-
| `src/datajoint/dependencies.py` | Possibly add helper methods for edge traversal with attr_map |
356-
| `tests/integration/test_cascading_delete.py` | Update tests, add graph-driven cascade tests |
357-
| `tests/integration/test_diagram.py` | New tests for restricted diagram |
345+
| `src/datajoint/diagram.py` | Single `Diagram(nx.DiGraph)` class with `cascade()`, `restrict()`, `_propagate_restrictions()`, `_apply_propagation_rule()`, `delete()`, `drop()`, `preview()`, `prune()`, `_from_table()` |
346+
| `src/datajoint/table.py` | `Table.delete()` (~200 → ~10 lines) and `Table.drop()` (~35 → ~10 lines) rewritten to delegate to `Diagram`. Dead cascade code removed |
347+
| `src/datajoint/user_tables.py` | `Part.drop()`: pass `part_integrity` through to `super().drop()` |
348+
| `tests/integration/test_erd.py` | 5 prune integration tests added |
358349

359-
## Open questions
350+
## Resolved design decisions
360351

361-
1. **Should `cascade()`/`restrict()` return a new object or mutate in place?**
362-
Returning a new object enables chaining (`diagram.restrict(A).restrict(B)`) and keeps the original diagram reusable. Mutating in place is simpler but prevents reuse.
352+
| Question | Resolution |
353+
|----------|------------|
354+
| Return new or mutate? | Return new `Diagram` — enables chaining and keeps original reusable |
355+
| Upward propagation scope? | Master's restriction propagates to all its descendants (natural from re-running `_propagate_restrictions`) |
356+
| Transaction boundaries? | Build diagram (read-only), preview, confirm, execute all deletes in one transaction |
357+
| Lazy vs eager propagation? | Eager — propagate when `cascade()`/`restrict()` is called. Restrictions are `QueryExpression` objects, not executed until `preview()`/`delete()` |
358+
| Export upward context? | Deferred to future work (Phase 4) |
363359

364-
2. **Upward propagation scope for `part_integrity="cascade"`:**
365-
When a restriction propagates up from part to master, should the master's restriction then propagate to the master's *other* parts and descendants? The current implementation does this (lines 1098–1108 of `table.py`). The diagram approach would naturally do the same — restricting the master triggers downstream propagation to all its children.
360+
## Future work
366361

367-
3. **Transaction boundaries:**
368-
The current `Table.delete()` wraps everything in a single transaction with user confirmation. The diagram-based delete should preserve this: build the restricted diagram (read-only), show preview, get confirmation, then execute all deletes in one transaction.
362+
### Export and backup (#864/#560)
369363

370-
4. **Lazy vs eager restriction propagation:**
371-
Eager: propagate all restrictions when `restrict()` is called (computes row counts immediately).
372-
Lazy: store parent restrictions and propagate during `delete()`/`export()` (defers queries).
373-
Eager is better for preview but may issue many queries upfront. Lazy is more efficient when the user just wants to delete without preview.
364+
Not yet implemented. Planned:
374365

375-
5. **Export: upward context scope.**
376-
When exporting, non-downstream tables should be included for referential integrity. How far upstream? Options: (a) all ancestors of restricted nodes, (b) only directly referenced parents, (c) full referential closure. Full closure is safest but may pull in large amounts of unrestricted data.
366+
- `Diagram.export(path)` — forward topo order, fetch + write at each restrict-restricted node
367+
- Upward pass to include referenced parent rows (referential context)
368+
- `Diagram.restore(path)` — forward topo order, insert at each node

0 commit comments

Comments
 (0)