|
1 | 1 | # What's New in DataJoint 2.2 |
2 | 2 |
|
3 | | -DataJoint 2.2 introduces **isolated instances** and **thread-safe mode** for applications that need multiple independent database connections—web servers, multi-tenant notebooks, parallel pipelines, and testing. |
| 3 | +DataJoint 2.2 introduces **isolated instances**, **thread-safe mode**, and **graph-driven diagram operations** for applications that need multiple independent database connections, explicit cascade control, and operational use of the dependency graph. |
4 | 4 |
|
5 | 5 | > **Upgrading from 2.0 or 2.1?** No breaking changes. All existing code using `dj.config` and `dj.Schema()` continues to work. The new Instance API is purely additive. |
6 | 6 |
|
@@ -201,9 +201,72 @@ class MyTable(dj.Manual): |
201 | 201 |
|
202 | 202 | Once a Schema is created, table definitions, inserts, queries, and all other operations work identically regardless of which pattern was used to create the Schema. |
203 | 203 |
|
| 204 | +## Graph-Driven Diagram Operations |
| 205 | + |
| 206 | +DataJoint 2.2 promotes `dj.Diagram` from a visualization tool to an operational component. The same dependency graph that renders pipeline diagrams now powers cascade delete, table drop, and data subsetting. |
| 207 | + |
| 208 | +### From Visualization to Operations |
| 209 | + |
| 210 | +In prior versions, `dj.Diagram` existed solely for visualization — drawing the dependency graph as SVG or Mermaid output. The cascade logic inside `Table.delete()` traversed dependencies independently, with no way to inspect or control the cascade before it executed. |
| 211 | + |
| 212 | +In 2.2, `Table.delete()` and `Table.drop()` delegate internally to `dj.Diagram`. The user-facing behavior of `Table.delete()` is unchanged, but the diagram-level API is now available as a more powerful interface for complex scenarios. |
| 213 | + |
| 214 | +### The Preview-Then-Execute Pattern |
| 215 | + |
| 216 | +The key benefit of the diagram-level API is the ability to build a cascade explicitly, inspect it, and then decide whether to execute: |
| 217 | + |
| 218 | +```python |
| 219 | +# Build the dependency graph |
| 220 | +diag = dj.Diagram(schema) |
| 221 | + |
| 222 | +# Apply cascade restriction — nothing is deleted yet |
| 223 | +restricted = diag.cascade(Session & {'subject_id': 'M001'}) |
| 224 | + |
| 225 | +# Inspect: what tables and how many rows would be affected? |
| 226 | +counts = restricted.preview() |
| 227 | +# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45} |
| 228 | + |
| 229 | +# Execute only after reviewing the blast radius |
| 230 | +restricted.delete(prompt=False) |
| 231 | +``` |
| 232 | + |
| 233 | +This is valuable when working with unfamiliar pipelines, large datasets, or multi-schema dependencies where the cascade impact is not immediately obvious. |
| 234 | + |
| 235 | +### Two Propagation Modes |
| 236 | + |
| 237 | +The diagram supports two restriction propagation modes with different convergence semantics: |
| 238 | + |
| 239 | +**`cascade()` uses OR at convergence.** When a child table has multiple restricted ancestors, the child row is affected if *any* parent path reaches it. This is the right semantics for delete — if any reason exists to remove a row, it should be removed. `cascade()` is one-shot: it can only be called once on an unrestricted diagram. |
| 240 | + |
| 241 | +**`restrict()` uses AND at convergence.** A child row is included only if *all* restricted ancestors match. This is the right semantics for data subsetting and export — only rows satisfying every condition are selected. `restrict()` is chainable: call it multiple times to build up conditions from different tables. |
| 242 | + |
| 243 | +The two modes are mutually exclusive on the same diagram. This prevents accidental mixing of incompatible semantics. |
| 244 | + |
| 245 | +### Pruning Empty Tables |
| 246 | + |
| 247 | +After applying restrictions, some tables in the diagram may have zero matching rows. The `prune()` method removes these tables from the diagram, leaving only the subgraph with actual data: |
| 248 | + |
| 249 | +```python |
| 250 | +export = (dj.Diagram(schema) |
| 251 | + .restrict(Subject & {'species': 'mouse'}) |
| 252 | + .restrict(Session & 'session_date > "2024-01-01"') |
| 253 | + .prune()) |
| 254 | + |
| 255 | +export.preview() # only tables with matching rows |
| 256 | +export # visualize the export subgraph |
| 257 | +``` |
| 258 | + |
| 259 | +Without prior restrictions, `prune()` removes physically empty tables. This is useful for understanding which parts of a pipeline are populated. |
| 260 | + |
| 261 | +### Architecture |
| 262 | + |
| 263 | +`Table.delete()` now constructs a `Diagram` internally, calls `cascade()`, and then `delete()`. This means every table-level delete benefits from the same graph-driven logic. The diagram-level API simply exposes this machinery for direct use when more control is needed. |
| 264 | + |
204 | 265 | ## See Also |
205 | 266 |
|
206 | 267 | - [Use Isolated Instances](../how-to/use-instances.md/) — Task-oriented guide |
207 | 268 | - [Working with Instances](../tutorials/advanced/instances.ipynb/) — Step-by-step tutorial |
208 | 269 | - [Configuration Reference](../reference/configuration.md/) — Thread-safe mode settings |
209 | 270 | - [Configure Database](../how-to/configure-database.md/) — Connection setup |
| 271 | +- [Diagram Specification](../reference/specs/diagram.md/) — Full reference for diagram operations |
| 272 | +- [Delete Data](../how-to/delete-data.md/) — Task-oriented delete guide |
0 commit comments