You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DataJoint 2.1 adds **PostgreSQL as a production backend**, **enhanced diagram visualization**, and **singleton tables**.
4
+
5
+
> **Upgrading from 2.0?** No breaking changes. All existing code continues to work. New features are purely additive.
6
+
7
+
> **Citation:** Yatsenko D, Nguyen TT. *DataJoint 2.0: A Computational Substrate for Agentic Scientific Workflows.* arXiv:2602.16585. 2026. [doi:10.48550/arXiv.2602.16585](https://doi.org/10.48550/arXiv.2602.16585)
8
+
9
+
## PostgreSQL Backend
10
+
11
+
DataJoint now supports PostgreSQL 15+ as a production database backend alongside MySQL 8+. The adapter architecture generates backend-specific SQL while maintaining a consistent API — the same table definitions, queries, and pipeline logic work on both backends.
12
+
13
+
```bash
14
+
export DJ_BACKEND=postgresql
15
+
export DJ_HOST=localhost
16
+
export DJ_PORT=5432
17
+
```
18
+
19
+
Or configure programmatically:
20
+
21
+
```python
22
+
dj.config['database.backend'] ='postgresql'
23
+
```
24
+
25
+
All core types (`int32`, `float64`, `varchar`, `uuid`, `json`), codec types (`<blob>`, `<attach>`, `<object@>`), query operations, foreign keys, indexes, and auto-populate work identically across backends. Backend-specific differences are handled internally by the adapter layer.
26
+
27
+
See [Database Backends](../reference/specs/database-backends.md) for the full specification.
28
+
29
+
## Diagram Enhancements
30
+
31
+
`dj.Diagram` gains several visualization features for working with complex, multi-schema pipelines.
32
+
33
+
### Layout Direction
34
+
35
+
Control the flow direction of diagrams:
36
+
37
+
```python
38
+
# Horizontal layout
39
+
dj.config.display.diagram_direction ="LR"
40
+
41
+
# Or temporarily
42
+
with dj.config.override(display__diagram_direction="LR"):
43
+
dj.Diagram(schema).draw()
44
+
```
45
+
46
+
| Value | Description |
47
+
|-------|-------------|
48
+
|`"TB"`| Top to bottom (default) |
49
+
|`"LR"`| Left to right |
50
+
51
+
### Mermaid Output
52
+
53
+
Generate [Mermaid](https://mermaid.js.org/) syntax for embedding diagrams in Markdown, GitHub, or web documentation:
54
+
55
+
```python
56
+
print(dj.Diagram(schema).make_mermaid())
57
+
```
58
+
59
+
Save directly to `.mmd` or `.mermaid` files:
60
+
61
+
```python
62
+
dj.Diagram(schema).save("pipeline.mmd")
63
+
```
64
+
65
+
### Schema Grouping
66
+
67
+
Multi-schema diagrams automatically group tables into visual clusters by database schema. The cluster label shows the Python module name when available, following the DataJoint convention of one module per schema.
The **"expanded wins" rule** applies: if a table appears in both a collapsed and non-collapsed diagram, it stays expanded. This allows showing specific tables while collapsing the rest:
84
+
85
+
```python
86
+
# Subject is expanded, rest of analysis schema is collapsed
Copy file name to clipboardExpand all lines: src/about/whats-new-22.md
+69-15Lines changed: 69 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -213,19 +213,15 @@ In prior versions, `dj.Diagram` existed solely for visualization — drawing the
213
213
-**PostgreSQL** aborts the entire transaction on any error, requiring `SAVEPOINT` / `ROLLBACK TO SAVEPOINT` round-trips for each failed delete attempt.
214
214
-**Fragile error parsing** across MySQL versions and privilege levels, where different configurations produce different error message formats.
215
215
216
-
In 2.2, `Table.delete()` and `Table.drop()` use `dj.Diagram` internally to compute the dependency graph and walk it in reverse topological order — deleting leaves first, with no trial-and-error needed. The user-facing behavior of `Table.delete()` is unchanged. The Diagram's `cascade()` and `preview()` methods are available as a public inspection API for understanding cascade impact before executing.
216
+
In 2.2, `Table.delete()` and `Table.drop()` use `dj.Diagram` internally to compute the dependency graph and walk it in reverse topological order — deleting leaves first, with no trial-and-error needed. The user-facing behavior of `Table.delete()` is unchanged. The Diagram's `cascade()` and `counts()` methods are available as a public inspection API for understanding cascade impact before executing.
217
217
218
218
### The Preview-Then-Execute Pattern
219
219
220
-
The key benefit of the diagram-level API is the ability to build a cascade explicitly, inspect it, and then execute via `Table.delete()`:
220
+
`Diagram.cascade()` is a class method that builds a complete cascade diagram from a table expression — including all descendants across all loaded schemas — in a single call:
221
221
222
222
```python
223
-
# Build the dependency graph and inspect the cascade
# Execute via Table.delete() after reviewing the blast radius
@@ -238,33 +234,89 @@ This is valuable when working with unfamiliar pipelines, large datasets, or mult
238
234
239
235
The diagram supports two restriction propagation modes designed for fundamentally different tasks.
240
236
241
-
**`cascade()` prepares a delete.** It takes a single restricted table expression, propagates the restriction downstream through all descendants, and **trims the diagram** to the resulting subgraph — ancestors and unrelated tables are removed entirely. Convergence uses OR: a descendant row is marked for deletion if *any* ancestor path reaches it, because if any reason exists to remove a row, it should be removed. `cascade()` is one-shot and is always followed by `preview()` or `delete()`.
237
+
**`Diagram.cascade(table_expr)`** is a class method that creates a cascade diagram for delete. It takes a (possibly restricted) table expression, includes all descendants across loaded schemas, propagates the restriction downstream, and **trims the diagram** to the resulting subgraph — ancestors and unrelated tables are removed entirely. Convergence uses OR: a descendant row is marked for deletion if *any* ancestor path reaches it, because if any reason exists to remove a row, it should be removed.
242
238
243
239
When the cascade encounters a part table whose master is not yet included in the cascade, the behavior depends on the `part_integrity` setting. With `"enforce"` (the default), `delete()` raises an error if part rows would be deleted without their master — preventing orphaned master rows. With `"cascade"`, the restriction propagates *upward* from the part to its master: the restricted part rows identify which master rows are affected, those masters receive a restriction, and that restriction then propagates back downstream to all sibling parts — deleting the entire compositional unit, not just the originally matched part rows.
244
240
245
-
**`restrict()`selects a data subset.** It propagates a restriction downstream but **preserves the full diagram**, allowing `restrict()` to be called again from a different seed table. This makes it possible to build up multi-condition subsets incrementally — for example, restricting by species from one table and by date from another. Convergence uses AND: a descendant row is included only if *all* restricted ancestors match, because an export should contain only rows satisfying every condition. After chaining restrictions, use `prune()` to remove empty tables and `preview()` to inspect the result.
241
+
**`diagram.restrict(table_expr)`** is an instance method that selects a data subset. It propagates a restriction downstream but **preserves the full diagram**, allowing `restrict()` to be called again from a different seed table. This makes it possible to build up multi-condition subsets incrementally — for example, restricting by species from one table and by date from another. Convergence uses AND: a descendant row is included only if *all* restricted ancestors match, because an export should contain only rows satisfying every condition. After chaining restrictions, use `prune()` to remove empty tables and `counts()` to inspect the result.
246
242
247
-
The two modes are mutually exclusive on the same diagram — DataJoint raises an error if you attempt to mix `cascade()` and `restrict()`, or if you call `cascade()` more than once. This prevents accidental mixing of incompatible semantics: a delete diagram should never be reused for subsetting, and vice versa.
243
+
The two modes are mutually exclusive — `restrict()` raises an error if called on a Diagram produced by `cascade()`. This prevents accidental mixing of incompatible semantics: a delete diagram should never be reused for subsetting.
248
244
249
245
### Pruning Empty Tables
250
246
251
-
After applying restrictions, some tables in the diagram may have zero matching rows. The `prune()` method removes these tables from the diagram, leaving only the subgraph with actual data:
247
+
After applying restrictions with `restrict()`, some tables in the diagram may have zero matching rows. The `prune()` method removes these tables from the diagram, leaving only the subgraph with actual data:
252
248
253
249
```python
254
250
export = (dj.Diagram(schema)
255
251
.restrict(Subject & {'species': 'mouse'})
256
252
.restrict(Session &'session_date > "2024-01-01"')
257
253
.prune())
258
254
259
-
export.preview()# only tables with matching rows
255
+
export.counts() # only tables with matching rows
260
256
export # visualize the export subgraph
261
257
```
262
258
263
259
Without prior restrictions, `prune()` removes physically empty tables. This is useful for understanding which parts of a pipeline are populated.
264
260
261
+
`prune()` cannot be used on cascade Diagrams — cascade retains all descendant tables to handle concurrent inserts safely (a table empty at cascade time could have rows by the time `delete()` executes).
262
+
263
+
### Restriction Propagation Rules
264
+
265
+
When `cascade()` or `restrict()` propagates a restriction from a parent to a child, one of three rules applies depending on the foreign key relationship:
266
+
267
+
| Rule | Condition | Child restriction |
268
+
|------|-----------|-------------------|
269
+
|**Direct copy**| Non-aliased FK, restriction attributes are a subset of child's primary key | Restriction copied directly |
270
+
|**Aliased projection**| FK uses attribute renaming (e.g., `subject_id` → `animal_id`) | Parent projected with attribute mapping |
271
+
|**Full projection**| Non-aliased FK, restriction uses attributes not in child's primary key | Parent projected (all attributes) as restriction |
272
+
273
+
When a child has multiple restricted ancestors, convergence depends on the mode: `cascade()` uses OR (any path marks a row for deletion), `restrict()` uses AND (all conditions must match).
274
+
275
+
When a child references the same parent through multiple foreign keys (e.g., `source_mouse` and `target_mouse` both referencing `Mouse`), these paths always combine with OR regardless of the mode — each FK path is an independent reason for the child row to be affected.
276
+
277
+
### Safe Delete Workflow
278
+
279
+
With `safemode=True` (the default), `delete()` provides a built-in preview-and-confirm workflow:
280
+
281
+
1. Builds the cascade diagram and computes all affected tables
282
+
2. Executes the deletes inside a transaction
283
+
3. Logs every affected table and its row count
284
+
4. Asks **"Commit deletes?"** — declining **rolls back** all changes
285
+
286
+
This is safer than a pre-transaction preview because it reflects the actual database state at delete time, including triggers and concurrent changes.
287
+
288
+
For programmatic preview without executing, use `Diagram.cascade()`:
The `drop()` method follows the same safemode pattern — previewing affected tables and asking for confirmation before proceeding.
296
+
297
+
### Unloaded Schema Detection
298
+
299
+
If a descendant table lives in a schema that hasn't been activated, the graph-driven delete won't know about it. When the final `DELETE` fails with a foreign key error, DataJoint catches it and produces an actionable error message identifying which schema needs to be activated — rather than the opaque crash of the prior implementation.
300
+
301
+
### Iteration API
302
+
303
+
Diagrams support Python's iteration protocol, yielding `FreeTable` objects in topological order:
304
+
305
+
```python
306
+
# Forward iteration (parents first) — useful for export/inspection
307
+
for ft in diagram:
308
+
print(ft.full_table_name, len(ft))
309
+
310
+
# Reverse iteration (leaves first) — used by delete and drop
311
+
for ft inreversed(diagram):
312
+
ft.delete_quick()
313
+
```
314
+
315
+
Each yielded `FreeTable` carries any cascade or restrict conditions that have been applied. `Table.delete()` and `Table.drop()` use `reversed(diagram)` internally, replacing the manual `topo_sort()` loops from prior implementations.
316
+
265
317
### Architecture
266
318
267
-
`Table.delete()`constructs a `Diagram` internally, calls `cascade()` to compute the affected subgraph, then executes the delete itself in reverse topological order. The Diagram is purely a graph computation and inspection tool — it computes the cascade and provides `preview()`, but all mutation logic (transactions, SQL execution, prompts) lives in `Table.delete()` and `Table.drop()`.
319
+
`Table.delete()`uses `Diagram.cascade(self)`internally to compute the affected subgraph, then iterates `reversed(diagram)` to delete leaves first. `Table.drop()` builds a Diagram with all descendants and drops in the same order. The Diagram is purely a graph computation and inspection tool — it computes the cascade and provides `counts()` and iteration, but all mutation logic (transactions, SQL execution, prompts) lives in `Table.delete()` and `Table.drop()`.
268
320
269
321
### Advantages over Error-Driven Cascade
270
322
@@ -278,10 +330,12 @@ The graph-driven approach resolves every known limitation of the prior error-dri
278
330
| Part integrity enforcement | Post-hoc check after delete | Data-driven post-check (no false positives) |
0 commit comments