Skip to content

Commit 2f17bda

Browse files
Merge pull request #157 from datajoint/docs/whats-new-pages
2 parents 32c4a40 + 72e1c75 commit 2f17bda

27 files changed

+603
-448
lines changed

mkdocs.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,6 @@ nav:
1010
- explanation/index.md
1111
- Overview:
1212
- Data Pipelines: explanation/data-pipelines.md
13-
- What's New in 2.0: explanation/whats-new-2.md
14-
- What's New in 2.2: explanation/whats-new-22.md
1513
- FAQ: explanation/faq.md
1614
- Data Model:
1715
- Relational Workflow Model: explanation/relational-workflow-model.md
@@ -127,6 +125,9 @@ nav:
127125
- API: api/ # Auto-generated via gen-files + literate-nav
128126
- About:
129127
- about/index.md
128+
- What's New in 2.2: about/whats-new-22.md
129+
- What's New in 2.1: about/whats-new-21.md
130+
- What's New in 2.0: about/whats-new-2.md
130131
- History: about/history.md
131132
- Documentation Versioning: about/versioning.md
132133
- Platform: https://www.datajoint.com/sign-up
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{% if config.extra.datajoint_version %}
2-
<a href="{{ 'explanation/whats-new-2/' | url }}">
2+
<a href="{{ 'about/whats-new-2/' | url }}">
33
Documentation for DataJoint {{ config.extra.datajoint_version }}
44
</a>
55
{% endif %}

src/about/versioning.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ print(dj.__version__)
8787

8888
If you're upgrading from legacy DataJoint (pre-2.0):
8989

90-
1. **Review** the [What's New in 2.0](../explanation/whats-new-2.md) page to understand major changes
90+
1. **Review** the [What's New in 2.0](whats-new-2.md) page to understand major changes
9191
2. **Follow** the [Migration Guide](../how-to/migrate-to-v20.md) for step-by-step upgrade instructions
9292
3. **Reference** this documentation for updated syntax and APIs
9393

Lines changed: 9 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -274,20 +274,12 @@ Most users complete Phases 1-2 in a single session. Phases 3-4 only apply if you
274274

275275
## See Also
276276

277-
### Migration
278-
- **[Migration Guide](../how-to/migrate-to-v20.md/)** — Complete upgrade instructions
279-
- [Configuration](../how-to/configure-database.md/) — Setup new configuration system
280-
281-
### Core Concepts
282-
- [Type System](type-system.md) — Understand the three-tier type architecture
283-
- [Computation Model](computation-model.md) — Jobs 2.0 and AutoPopulate
284-
- [Query Algebra](query-algebra.md) — Semantic matching and operators
285-
286-
### Getting Started
287-
- [Installation](../how-to/installation.md/) — Install DataJoint 2.0
288-
- [Tutorials](../tutorials/index.md/) — Learn by example
289-
290-
### Reference
291-
- [Type System Specification](../reference/specs/type-system.md/) — Complete type system details
292-
- [Codec API](../reference/specs/codec-api.md/) — Build custom codecs
293-
- [AutoPopulate Specification](../reference/specs/autopopulate.md/) — Jobs 2.0 reference
277+
- [What's New in 2.1](whats-new-21.md) — Next release
278+
- [Release Notes (v2.0.0)](https://github.com/datajoint/datajoint-python/releases/tag/v2.0.0) — GitHub changelog
279+
- **[Migration Guide](../how-to/migrate-to-v20.md)** — Complete upgrade instructions
280+
- [Configuration](../how-to/configure-database.md) — Setup new configuration system
281+
- [Type System](../explanation/type-system.md) — Understand the three-tier type architecture
282+
- [Computation Model](../explanation/computation-model.md) — Jobs 2.0 and AutoPopulate
283+
- [Query Algebra](../explanation/query-algebra.md) — Semantic matching and operators
284+
- [Installation](../how-to/installation.md) — Install DataJoint 2.0
285+
- [Tutorials](../tutorials/index.md) — Learn by example

src/about/whats-new-21.md

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
# What's New in DataJoint 2.1
2+
3+
DataJoint 2.1 adds **PostgreSQL as a production backend**, **enhanced diagram visualization**, and **singleton tables**.
4+
5+
> **Upgrading from 2.0?** No breaking changes. All existing code continues to work. New features are purely additive.
6+
7+
> **Citation:** Yatsenko D, Nguyen TT. *DataJoint 2.0: A Computational Substrate for Agentic Scientific Workflows.* arXiv:2602.16585. 2026. [doi:10.48550/arXiv.2602.16585](https://doi.org/10.48550/arXiv.2602.16585)
8+
9+
## PostgreSQL Backend
10+
11+
DataJoint now supports PostgreSQL 15+ as a production database backend alongside MySQL 8+. The adapter architecture generates backend-specific SQL while maintaining a consistent API — the same table definitions, queries, and pipeline logic work on both backends.
12+
13+
```bash
14+
export DJ_BACKEND=postgresql
15+
export DJ_HOST=localhost
16+
export DJ_PORT=5432
17+
```
18+
19+
Or configure programmatically:
20+
21+
```python
22+
dj.config['database.backend'] = 'postgresql'
23+
```
24+
25+
All core types (`int32`, `float64`, `varchar`, `uuid`, `json`), codec types (`<blob>`, `<attach>`, `<object@>`), query operations, foreign keys, indexes, and auto-populate work identically across backends. Backend-specific differences are handled internally by the adapter layer.
26+
27+
See [Database Backends](../reference/specs/database-backends.md) for the full specification.
28+
29+
## Diagram Enhancements
30+
31+
`dj.Diagram` gains several visualization features for working with complex, multi-schema pipelines.
32+
33+
### Layout Direction
34+
35+
Control the flow direction of diagrams:
36+
37+
```python
38+
# Horizontal layout
39+
dj.config.display.diagram_direction = "LR"
40+
41+
# Or temporarily
42+
with dj.config.override(display__diagram_direction="LR"):
43+
dj.Diagram(schema).draw()
44+
```
45+
46+
| Value | Description |
47+
|-------|-------------|
48+
| `"TB"` | Top to bottom (default) |
49+
| `"LR"` | Left to right |
50+
51+
### Mermaid Output
52+
53+
Generate [Mermaid](https://mermaid.js.org/) syntax for embedding diagrams in Markdown, GitHub, or web documentation:
54+
55+
```python
56+
print(dj.Diagram(schema).make_mermaid())
57+
```
58+
59+
Save directly to `.mmd` or `.mermaid` files:
60+
61+
```python
62+
dj.Diagram(schema).save("pipeline.mmd")
63+
```
64+
65+
### Schema Grouping
66+
67+
Multi-schema diagrams automatically group tables into visual clusters by database schema. The cluster label shows the Python module name when available, following the DataJoint convention of one module per schema.
68+
69+
```python
70+
combined = dj.Diagram(schema1) + dj.Diagram(schema2)
71+
combined.draw() # tables grouped by schema
72+
```
73+
74+
### Collapsing Schemas
75+
76+
For high-level pipeline views, collapse entire schemas into single nodes:
77+
78+
```python
79+
# Show schema1 expanded, schema2 as a single node with table count
80+
dj.Diagram(schema1) + dj.Diagram(schema2).collapse()
81+
```
82+
83+
The **"expanded wins" rule** applies: if a table appears in both a collapsed and non-collapsed diagram, it stays expanded. This allows showing specific tables while collapsing the rest:
84+
85+
```python
86+
# Subject is expanded, rest of analysis schema is collapsed
87+
dj.Diagram(Subject) + dj.Diagram(analysis).collapse()
88+
```
89+
90+
See [Diagram Specification](../reference/specs/diagram.md) for the full reference.
91+
92+
## Singleton Tables
93+
94+
A **singleton table** holds at most one row. Declare it with no attributes in the primary key section:
95+
96+
```python
97+
@schema
98+
class Config(dj.Lookup):
99+
definition = """
100+
# Global configuration
101+
---
102+
setting1 : varchar(100)
103+
setting2 : int32
104+
"""
105+
```
106+
107+
| Operation | Result |
108+
|-----------|--------|
109+
| Insert | Works without specifying a key |
110+
| Second insert | Raises `DuplicateError` |
111+
| `fetch1()` | Returns the single row |
112+
113+
Useful for global configuration, pipeline parameters, and summary statistics.
114+
115+
See [Table Declaration](../reference/specs/table-declaration.md#25-singleton-tables-empty-primary-keys) for details.
116+
117+
## See Also
118+
119+
- [Database Backends](../reference/specs/database-backends.md) — Full backend specification
120+
- [Diagram Specification](../reference/specs/diagram.md) — Diagram reference
121+
- [Table Declaration](../reference/specs/table-declaration.md) — Singleton tables
122+
- [Configure Database](../how-to/configure-database.md) — Connection setup for both backends
123+
- [What's New in 2.0](whats-new-2.md) — Previous release
124+
- [What's New in 2.2](whats-new-22.md) — Next release
125+
- [Release Notes (v2.1.0)](https://github.com/datajoint/datajoint-python/releases/tag/v2.1.0) — GitHub changelog
Lines changed: 69 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -213,19 +213,15 @@ In prior versions, `dj.Diagram` existed solely for visualization — drawing the
213213
- **PostgreSQL** aborts the entire transaction on any error, requiring `SAVEPOINT` / `ROLLBACK TO SAVEPOINT` round-trips for each failed delete attempt.
214214
- **Fragile error parsing** across MySQL versions and privilege levels, where different configurations produce different error message formats.
215215

216-
In 2.2, `Table.delete()` and `Table.drop()` use `dj.Diagram` internally to compute the dependency graph and walk it in reverse topological order — deleting leaves first, with no trial-and-error needed. The user-facing behavior of `Table.delete()` is unchanged. The Diagram's `cascade()` and `preview()` methods are available as a public inspection API for understanding cascade impact before executing.
216+
In 2.2, `Table.delete()` and `Table.drop()` use `dj.Diagram` internally to compute the dependency graph and walk it in reverse topological order — deleting leaves first, with no trial-and-error needed. The user-facing behavior of `Table.delete()` is unchanged. The Diagram's `cascade()` and `counts()` methods are available as a public inspection API for understanding cascade impact before executing.
217217

218218
### The Preview-Then-Execute Pattern
219219

220-
The key benefit of the diagram-level API is the ability to build a cascade explicitly, inspect it, and then execute via `Table.delete()`:
220+
`Diagram.cascade()` is a class method that builds a complete cascade diagram from a table expression — including all descendants across all loaded schemas — in a single call:
221221

222222
```python
223-
# Build the dependency graph and inspect the cascade
224-
diag = dj.Diagram(schema)
225-
restricted = diag.cascade(Session & {'subject_id': 'M001'})
226-
227-
# Inspect: what tables and how many rows would be affected?
228-
counts = restricted.preview()
223+
# Preview: what tables and how many rows would be affected?
224+
dj.Diagram.cascade(Session & {'subject_id': 'M001'}).counts()
229225
# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}
230226

231227
# Execute via Table.delete() after reviewing the blast radius
@@ -238,33 +234,89 @@ This is valuable when working with unfamiliar pipelines, large datasets, or mult
238234

239235
The diagram supports two restriction propagation modes designed for fundamentally different tasks.
240236

241-
**`cascade()` prepares a delete.** It takes a single restricted table expression, propagates the restriction downstream through all descendants, and **trims the diagram** to the resulting subgraph — ancestors and unrelated tables are removed entirely. Convergence uses OR: a descendant row is marked for deletion if *any* ancestor path reaches it, because if any reason exists to remove a row, it should be removed. `cascade()` is one-shot and is always followed by `preview()` or `delete()`.
237+
**`Diagram.cascade(table_expr)`** is a class method that creates a cascade diagram for delete. It takes a (possibly restricted) table expression, includes all descendants across loaded schemas, propagates the restriction downstream, and **trims the diagram** to the resulting subgraph — ancestors and unrelated tables are removed entirely. Convergence uses OR: a descendant row is marked for deletion if *any* ancestor path reaches it, because if any reason exists to remove a row, it should be removed.
242238

243239
When the cascade encounters a part table whose master is not yet included in the cascade, the behavior depends on the `part_integrity` setting. With `"enforce"` (the default), `delete()` raises an error if part rows would be deleted without their master — preventing orphaned master rows. With `"cascade"`, the restriction propagates *upward* from the part to its master: the restricted part rows identify which master rows are affected, those masters receive a restriction, and that restriction then propagates back downstream to all sibling parts — deleting the entire compositional unit, not just the originally matched part rows.
244240

245-
**`restrict()` selects a data subset.** It propagates a restriction downstream but **preserves the full diagram**, allowing `restrict()` to be called again from a different seed table. This makes it possible to build up multi-condition subsets incrementally — for example, restricting by species from one table and by date from another. Convergence uses AND: a descendant row is included only if *all* restricted ancestors match, because an export should contain only rows satisfying every condition. After chaining restrictions, use `prune()` to remove empty tables and `preview()` to inspect the result.
241+
**`diagram.restrict(table_expr)`** is an instance method that selects a data subset. It propagates a restriction downstream but **preserves the full diagram**, allowing `restrict()` to be called again from a different seed table. This makes it possible to build up multi-condition subsets incrementally — for example, restricting by species from one table and by date from another. Convergence uses AND: a descendant row is included only if *all* restricted ancestors match, because an export should contain only rows satisfying every condition. After chaining restrictions, use `prune()` to remove empty tables and `counts()` to inspect the result.
246242

247-
The two modes are mutually exclusive on the same diagram — DataJoint raises an error if you attempt to mix `cascade()` and `restrict()`, or if you call `cascade()` more than once. This prevents accidental mixing of incompatible semantics: a delete diagram should never be reused for subsetting, and vice versa.
243+
The two modes are mutually exclusive `restrict()` raises an error if called on a Diagram produced by `cascade()`. This prevents accidental mixing of incompatible semantics: a delete diagram should never be reused for subsetting.
248244

249245
### Pruning Empty Tables
250246

251-
After applying restrictions, some tables in the diagram may have zero matching rows. The `prune()` method removes these tables from the diagram, leaving only the subgraph with actual data:
247+
After applying restrictions with `restrict()`, some tables in the diagram may have zero matching rows. The `prune()` method removes these tables from the diagram, leaving only the subgraph with actual data:
252248

253249
```python
254250
export = (dj.Diagram(schema)
255251
.restrict(Subject & {'species': 'mouse'})
256252
.restrict(Session & 'session_date > "2024-01-01"')
257253
.prune())
258254

259-
export.preview() # only tables with matching rows
255+
export.counts() # only tables with matching rows
260256
export # visualize the export subgraph
261257
```
262258

263259
Without prior restrictions, `prune()` removes physically empty tables. This is useful for understanding which parts of a pipeline are populated.
264260

261+
`prune()` cannot be used on cascade Diagrams — cascade retains all descendant tables to handle concurrent inserts safely (a table empty at cascade time could have rows by the time `delete()` executes).
262+
263+
### Restriction Propagation Rules
264+
265+
When `cascade()` or `restrict()` propagates a restriction from a parent to a child, one of three rules applies depending on the foreign key relationship:
266+
267+
| Rule | Condition | Child restriction |
268+
|------|-----------|-------------------|
269+
| **Direct copy** | Non-aliased FK, restriction attributes are a subset of child's primary key | Restriction copied directly |
270+
| **Aliased projection** | FK uses attribute renaming (e.g., `subject_id``animal_id`) | Parent projected with attribute mapping |
271+
| **Full projection** | Non-aliased FK, restriction uses attributes not in child's primary key | Parent projected (all attributes) as restriction |
272+
273+
When a child has multiple restricted ancestors, convergence depends on the mode: `cascade()` uses OR (any path marks a row for deletion), `restrict()` uses AND (all conditions must match).
274+
275+
When a child references the same parent through multiple foreign keys (e.g., `source_mouse` and `target_mouse` both referencing `Mouse`), these paths always combine with OR regardless of the mode — each FK path is an independent reason for the child row to be affected.
276+
277+
### Safe Delete Workflow
278+
279+
With `safemode=True` (the default), `delete()` provides a built-in preview-and-confirm workflow:
280+
281+
1. Builds the cascade diagram and computes all affected tables
282+
2. Executes the deletes inside a transaction
283+
3. Logs every affected table and its row count
284+
4. Asks **"Commit deletes?"** — declining **rolls back** all changes
285+
286+
This is safer than a pre-transaction preview because it reflects the actual database state at delete time, including triggers and concurrent changes.
287+
288+
For programmatic preview without executing, use `Diagram.cascade()`:
289+
290+
```python
291+
dj.Diagram.cascade(Session & {'subject_id': 'M001'}).counts()
292+
# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}
293+
```
294+
295+
The `drop()` method follows the same safemode pattern — previewing affected tables and asking for confirmation before proceeding.
296+
297+
### Unloaded Schema Detection
298+
299+
If a descendant table lives in a schema that hasn't been activated, the graph-driven delete won't know about it. When the final `DELETE` fails with a foreign key error, DataJoint catches it and produces an actionable error message identifying which schema needs to be activated — rather than the opaque crash of the prior implementation.
300+
301+
### Iteration API
302+
303+
Diagrams support Python's iteration protocol, yielding `FreeTable` objects in topological order:
304+
305+
```python
306+
# Forward iteration (parents first) — useful for export/inspection
307+
for ft in diagram:
308+
print(ft.full_table_name, len(ft))
309+
310+
# Reverse iteration (leaves first) — used by delete and drop
311+
for ft in reversed(diagram):
312+
ft.delete_quick()
313+
```
314+
315+
Each yielded `FreeTable` carries any cascade or restrict conditions that have been applied. `Table.delete()` and `Table.drop()` use `reversed(diagram)` internally, replacing the manual `topo_sort()` loops from prior implementations.
316+
265317
### Architecture
266318

267-
`Table.delete()` constructs a `Diagram` internally, calls `cascade()` to compute the affected subgraph, then executes the delete itself in reverse topological order. The Diagram is purely a graph computation and inspection tool — it computes the cascade and provides `preview()`, but all mutation logic (transactions, SQL execution, prompts) lives in `Table.delete()` and `Table.drop()`.
319+
`Table.delete()` uses `Diagram.cascade(self)` internally to compute the affected subgraph, then iterates `reversed(diagram)` to delete leaves first. `Table.drop()` builds a Diagram with all descendants and drops in the same order. The Diagram is purely a graph computation and inspection tool — it computes the cascade and provides `counts()` and iteration, but all mutation logic (transactions, SQL execution, prompts) lives in `Table.delete()` and `Table.drop()`.
268320

269321
### Advantages over Error-Driven Cascade
270322

@@ -278,10 +330,12 @@ The graph-driven approach resolves every known limitation of the prior error-dri
278330
| Part integrity enforcement | Post-hoc check after delete | Data-driven post-check (no false positives) |
279331
| Unloaded schemas | Crash with opaque error | Clear error: "activate schema X" |
280332
| Reusability | Delete-only | Delete, drop, export, prune |
281-
| Inspectability | Opaque recursive cascade | `preview()` / `dry_run` before executing |
333+
| Inspectability | Opaque recursive cascade | `counts()` preview + safemode confirmation before commit |
282334

283335
## See Also
284336

337+
- [What's New in 2.1](whats-new-21.md) — Previous release
338+
- [Release Notes (v2.2.0)](https://github.com/datajoint/datajoint-python/releases) — GitHub changelog
285339
- [Use Isolated Instances](../how-to/use-instances.md) — Task-oriented guide
286340
- [Working with Instances](../tutorials/advanced/instances.ipynb) — Step-by-step tutorial
287341
- [Configuration Reference](../reference/configuration.md) — Thread-safe mode settings

src/explanation/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ and scalable.
5353

5454
How DataJoint ensures safe joins through attribute lineage tracking.
5555

56-
- :material-new-box: **[What's New in 2.0](whats-new-2.md)**
56+
- :material-new-box: **[What's New in 2.0](../about/whats-new-2.md)**
5757

5858
Major changes, new features, and migration guidance for DataJoint 2.0.
5959

0 commit comments

Comments
 (0)