Skip to content
73 changes: 68 additions & 5 deletions src/explanation/whats-new-22.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# What's New in DataJoint 2.2

DataJoint 2.2 introduces **isolated instances** and **thread-safe mode** for applications that need multiple independent database connections—web servers, multi-tenant notebooks, parallel pipelines, and testing.
DataJoint 2.2 introduces **isolated instances**, **thread-safe mode**, and **graph-driven diagram operations** for applications that need multiple independent database connections, explicit cascade control, and operational use of the dependency graph.
Comment thread
dimitri-yatsenko marked this conversation as resolved.
Outdated

> **Upgrading from 2.0 or 2.1?** No breaking changes. All existing code using `dj.config` and `dj.Schema()` continues to work. The new Instance API is purely additive.

Expand Down Expand Up @@ -201,9 +201,72 @@ class MyTable(dj.Manual):

Once a Schema is created, table definitions, inserts, queries, and all other operations work identically regardless of which pattern was used to create the Schema.

## Graph-Driven Diagram Operations

DataJoint 2.2 promotes `dj.Diagram` from a visualization tool to an operational component. The same dependency graph that renders pipeline diagrams now powers cascade delete, table drop, and data subsetting.

### From Visualization to Operations

In prior versions, `dj.Diagram` existed solely for visualization — drawing the dependency graph as SVG or Mermaid output. The cascade logic inside `Table.delete()` traversed dependencies independently, with no way to inspect or control the cascade before it executed.

In 2.2, `Table.delete()` and `Table.drop()` delegate internally to `dj.Diagram`. The user-facing behavior of `Table.delete()` is unchanged, but the diagram-level API is now available as a more powerful interface for complex scenarios.

### The Preview-Then-Execute Pattern

The key benefit of the diagram-level API is the ability to build a cascade explicitly, inspect it, and then decide whether to execute:

```python
# Build the dependency graph
diag = dj.Diagram(schema)

# Apply cascade restriction — nothing is deleted yet
restricted = diag.cascade(Session & {'subject_id': 'M001'})

# Inspect: what tables and how many rows would be affected?
counts = restricted.preview()
# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}

# Execute only after reviewing the blast radius
restricted.delete(prompt=False)
```

This is valuable when working with unfamiliar pipelines, large datasets, or multi-schema dependencies where the cascade impact is not immediately obvious.

### Two Propagation Modes

The diagram supports two restriction propagation modes designed for fundamentally different tasks.

**`cascade()` prepares a delete.** It takes a single restricted table expression, propagates the restriction downstream through all descendants, and **trims the diagram** to the resulting subgraph — ancestors and unrelated tables are removed entirely. Convergence uses OR: a descendant row is marked for deletion if *any* ancestor path reaches it, because if any reason exists to remove a row, it should be removed. `cascade()` is one-shot and is always followed by `preview()` or `delete()`.
Comment thread
dimitri-yatsenko marked this conversation as resolved.

**`restrict()` selects a data subset.** It propagates a restriction downstream but **preserves the full diagram**, allowing `restrict()` to be called again from a different seed table. This makes it possible to build up multi-condition subsets incrementally — for example, restricting by species from one table and by date from another. Convergence uses AND: a descendant row is included only if *all* restricted ancestors match, because an export should contain only rows satisfying every condition. After chaining restrictions, use `prune()` to remove empty tables and `preview()` to inspect the result.
Comment thread
dimitri-yatsenko marked this conversation as resolved.

The two modes are mutually exclusive on the same diagram. This prevents accidental mixing of incompatible semantics — a delete diagram should never be reused for subsetting, and vice versa.
Comment thread
dimitri-yatsenko marked this conversation as resolved.
Outdated

### Pruning Empty Tables

After applying restrictions, some tables in the diagram may have zero matching rows. The `prune()` method removes these tables from the diagram, leaving only the subgraph with actual data:

```python
export = (dj.Diagram(schema)
.restrict(Subject & {'species': 'mouse'})
.restrict(Session & 'session_date > "2024-01-01"')
.prune())

export.preview() # only tables with matching rows
export # visualize the export subgraph
```

Without prior restrictions, `prune()` removes physically empty tables. This is useful for understanding which parts of a pipeline are populated.

### Architecture

`Table.delete()` now constructs a `Diagram` internally, calls `cascade()`, and then `delete()`. This means every table-level delete benefits from the same graph-driven logic. The diagram-level API simply exposes this machinery for direct use when more control is needed.

## See Also

- [Use Isolated Instances](../how-to/use-instances.md/) — Task-oriented guide
- [Working with Instances](../tutorials/advanced/instances.ipynb/) — Step-by-step tutorial
- [Configuration Reference](../reference/configuration.md/) — Thread-safe mode settings
- [Configure Database](../how-to/configure-database.md/) — Connection setup
- [Use Isolated Instances](../how-to/use-instances.md) — Task-oriented guide
- [Working with Instances](../tutorials/advanced/instances.ipynb) — Step-by-step tutorial
- [Configuration Reference](../reference/configuration.md) — Thread-safe mode settings
- [Configure Database](../how-to/configure-database.md) — Connection setup
- [Diagram Specification](../reference/specs/diagram.md) — Full reference for diagram operations
- [Delete Data](../how-to/delete-data.md) — Task-oriented delete guide
35 changes: 35 additions & 0 deletions src/how-to/delete-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,8 +189,43 @@ count = (Subject & restriction).delete(prompt=False)
print(f"Deleted {count} subjects")
```

## Diagram-Level Delete

!!! version-added "New in 2.2"
Diagram-level delete was added in DataJoint 2.2.

For complex scenarios — previewing the blast radius, working across schemas, or understanding the dependency graph before deleting — use `dj.Diagram` to build and inspect the cascade before executing.
Comment thread
dimitri-yatsenko marked this conversation as resolved.
Outdated

### Build, Preview, Execute

```python
import datajoint as dj

# 1. Build the dependency graph
diag = dj.Diagram(schema)

# 2. Apply cascade restriction (nothing deleted yet)
restricted = diag.cascade(Session & {'subject_id': 'M001'})

# 3. Preview: see affected tables and row counts
counts = restricted.preview()
# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}

# 4. Execute only after reviewing
restricted.delete(prompt=False)
```

### When to Use

- **Preview blast radius**: Understand what a cascade delete will affect before committing
- **Multi-schema cascades**: Build a diagram spanning multiple schemas and delete across them in one operation
- **Programmatic control**: Use `preview()` return values to make decisions in automated workflows

For simple single-table deletes, `(Table & restriction).delete()` remains the simplest approach. The diagram-level API is for when you need more visibility or control.

## See Also

- [Diagram Specification](../reference/specs/diagram.md) — Full reference for diagram operations
- [Master-Part Tables](master-part.ipynb) — Compositional data patterns
- [Model Relationships](model-relationships.ipynb) — Foreign key patterns
- [Insert Data](insert-data.md) — Adding data to tables
Expand Down
84 changes: 7 additions & 77 deletions src/how-to/read-diagrams.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1325,39 +1325,13 @@
"cell_type": "markdown",
"id": "cell-ops-ref",
"metadata": {},
"source": [
"**Operation Reference:**\n",
"\n",
"| Operation | Meaning |\n",
"|-----------|--------|\n",
"| `dj.Diagram(schema)` | Entire schema |\n",
"| `dj.Diagram(Table) - N` | Table + N levels upstream |\n",
"| `dj.Diagram(Table) + N` | Table + N levels downstream |\n",
"| `D1 + D2` | Union of two diagrams |\n",
"| `D1 * D2` | Intersection (common nodes) |\n",
"\n",
"**Finding paths:** Use intersection to find connection paths:\n",
"```python\n",
"(dj.Diagram(upstream) + 100) * (dj.Diagram(downstream) - 100)\n",
"```"
]
"source": "**Operation Reference:**\n\n| Operation | Meaning |\n|-----------|--------|\n| `dj.Diagram(schema)` | Entire schema |\n| `dj.Diagram(Table) - N` | Table + N levels upstream |\n| `dj.Diagram(Table) + N` | Table + N levels downstream |\n| `D1 + D2` | Union of two diagrams |\n| `D1 * D2` | Intersection (common nodes) |\n| `D.prune()` | Remove tables with zero matching rows *(New in 2.2)* |\n\n**Finding paths:** Use intersection to find connection paths:\n```python\n(dj.Diagram(upstream) + 100) * (dj.Diagram(downstream) - 100)\n```"
},
{
"cell_type": "markdown",
"id": "2lmw6tar3w8",
"metadata": {},
"source": [
"## Layout Direction\n",
"\n",
"*New in DataJoint 2.1*\n",
"\n",
"Control the flow direction of diagrams via configuration:\n",
"\n",
"| Direction | Description |\n",
"|-----------|-------------|\n",
"| `\"TB\"` | Top to bottom (default) |\n",
"| `\"LR\"` | Left to right |"
]
"source": "## Layout Direction\n\n!!! version-added \"New in 2.1\"\n Configurable layout direction was added in DataJoint 2.1.\n\nControl the flow direction of diagrams via configuration:\n\n| Direction | Description |\n|-----------|-------------|\n| `\"TB\"` | Top to bottom (default) |\n| `\"LR\"` | Left to right |"
},
{
"cell_type": "code",
Expand Down Expand Up @@ -1634,13 +1608,7 @@
"cell_type": "markdown",
"id": "ogpr8cqsife",
"metadata": {},
"source": [
"## Mermaid Output\n",
"\n",
"*New in DataJoint 2.1*\n",
"\n",
"Generate [Mermaid](https://mermaid.js.org/) syntax for embedding diagrams in Markdown documentation, GitHub, or web pages:"
]
"source": "## Mermaid Output\n\n!!! version-added \"New in 2.1\"\n Mermaid output was added in DataJoint 2.1.\n\nGenerate [Mermaid](https://mermaid.js.org/) syntax for embedding diagrams in Markdown documentation, GitHub, or web pages:"
},
{
"cell_type": "code",
Expand Down Expand Up @@ -1700,13 +1668,7 @@
"cell_type": "markdown",
"id": "pqet0vo8pwp",
"metadata": {},
"source": [
"## Multi-Schema Pipelines\n",
"\n",
"Real-world pipelines often span multiple schemas (modules). \n",
"\n",
"*New in DataJoint 2.1:* Tables are automatically grouped into visual clusters by schema, with the Python module name shown as the group label."
]
"source": "## Multi-Schema Pipelines\n\nReal-world pipelines often span multiple schemas (modules).\n\n!!! version-added \"New in 2.1\"\n Automatic schema grouping was added in DataJoint 2.1. Tables are automatically grouped into visual clusters by schema, with the Python module name shown as the group label."
},
{
"cell_type": "code",
Expand Down Expand Up @@ -2104,13 +2066,7 @@
"cell_type": "markdown",
"id": "ncl6hafwbjt",
"metadata": {},
"source": [
"## Collapsing Schemas\n",
"\n",
"*New in DataJoint 2.1*\n",
"\n",
"For high-level pipeline views, collapse entire schemas into single nodes using `.collapse()`. This is useful for showing relationships between modules without the detail of individual tables."
]
"source": "## Collapsing Schemas\n\n!!! version-added \"New in 2.1\"\n The `collapse()` method was added in DataJoint 2.1.\n\nFor high-level pipeline views, collapse entire schemas into single nodes using `.collapse()`. This is useful for showing relationships between modules without the detail of individual tables."
},
{
"cell_type": "code",
Expand Down Expand Up @@ -3322,33 +3278,7 @@
"cell_type": "markdown",
"id": "cell-summary-md",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"| Visual | Meaning |\n",
"|--------|--------|\n",
"| **Thick solid** | One-to-one extension |\n",
"| **Thin solid** | One-to-many containment |\n",
"| **Dashed** | Reference (independent identity) |\n",
"| **Underlined** | Introduces new dimension |\n",
"| **Orange dots** | Renamed FK via `.proj()` |\n",
"| **Colors** | Green=Manual, Gray=Lookup, Red=Computed, Blue=Imported |\n",
"| **Grouped boxes** | Tables grouped by schema/module |\n",
"| **3D box (gray)** | Collapsed schema *(2.1+)* |\n",
"\n",
"| Feature | Method |\n",
"|---------|--------|\n",
"| Layout direction | `dj.config.display.diagram_direction` |\n",
"| Mermaid output | `.make_mermaid()` |\n",
"| Collapse schema | `.collapse()` *(2.1+)* |\n",
"\n",
"## Related\n",
"\n",
"- [Diagram Specification](../reference/specs/diagram.md)\n",
"- [Entity Integrity: Dimensions](../explanation/entity-integrity.md#schema-dimensions)\n",
"- [Semantic Matching](../reference/specs/semantic-matching.md)\n",
"- [Schema Design Tutorial](../tutorials/basics/02-schema-design.ipynb)"
]
"source": "## Summary\n\n| Visual | Meaning |\n|--------|--------|\n| **Thick solid** | One-to-one extension |\n| **Thin solid** | One-to-many containment |\n| **Dashed** | Reference (independent identity) |\n| **Underlined** | Introduces new dimension |\n| **Orange dots** | Renamed FK via `.proj()` |\n| **Colors** | Green=Manual, Gray=Lookup, Red=Computed, Blue=Imported |\n| **Grouped boxes** | Tables grouped by schema/module |\n| **3D box (gray)** | Collapsed schema *(New in 2.1)* |\n\n| Feature | Method |\n|---------|--------|\n| Layout direction | `dj.config.display.diagram_direction` |\n| Mermaid output | `.make_mermaid()` |\n| Collapse schema | `.collapse()` *(New in 2.1)* |\n| Prune empty tables | `.prune()` *(New in 2.2)* |\n\n## Related\n\n- [Diagram Specification](../reference/specs/diagram.md)\n- [Entity Integrity: Dimensions](../explanation/entity-integrity.md#schema-dimensions)\n- [Semantic Matching](../reference/specs/semantic-matching.md)\n- [Schema Design Tutorial](../tutorials/basics/02-schema-design.ipynb)"
},
{
"cell_type": "code",
Expand Down Expand Up @@ -3397,4 +3327,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}
3 changes: 3 additions & 0 deletions src/reference/specs/data-manipulation.md
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,9 @@ Delete automatically cascades to all dependent tables:
2. Recursively delete matching rows in child tables
3. Delete rows in target table

!!! version-added "New in 2.2"
`Table.delete()` now uses graph-driven cascade internally via `dj.Diagram`. User-facing behavior is unchanged — the same parameters and return values apply. For direct control over the cascade (preview, multi-schema operations), use the [Diagram operational methods](diagram.md#operational-methods).

### 4.3 Basic Usage

```python
Expand Down
Loading
Loading