Skip to content

Commit 3ba5256

Browse files
Merge pull request #123 from datajoint/docs/plugin-codecs-guide
2 parents fa22cf7 + daf7eac commit 3ba5256

File tree

8 files changed

+20
-346
lines changed

8 files changed

+20
-346
lines changed

src/tutorials/basics/01-first-pipeline.ipynb

Lines changed: 3 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -3,22 +3,7 @@
33
{
44
"cell_type": "markdown",
55
"metadata": {},
6-
"source": [
7-
"# A Simple Pipeline\n",
8-
"\n",
9-
"This tutorial introduces DataJoint by building a simple research lab database. You'll learn to:\n",
10-
"\n",
11-
"- Define tables with primary keys and dependencies\n",
12-
"- Insert and query data\n",
13-
"- Use the four core operations: restriction, projection, join, aggregation\n",
14-
"- Understand the schema diagram\n",
15-
"\n",
16-
"We'll work with **Manual tables** only—tables where you enter data directly. Later tutorials introduce automated computation.\n",
17-
"\n",
18-
"For complete working examples, see:\n",
19-
"- [University Database](../examples/university.ipynb) — Academic records with complex queries\n",
20-
"- [Blob Detection](../examples/blob-detection.ipynb) — Image processing with computation"
21-
]
6+
"source": "# A Simple Pipeline\n\nThis tutorial introduces DataJoint by building a simple research lab database. You'll learn to:\n\n- Define tables with primary keys and dependencies\n- Insert and query data\n- Use the four core operations: restriction, projection, join, aggregation\n- Understand the schema diagram\n\nWe'll work with **Manual tables** only—tables where you enter data directly. Later tutorials introduce automated computation.\n\nFor complete working examples, see:\n- [University Database](../examples/university/) — Academic records with complex queries\n- [Blob Detection](../examples/blob-detection/) — Image processing with computation"
227
},
238
{
249
"cell_type": "markdown",
@@ -2698,32 +2683,7 @@
26982683
{
26992684
"cell_type": "markdown",
27002685
"metadata": {},
2701-
"source": [
2702-
"## Summary\n",
2703-
"\n",
2704-
"You've learned the fundamentals of DataJoint:\n",
2705-
"\n",
2706-
"| Concept | Description |\n",
2707-
"|---------|-------------|\n",
2708-
"| **Tables** | Python classes with a `definition` string |\n",
2709-
"| **Primary key** | Above `---`, uniquely identifies rows |\n",
2710-
"| **Dependencies** | `->` creates foreign keys |\n",
2711-
"| **Restriction** | `&` filters rows |\n",
2712-
"| **Projection** | `.proj()` selects/computes columns |\n",
2713-
"| **Join** | `*` combines tables |\n",
2714-
"| **Aggregation** | `.aggr()` summarizes groups |\n",
2715-
"\n",
2716-
"### Next Steps\n",
2717-
"\n",
2718-
"- [Schema Design](02-schema-design.ipynb) — Primary keys, relationships, table tiers\n",
2719-
"- [Queries](04-queries.ipynb) — Advanced query patterns\n",
2720-
"- [Computation](05-computation.ipynb) — Automated processing with Imported/Computed tables\n",
2721-
"\n",
2722-
"### Complete Examples\n",
2723-
"\n",
2724-
"- [University Database](../examples/university.ipynb) — Complex queries on academic records\n",
2725-
"- [Blob Detection](../examples/blob-detection.ipynb) — Image processing pipeline with computation"
2726-
]
2686+
"source": "## Summary\n\nYou've learned the fundamentals of DataJoint:\n\n| Concept | Description |\n|---------|-------------|\n| **Tables** | Python classes with a `definition` string |\n| **Primary key** | Above `---`, uniquely identifies rows |\n| **Dependencies** | `->` creates foreign keys |\n| **Restriction** | `&` filters rows |\n| **Projection** | `.proj()` selects/computes columns |\n| **Join** | `*` combines tables |\n| **Aggregation** | `.aggr()` summarizes groups |\n\n### Next Steps\n\n- [Schema Design](02-schema-design/) — Primary keys, relationships, table tiers\n- [Queries](04-queries/) — Advanced query patterns\n- [Computation](05-computation/) — Automated processing with Imported/Computed tables\n\n### Complete Examples\n\n- [University Database](../examples/university/) — Complex queries on academic records\n- [Blob Detection](../examples/blob-detection/) — Image processing pipeline with computation"
27272687
},
27282688
{
27292689
"cell_type": "code",
@@ -2764,4 +2724,4 @@
27642724
},
27652725
"nbformat": 4,
27662726
"nbformat_minor": 4
2767-
}
2727+
}

src/tutorials/basics/02-schema-design.ipynb

Lines changed: 3 additions & 108 deletions
Original file line numberDiff line numberDiff line change
@@ -1299,39 +1299,7 @@
12991299
{
13001300
"cell_type": "markdown",
13011301
"metadata": {},
1302-
"source": [
1303-
"### Reading the Diagram\n",
1304-
"\n",
1305-
"DataJoint diagrams show tables as nodes and foreign keys as edges. The notation conveys relationship semantics at a glance.\n",
1306-
"\n",
1307-
"**Line Styles:**\n",
1308-
"\n",
1309-
"| Line | Style | Relationship | Meaning |\n",
1310-
"|------|-------|--------------|---------|\n",
1311-
"| ━━━ | Thick solid | Extension | FK **is** entire PK (one-to-one) |\n",
1312-
"| ─── | Thin solid | Containment | FK **in** PK with other fields (one-to-many) |\n",
1313-
"| ┄┄┄ | Dashed | Reference | FK in secondary attributes (one-to-many) |\n",
1314-
"\n",
1315-
"**Visual Indicators:**\n",
1316-
"\n",
1317-
"| Indicator | Meaning |\n",
1318-
"|-----------|---------|\n",
1319-
"| **Underlined name** | Introduces new dimension (new PK attributes) |\n",
1320-
"| Non-underlined name | Inherits all dimensions (PK entirely from FKs) |\n",
1321-
"| **Green** | Manual table |\n",
1322-
"| **Gray** | Lookup table |\n",
1323-
"| **Red** | Computed table |\n",
1324-
"| **Blue** | Imported table |\n",
1325-
"| **Orange dots** | Renamed foreign keys (via `.proj()`) |\n",
1326-
"\n",
1327-
"**Key principle:** Solid lines mean the parent's identity becomes part of the child's identity. Dashed lines mean the child maintains independent identity.\n",
1328-
"\n",
1329-
"**Note:** Diagrams do NOT show `[nullable]` or `[unique]` modifiers—check table definitions for these constraints.\n",
1330-
"\n",
1331-
"See [How to Read Diagrams](../../how-to/read-diagrams.ipynb) for diagram operations and comparison to ER notation.\n",
1332-
"\n",
1333-
"## Insert Test Data and Populate"
1334-
]
1302+
"source": "### Reading the Diagram\n\nDataJoint diagrams show tables as nodes and foreign keys as edges. The notation conveys relationship semantics at a glance.\n\n**Line Styles:**\n\n| Line | Style | Relationship | Meaning |\n|------|-------|--------------|---------|\n| ━━━ | Thick solid | Extension | FK **is** entire PK (one-to-one) |\n| ─── | Thin solid | Containment | FK **in** PK with other fields (one-to-many) |\n| ┄┄┄ | Dashed | Reference | FK in secondary attributes (one-to-many) |\n\n**Visual Indicators:**\n\n| Indicator | Meaning |\n|-----------|---------|\n| **Underlined name** | Introduces new dimension (new PK attributes) |\n| Non-underlined name | Inherits all dimensions (PK entirely from FKs) |\n| **Green** | Manual table |\n| **Gray** | Lookup table |\n| **Red** | Computed table |\n| **Blue** | Imported table |\n| **Orange dots** | Renamed foreign keys (via `.proj()`) |\n\n**Key principle:** Solid lines mean the parent's identity becomes part of the child's identity. Dashed lines mean the child maintains independent identity.\n\n**Note:** Diagrams do NOT show `[nullable]` or `[unique]` modifiers—check table definitions for these constraints.\n\nSee [How to Read Diagrams](../../how-to/read-diagrams/) for diagram operations and comparison to ER notation.\n\n## Insert Test Data and Populate"
13351303
},
13361304
{
13371305
"cell_type": "code",
@@ -1562,80 +1530,7 @@
15621530
{
15631531
"cell_type": "markdown",
15641532
"metadata": {},
1565-
"source": [
1566-
"## Best Practices\n",
1567-
"\n",
1568-
"### 1. Choose Meaningful Primary Keys\n",
1569-
"- Use natural identifiers when possible (`subject_id = 'M001'`)\n",
1570-
"- Keep keys minimal but sufficient for uniqueness\n",
1571-
"\n",
1572-
"### 2. Use Appropriate Table Tiers\n",
1573-
"- **Manual**: Data entered by operators or instruments\n",
1574-
"- **Lookup**: Configuration, parameters, reference data\n",
1575-
"- **Imported**: Data read from files (recordings, images)\n",
1576-
"- **Computed**: Derived analyses and summaries\n",
1577-
"\n",
1578-
"### 3. Normalize Your Data\n",
1579-
"- Don't repeat information across rows\n",
1580-
"- Create separate tables for distinct entities\n",
1581-
"- Use foreign keys to link related data\n",
1582-
"\n",
1583-
"### 4. Use Core DataJoint Types\n",
1584-
"\n",
1585-
"DataJoint has a three-layer type architecture (see [Type System Specification](../reference/specs/type-system.md)):\n",
1586-
"\n",
1587-
"1. **Native database types** (Layer 1): Backend-specific types like `INT`, `FLOAT`, `TINYINT UNSIGNED`. These are **discouraged** but allowed for backward compatibility.\n",
1588-
"\n",
1589-
"2. **Core DataJoint types** (Layer 2): Standardized, scientist-friendly types that work identically across MySQL and PostgreSQL. **Always prefer these.**\n",
1590-
"\n",
1591-
"3. **Codec types** (Layer 3): Types with `encode()`/`decode()` semantics like `<blob>`, `<attach>`, `<object@>`.\n",
1592-
"\n",
1593-
"**Core types used in this tutorial:**\n",
1594-
"\n",
1595-
"| Type | Description | Example |\n",
1596-
"|------|-------------|---------|\n",
1597-
"| `uint8`, `uint16`, `int32` | Sized integers | `session_idx : uint16` |\n",
1598-
"| `float32`, `float64` | Sized floats | `reaction_time : float32` |\n",
1599-
"| `varchar(n)` | Variable-length string | `name : varchar(100)` |\n",
1600-
"| `bool` | Boolean | `correct : bool` |\n",
1601-
"| `date` | Date only | `date_of_birth : date` |\n",
1602-
"| `datetime` | Date and time (UTC) | `created_at : datetime` |\n",
1603-
"| `enum(...)` | Enumeration | `sex : enum('M', 'F', 'U')` |\n",
1604-
"| `json` | JSON document | `task_params : json` |\n",
1605-
"| `uuid` | Universally unique ID | `experimenter_id : uuid` |\n",
1606-
"\n",
1607-
"**Why native types are allowed but discouraged:**\n",
1608-
"\n",
1609-
"Native types (like `int`, `float`, `tinyint`) are passed through to the database but generate a **warning at declaration time**. They are discouraged because:\n",
1610-
"- They lack explicit size information\n",
1611-
"- They are not portable across database backends\n",
1612-
"- They are not recorded in field metadata for reconstruction\n",
1613-
"\n",
1614-
"If you see a warning like `\"Native type 'int' used; consider 'int32' instead\"`, update your definition to use the corresponding core type.\n",
1615-
"\n",
1616-
"### 5. Document Your Tables\n",
1617-
"- Add comments after `#` in definitions\n",
1618-
"- Document units in attribute comments\n",
1619-
"\n",
1620-
"## Key Concepts Recap\n",
1621-
"\n",
1622-
"| Concept | Description |\n",
1623-
"|---------|-------------|\n",
1624-
"| **Primary Key** | Attributes above `---` that uniquely identify rows |\n",
1625-
"| **Secondary Attributes** | Attributes below `---` that store additional data |\n",
1626-
"| **Foreign Key** (`->`) | Reference to another table, imports its primary key |\n",
1627-
"| **One-to-Many** | FK in primary key: parent has many children |\n",
1628-
"| **One-to-One** | FK is entire primary key: exactly one child per parent |\n",
1629-
"| **Master-Part** | Compositional integrity: master and parts inserted/deleted atomically |\n",
1630-
"| **Nullable FK** | `[nullable]` makes the reference optional |\n",
1631-
"| **Lookup Table** | Pre-populated reference data |\n",
1632-
"\n",
1633-
"## Next Steps\n",
1634-
"\n",
1635-
"- [Data Entry](03-data-entry.ipynb) — Inserting, updating, and deleting data\n",
1636-
"- [Queries](04-queries.ipynb) — Filtering, joining, and projecting\n",
1637-
"- [Computation](05-computation.ipynb) — Building computational pipelines"
1638-
]
1533+
"source": "## Best Practices\n\n### 1. Choose Meaningful Primary Keys\n- Use natural identifiers when possible (`subject_id = 'M001'`)\n- Keep keys minimal but sufficient for uniqueness\n\n### 2. Use Appropriate Table Tiers\n- **Manual**: Data entered by operators or instruments\n- **Lookup**: Configuration, parameters, reference data\n- **Imported**: Data read from files (recordings, images)\n- **Computed**: Derived analyses and summaries\n\n### 3. Normalize Your Data\n- Don't repeat information across rows\n- Create separate tables for distinct entities\n- Use foreign keys to link related data\n\n### 4. Use Core DataJoint Types\n\nDataJoint has a three-layer type architecture (see [Type System Specification](../../reference/specs/type-system/)):\n\n1. **Native database types** (Layer 1): Backend-specific types like `INT`, `FLOAT`, `TINYINT UNSIGNED`. These are **discouraged** but allowed for backward compatibility.\n\n2. **Core DataJoint types** (Layer 2): Standardized, scientist-friendly types that work identically across MySQL and PostgreSQL. **Always prefer these.**\n\n3. **Codec types** (Layer 3): Types with `encode()`/`decode()` semantics like `<blob>`, `<attach>`, `<object@>`.\n\n**Core types used in this tutorial:**\n\n| Type | Description | Example |\n|------|-------------|---------|\n| `uint8`, `uint16`, `int32` | Sized integers | `session_idx : uint16` |\n| `float32`, `float64` | Sized floats | `reaction_time : float32` |\n| `varchar(n)` | Variable-length string | `name : varchar(100)` |\n| `bool` | Boolean | `correct : bool` |\n| `date` | Date only | `date_of_birth : date` |\n| `datetime` | Date and time (UTC) | `created_at : datetime` |\n| `enum(...)` | Enumeration | `sex : enum('M', 'F', 'U')` |\n| `json` | JSON document | `task_params : json` |\n| `uuid` | Universally unique ID | `experimenter_id : uuid` |\n\n**Why native types are allowed but discouraged:**\n\nNative types (like `int`, `float`, `tinyint`) are passed through to the database but generate a **warning at declaration time**. They are discouraged because:\n- They lack explicit size information\n- They are not portable across database backends\n- They are not recorded in field metadata for reconstruction\n\nIf you see a warning like `\"Native type 'int' used; consider 'int32' instead\"`, update your definition to use the corresponding core type.\n\n### 5. Document Your Tables\n- Add comments after `#` in definitions\n- Document units in attribute comments\n\n## Key Concepts Recap\n\n| Concept | Description |\n|---------|-------------|\n| **Primary Key** | Attributes above `---` that uniquely identify rows |\n| **Secondary Attributes** | Attributes below `---` that store additional data |\n| **Foreign Key** (`->`) | Reference to another table, imports its primary key |\n| **One-to-Many** | FK in primary key: parent has many children |\n| **One-to-One** | FK is entire primary key: exactly one child per parent |\n| **Master-Part** | Compositional integrity: master and parts inserted/deleted atomically |\n| **Nullable FK** | `[nullable]` makes the reference optional |\n| **Lookup Table** | Pre-populated reference data |\n\n## Next Steps\n\n- [Data Entry](03-data-entry/) — Inserting, updating, and deleting data\n- [Queries](04-queries/) — Filtering, joining, and projecting\n- [Computation](05-computation/) — Building computational pipelines"
16391534
},
16401535
{
16411536
"cell_type": "code",
@@ -1676,4 +1571,4 @@
16761571
},
16771572
"nbformat": 4,
16781573
"nbformat_minor": 4
1679-
}
1574+
}

src/tutorials/basics/03-data-entry.ipynb

Lines changed: 2 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1588,25 +1588,7 @@
15881588
"cell_type": "markdown",
15891589
"id": "cell-42",
15901590
"metadata": {},
1591-
"source": [
1592-
"## Quick Reference\n",
1593-
"\n",
1594-
"| Operation | Method | Use Case |\n",
1595-
"|-----------|--------|----------|\n",
1596-
"| Insert one | `insert1(row)` | Adding single entity |\n",
1597-
"| Insert many | `insert(rows)` | Bulk data loading |\n",
1598-
"| Update one | `update1(row)` | Surgical corrections only |\n",
1599-
"| Delete | `delete()` | Removing entities (cascades) |\n",
1600-
"| Delete quick | `delete_quick()` | Internal cleanup (no cascade) |\n",
1601-
"| Validate | `validate(rows)` | Pre-insert check |\n",
1602-
"\n",
1603-
"See the [Data Manipulation Specification](../reference/specs/data-manipulation.md) for complete details.\n",
1604-
"\n",
1605-
"## Next Steps\n",
1606-
"\n",
1607-
"- [Queries](04-queries.ipynb) — Filtering, joining, and projecting data\n",
1608-
"- [Computation](05-computation.ipynb) — Building computational pipelines"
1609-
]
1591+
"source": "## Quick Reference\n\n| Operation | Method | Use Case |\n|-----------|--------|----------|\n| Insert one | `insert1(row)` | Adding single entity |\n| Insert many | `insert(rows)` | Bulk data loading |\n| Update one | `update1(row)` | Surgical corrections only |\n| Delete | `delete()` | Removing entities (cascades) |\n| Delete quick | `delete_quick()` | Internal cleanup (no cascade) |\n| Validate | `validate(rows)` | Pre-insert check |\n\nSee the [Data Manipulation Specification](../../reference/specs/data-manipulation/) for complete details.\n\n## Next Steps\n\n- [Queries](04-queries/) — Filtering, joining, and projecting data\n- [Computation](05-computation/) — Building computational pipelines"
16101592
},
16111593
{
16121594
"cell_type": "code",
@@ -1648,4 +1630,4 @@
16481630
},
16491631
"nbformat": 4,
16501632
"nbformat_minor": 5
1651-
}
1633+
}

0 commit comments

Comments
 (0)