docs: DataJoint 2.0 Documentation - Diátaxis structure, specs, and concepts#94
Merged
Conversation
- Create new directory structure: explanation/, tutorials/, how-to/, reference/specs/, api/ - Add index pages for each section with content outlines - Update mkdocs.yaml with new navigation (removed partnerships/publications) - Add mkdocs-jupyter for notebook support - Update README with comprehensive project description - Add about/index.md and about/contributing.md - Update license references to Apache 2.0 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Migrated spec documents: - primary-keys.md - Primary key rules in query operators - semantic-matching.md - Attribute lineage and join compatibility - type-system.md - Three-layer type architecture - codec-api.md - Custom codec implementation - fetch-api.md - Data retrieval methods - autopopulate.md - Jobs 2.0 specification - job-metadata.md - Hidden job tracking columns Updated specs/index.md with proper categorization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Created explanation pages based on datajoint-book concepts: - relational-workflow-model.md - Core paradigm, three approaches compared - entity-integrity.md - Primary keys, three questions framework - normalization.md - Workflow normalization principle - query-algebra.md - Five operators with examples - type-system.md - Three-layer architecture, codecs - computation-model.md - AutoPopulate, Jobs 2.0 Updated explanation/index.md with grid card layout. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Added explanation/custom-codecs.md covering codec extensibility - Updated TERMINOLOGY.md with codec extensibility terms - Updated mkdocs.yaml navigation - Updated explanation/index.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Added mkdocstrings, gen-files, literate-nav plugins - Created scripts/gen_api_pages.py for auto-generating API docs - Updated mkdocs.yaml with API generation configuration - Created reference pages: configuration.md, definition-syntax.md, errors.md - Updated api/index.md with module links - Added pip requirements for doc generation API docs are auto-generated from datajoint-python/src docstrings using NumPy-style format. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Archive elements/ (to be documented separately) - Archive partnerships/ and projects/ (handled elsewhere) - Archive support-events.md and additional-resources.md - Remove redundant about/ files (about.md, contribute.md, datajoint-team.md) - Update index.md to remove Elements reference - Update nav to remove Elements section 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Comprehensive spec covering: - Table tiers and class structure - Definition string grammar - Attribute types (core, string, temporal, codec) - Default values and nullable attributes - Foreign key references and options - Index declarations - Part tables - Auto-populated tables - Validation rules - SQL generation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Comprehensive spec covering all query operators: - Restriction (& and -): condition types, semantic matching - Projection (.proj): selection, renaming, computed attributes - Join (*): functional dependencies, PK determination, left join - Aggregation (.aggr): grouping, aggregate functions, HAVING - Extension (.extend): left join with A→B requirement - Union (+): combining entity sets, PK requirements - Universal sets (dj.U): unique values, global aggregation Also covers: - Semantic matching rules and lineage - Operator precedence - Subquery generation rules - Quick reference table 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Comprehensive spec covering insert, update1, and delete operations: - Workflow normalization philosophy: insert/delete as primary ops - Updates as surgical corrections (update1 only, by design) - The recomputation pattern for data corrections Insert operations: - insert() with all parameters and input formats - insert1() convenience method - staged_insert1 for large objects (Zarr, HDF5) - Handling duplicates, extra fields, auto-populated tables Update operations: - update1() requirements and constraints - When to use vs when to delete/reinsert - Why no bulk update (by design) Delete operations: - Cascade behavior to dependent tables - Safe mode and transaction control - Part table constraints - delete_quick() for internal use Also covers validation, transactions, error handling, best practices. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Restructured to present DataJoint 2.0 as the status quo: - Starts with fundamentals: table types, make() method, key_source - Explains populate() method and operating modes - Describes per-table jobs system as native feature - Covers priority, scheduling, distributed computing - Migration from 1.x moved to brief section at end Removed problem/solution framing that assumed 1.x knowledge. Now readable as standalone 2.0 documentation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added comprehensive coverage of: - Key source calculation: automatic derivation from FK joins, custom key sources - The populate process: execution flow, direct mode behavior, return values - The make() method: basic pattern, requirements, tripartite make (generator and method-based) - Transaction management: automatic transactions, atomicity, scope diagrams - Part tables: computed results with parts, transaction behavior, cascading deletes - Progress monitoring: progress() method, display_progress parameter - Direct vs distributed mode comparison Reorganized to present basic populate first, job reservation as an extension. Tripartite make pattern documented with both generator and method approaches. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tutorials: - 01-getting-started: Blob detection pipeline example - 02-schema-design: Table tiers, keys, relationships, core types - 03-data-entry: Insert, update, delete operations - 04-queries: Restriction, projection, join, aggregation, fetch - 05-computation: Computed tables, make(), populate() Updates: - Home page: Relational Workflow Model explanation - Type system: Core types vs native types distinction - Schema design: Master-part relationships, compositional integrity - All tutorials use DataJoint 2.0 API (to_arrays, to_dicts, keys) - Dates updated to January 2026 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Explain OAS: unified architecture for relational + object storage - Clarify "object" terminology (data objects, not OOP) - Emphasize that object storage is managed with same rigor as database - List key OAS features: transparent access, lifecycle, deduplication - Update Quick Start dates to 2026 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove replace=True example, add caveat about breaking immutability - Introduce master-part with transactions for compositional integrity - Explain auto-populated tables enforce transactions automatically - Manual tables need explicit transactions for master-part inserts - All session+trial inserts now use transactions - Update best practices to emphasize transaction usage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes: - 02-schema-design: Add task_params=None for consistent field sets - 03-data-entry: Fix to_arrays() usage for single column - 05-computation: Cast numpy bool to Python bool for is_fast All 5 tutorials now execute successfully with outputs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
API updates: - Replace safemode parameter with prompt in delete() - Remove download_path from fetch methods (use config.override instead) - Update fetch-api spec with config-based download path All tutorials re-executed and pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The fetch module was removed in modern-fetch-api merge. Fetch methods are now on QueryExpression directly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
These terms are misnomers - they are restriction operations, not joins. Replaced with: - "Restriction by Query Expression" - "restriction" / "anti-restriction" Added reference to semantic matching spec for attribute matching. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Explain that semantic matching prevents accidental matches on unrelated attributes that happen to share names. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace keep_all_rows with exclude_nonmatching (inverted logic) - Default behavior now keeps all rows (LEFT JOIN) - Update query-algebra.md and primary-keys.md specs - Expand queries tutorial with: - Join primary key determination via functional dependencies - Entity-to-entity aggregation concept - Extension operator (.extend()) - Universal set (dj.U()) for ad-hoc groupings 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Explain default behavior keeps all entities (even without matches) - Show count(pk_attr) vs count(*) for correct zero counts - Add exclude_nonmatching=True example for filtering 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Clarify that prompt default is determined by config['safemode'] - Not hardcoded to True or "interactive mode" - Update best practices section accordingly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
User-friendly reference covering all query operators: - Restriction (&) and anti-restriction (-) - Projection (.proj()) - Join (*) - Extension (.extend()) - Aggregation (.aggr()) - Union (+) - Universal set (dj.U()) - Operator precedence - Semantic matching explanation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
entity-integrity.md: - Fix surrogate key definition: used inside database, not exposed to users - Replace auto_increment with UUID (no auto-increment in DataJoint) - Update all examples to use core DataJoint types (uint32, float32, etc.) - Use <blob> for blob storage type - Use datetime(3) for millisecond, datetime(6) for microsecond precision computation-model.md: - Add three-part make model for long-running computations - Explain make_fetch, make_compute, make_insert pattern - Document re-fetch verification for referential integrity - Explain when to use standard vs three-part make - Fix int to uint32 in Segmentation example 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- installation.md — Install DataJoint and set up environment - configure-database.md — Database connection with secrets separation - define-tables.md — Table definitions with core DataJoint types - insert-data.md — Insert patterns including transactions - query-data.md — Query operators quick reference - fetch-results.md — Output methods and formats - run-computations.md — populate() and three-part make 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove unnecessary int() and bool() wrappers around boolean values now that datajoint-python properly handles np.bool_ types. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
5 tasks
Update LICENSE from MIT to Apache 2.0 with copyright: Copyright 2014-2026 DataJoint Inc. and contributors 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tutorials: - Add tutorial 06: Object Storage (externals, attachments, file stores) - Add advanced tutorials: custom codecs, distributed computing, migration - Fix distributed.ipynb multiprocessing demo (explain module requirement) - Minor updates to tutorials 01-03 for consistency How-to guides: - Add 14 new task-oriented guides covering common operations - Expand index with full guide listing Explanation: - Expand entity integrity section Config: - Update mkdocs.yaml navigation for new content - Add new images for documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
installation.md: - Change mysql-connector-python to pymysql - Update Python requirement to 3.10+ - Add DataJoint.com as recommended managed service define-tables.md: - Add Schema creation explanation - Separate core types from built-in codecs - Add json as core type (no angle brackets) - Document built-in codecs: blob, attach, object@store - Move indexes to end of definition examples - Clarify tables declared at @Schema decorator time - Add schema.drop() and table.drop() for prototyping - Use uint16 instead of int in examples configure-database.md: - Remove untested multiple connections section - Add DataJoint.com tip configure-storage.md: - Add DataJoint.com tip for pre-configured storage backup-restore.md: - Add DataJoint.com tip for automatic backups 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New pages: - explanation/whats-new-2.md: Summary of DataJoint 2.0 features - Rename migrate-from-1x.md to migrate-from-0x.md (version jump was 0.14 → 2.0) define-tables.md: - Add default values and nullable attributes section - Document timestamp precision options - Condense auto-increment section, cross-reference design-primary-keys - Use "object storage" terminology (not "external storage") - Improve ObjectRef codec description use-object-storage.md: - Define Object-Augmented Schema (OAS) - Explain internal, hash-addressed, path-addressed storage sections manage-pipeline-project.md: - Update platform architecture diagram - Replace "DataJoint Works" with "DataJoint Platform" - Update links to datajoint.com/sign-up Formatting fixes (blank lines before bullet lists): - model-relationships, define-tables, migrate-from-0x - monitor-progress, distributed-computing, design-primary-keys - create-custom-codec, use-object-storage, manage-pipeline-project Specs: Update dates from 2025 to 2026 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add datajoint_version: "2.0" to mkdocs.yaml extra config - Create announce.html partial to display version in header banner - Enable announce.dismiss feature for dismissible banner - Banner links to "What's New in 2.0" page This provides a simple version coordination mechanism without full multi-version documentation. When datajoint-python releases a new version, update datajoint_version in mkdocs.yaml. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
manage-large-data.md: - Replace deprecated .fetch() with .to_arrays() - Use lazy iteration (for row in table) for streaming - Use keys() with batch slicing for ID-range batching whats-new-2.md: - Add links to relevant specs for each feature section 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove early adopters and contributors paragraph - Link to original Google Code archive (2009-2011) - Link to 2015 foundational publication - Add 2018 theoretical framework paper - Remove MICrONS paragraph - Add Vathes LLC re-incorporation as DataJoint Inc. (2024) - Reorder chronologically - Add note about hundreds of research labs using DataJoint 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove datajoint.slack.com link, add link to datajoint-python discussions for community support. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Updated across all docs (except fetch-api.md migration guide):
- .fetch() → .to_dicts() or .to_arrays()
- .fetch(as_dict=True) → .to_dicts()
- .fetch('KEY') → .keys()
- .fetch(attr) → .to_arrays(attr)
- for row in table.fetch() → for row in table (lazy iteration)
Files updated:
- reference/errors.md
- reference/specs/autopopulate.md, data-manipulation.md, query-algebra.md
- reference/specs/job-metadata.md, type-system.md
- explanation/query-algebra.md, computation-model.md, whats-new-2.md
- how-to/monitor-progress.md, distributed-computing.md, handle-errors.md
- how-to/alter-tables.md, backup-restore.md, define-tables.md
- tutorials/advanced/migration.ipynb (also fixed 1.x → 0.x)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update LICENSE file to CC BY 4.0 (matching README) - Fix README license link (main -> master) - Remove `text` from core types lists (it's a native type, not core) - Remove `time` and `timestamp` from core types in define-tables.md - Add `bytes` to core types list in define-tables.md - Ensure consistent core type documentation across all files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Complete restructuring of DataJoint documentation following the Diátaxis framework for DataJoint 2.0.
Documentation Structure
Content
Tutorials (Learning-Oriented)
6 core tutorials + 3 advanced, all tested via
pytest --nbmake:How-To Guides (14 guides)
Explanation (Conceptual)
Specifications (10 formal specs)
Table Declaration, Query Algebra, Data Manipulation, Primary Keys,
Semantic Matching, Type System, Codec API, AutoPopulate, Fetch API, Job Metadata
Key Updates
.to_dicts(),.keys(), etc.)Test Plan
mkdocs buildsucceedspytest --nbmake)Related
🤖 Generated with Claude Code