This document specifies the table declaration mechanism in DataJoint Python. Table declarations define the schema structure using a domain-specific language (DSL) embedded in Python class definitions.
@schema
class TableName(dj.Manual):
definition = """
# table comment
primary_attr : int32
---
secondary_attr : float64
"""| Tier | Base Class | Table Prefix | Purpose |
|---|---|---|---|
| Manual | dj.Manual |
(none) | User-entered data |
| Lookup | dj.Lookup |
# |
Reference/enumeration data |
| Imported | dj.Imported |
_ |
Data from external sources |
| Computed | dj.Computed |
__ |
Derived from other tables |
| Part | dj.Part |
master__ |
Detail records of master table |
- Format: Strict CamelCase (e.g.,
MyTable,SessionAnalysis) - Pattern:
^[A-Z][A-Za-z0-9]*$ - Conversion: CamelCase to snake_case for SQL table name
- Examples:
SessionTrial->session_trialProcessedEMG->processed_emg
- Maximum length: 64 characters (MySQL limit)
- Final name: prefix + snake_case(class_name)
- Validation: Checked at declaration time
[table_comment]
primary_key_section
---
secondary_section
# Free-form description of the table purpose
- Must be first non-empty line if present
- Starts with
# - Cannot start with
#: - Stored in MySQL table COMMENT
---
- Three or more dashes
- Separates primary key attributes (above) from secondary attributes (below)
- Required if table has secondary attributes
Each non-empty, non-comment line is one of:
- Attribute definition
- Foreign key reference
- Index declaration
!!! version-added "New in 2.1"
Singleton tables were introduced in DataJoint 2.1.
A singleton table can hold at most one row. It is declared with no attributes in the primary key section:
@schema
class Config(dj.Lookup):
definition = """
# Global configuration
---
setting1 : varchar(100)
setting2 : int32
"""Behavior:
| Operation | Result |
|---|---|
| Insert | Works without specifying a key |
| Second insert | Raises DuplicateError |
fetch1() |
Returns the single row |
heading.primary_key |
Returns [] (empty) |
Use cases:
- Global configuration settings
- Pipeline parameters
- Summary statistics
- State tracking
Implementation:
Internally, singleton tables use a hidden _singleton attribute of type bool as the primary key. This attribute is:
- Automatically created and populated
- Excluded from
heading.attributes - Excluded from
fetch()results - Excluded from join matching
attribute_name [= default_value] : type [# comment]
| Component | Required | Description |
|---|---|---|
attribute_name |
Yes | Identifier for the column |
default_value |
No | Default value (before colon) |
type |
Yes | Data type specification |
comment |
No | Documentation (after #) |
- Pattern:
^[a-z_][a-z0-9_]*$ - Start: Lowercase letter or underscore
- Contains: Lowercase letters, digits, underscores
- Convention: snake_case
3.4 Hidden Attributes
Attributes with names starting with an underscore (_) are hidden. The hidden-attribute mechanism is reserved for platform-managed columns — bookkeeping that DataJoint itself adds to support the data pipeline — and is intentionally not exposed for user-defined attributes. Attempting to declare an attribute name with a leading underscore raises:
DataJointError: Attribute name in line "_hidden: bool" starts with an underscore.
Names with leading underscore are reserved for platform-managed columns
(e.g. _job_start_time, _singleton). Use a regular attribute name; if you
need to control visibility at the call site, use proj().
Platform-managed hidden attributes are added automatically when DataJoint declares certain table types. Users do not write these in the definition; the framework injects them programmatically after parsing.
| Hidden attribute | Added to | Purpose |
|---|---|---|
_job_start_time |
Computed, Imported |
Wall-clock start of the populate call |
_job_duration |
Computed, Imported |
Elapsed seconds for the populate call |
_job_version |
Computed, Imported |
Library version that produced the row |
_singleton |
Singleton tables | Implementation detail of the singleton pattern |
These columns are populated by DataJoint internals via raw SQL during the populate() lifecycle, not via insert/update1. They are filtered out of every public API surface so they don't clutter joins, fetches, or displays.
Behavior. The filter is implemented in Heading.attributes, which all visible code paths consume; raw SQL strings bypass it.
| Context | Hidden attributes |
|---|---|
heading.attributes, heading.names, heading.primary_key |
Excluded |
heading._attributes (internal) |
Included |
Table display / repr / _repr_html_ |
Excluded |
fetch(), fetch1(), to_dicts(), to_pandas() (default) |
Excluded |
fetch("_name") / fetch1("_name") (explicit) |
Rejected (Attribute not found) — use raw SQL via conn.query(...) |
proj("_name") (explicit) |
Rejected (same reason) |
| Natural-join namesake matching | Excluded |
Dict restriction Table & {"_name": value} |
Silently ignored |
String restriction Table & "_name = ..." |
Included (passes to SQL) |
insert(), insert1() |
Rejected — KeyError("`_name` is not in the table heading") |
update1() |
Rejected — DataJointError("Attribute `_name` not found.") |
insert(..., ignore_extra_fields=True) |
Silently dropped (key not written) |
describe() / reverse-engineered definition |
Excluded |
unique index (..., _name) |
Allowed |
Why users can't declare them. Allowing user-defined hidden attributes would expose a feature with no public-API write path (insert/update1 reject the keys; ignore_extra_fields=True drops them silently), no describe() round-trip (the regenerated definition would be missing the column), and silent filtering on dict restrictions. The cases users typically reach for hidden attributes — most commonly an index-backing derived column — are better served by a regular attribute.
Inspecting platform-managed hidden columns:
# Default fetch — hidden columns excluded
results = MyTable.to_dicts()
# To inspect platform-managed hidden columns, query raw SQL.
# The public API (fetch / proj) intentionally rejects them.
conn = MyTable.connection
rows = conn.query(
f"SELECT _job_start_time, _job_duration, _job_version "
f"FROM {MyTable.full_table_name}"
).fetchall()
# String restriction works (passes through to SQL)
MyTable & "_job_start_time > '2024-01-01'"
# Dict restriction is silently dropped — does NOT filter
MyTable & {'_job_start_time': some_date} # ⚠ ignoredUse a regular attribute instead. When you want a column that's part of the schema-level contract (backing an index, storing a derived value, etc.) but isn't featured in default displays, declare it as a regular attribute and use proj() at the call site if you want to omit it from a particular query result. For example, a hash column backing a unique index:
@schema
class TaskParams(dj.Manual):
definition = """
task_id : int32
---
tool : varchar(32)
params : json
params_hash : varchar(32)
unique index (tool, params_hash)
"""
# Inserts work directly:
TaskParams.insert1({'task_id': 1, 'tool': 't', 'params': {...}, 'params_hash': h})
# Dict restrictions work:
TaskParams & {'params_hash': h}
# Hide from a specific result set with proj() if needed:
TaskParams.proj('tool', 'params').fetch()definition = """
# Experimental session with subject and timing info
session_id : int32 # auto-assigned
---
subject_name : varchar(100) # subject identifier
trial_number = 1 : int32 # default to 1
score = null : float32 # nullable
timestamp = CURRENT_TIMESTAMP : datetime # auto-timestamp
notes = '' : varchar(4000) # empty default
"""Scientist-friendly type names with guaranteed semantics:
| Type | SQL Mapping | Size | Description |
|---|---|---|---|
int8 |
tinyint |
1 byte | 8-bit signed integer |
int16 |
tinyint unsigned |
1 byte | 8-bit unsigned integer |
int16 |
smallint |
2 bytes | 16-bit signed integer |
int32 |
smallint unsigned |
2 bytes | 16-bit unsigned integer |
int32 |
int |
4 bytes | 32-bit signed integer |
int64 |
int unsigned |
4 bytes | 32-bit unsigned integer |
int64 |
bigint |
8 bytes | 64-bit signed integer |
int64 |
bigint unsigned |
8 bytes | 64-bit unsigned integer |
float32 |
float |
4 bytes | 32-bit IEEE 754 float |
float64 |
double |
8 bytes | 64-bit IEEE 754 float |
bool |
tinyint |
1 byte | Boolean (0 or 1) |
uuid |
binary(16) |
16 bytes | UUID stored as binary |
bytes |
longblob |
Variable | Binary data (up to 4GB) |
| Type | SQL Mapping | Description |
|---|---|---|
char(N) |
char(N) |
Fixed-length string |
varchar(N) |
varchar(N) |
Variable-length string (max N) |
enum('a','b',...) |
enum(...) |
Enumerated values |
| Type | SQL Mapping | Description |
|---|---|---|
date |
date |
Date (YYYY-MM-DD) |
datetime |
datetime |
Date and time |
datetime(N) |
datetime(N) |
With fractional seconds (0-6) |
| Type | SQL Mapping | Description |
|---|---|---|
json |
json |
JSON document |
decimal(P,S) |
decimal(P,S) |
Fixed-point decimal |
These SQL types are accepted but generate a warning recommending core types:
- Integer variants:
tinyint,smallint,mediumint,bigint,integer,serial - Float variants:
float,double,real(with size specifiers) - Text variants:
tinytext,mediumtext,longtext - Blob variants:
tinyblob,smallblob,mediumblob,longblob - Temporal:
time,timestamp,year - Numeric:
numeric(P,S)
Format: <codec_name> or <codec_name@store>
| Codec | In-table dtype | In-store dtype | Purpose |
|---|---|---|---|
<blob> |
bytes |
<hash> |
Serialized Python objects |
<hash> |
N/A (in-store only) | json |
Hash-addressed deduped storage |
<attach> |
bytes |
<hash> |
File attachments with filename |
<filepath> |
N/A (in-store only) | json |
Reference to managed file |
<object> |
N/A (in-store only) | json |
Object storage (Zarr, HDF5) |
In-store storage syntax:
<blob@>- default store<blob@store_name>- named store
Core types and codecs are stored in the SQL COMMENT field for reconstruction:
COMMENT ':float32:user comment here'
COMMENT ':<blob@store>:user comment'attribute_name = default_value : type
| Value | Meaning | SQL |
|---|---|---|
null |
Nullable attribute | DEFAULT NULL |
CURRENT_TIMESTAMP |
Server timestamp | DEFAULT CURRENT_TIMESTAMP |
"string" or 'string' |
String literal | DEFAULT "string" |
123 |
Numeric literal | DEFAULT 123 |
true/false |
Boolean | DEFAULT 1/DEFAULT 0 |
These values are used without quotes in SQL:
NULLCURRENT_TIMESTAMP
score = null : float32
- The special default
null(case-insensitive) makes the attribute nullable - Nullable attributes can be omitted from INSERT
- Primary key attributes CANNOT be nullable
Blob and JSON attributes can only have null as default:
# Valid
data = null : <blob>
# Invalid - raises DataJointError
data = '' : <blob>-> [options] ReferencedTable
| Option | Effect |
|---|---|
nullable |
All inherited attributes become nullable |
unique |
Creates UNIQUE INDEX on FK attributes |
Options are comma-separated in brackets:
-> [nullable, unique] ParentTable
Foreign keys automatically inherit all primary key attributes from the referenced table:
# Parent
class Subject(dj.Manual):
definition = """
subject_id : int32
---
name : varchar(100)
"""
# Child - inherits subject_id
class Session(dj.Manual):
definition = """
-> Subject
session_id : int32
---
session_date : date
"""| Position | Effect |
|---|---|
Before --- |
FK attributes become part of primary key |
After --- |
FK attributes are secondary |
-> [nullable] OptionalParent
- Only allowed after
---(secondary) - Primary key FKs cannot be nullable
- Creates optional relationship
-> [unique] ParentTable
- Creates UNIQUE INDEX on inherited attributes
- Enforces one-to-one relationship from child perspective
-> [nullable, unique] ParentTable
- Combines nullable and unique constraints
- Multiple rows can have NULL values (SQL standard: NULLs are not considered equal in UNIQUE constraints)
- At most one row per non-NULL parent reference
- Use case: optional one-to-one relationships where the child may not reference any parent
-> Parent.proj(alias='original_name')
- Reference same table multiple times with different attribute names
- Useful for self-referential or multi-reference patterns
All foreign keys use:
ON UPDATE CASCADE- Parent key changes propagateON DELETE RESTRICT- Cannot delete parent with children
Foreign key relationships are recorded in the ~lineage table:
{
'child_attr': ('parent_schema.parent_table', 'parent_attr')
}Used for semantic attribute matching in queries.
index(attr1, attr2, ...)
unique index(attr1, attr2, ...)
definition = """
# User contact information
user_id : int32
---
first_name : varchar(50)
last_name : varchar(50)
email : varchar(100)
index(last_name, first_name)
unique index(email)
"""Indexes can include SQL expressions:
index(last_name, (YEAR(birth_date)))
- Cannot be altered after table creation (via
table.alter()) - Must reference existing attributes
@schema
class Master(dj.Manual):
definition = """
master_id : int32
"""
class Detail(dj.Part):
definition = """
-> master
detail_id : int32
---
value : float32
"""- SQL name:
master_table__part_name - Example:
experiment__trial
Within Part definition, use:
-> master(lowercase keyword)-> MasterClassName(class name)
- Parts must reference their master
- Cannot delete Part records directly (use master)
- Cannot drop Part table directly (use master)
- Part inherits master's primary key
dj.Imported- Data from external sourcesdj.Computed- Derived from other DataJoint tables
All primary key attributes must come from foreign key references.
Valid:
class Analysis(dj.Computed):
definition = """
-> Session
-> Parameter
---
result : float64
"""Invalid (by default):
class Analysis(dj.Computed):
definition = """
-> Session
analysis_id : int32 # ERROR: non-FK primary key
---
result : float64
"""Override:
dj.config['jobs.allow_new_pk_fields_in_computed_tables'] = TrueWhen config['jobs.add_job_metadata'] = True, auto-populated tables receive:
| Column | Type | Description |
|---|---|---|
_job_start_time |
datetime(3) |
Job start timestamp |
_job_duration |
float64 |
Duration in seconds |
_job_version |
varchar(64) |
Code version |
| Check | Error |
|---|---|
| Unknown type | DataJointError: Unsupported attribute type |
| Invalid attribute name | DataJointError: Declaration error |
Comment starts with : |
DataJointError: comment must not start with colon |
| Non-null blob default | DataJointError: default value for blob can only be NULL |
| Check | Error |
|---|---|
| Table name > 64 chars | DataJointError: Table name exceeds max length |
| No primary key | DataJointError: Table must have a primary key |
| Nullable primary key attr | DataJointError: Primary key attributes cannot be nullable |
| Invalid CamelCase | DataJointError: Invalid table name |
| FK resolution failure | DataJointError: Foreign key reference could not be resolved |
The table.validate() method checks:
- Required fields present
- NULL constraints satisfied
- Primary key completeness
- Codec validation (if defined)
- UUID format
- JSON serializability
CREATE TABLE `schema`.`table_name` (
`attr1` TYPE1 NOT NULL COMMENT "...",
`attr2` TYPE2 DEFAULT NULL COMMENT "...",
PRIMARY KEY (`pk1`, `pk2`),
FOREIGN KEY (`fk_attr`) REFERENCES `parent` (`pk`)
ON UPDATE CASCADE ON DELETE RESTRICT,
INDEX (`idx_attr`),
UNIQUE INDEX (`uniq_attr`)
) ENGINE=InnoDB COMMENT="table comment"Core types and codecs are preserved in comments:
`value` float NOT NULL COMMENT ":float32:measurement value"
`data` longblob DEFAULT NULL COMMENT ":<blob>:serialized data"
`archive` json DEFAULT NULL COMMENT ":<blob@cold>:in-store data"| File | Purpose |
|---|---|
declare.py |
Definition parsing, SQL generation |
heading.py |
Attribute metadata, type reconstruction |
table.py |
Base Table class, declaration interface |
user_tables.py |
Tier classes (Manual, Computed, etc.) |
schemas.py |
Schema binding, table decoration |
codecs.py |
Codec registry and resolution |
lineage.py |
Attribute lineage tracking |
Potential improvements identified for the declaration system:
- Better error messages with suggestions and context
- Import-time validation via
__init_subclass__ - Parser alternatives (regex-based for simpler grammar)
- SQL dialect abstraction for multi-database support
- Extended constraints (CHECK, custom validation)
- Migration support for schema evolution
- Definition caching for performance
- IDE tooling support via structured intermediate representation