Skip to content

Commit ae0b01c

Browse files
Merge pull request #132 from datajoint/pre/v2.1
2 parents fe7d5d3 + bb774bc commit ae0b01c

File tree

2 files changed

+63
-35
lines changed

2 files changed

+63
-35
lines changed

src/explanation/type-system.md

Lines changed: 62 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -12,49 +12,69 @@ database efficiency with Python convenience.
1212
graph TB
1313
subgraph "Layer 3: Codecs"
1414
blob["‹blob›"]
15+
blob_at["‹blob@›"]
1516
attach["‹attach›"]
17+
attach_at["‹attach@›"]
1618
npy["‹npy@›"]
1719
object["‹object@›"]
20+
filepath["‹filepath@›"]
1821
hash["‹hash@›"]
19-
custom["‹custom›"]
22+
plugin["‹plugin›"]
2023
end
2124
subgraph "Layer 2: Core Types"
2225
int32
2326
float64
2427
varchar
2528
json
2629
bytes
30+
uuid
2731
end
28-
subgraph "Layer 1: Native"
29-
INT["INT"]
30-
DOUBLE["DOUBLE"]
31-
VARCHAR["VARCHAR"]
32+
subgraph "Layer 1: Native Types (MySQL / PostgreSQL)"
33+
INT["INT / INTEGER"]
34+
DOUBLE["DOUBLE / DOUBLE PRECISION"]
35+
VARCHAR_N["VARCHAR"]
3236
JSON_N["JSON"]
33-
BLOB["LONGBLOB"]
37+
BYTES_N["LONGBLOB / BYTEA"]
38+
UUID_N["BINARY(16) / UUID"]
3439
end
3540
3641
blob --> bytes
42+
blob_at --> hash
3743
attach --> bytes
44+
attach_at --> hash
45+
hash --> json
3846
npy --> json
3947
object --> json
40-
hash --> json
41-
bytes --> BLOB
48+
filepath --> json
49+
50+
bytes --> BYTES_N
4251
json --> JSON_N
4352
int32 --> INT
4453
float64 --> DOUBLE
45-
varchar --> VARCHAR
54+
varchar --> VARCHAR_N
55+
uuid --> UUID_N
4656
```
4757

58+
Core types provide **portability** — the same table definition works on both MySQL and PostgreSQL. For example, `bytes` maps to `LONGBLOB` on MySQL but `BYTEA` on PostgreSQL; `uuid` maps to `BINARY(16)` on MySQL but native `UUID` on PostgreSQL. Native types can be used directly but sacrifice cross-backend compatibility.
59+
4860
## Layer 1: Native Database Types
4961

50-
Backend-specific types (MySQL, PostgreSQL). **Discouraged for direct use.**
62+
Backend-specific types. **Can be used directly at the cost of portability.**
5163

5264
```python
53-
# Native types (avoid)
54-
column : TINYINT UNSIGNED
55-
column : MEDIUMBLOB
65+
# Native types — work but not portable
66+
column : TINYINT UNSIGNED # MySQL only
67+
column : MEDIUMBLOB # MySQL only (use BYTEA on PostgreSQL)
68+
column : SERIAL # PostgreSQL only
5669
```
5770

71+
| MySQL | PostgreSQL | Portable Alternative |
72+
|-------|------------|---------------------|
73+
| `LONGBLOB` | `BYTEA` | `bytes` |
74+
| `BINARY(16)` | `UUID` | `uuid` |
75+
| `SMALLINT` | `SMALLINT` | `int16` |
76+
| `DOUBLE` | `DOUBLE PRECISION` | `float64` |
77+
5878
## Layer 2: Core DataJoint Types
5979

6080
Standardized, scientist-friendly types that work identically across backends.
@@ -106,24 +126,24 @@ Codec types use angle bracket notation:
106126

107127
### Built-in Codecs
108128

109-
| Codec | Database | Object Store | Returns |
110-
|-------|----------|--------------|---------|
111-
| `<blob>` ||`<blob@>` | Python object |
112-
| `<attach>` ||`<attach@>` | Local file path |
113-
| `<npy@>` ||| NpyRef (lazy) |
114-
| `<object@>` ||| ObjectRef |
115-
| `<hash@>` ||| bytes |
116-
| `<filepath@>` ||| ObjectRef |
129+
| Codec | Database | Object Store | Addressing | Returns |
130+
|-------|----------|--------------|------------|---------|
131+
| `<blob>` ||`<blob@>` | Hash | Python object |
132+
| `<attach>` ||`<attach@>` | Hash | Local file path |
133+
| `<npy@>` ||| Schema | NpyRef (lazy) |
134+
| `<object@>` ||| Schema | ObjectRef |
135+
| `<hash@>` ||| Hash | bytes |
136+
| `<filepath@>` ||| | ObjectRef |
117137

118138
### Plugin Codecs
119139

120-
Additional codecs are available as separately installed packages. This ecosystem is actively expanding—new codecs are added as community needs arise.
140+
Additional schema-addressed codecs are available as separately installed packages. This ecosystem is actively expanding—new codecs are added as community needs arise.
121141

122142
| Package | Codec | Description | Repository |
123143
|---------|-------|-------------|------------|
124-
| `dj-zarr-codecs` | `<zarr@>` | Zarr arrays with lazy chunked access | [datajoint/dj-zarr-codecs](https://github.com/datajoint/dj-zarr-codecs) |
125-
| `dj-figpack-codecs` | `<figpack@>` | Interactive browser visualizations | [datajoint/dj-figpack-codecs](https://github.com/datajoint/dj-figpack-codecs) |
126-
| `dj-photon-codecs` | `<photon@>` | Photon imaging data formats | [datajoint/dj-photon-codecs](https://github.com/datajoint/dj-photon-codecs) |
144+
| `dj-zarr-codecs` | `<zarr@>` | Schema-addressed Zarr arrays with lazy chunked access | [datajoint/dj-zarr-codecs](https://github.com/datajoint/dj-zarr-codecs) |
145+
| `dj-figpack-codecs` | `<figpack@>` | Schema-addressed interactive browser visualizations | [datajoint/dj-figpack-codecs](https://github.com/datajoint/dj-figpack-codecs) |
146+
| `dj-photon-codecs` | `<photon@>` | Schema-addressed photon imaging data formats | [datajoint/dj-photon-codecs](https://github.com/datajoint/dj-photon-codecs) |
127147

128148
**Installation and discovery:**
129149

@@ -208,7 +228,7 @@ class Config(dj.Manual):
208228

209229
### `<npy@>` — NumPy Arrays as .npy Files
210230

211-
Stores NumPy arrays as standard `.npy` files with lazy loading. Returns `NpyRef` which provides metadata access (shape, dtype) without downloading.
231+
Schema-addressed storage for NumPy arrays as standard `.npy` files. Returns `NpyRef` which provides metadata access (shape, dtype) without downloading.
212232

213233
```python
214234
class Recording(dj.Computed):
@@ -241,19 +261,21 @@ result = np.mean(ref) # Downloads automatically
241261
- **Safe bulk fetch**: Fetching many rows doesn't download until needed
242262
- **Memory mapping**: `ref.load(mmap_mode='r')` for random access to large arrays
243263

244-
### `<object@>`Path-Addressed Storage
264+
### `<object@>`Schema-Addressed Storage
245265

246-
For large/complex file structures (Zarr, HDF5). Path derived from primary key.
266+
Schema-addressed storage for files and folders. Path mirrors the database structure: `{schema}/{table}/{pk}/{attribute}`.
247267

248268
```python
249269
class ProcessedData(dj.Computed):
250270
definition = """
251271
-> Recording
252272
---
253-
zarr_data : <object@> # Stored at {schema}/{table}/{pk}/
273+
results : <object@> # Stored at {schema}/{table}/{pk}/results/
254274
"""
255275
```
256276

277+
Accepts files, folders, or bytes. Returns `ObjectRef` for lazy access.
278+
257279
### `<filepath@store>` — Portable References
258280

259281
References to independently-managed files with portable paths.
@@ -269,11 +291,17 @@ class RawData(dj.Manual):
269291

270292
## Storage Modes
271293

272-
| Mode | Database | Object Store | Use Case |
273-
|------|----------|--------------|----------|
274-
| Database | Data || Small data |
275-
| Hash-addressed | Metadata | Deduplicated | Large/repeated data |
276-
| Path-addressed | Metadata | PK-based path | Complex files |
294+
Object store codecs use one of two addressing schemes:
295+
296+
**Hash-addressed** — Path derived from content hash (e.g., `_hash/ab/cd/abcd1234...`). Provides automatic deduplication—identical content stored once. Used by `<blob@>`, `<attach@>`, `<hash@>`.
297+
298+
**Schema-addressed** — Path mirrors database structure: `{schema}/{table}/{pk}/{attribute}`. Human-readable, browsable paths that reflect your data organization. No deduplication. Used by `<object@>`, `<npy@>`, and plugin codecs (`<zarr@>`, `<figpack@>`, `<photon@>`).
299+
300+
| Mode | Database | Object Store | Deduplication | Use Case |
301+
|------|----------|--------------|---------------|----------|
302+
| Database | Data ||| Small data |
303+
| Hash-addressed | Metadata | Content hash path | ✅ Automatic | Large/repeated data |
304+
| Schema-addressed | Metadata | Schema-mirrored path | ❌ None | Complex files, browsable storage |
277305

278306
## Custom Codecs
279307

src/reference/specs/npy-codec.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -287,4 +287,4 @@ arr = np.load('/path/to/store/my_schema/recording/recording_id=1/waveform.npy')
287287

288288
- [Type System Specification](type-system.md) - Complete type system overview
289289
- [Codec API](codec-api.md) - Creating custom codecs
290-
- [Object Storage](type-system.md#object--path-addressed-storage) - Path-addressed storage details
290+
- [Object Storage](type-system.md#object--schema-addressed-storage) - Schema-addressed storage details

0 commit comments

Comments
 (0)