Skip to content

Commit cb14686

Browse files
Merge pull request #140 from datajoint/feat/diagram-improvements
fix: update datajoint-python ref to pre/v2.1
2 parents 200b0c4 + bb3ad66 commit cb14686

File tree

6 files changed

+317
-5
lines changed

6 files changed

+317
-5
lines changed

.github/workflows/development.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ jobs:
1616
uses: actions/checkout@v2
1717
with:
1818
repository: datajoint/datajoint-python
19-
ref: pre/v2.0
19+
ref: pre/v2.1
2020
path: datajoint-python
2121
- name: Compile docs static artifacts
2222
run: |

mkdocs.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ nav:
6464
- Read Diagrams: how-to/read-diagrams.ipynb
6565
- Project Management:
6666
- Manage Pipeline Project: how-to/manage-pipeline-project.md
67+
- Deploy to Production: how-to/deploy-production.md
6768
- Data Operations:
6869
- Insert Data: how-to/insert-data.md
6970
- Query Data: how-to/query-data.md

src/how-to/deploy-production.md

Lines changed: 307 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,307 @@
1+
# Deploy to Production
2+
3+
Configure DataJoint for production environments with controlled schema changes and project isolation.
4+
5+
## Overview
6+
7+
Development and production environments have different requirements:
8+
9+
| Concern | Development | Production |
10+
|---------|-------------|------------|
11+
| Schema changes | Automatic table creation | Controlled, explicit changes only |
12+
| Naming | Ad-hoc schema names | Consistent project prefixes |
13+
| Configuration | Local settings | Environment-based |
14+
15+
DataJoint 2.0 provides settings to enforce production discipline.
16+
17+
## Prevent Automatic Table Creation
18+
19+
By default, DataJoint creates tables automatically when you first access them. This is convenient during development but dangerous in production—a typo or code bug could create unintended tables.
20+
21+
### Enable Production Mode
22+
23+
Set `create_tables=False` to prevent automatic table creation:
24+
25+
```python
26+
import datajoint as dj
27+
28+
# Production mode: no automatic table creation
29+
dj.config.database.create_tables = False
30+
```
31+
32+
Or via environment variable:
33+
34+
```bash
35+
export DJ_CREATE_TABLES=false
36+
```
37+
38+
Or in `datajoint.json`:
39+
40+
```json
41+
{
42+
"database": {
43+
"create_tables": false
44+
}
45+
}
46+
```
47+
48+
### What Changes
49+
50+
With `create_tables=False`:
51+
52+
| Action | Development (True) | Production (False) |
53+
|--------|-------------------|-------------------|
54+
| Access existing table | Works | Works |
55+
| Access missing table | Creates it | **Raises error** |
56+
| Explicit `Schema(create_tables=True)` | Creates | Creates (override) |
57+
58+
### Example: Production Safety
59+
60+
```python
61+
import datajoint as dj
62+
63+
dj.config.database.create_tables = False
64+
schema = dj.Schema('myproject_ephys')
65+
66+
@schema
67+
class Recording(dj.Manual):
68+
definition = """
69+
recording_id : int
70+
---
71+
path : varchar(255)
72+
"""
73+
74+
# If table doesn't exist in database:
75+
Recording() # Raises DataJointError: Table not found
76+
```
77+
78+
### Override for Migrations
79+
80+
When you need to create tables during a controlled migration:
81+
82+
```python
83+
# Explicit override for this schema only
84+
schema = dj.Schema('myproject_ephys', create_tables=True)
85+
86+
@schema
87+
class NewTable(dj.Manual):
88+
definition = """..."""
89+
90+
NewTable() # Creates the table
91+
```
92+
93+
## Use Schema Prefixes
94+
95+
When multiple projects share a database server, use prefixes to avoid naming collisions and organize schemas.
96+
97+
### Configure Project Prefix
98+
99+
```python
100+
import datajoint as dj
101+
102+
dj.config.database.schema_prefix = 'myproject_'
103+
```
104+
105+
Or via environment variable:
106+
107+
```bash
108+
export DJ_SCHEMA_PREFIX=myproject_
109+
```
110+
111+
Or in `datajoint.json`:
112+
113+
```json
114+
{
115+
"database": {
116+
"schema_prefix": "myproject_"
117+
}
118+
}
119+
```
120+
121+
### Apply Prefix to Schemas
122+
123+
Use the prefix when creating schemas:
124+
125+
```python
126+
import datajoint as dj
127+
128+
prefix = dj.config.database.schema_prefix # 'myproject_'
129+
130+
# Schema names include prefix
131+
subject_schema = dj.Schema(prefix + 'subject') # myproject_subject
132+
session_schema = dj.Schema(prefix + 'session') # myproject_session
133+
ephys_schema = dj.Schema(prefix + 'ephys') # myproject_ephys
134+
```
135+
136+
### Benefits
137+
138+
- **Isolation**: Multiple projects coexist without conflicts
139+
- **Visibility**: Easy to identify which schemas belong to which project
140+
- **Permissions**: Grant access by prefix pattern (`myproject_*`)
141+
- **Cleanup**: Drop all project schemas by prefix
142+
143+
### Database Permissions by Prefix
144+
145+
```sql
146+
-- Grant access to all schemas with prefix
147+
GRANT ALL PRIVILEGES ON `myproject\_%`.* TO 'developer'@'%';
148+
149+
-- Read-only access to another project
150+
GRANT SELECT ON `otherproject\_%`.* TO 'developer'@'%';
151+
```
152+
153+
## Environment-Based Configuration
154+
155+
Use different configurations for development, staging, and production.
156+
157+
### Configuration Hierarchy
158+
159+
DataJoint loads settings in priority order:
160+
161+
1. **Environment variables** (highest priority)
162+
2. **Secrets directory** (`.secrets/`)
163+
3. **Config file** (`datajoint.json`)
164+
4. **Defaults** (lowest priority)
165+
166+
### Development Setup
167+
168+
**datajoint.json** (committed):
169+
```json
170+
{
171+
"database": {
172+
"host": "localhost",
173+
"create_tables": true
174+
}
175+
}
176+
```
177+
178+
**.secrets/database.user**:
179+
```
180+
dev_user
181+
```
182+
183+
### Production Setup
184+
185+
Override via environment:
186+
187+
```bash
188+
# Production database
189+
export DJ_HOST=prod-db.example.com
190+
export DJ_USER=prod_user
191+
export DJ_PASS=prod_password
192+
193+
# Production mode
194+
export DJ_CREATE_TABLES=false
195+
export DJ_SCHEMA_PREFIX=myproject_
196+
197+
# Disable interactive prompts
198+
export DJ_SAFEMODE=false
199+
```
200+
201+
### Docker/Kubernetes Example
202+
203+
```yaml
204+
# docker-compose.yaml
205+
services:
206+
worker:
207+
image: my-pipeline:latest
208+
environment:
209+
- DJ_HOST=db.example.com
210+
- DJ_USER_FILE=/run/secrets/db_user
211+
- DJ_PASS_FILE=/run/secrets/db_password
212+
- DJ_CREATE_TABLES=false
213+
- DJ_SCHEMA_PREFIX=prod_
214+
secrets:
215+
- db_user
216+
- db_password
217+
```
218+
219+
## Complete Production Configuration
220+
221+
### datajoint.json (committed)
222+
223+
```json
224+
{
225+
"database": {
226+
"host": "localhost",
227+
"port": 3306
228+
},
229+
"stores": {
230+
"default": "main",
231+
"main": {
232+
"protocol": "s3",
233+
"endpoint": "s3.amazonaws.com",
234+
"bucket": "my-org-data",
235+
"location": "myproject"
236+
}
237+
}
238+
}
239+
```
240+
241+
### Production Environment Variables
242+
243+
```bash
244+
# Database
245+
export DJ_HOST=prod-mysql.example.com
246+
export DJ_USER=prod_service
247+
export DJ_PASS=<from-secret-manager>
248+
249+
# Production behavior
250+
export DJ_CREATE_TABLES=false
251+
export DJ_SCHEMA_PREFIX=prod_
252+
export DJ_SAFEMODE=false
253+
254+
# Logging
255+
export DJ_LOG_LEVEL=WARNING
256+
```
257+
258+
### Verification Script
259+
260+
```python
261+
#!/usr/bin/env python
262+
"""Verify production configuration before deployment."""
263+
import datajoint as dj
264+
265+
def verify_production_config():
266+
"""Check that production settings are correctly applied."""
267+
errors = []
268+
269+
# Check create_tables is disabled
270+
if dj.config.database.create_tables:
271+
errors.append("create_tables should be False in production")
272+
273+
# Check schema prefix is set
274+
if not dj.config.database.schema_prefix:
275+
errors.append("schema_prefix should be set in production")
276+
277+
# Check not pointing to localhost
278+
if dj.config.database.host == 'localhost':
279+
errors.append("database.host is localhost - expected production host")
280+
281+
if errors:
282+
for e in errors:
283+
print(f"ERROR: {e}")
284+
return False
285+
286+
print("Production configuration verified")
287+
return True
288+
289+
if __name__ == '__main__':
290+
import sys
291+
sys.exit(0 if verify_production_config() else 1)
292+
```
293+
294+
## Summary
295+
296+
| Setting | Development | Production |
297+
|---------|-------------|------------|
298+
| `database.create_tables` | `true` | `false` |
299+
| `database.schema_prefix` | `""` or `dev_` | `prod_` |
300+
| `safemode` | `true` | `false` (automated) |
301+
| `loglevel` | `DEBUG` | `WARNING` |
302+
303+
## See Also
304+
305+
- [Manage Pipeline Project](manage-pipeline-project.md) — Project organization
306+
- [Configuration Reference](../reference/configuration.md) — All settings
307+
- [Manage Secrets](manage-secrets.md) — Credential management

src/how-to/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ they assume you understand the basics and focus on getting things done.
2424
## Project Management
2525

2626
- [Manage a Pipeline Project](manage-pipeline-project.md) — Multi-schema pipelines, team collaboration
27+
- [Deploy to Production](deploy-production.md) — Production mode, schema prefixes, environment config
2728

2829
## Data Operations
2930

src/how-to/manage-pipeline-project.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -368,6 +368,7 @@ These challenges grow with team size and pipeline complexity. The [DataJoint Pla
368368

369369
## See Also
370370

371+
- [Deploy to Production](deploy-production.md) — Production mode and environment configuration
371372
- [Data Pipelines](../explanation/data-pipelines.md) — Conceptual overview and architecture
372373
- [Configure Object Storage](configure-storage.md) — Storage setup
373374
- [Distributed Computing](distributed-computing.md) — Multi-worker pipelines

src/reference/configuration.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,14 @@ Configuration is loaded in priority order:
1515

1616
| Setting | Environment | Default | Description |
1717
|---------|-------------|---------|-------------|
18-
| `database.host` | `DJ_HOST` | `localhost` | MySQL server hostname |
19-
| `database.port` | `DJ_PORT` | `3306` | MySQL server port |
20-
| `database.user` | `DJ_USER` || Database username |
21-
| `database.password` | `DJ_PASS` || Database password |
18+
| `database.host` | `DJ_HOST` | `localhost` | Database server hostname |
19+
| `database.port` | `DJ_PORT` | `3306` | Database server port |
20+
| `database.user` | `DJ_USER` || Database username (required) |
21+
| `database.password` | `DJ_PASS` || Database password (required) |
2222
| `database.reconnect` || `True` | Auto-reconnect on connection loss |
2323
| `database.use_tls` || `None` | Enable TLS encryption |
24+
| `database.schema_prefix` | `DJ_SCHEMA_PREFIX` | `""` | Project-specific prefix for schema names |
25+
| `database.create_tables` | `DJ_CREATE_TABLES` | `True` | Default for `Schema(create_tables=)`. Set `False` for production mode |
2426

2527
## Connection Settings
2628

0 commit comments

Comments
 (0)