The following document describes how to setup a development environment and connect to a database so that you can use the DataJoint Elements to build and run a workflow on your local machine.
Any of the DataJoint Elements can be combined together to create a workflow that matches your experimental setup. We have a number of example workflows to get you started. Each focuses on a specific modality, but they can be adapted for your custom workflow.
-
Getting up and running will require a couple items for a good development environment. If any of these items are already familiar to you and installed on your machine, you can skip the corresponding section.
-
Next, you'll need to download one of the example workflows and corresponding example data.
-
Finally, there are a couple different approaches to connecting to a database. Here, we highlight three approaches:
-
First Time: Beginner. Temporary storage to learn the ropes.
-
Local Database: Intermediate. Deployed on local hardware, managed by you.
-
Central Database: Advanced: Deployed on dedicated hardware.
-
This diagram describes the general components for a local DataJoint environment.
flowchart LR
%% Nodes
py_interp["Python Interpreter"]
db_server["Database Server<br>(e.g., MySQL)"]
conda_env["Conda Environment"]
terminal["Terminal or Jupyter Notebook"]
%% Edges
py_interp -->|DataJoint| db_server
terminal --> conda_env
conda_env --> py_interp
%% Styling
classDef boxes fill:#ddd,stroke:#333;
class py_interp,db_server,conda_env,terminal boxes;
DataJoint Elements are written in Python. The DataJoint Python API supports Python versions 3.7 and up. We recommend downloading the latest stable release of 3.9 here, and following the install instructions.
Python projects each rely on different dependencies, which may conflict across projects. We recommend working in a Conda environment for each project to isolate the dependencies. For more information on why Conda, and setting up the version of Conda that best suits your needs, see this article.
To get going quickly, we recommend you ...
-
Download Miniconda and go through the setup, including adding Miniconda to your
PATH(full instructions here). -
Declare and initialize a new conda environment with the following commands. Edit
<name>to reflect your project.conda create --name datajoint-workflow-<name> python=3.9 conda activate datajoint-workflow-<name>
??? Warning "Apple M1 users: Click to expand"
Running analyses with Element DeepLabCut or Element Calcium imaging may require
tensorflow, which can cause issues on M1 machines. By saving the <code>yaml</code>
file below, this environment can be loaded with <code>conda create -f my-file.yaml
</code>. If you encounter errors related to <code>clang</code>, try launching xcode
and retrying.
```yaml
name: dj-workflow-<name>
channels:
- apple
- conda-forge
- defaults
dependencies:
- tensorflow-deps
- opencv
- python=3.9
- pip>=19.0
- pip:
- tensorflow-macos
- tensorflow-metal
- datajoint
```
Development and use can be done with a plain text editor in the terminal. However, an integrated development environment (IDE) can improve your experience. Several IDEs are available. We recommend Microsoft's Visual Studio Code, also called VS Code. To set up VS Code with Python for the first time, follow this tutorial.
Table definitions and analysis code can change over time, especially with multiple collaborators working on the same project. Git is an open-source, distributed version control system that helps keep track of what changes where made when, and by whom. GitHub is a platform that hosts projects managed with git. The example DataJoint Workflows are hosted on GitHub, we will use git to clone (i.e., download) this repository.
- Check if you already have git by typing
git --versionin a terminal window. - If git is not installed on your system, please install git.
- You can read more about git basics here.
To run the demo notebooks and generate visualizations associated with an example workflow, you'll need a couple extra packages.
Jupyter Notebooks help structure code (see here for full instructions on Jupyter within VS Code).
-
Install Jupyter packages
conda install jupyter ipykernel nb_conda_kernels -
Ensure your VS Code python interpreter is set to your Conda environment path.
Click to expand more details.
- View > Command Palette
- Type "Python: Select Interpreter", hit enter.
- If asked, select the workspace where you plan to download the workflow.
- If present, select your Conda environment. If not present, enter in the path.
DataJoint Diagrams rely on additional packages. To install these packages,
enter the following command...
console conda install graphviz python-graphviz pydotplus
Of the options below, pick the workflow that best matches your needs.
-
Change the directory to where you want to download the workflow.
cd ~/Projects -
Clone the relevant repository, and change directories to this new directory.
git clone https://github.com/datajoint/<repository> cd <repository>
-
Install this directory as editable with the
-eflag.pip install -e .Why editable? Click for details
This lets you modify the code after installation and experiment with different designs or adding additional tables. You may wish to edit `pipeline.py` or `paths.py` to better suit your needs. If no modification is required, using `pip install .` is sufficient. -
Install
element-interface, which has utilities used across different Elements and Workflows.pip install "element-interface @ git+https://github.com/datajoint/element-interface" -
​Set up a local DataJoint config file by saving the following block as a json in your workflow directory as
dj_local_conf.json. Not sure what to put for the< >values below? We'll cover this when we connect to the database{ "database.host": "<hostname>", "database.user": "<username>", "database.password": "<password>", "loglevel": "INFO", "safemode": true, "display.limit": 7, "display.width": 14, "display.show_tuple_count": true, "custom": { "database.prefix": "<username_>" } }
-
:fontawesome-brands-python:{ .lg .middle } Workflow Session
An example workflow for session management.
-
:fontawesome-brands-python:{ .lg .middle } Workflow Array Electrophysiology
An example workflow for Neuropixels probes.
-
:fontawesome-brands-python:{ .lg .middle } Workflow Calcium Imaging
An example workflow for calcium imaging microscopy.
-
:fontawesome-brands-python:{ .lg .middle } Workflow Miniscope
An example workflow for miniscope calcium imaging.
-
:fontawesome-brands-python:{ .lg .middle } Workflow DeepLabCut
An example workflow for pose estimation with DeepLabCut.
The first notebook in each workflow will guide you through downloading example data from DataJoint's AWS storage archive. You can also process your own data. To use the example data, you would ...
-
Install
djarchive-clientpip install git+https://github.com/datajoint/djarchive-client.git -
Use a python terminal to import the
djarchiveclient and view available datasets, and revisions.import djarchive_client client = djarchive_client.client() list(client.datasets()) # List available datasets, select one list(client.revisions()) # List available revisions, select one
-
Prepare a directory to store the download data, for example in
/tmp, then download the data with thedjarchiveclient. This may take some time with larger datasets.import os os.makedirs('/tmp/example_data/', exist_ok=True) client.download( '<workflow-dataset>', target_directory='/tmp/example_data', revision='<revision>' )
??? Note "Array Ephys: Click to expand details"
- **Dataset**: workflow-array-ephys-benchmark
- **Revision**: 0.1.0a4
- **Size**: 293 GB
The example <code>subject6/session1</code> data was recorded with SpikeGLX and
processed with Kilosort2.
```
/tmp/example_data/
- subject6
- session1
- towersTask_g0_imec0
- towersTask_g0_t0_nidq.meta
- towersTask_g0_t0.nidq.bin
```
Element and Workflow Array Ephys also support data recorded with
OpenEphys.
??? Note "Calcium Imaging: Click to expand details" - Dataset: workflow-array-calcium-imaging-test-set - Revision: 0_1_0a2 - Size: 142 GB
The example `subject3` data was recorded with Scanbox.
The example `subject7` data was recorded with ScanImage.
Both datasets were processed with Suite2p.
```
/tmp/example_data/
- subject3/
- 210107_run00_orientation_8dir/
- run00_orientation_8dir_000_000.sbx
- run00_orientation_8dir_000_000.mat
- suite2p/
- combined
- plane0
- plane1
- plane2
- plane3
- subject7/
- session1
- suite2p
- plane0
```
Element and Workflow Calcium Imaging also support data collected with ...
- Nikon
- Prairie View
- CaImAn
??? Note "DeepLabCut: Click to expand details" - Dataset: workflow-dlc-data - Revision: v1 - Size: .3 GB
The example data includes both training data and pretrained models.
```
/tmp/test_data/from_top_tracking/
- config.yml
- dlc-models/iteration-0/from_top_trackingFeb23-trainset95shuffle1/
- test/pose_cfg.yaml
- train/
- checkpoint
- checkpoint_orig
─ learning_stats.csv
─ log.txt
─ pose_cfg.yaml
─ snapshot-10300.data-00000-of-00001
─ snapshot-10300.index
─ snapshot-10300.meta # same for 103000
- labeled-data/
- train1/
- CollectedData_DJ.csv
- CollectedData_DJ.h5
- img00674.png # and others
- train2/ # similar to above
- videos/
- test.mp4
- train1.mp4
```
??? Note "FaceMap: Click to expand details"
**Associated workflow still under development**
- **Dataset**: workflow-facemap
- **Revision**: 0.0.0
- **Size**: .3 GB
Some of the workflows carry some assumptions about how your file directory will be organized, and how some files are named.
??? Note "Array Ephys: Click to expand details"
- In your [DataJoint config](#config), add another item under `custom`,
`ephys_root_data_dir`, for your local root data directory. This can include
multiple roots.
```json
"custom": {
"database.prefix": "<username_>",
"ephys_root_data_dir": ["/local/root/dir1", "/local/root/dir2"]
}
```
- The `subject` directory names must match the subject IDs in your subjects table.
The `ingest.py` script (
[demo ingestion notebook](https://github.com/datajoint/workflow-array-ephys/blob/main/notebooks/04-automate-optional.ipynb)
) can help load these values from `./user_data/subjects.csv`.
- The `session` directories can have any naming convention, but must be specified
in the session table (see also
[demo ingestion notebook](https://github.com/datajoint/workflow-array-ephys/blob/main/notebooks/04-automate-optional.ipynb)
).
- Each session can have multiple probes.
- The `probe` directory names must end in a one-digit number corresponding to the
probe number.
- Each `probe` directory should contain:
- One neuropixels meta file named `*[0-9].ap.meta`
- Optionally, one Kilosort output folder
Folder structure:
```
<ephys_root_data_dir>/
└───<subject1>/ # Subject name in `subjects.csv`
│ └───<session0>/ # Session directory in `sessions.csv`
│ │ └───imec0/
│ │ │ │ *imec0.ap.meta
│ │ │ └───ksdir/
│ │ │ │ spike_times.npy
│ │ │ │ templates.npy
│ │ │ │ ...
│ │ └───imec1/
│ │ │ *imec1.ap.meta
│ │ └───ksdir/
│ │ │ spike_times.npy
│ │ │ templates.npy
│ │ │ ...
│ └───<session1>/
│ │ │ ...
└───<subject2>/
│ │ ...
```
??? Note "Calcium Imaging: Click to expand details"
**Note:** While Element Calcium Imaging can accommodate multiple scans per
session, Workflow Calcium Imaging assumes there is only one scan per session.
- In your [DataJoint config](#config), add another item under `custom`,
`imaging_root_data_dir`, for your local root data directory.
```json
"custom": {
"database.prefix": "<username_>",
"imaging_root_data_dir": "/local/root/dir1"
}
```
- The `subject` directory names must match the subject IDs in your subjects table.
The `ingest.py` script (
[tutorial notebook](https://github.com/datajoint/element-calcium-imaging/blob/main/notebooks/tutorial.ipynb)
) can help load these values from `./user_data/subjects.csv`.
- The `session` directories can have any naming convention, but must be specified
in the session table (see also
[tutorial notebook])(https://github.com/datajoint/element-calcium-imaging/blob/main/notebooks/tutorial.ipynb)
.
- Each `session` directory should contain:
- All `.tif` or `.sbx` files for the scan, with any naming convention.
- One `suite2p` subfolder, containing the analysis outputs in the default naming
convention.
- One `caiman` subfolder, containing the analysis output `.hdf5` file, with any
naming convention.
Folder structure:
```
imaging_root_data_dir/
└───<subject1>/ # Subject name in `subjects.csv`
│ └───<session0>/ # Session directory in `sessions.csv`
│ │ │ scan_0001.tif
│ │ │ scan_0002.tif
│ │ │ scan_0003.tif
│ │ │ ...
│ │ └───suite2p/
│ │ │ ops1.npy
│ │ └───plane0/
│ │ │ │ ops.npy
│ │ │ │ spks.npy
│ │ │ │ stat.npy
│ │ │ │ ...
│ │ └───plane1/
│ │ │ ops.npy
│ │ │ spks.npy
│ │ │ stat.npy
│ │ │ ...
│ │ └───caiman/
│ │ │ analysis_results.hdf5
│ └───<session1>/ # Session directory in `sessions.csv`
│ │ │ scan_0001.tif
│ │ │ scan_0002.tif
│ │ │ ...
└───<subject2>/ # Subject name in `subjects.csv`
│ │ ...
```
??? Note "DeepLabCut: Click to expand details"
**Note:** Element DeepLabCut assumes you've already used the DeepLabCut GUI to
set up your project and label your data. This can include multiple roots.
- In your [DataJoint config](#config), add another item under
`custom`, `dlc_root_data_dir`, for your local root
data directory.
```json
"custom": {
"database.prefix": "<username_>",
"dlc_root_data_dir": ["/local/root/dir1", "/local/root/dir2"]
}
```
- You have preserved the default DeepLabCut project directory, shown below.
- The paths in your various `yaml` files reflect the current folder structure.
- You have generated the `pickle` and `mat` training files. If not, follow the
DeepLabCut guide to
[create a training dataset](https://github.com/DeepLabCut/DeepLabCut/blob/master/docs/standardDeepLabCut_UserGuide.md#f-create-training-datasets)
Folder structure:
```
/dlc_root_data_dir/your_project/
- config.yaml # Including correct path information
- dlc-models/iteration-*/your_project_date-trainset*shuffle*/
- test/pose_cfg.yaml # Including correct path information
- train/pose_cfg.yaml # Including correct path information
- labeled-data/any_names/*{csv,h5,png}
- training-datasets/iteration-*/UnaugmentedDataSet_your_project_date/
- your_project_*shuffle*.pickle
- your_project_scorer*shuffle*.mat
- videos/any_names.mp4
```
??? Note "Miniscope: Click to expand details"
- In your [DataJoint config](#config), add another item under `custom`,
`miniscope_root_data_dir`, for your local root data directory.
```json
"custom": {
"database.prefix": "<username_>",
"miniscope_root_data_dir": "/local/root/dir"
}
```
DataJoint helps you connect to a database server from your programming environment (i.e., Python or MATLAB), granting a number of benefits over traditional file hierarchies (see YouTube Explainer). We offer two options:
- The First Time beginner approach loads example data to a temporary existing database, saving you setup time. But, because this data will be purged intermittently, it should not be used in a true experiment.
- The Local Database intermediate approach will walk you through setting up your own database on your own hardware. While easier to manage, it may be difficult to expose this to outside collaborators.
- The Central Database advanced approach has the benefits of running on dedicated hardware, but may require significant IT expertise and infrastructure depending on your needs.
Temporary storage. Not for production use.
- Make an account at accounts.datajoint.io.
- In a workflow directory, make a config
jsonfile calleddj_local_conf.jsonusing your DataJoint account information andtutorial-db.datajoint.ioas the host.Note: Your database prefix must begin with your username in order to have permission to declare new tables.{ "database.host": "tutorial-db.datajoint.io", "database.user": "<datajoint-username>", "database.password": "<datajoint-password>", "loglevel": "INFO", "safemode": true, "display.limit": 7, "display.width": 14, "display.show_tuple_count": true, "custom": { "database.prefix": "<datajoint-username_>" } } - Launch a Python terminal and start interacting with the workflow.
-
Install Docker.
Why Docker? Click for details.
Docker makes it easy to package a program, including the file system and related code libraries, in a <i>container</i>. This container can be distributed to any machine, both automating and standardizing the setup process. -
Test that docker has been installed by running the following command:
docker run --rm hello-world -
Launch the DataJoint MySQL server with the following command:
docker run -p 3306:3306 -e MYSQL_ROOT_PASSWORD=tutorial datajoint/mysqlWhat's this doing? Click for details.
- Download a container image called datajoint/mysql, which is pre-installed and configured MySQL database with appropriate settings for use with DataJoint
- Open up the port 3306 (MySQL default) on your computer so that your database server can accept connections.
- Set the password for the root database user to be tutorial, which are then used in the config file.
-
In a workflow directory, make a config
jsonfile calleddj_local_conf.jsonusing the following details. The prefix can be set to any value.{ "database.host": "localhost", "database.password": "tutorial", "database.user": "root", "database.port": 3306, "loglevel": "INFO", "safemode": true, "display.limit": 7, "display.width": 14, "display.show_tuple_count": true, "custom": { "database.prefix": "neuro_" } }
??? Note "Already familiar with Docker? Click here for details."
This document is written to apply to all example workflows. Many have a docker
folder used by developers to set up both a database and a local environment for
integration tests. Simply `docker compose up` the relevant file and
`docker exec` into the relevant container.
To set up a database on dedicated hardware may require expertise to set up and maintain. DataJoint's MySQL Docker image project provides all the information required to set up a dedicated database.
-
Connect to the database and import tables
from <relevant-workflow>.pipeline import *
-
View the declared tables. For a more in depth explanation of how to run the workflow and explore the data, refer to the Jupyter notebooks in the workflow directory.
Array Ephys: Click to expand details
```python subject.Subject() session.Session() ephys.ProbeInsertion() ephys.EphysRecording() ephys.Clustering() ephys.Clustering.Unit() ```Calcium Imaging: Click to expand details
```python subject.Subject() session.Session() scan.Scan() scan.ScanInfo() imaging.ProcessingParamSet() imaging.ProcessingTask() ```DeepLabCut: Click to expand details
```python subject.Subject() session.Session() train.TrainingTask() model.VideoRecording.File() model.Model() model.PoseEstimation.BodyPartPosition() ```
DataJoint LabBook is a graphical user interface to facilitate data entry for existing DataJoint tables.
-
Labbook Website - If a database is public (e.g.,
tutorial-db) and you have access, you can view the contents here. -
DataJoint LabBook Documentation, including prerequisites, installation, and running the application