(getting-started-walkthrough)=

Walkthrough `datashuttle`

:::{note}

This walkthrough was written at datashuttle version 0.3.0. While the interface may look slightly different, core functionality remains the same.

:::

This tutorial will give a full introduction to managing a neuroscience project with datashuttle.

We will highlight datashuttle's key features by creating a 'mock' experiment, standardised to the NeuroBlueprint style.

   :align: center
   :class: only-dark
   :width: 550px

   :align: center
   :class: only-light
   :width: 550px

We will upload data to a central data storage machine, as you would do at the end of a real acquisition session.

Finally, we will download data from the central storage to a local machine, as you would do during analysis.

Installing `datashuttle`

The first step is to install datashuttle by following the instructions on the How to Install page.

::::{tab-set}

:::{tab-item} Graphical Interface :sync: gui

Entering datashuttle launch after installation will launch the application in your terminal:

   :align: center
   :class: only-dark
   :width: 700px

   :align: center
   :class: only-light
   :width: 700px

::: :::{tab-item} Python API :sync: python

We can check datashuttle has installed correctly by importing it into Python without error:

from datashuttle import DataShuttle

::: ::::

:::{note}

This walkthrough does not include the recently-added validation feature (it will be updated soon). Please see the validation guide for how to validate your project format.

:::

Make a new project

The first thing to do when using datashuttle on a new machine is to set up your project.

We need to set the:

project name
location of the project our local machine (where the acquired data will be saved).
location of the project on the central data storage (where we will upload the acquired data).

datashuttle supports connecting to the central storage machine either as a mounted drive or through SHH.
See Set up configs for transfer for detailed instructions for connecting a mounted drive or by using SSH.

In this walkthrough, we will set our central storage as a folder on our local machine for simplicity.

::::{tab-set} :::{tab-item} Graphical Interface :sync: gui

Click Make New Project and you will be taken to the project set up page.

   :align: center
   :class: only-dark
   :width: 900px

   :align: center
   :class: only-light
   :width: 900px

We'll call our project my_first_project and can type this into the first input box on the page:

   :align: center
   :class: only-dark
   :width: 400px

   :align: center
   :class: only-light
   :width: 400px

Next we need to specify the local path, the location on our machine where acquired data will be saved. Choose any directory that is convenient.

In this example we will add the folder "local" to the end of the filepath for clarity:

   :align: center
   :class: only-dark
   :width: 400px

   :align: center
   :class: only-light
   :width: 400px

The filepath can be typed into the input, copied in with CTRL+V or selected from a directory tree using the Select button.

Finally, we need to select the central path. Usually this would be a path to a mounted central storage drive or relative to the server path if connecting via SSH.

In this tutorial, we will set this next to the local path for convenience.

Copy the contents of the local path input by clicking it, hovering over it and pressing CTRL+Q to copy.
Paste it into the central path input with CTRL+V and change "local" to "central".

   :align: center
   :class: only-dark
   :width: 400px

   :align: center
   :class: only-light
   :width: 400px

You can now click Save to set up the project.

Once the project is created, the Go to Project Screen button will appear. Click to move on to the Create Project page.

::: :::{tab-item} Python API :sync: python

First, we must initialise the DataShuttle object with our chosen project_name.

We will call our project "my_first_project":

from datashuttle import DataShuttle

project = DataShuttle("my_first_project")

Next, we will use the method set the configurations ('configs') for our project.

First, we need to specify the local_path as the location on our machine where the projact (and acquired data) will be located.

Next, we set the central_path to the project location on the central storage machine.

In this tutorial, we will set this next to the local_path for convenience.

Finally, we will set the connection_method to "local_filesystem" as we are not using SSH in this example.

project.make_config_file(
    local_path=r"C:\Users\Joe\data\local\my_first_project",
    central_path=r"C:\Users\Joe\data\central\my_first_project",
    connection_method="local_filesystem",
)

If you want to change any config in the future, use the method

project.update_config_file(
    local_path=r"C:\a\new\path"
)

We are now ready to create our standardised project folders. ::: ::::

Creating folders

Let's imagine today is our first day of data collection, and we are acquiring behaviour (behav) and electrophysiology (ephys) data.

We will create standardised subject, session and datatype folders in which to store the acquired data.

::::{tab-set} :::{tab-item} Graphical Interface :sync: gui

We will create standardised project folders using the Create tab.

   :align: center
   :class: only-dark
   :width: 900px

   :align: center
   :class: only-light
   :width: 900px

Following the NeuroBlueprint style we will call the first subject sub-001. Additional key-value pairs could be included if desired (see the NeuroBlueprint specification for details).

In the session name we will include today's date as an extra key-value pair. Our first session will be ses-001_date-<todays_date>.

We could start by typing sub-001 into the subject input box, but it is more convenient to simply double-left-click it. This will suggest the next subject number based on the current subjects in the project:

   :align: center
   :class: only-dark
   :width: 400px

   :align: center
   :class: only-light
   :width: 400px

The subject and session folder input boxes have live validation.
This will flag any
[NeuroBlueprint](https://neuroblueprint.neuroinformatics.dev/)
errors with a red border. Hover over the input box with the mouse
to see the nature of the error.

Next, we can input the session name. Double-left-click on the session input to automatically fill with ses-001. We can then add today's date with the @DATE@ convenience tag:

   :align: center
   :class: only-dark
   :width: 400px

   :align: center
   :class: only-light
   :width: 400px

Today's date will be automatically added when the session folder is created.

The datatype folders to create can be set with the Datatype(s) checkboxes. Uncheck the funcimg and anat datatype boxes to ensure we only create behav and ephys folders.

   :align: center
   :class: only-dark
   :width: 375px

   :align: center
   :class: only-light
   :width: 375px

Finally, click Create Folders to create the project folders. ::: :::{tab-item} Python API :sync: python We will create project folders with the method.

Following the NeuroBlueprint style we will call the first subject sub-001. Additional key-value pairs could be included if desired (see the NeuroBlueprint specification for details).

In the session name we will include today's date as an extra key-value pair. Our first session will be ses-001_date-<todays_date>.

Finally, we will tell datashuttle to create behav and ephys datatype folders only:

project.create_folders(
    top_level_folder="rawdata",
    sub_names="sub-001",
    ses_names="ses-001_@DATE@",
    datatype=["behav", "ephys"]
)

Navigate to the local_path in your system filebrowser to see the created folders.

The names of the folders to be created are validated on the fly against
[NeuroBlueprint](https://neuroblueprint.neuroinformatics.dev/latest/specification.html).
An error will be raised if names break with the specification and
the folders will not be created.

Two useful methods to automate folder creation are and . These can be used to automatically get the next subject and session names in a project.

To get the next subject in this project (sub-002) and the next session for that subject (ses-001) we can run:

next_sub = project.get_next_sub("rawdata")                # returns "sub-001"
next_ses = project.get_next_ses("rawdata", sub=next_sub)  # returns "ses-001"

project.create_folders(
    "rawdata",
    next_sub,
    f"{next_ses}_@DATE@",
    datatype=["behav", "ephys"]
)

These functions also take an include_central argument which extends the search for the next subject and session to the central project folder (False by default).

::: ::::

This was a quick overview of creating folders—see and How to use Create Folder Tags for full details including additional customisation with Name Templates.

Exploring folders

In our imagined experiment, we will now want to save data from acquisition software into our newly created, standardised folders. datashuttle provides some quick methods to pass the created folder paths to acquisition software.

::::{tab-set} :::{tab-item} Graphical Interface :sync: gui

When folders are created the Directory Tree on the left-hand side will update to display the new folders:

   :align: center
   :class: only-dark
   :width: 400px

   :align: center
   :class: only-light
   :width: 400px

By hovering over a folder with the mouse we can quickly copy the full path to the folder by pressing CTRL+Q) (you may need to click the Directory Tree first).

Alternatively, pressing CTRL+O will open the folder in your file browser.

Hover the mouse over the DirectoryTree for a tooltip indicating all possible shortcuts.


To continue with our experiment we will need to create 'mock'
acquired data to transfer to central storage. These will
take the form of simple text files with their extensions changed.

You can download these files from
[this link](https://gin.g-node.org/joe-ziminski/datashuttle/src/master/docs/tutorial-mock-data-files),
by right-clicking each file and selecting "Download (or) Save Link As...".
Alternatively you can create them in your favourite text editor.

Next, hover over the `behav` folder the `Directory Tree` with your
mouse and press `CTRL+O` to open the folder in your file browser.
Move the mock behavioural data file (`sub-001_ses-001_camera-top.mp4`)
into the `behav` datatype folder.

Next, repeat this for the `ephys` datatype by moving the remaining
electrophysiology files to the `ephys` folder.

Finally, hover the mouse over the `Directory Tree` and press `CTRL+R` to refresh.

::: :::{tab-item} Python API :sync: python

returns the full filepaths of created datatype folders.

These can be used in acquisition scripts to save data to these folders:

folder_path_dict = project.create_folders(
    top_level_folder="rawdata",
    sub_names=["sub-001"],
    ses_names=["ses-001_@DATE@"],
    datatype=["behav", "ephys"]

)

print([path_ for path_ in folder_path_dict["behav"]])
# ["C:\Users\Joe\data\local\my_first_project\sub-001\ses-001_16052024\behav"]


To continue with our experiment we will need to create 'mock'
acquired data to transfer to central storage. These will
take the form of simple text files with their extensions changed.

You can download these files from
[this link](https://gin.g-node.org/joe-ziminski/datashuttle/src/master/docs/tutorial-mock-data-files),
by right-clicking each file and selecting "Download (or) Save Link As...".
Alternatively you can create them in your favourite text editor.

Move the mock behavioural data file (`sub-001_ses-001_camera-top.mp4`)
into the `behav` datatype folder and the remaining
electrophysiology files to the `ephys` folder.

::: ::::

Uploading to central storage

We have now 'acquired' behav and ephys data onto our local machine. The next step is to upload it to central data storage.

In this walkthrough we set the central storage on our local machine for convenience. Typically, this would be an external central storage machine connected as a mounted drive or through SSH.

The **overwrite existing files** setting is very important.
It takes on the options **never**, **always** or **if source newer**.

See the [transfer options](transfer-options) section for full details.

::::{tab-set} :::{tab-item} Graphical Interface :sync: gui

Switch to the Transfer tab. On the left we again see a Directory Tree displaying the local version of the project:

   :align: center
   :class: only-dark
   :width: 900px

   :align: center
   :class: only-light
   :width: 900px

The first page on the Transfer tab allows us to upload the entire project, both the rawdata and derivatives—see the NeuroBlueprint specification for details.

We only have acquired data in the rawdata folder. We can simply click Transfer to upload everything to central storage.

The data from local will now appear in the "central" folder (an easy way to navigate to the folder to check is to go to the Config tab and press CTRL+O on the central path input box).

See the How to Transfer Data page for full details on transfer options, as well as Top Level Folder and Custom transfers.

Next, we will use Custom transfers to move only a subset of the dataset.

::: :::{tab-item} Python API :sync: python

is a high level method that uploads all files in the project. This includes both the rawdata and derivatives top-level folders—see the NeuroBlueprint specification for details.

As we only have a rawdata folder we can simply run:

project.upload_entire_project()

All files will be uploaded from the local version of the project to central storage.

Navigating to the central_path in your systems file browser, the newly transferred data will have appeared.

Other methods (e.g. and ) provide customisable transfers (and every upload method has an equivalent download method).

See the How to Transfer Data page for full details on transfer methods and arguments.

Next, we will use Custom transfers to move only a subset of the dataset.

::: ::::

Downloading from central storage

Next let's imagine we are now using an analysis machine on which there is no data. We want to download a subset of data central storage data for further processing.

In this example we will download the behavioural data only from the first session.

In practice datashuttle's custom data transfers work well when there are many subjects and sessions. For example, we may want to download only the behavioural 'test' sessions from a specific range of subjects.

To replicate starting on a new local machine, delete the `rawdata` folder
from your **local path**.

We will next download data from the **central path** to our now-empty local project.

In practice when setting up ``datashuttle`` on a new machine, you would
again [make a new project](set-up-a-project-for-transfer).

We will look at a small subset of possible options here—see How to make Custom Transfers for all possibilities.

::::{tab-set} :::{tab-item} Graphical Interface :sync: gui

The Custom transfer screen has options for selecting specific combinations of subjects, sessions and datatypes.

   :align: center
   :class: only-dark
   :width: 600px

   :align: center
   :class: only-light
   :width: 600px

In the subject input, we can simply type all (in this case, we only have one subject anyway).

   :align: center
   :class: only-dark
   :width: 400px

   :align: center
   :class: only-light
   :width: 400px

Next, let's specify what session to download. We can use the wildcard tag to avoid typing the exact date—ses-001_@*@:

   :align: center
   :class: only-dark
   :width: 400px

   :align: center
   :class: only-light
   :width: 400px

This is useful if you want to download many sessions, all with different dates.

Then, select only the behav datatype from the datatype checkboxes.

   :align: center
   :class: only-dark
   :width: 400px

   :align: center
   :class: only-light
   :width: 400px

Finally, we can select Download from the upload / download switch, and click Transfer.

   :align: center
   :class: only-dark
   :width: 580px

   :align: center
   :class: only-light
   :width: 580px

The transfer will complete, and the custom selection of files will now be available in the local path.

::: :::{tab-item} Python API :sync: python

We will use the method (the download equivalent method of the ).

Convenience tags can be used to make downloading subsets of data easier:

project.download_custom(
    top_level_folder="rawdata",
    sub_names="all",
    ses_names="ses-001_@*@",
    datatype="behav"
)

The "all" keyword will upload every subject in the project (in this case, we only have one subject anyway).

The @*@ wildcard tag can be used to match any part of a subject or session name—in this case we use it to avoid typing out the date. This is also useful if you want to download many sessions, all with different dates.

Finally, we chose to download only the behav data for the session. ::: ::::

Detailed information on data transfers can be found in the `Logs`.
Visit [How to Read the Logs](how-to-read-the-logs) for more information.

The transfer will complete, and the custom selection of files will now be available in the local path.

Summary

That final transfer marks the end of our datashuttle tutorial!

Now you can:

set up a new project
upload your acquired data to a central storage machine
download subsets of data for analysis

We are always keen to improve datashuttle, so please don't hesitate to get in contact with any Issues or drop in to our Zulip Chat with any questions or feedback.

Have a great day!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Walkthrough `datashuttle`

Installing `datashuttle`

Make a new project

Creating folders

Exploring folders

Uploading to central storage

Downloading from central storage

Summary

Uh oh!

FilesExpand file tree

getting-started.md

Latest commit

History

getting-started.md

File metadata and controls

Walkthrough datashuttle

Installing datashuttle

Make a new project

Creating folders

Exploring folders

Uploading to central storage

Downloading from central storage

Summary

Walkthrough `datashuttle`

Installing `datashuttle`