This repository contains a very basic recommended directory layout for analysis workflows in the TELOS Collaboration.
-
Clone this template to the computer where you will perform the analysis, giving it an appropriate name. Rename the default remote so that it is clear that it is the template.
git clone https://github.com/telos-collaboration/workflow_template new_repository_name_202X cd new_repository_name_202X git remote rename origin template
-
Create a new, empty repository on GitHub, either under the TELOS Collaboration organisation or under your personal account. (If the latter, then it must be moved to the TELOS Collaboration organisation before publishing.) Push your clone to this repository.
git remote add origin git@github.com:telos-collaboration/new_repository_name_202X -
Ensure that you have Snakemake installed, by following the instructions in the template README below.
-
Ensure that you have pre-commit installed in your Snakemake environment.
conda activate snakemake pip install pre-commit
-
Install pre-commit into your working copy of the repository.
pre-commit install -
Open the file
CITATION.cffand ensure that it represents all authors of the work being prepared. If not, edit it such that it does, for example by using cffinit. -
Edit the template README below to fill in anything marked
TODO. (The arXiv identifier cannot be filled in until the preprint is submitted; however, DOIs may be reserved in advance on Zenodo.) -
Remove everything above the line
# TODO: Release namefrom this file. -
Commit these initial changes:
git add README.md CITATION.cff git commit -m "Initial commit: set basic metadata"
-
Begin working on your analysis. The pre-created directories contain additional information on what to place in them, but to summarise:
-
Place raw data (and only raw data) in the
datadirectory. Your workflow must not modify the files in this directory. These files will not be committed to the repository. -
Add any non-PyPI Python libraries as Git submodules in the
libsdirectory. -
Place ensemble metadata, fit parameters, etc. in the
metadatadirectory. (CSV and YAML are typical choices for this.) One file (e.g.ensemble_metadata.csv) is typically sufficient for this; do not use separate files for each ensemble. In general, numbers in this file should not need error bars, or many decimal places; if you need to put numbers with error bars in here, something has likely gone wrong with your analysis. These files will not be committed to the repository. -
If you are quoting numbers from other work that does not have its own data release, place these in the
external_datadirectory. Otherwise, delete this directory withgit rm -r external_data -
If adding new Matplotlib styles, add them to the
stylesdirectory. (For most work, this should not be needed.) -
Add your workflow definition to
workflow/Snakefile. If your workflow has many moving parts, break it up into modules and place these inworkflow/rules. -
Add environment definitions to
workflow/envs. -
Put the code itself in the
srcdirectory. -
Your workflow should place files in the following locations:
- Intermediary data in
intermediary_data - Plots in
assets/plots - Tables in
assets/tables - Definitions in
assets/definitions - Data products
(such as summary CSVs)
to be uploaded to Zenodo as part of the data release
in
data_assetsThese directories will be created automatically by Snakemake as needed. Their contents will not be committed to the repository.
- Intermediary data in
-
For more information, please see the TELOS Collaboration Reproducibility/Open Science Strategy.
You can pull the latest changes from this template by running:
git pull --no-ff template mainThis may be useful to, for example, add more recommended sections to the README, or more plot styles. It will not make any changes to the directory structure, other than moving the various README files to new locations.
[](https://doi.org/TODO DOI)
The workflow in this repository performs the analyses presented in the paper TODO: paper title.
- Conda, for example, installed from Miniforge
- Snakemake, which may be installed using Conda
- LaTeX, for example, from TeX Live
-
Install the dependencies above.
-
Clone this repository including submodules (or download its Zenodo release and
unzipit) andcdinto it:git clone --recurse-submodules https://github.com/telos-collaboration/TODO REPO NAME cd TODO REPO NAME
-
TODO Add instructions on which files to download from data release, and where to place them.
The workflow is run using Snakemake:
snakemake --cores 1 --use-condawhere the number 1
may be replaced by
the number of CPU cores you wish to allocate to the computation.
Snakemake will automatically download and install all required Python packages. This requires an Internet connection; if you are running in an HPC environment where you would need to run the workflow without Internet access, details on how to preinstall the environment can be found in the Snakemake documentation.
TODO Add estimate of how long the analysis takes end-to-end, and on what hardware.
Output plots, tables, and definitions
are placed in the assets/plots, assets/tables, and assets/definitions directories.
Output data assets are placed into the data_assets directory.
Intermediary data are placed in the intermediary_data directory.
This workflow is relatively tailored to the data
which it was originally written to analyse.
Additional ensembles may be added to the analysis
by adding relevant files to the raw_data directory,
and adding corresponding entries to the files in the metadata directory.
However,
extending the analysis in this way
has not been as fully tested as the rest of the workflow,
and is not guaranteed to be trivial for someone not already familiar with the code.