The work directory of a task is initially created empty, and then CBRAIN goes on to prepare it for submission to a computing cluster. The preparations consists in setting up a bunch of files. These files generally fall into two groups:
- data files prepared as inputs for the tool
- special CBRAIN support and administrative files
Normally, the CBRAIN support files are all created with names that start with a period, to make them invisible to casual "ls" commands and tools that perform basic globs (e.g. with '*').
As an example, here's the content of a work directory for a completed task, as inspected by shell commands. There are 5 input files, and 15 CBRAIN support files.
# cbuser-Ensemblex-T3249731$ ls
pooled_bam.bam pooled_barcodes.tsv reference.vcf
pooled_bam.bam.bai pooled_samples.vcf
# cbuser-Ensemblex-T3249731$ ls -A
.boutiques.3249731-1.json .runtime_info.sh
.container-3249731.img .science.Ensemblex.3249731-1.sh
.invoke.3249731-1.json .science.err.Ensemblex.3249731-1
.qsub.Ensemblex.3249731-1.sh .science.out.Ensemblex.3249731-1
.qsub.err.Ensemblex.3249731-1 .singularity.3249731-1.sh
.qsub.err.Ensemblex.3249731-1-combined pooled_bam.bam
.qsub.exit.Ensemblex.3249731-1 pooled_bam.bam.bai
.qsub.out.Ensemblex.3249731-1 pooled_barcodes.tsv
.qsub.out.Ensemblex.3249731-1-combined pooled_samples.vcf
.runtime_info.Ensemblex.3249731-1.kv reference.vcf
I am suggesting we could create a folder named ".cbrain" where all these support files would go in instead.
Compatibility consideration: changing this convention would mean that older archived tasks could not be re-activated. That could be fixed if the unarchiving code for tasks was extended to detect that the admin files were stored in the task folder, and then create a ".cbrain" folder and move these older files to it.
Anyway, the end result would look like this, again from a shell perspective:
# cbuser-Ensemblex-T3249731$ ls -A
.cbrain pooled_barcodes.tsv
pooled_bam.bam pooled_samples.vcf
pooled_bam.bam.bai reference.vcf
# cbuser-Ensemblex-T3249731$ ls -A .cbrain
.boutiques.3249731-1.json .qsub.out.Ensemblex.3249731-1-combined
.container-3249731.img .runtime_info.Ensemblex.3249731-1.kv
.invoke.3249731-1.json .runtime_info.sh
.qsub.Ensemblex.3249731-1.sh .science.Ensemblex.3249731-1.sh
.qsub.err.Ensemblex.3249731-1 .science.err.Ensemblex.3249731-1
.qsub.err.Ensemblex.3249731-1-combined .science.out.Ensemblex.3249731-1
.qsub.exit.Ensemblex.3249731-1 .singularity.3249731-1.sh
.qsub.out.Ensemblex.3249731-1
The work directory of a task is initially created empty, and then CBRAIN goes on to prepare it for submission to a computing cluster. The preparations consists in setting up a bunch of files. These files generally fall into two groups:
Normally, the CBRAIN support files are all created with names that start with a period, to make them invisible to casual "ls" commands and tools that perform basic globs (e.g. with '*').
As an example, here's the content of a work directory for a completed task, as inspected by shell commands. There are 5 input files, and 15 CBRAIN support files.
I am suggesting we could create a folder named ".cbrain" where all these support files would go in instead.
Compatibility consideration: changing this convention would mean that older archived tasks could not be re-activated. That could be fixed if the unarchiving code for tasks was extended to detect that the admin files were stored in the task folder, and then create a ".cbrain" folder and move these older files to it.
Anyway, the end result would look like this, again from a shell perspective: