Skip to content

Re-organize CBRAIN support files of task workdirs in a subfolder #1625

@prioux

Description

@prioux

The work directory of a task is initially created empty, and then CBRAIN goes on to prepare it for submission to a computing cluster. The preparations consists in setting up a bunch of files. These files generally fall into two groups:

  1. data files prepared as inputs for the tool
  2. special CBRAIN support and administrative files

Normally, the CBRAIN support files are all created with names that start with a period, to make them invisible to casual "ls" commands and tools that perform basic globs (e.g. with '*').

As an example, here's the content of a work directory for a completed task, as inspected by shell commands. There are 5 input files, and 15 CBRAIN support files.

# cbuser-Ensemblex-T3249731$ ls

pooled_bam.bam      pooled_barcodes.tsv  reference.vcf
pooled_bam.bam.bai  pooled_samples.vcf

# cbuser-Ensemblex-T3249731$ ls -A

.boutiques.3249731-1.json               .runtime_info.sh
.container-3249731.img                  .science.Ensemblex.3249731-1.sh
.invoke.3249731-1.json                  .science.err.Ensemblex.3249731-1
.qsub.Ensemblex.3249731-1.sh            .science.out.Ensemblex.3249731-1
.qsub.err.Ensemblex.3249731-1           .singularity.3249731-1.sh
.qsub.err.Ensemblex.3249731-1-combined  pooled_bam.bam
.qsub.exit.Ensemblex.3249731-1          pooled_bam.bam.bai
.qsub.out.Ensemblex.3249731-1           pooled_barcodes.tsv
.qsub.out.Ensemblex.3249731-1-combined  pooled_samples.vcf
.runtime_info.Ensemblex.3249731-1.kv    reference.vcf

I am suggesting we could create a folder named ".cbrain" where all these support files would go in instead.

Compatibility consideration: changing this convention would mean that older archived tasks could not be re-activated. That could be fixed if the unarchiving code for tasks was extended to detect that the admin files were stored in the task folder, and then create a ".cbrain" folder and move these older files to it.

Anyway, the end result would look like this, again from a shell perspective:

# cbuser-Ensemblex-T3249731$ ls -A

.cbrain             pooled_barcodes.tsv
pooled_bam.bam      pooled_samples.vcf
pooled_bam.bam.bai  reference.vcf

# cbuser-Ensemblex-T3249731$ ls -A .cbrain

.boutiques.3249731-1.json               .qsub.out.Ensemblex.3249731-1-combined
.container-3249731.img                  .runtime_info.Ensemblex.3249731-1.kv
.invoke.3249731-1.json                  .runtime_info.sh
.qsub.Ensemblex.3249731-1.sh            .science.Ensemblex.3249731-1.sh
.qsub.err.Ensemblex.3249731-1           .science.err.Ensemblex.3249731-1
.qsub.err.Ensemblex.3249731-1-combined  .science.out.Ensemblex.3249731-1
.qsub.exit.Ensemblex.3249731-1          .singularity.3249731-1.sh
.qsub.out.Ensemblex.3249731-1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions