Hi Alon,
I tried out the package today. Overall it's really nice and helpful and i will definitely be using it to manage my future phipseq projects. I have some comments or issues, that i think will significantly add value to what you already have:
- Data import:
Currently you have to manually create the data directory, than manually add the data into the directory, then add inputs to project.yaml and absolute paths to inputs.local.yaml + add the directory you just created to gitignore. And an the end you can also verify the configuration. That's a lot of steps for a data import - i would suggest a single function, let's say pm$create_dataset(data_path), that does this stuff. Its all conceptually related to a single "task/operation" so it would logically make sense for me. Also i suggest using a standardized name for the data directory ("data" or "raw-data"). The function should:
- Create the "data" directory in the project's root, when it's not created yet,
- Add the "data" directory to the .gitignore, when it's not there yet,
- Take the data from the absolute
data_path and move it to the "data" directory
- Add the input to the
project.yaml
- Add the path to the
inputs.local.yaml
- Check the configuration
I know this is probably a lot of abstraction for a single function, but we could define reusable helpers for each step. And conceptually, in the process of analyzing the data, the import step is a single task for me, so it would be really nice to have a single function which does this.
-
R project:
Currently pm_create_project creates README.md, project.yaml, inputs.local.yaml and analyses directory + .gitignore. I would say it would be good, to add LICENSE + create a R project (.Rproj file) on the fly, cause you would need it anyway at the end. Or at least add an option to create them in the pm_create_project.
-
.gitignore template:
As this is a R package, i would mandatory add all the R stuff to .gitignore, so .Rhistory, .RData etc. There is a bunch of files you do not want on git and there are ready R templates to use. Pretty easy fix.
-
git initialization:
Honestly it should be fairly easy too. There is a gitcreds package in R, which could handle everything (initialize git repo, add files, inital commit, github login with token or another form of validation, creating empty repo, connecting to remote and initial push). It would be honestly really great, cause i personally like to have all my projects on github, for sharing the code with others, and this would remove a loooot of setup and repetition.
Hi Alon,
I tried out the package today. Overall it's really nice and helpful and i will definitely be using it to manage my future phipseq projects. I have some comments or issues, that i think will significantly add value to what you already have:
Currently you have to manually create the data directory, than manually add the data into the directory, then add inputs to project.yaml and absolute paths to inputs.local.yaml + add the directory you just created to gitignore. And an the end you can also verify the configuration. That's a lot of steps for a data import - i would suggest a single function, let's say
pm$create_dataset(data_path), that does this stuff. Its all conceptually related to a single "task/operation" so it would logically make sense for me. Also i suggest using a standardized name for the data directory ("data" or "raw-data"). The function should:data_pathand move it to the "data" directoryproject.yamlinputs.local.yamlI know this is probably a lot of abstraction for a single function, but we could define reusable helpers for each step. And conceptually, in the process of analyzing the data, the import step is a single task for me, so it would be really nice to have a single function which does this.
R project:
Currently
pm_create_projectcreates README.md, project.yaml, inputs.local.yaml and analyses directory + .gitignore. I would say it would be good, to add LICENSE + create a R project (.Rproj file) on the fly, cause you would need it anyway at the end. Or at least add an option to create them in thepm_create_project..gitignore template:
As this is a R package, i would mandatory add all the R stuff to .gitignore, so .Rhistory, .RData etc. There is a bunch of files you do not want on git and there are ready R templates to use. Pretty easy fix.
git initialization:
Honestly it should be fairly easy too. There is a
gitcredspackage in R, which could handle everything (initialize git repo, add files, inital commit, github login with token or another form of validation, creating empty repo, connecting to remote and initial push). It would be honestly really great, cause i personally like to have all my projects on github, for sharing the code with others, and this would remove a loooot of setup and repetition.