Skip to content

First run comments (multiple tasks) #17

@Polymerase3

Description

@Polymerase3

Hi Alon,

I tried out the package today. Overall it's really nice and helpful and i will definitely be using it to manage my future phipseq projects. I have some comments or issues, that i think will significantly add value to what you already have:

  1. Data import:
    Currently you have to manually create the data directory, than manually add the data into the directory, then add inputs to project.yaml and absolute paths to inputs.local.yaml + add the directory you just created to gitignore. And an the end you can also verify the configuration. That's a lot of steps for a data import - i would suggest a single function, let's say pm$create_dataset(data_path), that does this stuff. Its all conceptually related to a single "task/operation" so it would logically make sense for me. Also i suggest using a standardized name for the data directory ("data" or "raw-data"). The function should:
  • Create the "data" directory in the project's root, when it's not created yet,
  • Add the "data" directory to the .gitignore, when it's not there yet,
  • Take the data from the absolute data_path and move it to the "data" directory
  • Add the input to the project.yaml
  • Add the path to the inputs.local.yaml
  • Check the configuration

I know this is probably a lot of abstraction for a single function, but we could define reusable helpers for each step. And conceptually, in the process of analyzing the data, the import step is a single task for me, so it would be really nice to have a single function which does this.

  1. R project:
    Currently pm_create_project creates README.md, project.yaml, inputs.local.yaml and analyses directory + .gitignore. I would say it would be good, to add LICENSE + create a R project (.Rproj file) on the fly, cause you would need it anyway at the end. Or at least add an option to create them in the pm_create_project.

  2. .gitignore template:
    As this is a R package, i would mandatory add all the R stuff to .gitignore, so .Rhistory, .RData etc. There is a bunch of files you do not want on git and there are ready R templates to use. Pretty easy fix.

  3. git initialization:
    Honestly it should be fairly easy too. There is a gitcreds package in R, which could handle everything (initialize git repo, add files, inital commit, github login with token or another form of validation, creating empty repo, connecting to remote and initial push). It would be honestly really great, cause i personally like to have all my projects on github, for sharing the code with others, and this would remove a loooot of setup and repetition.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions