Parallelization - 144 break uprecombine larger genomes#154
Conversation
There was a problem hiding this comment.
Okay, one general comment, it's coming together. Let's try a slight change to the structure. Instead of putting it under read_simulator/utils, this could be it's own submodule, as we have done with model_fragment_lengths and gen_mut_model. It's already working now as a primary submodule, which is what we want. But in terms of structure and maintainability, let's move it out and group the scripts together:
├── cli
├── common
├── gen_mut_model
├── __init__.py
├── __main__.py
├── model_fragment_lengths
├── models
├── model_sequencing_error
├── __pycache__
├── read_simulator
├── parallel_read_simulator
└── variants
I think explicitly calling it parallel_read_simulator would be more clear as well. But in the parallel_read_simulator folder, you'd add first __init__.py, then parallelize.py, split_inpts.py, and splice_inputs.py. Then, check the other __init__.py commands, but basically it's just an import, which signals to poetry to add that feature to the application. The cli/commands/parallel.py is fine where it is. Then it will be more clear when we come back to this in six months which parts are specific to that feature.
joshfactorial
left a comment
There was a problem hiding this comment.
One more general comment. It would be great if we could add some unit tests to these new functions.
There's an existing tests folder that you can add to, or doctests are fine too. A couple of tests for parallelize, split_inputs and stitch_outputs especially are what is important.
|
Do a |
Solved parallelization issues!
You can run this command if additional testing is wanted:
neat parallel -c config_template/keshav_config.ymlMore information is in README.md.