Skip to content

Parallelization - 144 break uprecombine larger genomes#154

Merged
joshfactorial merged 12 commits into
mainfrom
144-break-uprecombine-larger-genomes
Sep 9, 2025
Merged

Parallelization - 144 break uprecombine larger genomes#154
joshfactorial merged 12 commits into
mainfrom
144-break-uprecombine-larger-genomes

Conversation

@keshav-gandhi
Copy link
Copy Markdown
Collaborator

Solved parallelization issues!

You can run this command if additional testing is wanted:
neat parallel -c config_template/keshav_config.yml

More information is in README.md.

@keshav-gandhi keshav-gandhi linked an issue Aug 29, 2025 that may be closed by this pull request
Copy link
Copy Markdown
Collaborator

@joshfactorial joshfactorial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, one general comment, it's coming together. Let's try a slight change to the structure. Instead of putting it under read_simulator/utils, this could be it's own submodule, as we have done with model_fragment_lengths and gen_mut_model. It's already working now as a primary submodule, which is what we want. But in terms of structure and maintainability, let's move it out and group the scripts together:

├── cli
├── common
├── gen_mut_model
├── __init__.py
├── __main__.py
├── model_fragment_lengths
├── models
├── model_sequencing_error
├── __pycache__
├── read_simulator
├── parallel_read_simulator
└── variants

I think explicitly calling it parallel_read_simulator would be more clear as well. But in the parallel_read_simulator folder, you'd add first __init__.py, then parallelize.py, split_inpts.py, and splice_inputs.py. Then, check the other __init__.py commands, but basically it's just an import, which signals to poetry to add that feature to the application. The cli/commands/parallel.py is fine where it is. Then it will be more clear when we come back to this in six months which parts are specific to that feature.

Comment thread neat/cli/commands/parallel.py Outdated
Comment thread neat/cli/commands/parallel.py
Comment thread neat/read_simulator/utils/parallelize.py
Comment thread neat/read_simulator/utils/parallelize.py
Comment thread neat/read_simulator/utils/parallelize.py Outdated
Comment thread neat/read_simulator/utils/split_inputs.py
Comment thread neat/read_simulator/utils/split_inputs.py Outdated
Comment thread neat/cli/commands/parallel.py
Copy link
Copy Markdown
Collaborator

@joshfactorial joshfactorial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more general comment. It would be great if we could add some unit tests to these new functions.

There's an existing tests folder that you can add to, or doctests are fine too. A couple of tests for parallelize, split_inputs and stitch_outputs especially are what is important.

@joshfactorial
Copy link
Copy Markdown
Collaborator

Do a git fetch and then git pull origin main you may need to add the --rebase flag to finalize changes. There were a couple of edits to the readme. That will ensure that your readme commits won't get lost in the merge.

Copy link
Copy Markdown
Collaborator

@joshfactorial joshfactorial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

@joshfactorial joshfactorial merged commit ca2fc5e into main Sep 9, 2025
1 check passed
@joshfactorial joshfactorial deleted the 144-break-uprecombine-larger-genomes branch September 9, 2025 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Break up/recombine larger genomes

2 participants