Skip to content

Resources and limitations

zburkett edited this page Jun 2, 2022 · 2 revisions

What are the resources that I need and what are the limits?

Type of setup:

There are a number of ways to fine tune the usage of the SEQuoia Express Toolkit. It will vary depending on the system you intend to use.

  • Big data (large data sets of large files): clusters / cloud computing is a good path forward
  • Norma data sets ~50M reads: cloud computing or a local machine that has a good amount of resources: 8-16 cores, and 32-64 gb RAM

A laptop is not recommended to do this computational work, unless it is being used to to submit the job to a cluster or cloud computing platform and retrieve the results, in which case you just need to make sure you plenty of hard drive space.

What have you tested?

For development of the SEQuoia Express Toolkit, we used a variety of ways to test the upper limits of the software.

For Large single file data sets we tested up to 125 million reads. Experiments of this size require large amounts of RAM (approximately 80 GB) for deduplication in order to build the graph. We recommend running samples of this size on a large cluster or cloud platform.

Fine tuning

Each process in the main.nf file is tagged with a resource allocation. If a process is untagged, it will revert to the defaults present conf/base.config. The tags can be used as an indicator of whether if the process requires more than the minimum CPU or RAM. Some need a lot of RAM (such as deduplication) but not a lot of CPU and vice-versa. While others might need a lot of both resources. These allocations are already included in the main.nf file but can always be changed to suit your needs or system size.

Clone this wiki locally