Chunking in future.apply
future.apply currently relies on the internal makeChunks.R function to partition elements for processing into "chunks" that are sent to workers for processing. makeChunks outputs a list of integer vectors, where each vector is a "chunk" and its elements are the indices representing the elements to be processed in the input object (often a list).
Users have some control over the generation of chunks via the future.apply arguments future.chunk.size (which specifies the average number of elements per chunk a user prefers) and future.scheduling (which specifies the order of chunk processing). Furthermore, they can control the processing order of chunks with the ordering attribute of future.chunk.size or future.scheduling.
Nevertheless, this control is limited and even the sensible defaults of makeChunks can produce substantial load imbalance across workers and resulting inefficiency. Some of this inefficiency could be reduced if users were able to better-control chunk generation. While some of this inefficiency may be averted by the dynamic balancing of future.apply. the costs of dynamic balancing itself can be non-trivial.
The Purpose of makeChunks
Currently, makeChunks accomplishes two tasks:
- Generates chunks by partitioning elements to be processed.
- Specifies the order in which chunks are processed.
Ideally, the former is redundant as the elements of the object users pass future.apply would be the chunks they want processed and nbrOfElements == nbrOfWorkers and the latter is redundant as chunks are generated such that chunks are indexed in the order in which they should be processed. This allows for efficient static load balancing with chunks already balanced and one chunk per worker. However, users often pass objects where the ordering is ad-hoc and chunking not planned.
Adding customChunks
I envision two approaches to improving the flexibility of chunking in future.apply:
- Add a
customChunks argument to future.apply functions
Users could pass a list to customChunks. future.applywould use this list instead of the list thatmakeChunksreturns. Ifis.null(customChunks) == TRUE, then the status quo internal makeChunksfunction is used. Ifis.null(customChunks) == FALSE, makeChunks` is not executed and all other chunk-related arguments are ignored.
The primary motivation for this is that users may wish to (a) have complete control over chunking and ordering of chunks, (b) do so without modifying their input to ``future.apply` (i.e. avoid creating deeper objects or repeatedly rearranging the elements of their object just for processing) and (c) create more interpretable code distinguishing between the input object, the plan for processing, and processing itself. This also helps decouple functions for working in parallel from functions for serial pre-processing.
- Add a
customChunks argument to future.apply functions and export makeChunks
Users could pass their object to makeChunks and pass the result to to the customChunks argument of future.apply. In the event that customChunks == NULL, future.apply would call makeChunks as usual. This would allow users to generate chunks with makeChunks either inside or outside of future.apply. The upside to this is that users can directly observe and edit the output of makeChunks.
I could submit a pull request implementing this, but I'm not sure when/how makeChunks is called internally - don't see it in makefiles or the definitions of the future.apply functions.
Chunking in
future.applyfuture.applycurrently relies on the internal makeChunks.R function to partition elements for processing into "chunks" that are sent to workers for processing.makeChunksoutputs a list of integer vectors, where each vector is a "chunk" and its elements are the indices representing the elements to be processed in the input object (often a list).Users have some control over the generation of chunks via the
future.applyargumentsfuture.chunk.size(which specifies the average number of elements per chunk a user prefers) andfuture.scheduling(which specifies the order of chunk processing). Furthermore, they can control the processing order of chunks with the ordering attribute offuture.chunk.sizeorfuture.scheduling.Nevertheless, this control is limited and even the sensible defaults of
makeChunkscan produce substantial load imbalance across workers and resulting inefficiency. Some of this inefficiency could be reduced if users were able to better-control chunk generation. While some of this inefficiency may be averted by the dynamic balancing offuture.apply. the costs of dynamic balancing itself can be non-trivial.The Purpose of
makeChunksCurrently,
makeChunksaccomplishes two tasks:Ideally, the former is redundant as the elements of the object users pass
future.applywould be the chunks they want processed andnbrOfElements == nbrOfWorkersand the latter is redundant as chunks are generated such that chunks are indexed in the order in which they should be processed. This allows for efficient static load balancing with chunks already balanced and one chunk per worker. However, users often pass objects where the ordering is ad-hoc and chunking not planned.Adding
customChunksI envision two approaches to improving the flexibility of chunking in
future.apply:customChunksargument tofuture.applyfunctionsUsers could pass a list to
customChunks. future.applywould use this list instead of the list thatmakeChunksreturns. Ifis.null(customChunks) == TRUE, then the status quo internalmakeChunksfunction is used. Ifis.null(customChunks) == FALSE,makeChunks` is not executed and all other chunk-related arguments are ignored.The primary motivation for this is that users may wish to (a) have complete control over chunking and ordering of chunks, (b) do so without modifying their input to ``future.apply` (i.e. avoid creating deeper objects or repeatedly rearranging the elements of their object just for processing) and (c) create more interpretable code distinguishing between the input object, the plan for processing, and processing itself. This also helps decouple functions for working in parallel from functions for serial pre-processing.
customChunksargument tofuture.applyfunctions and exportmakeChunksUsers could pass their object to
makeChunksand pass the result to to thecustomChunksargument offuture.apply. In the event thatcustomChunks == NULL,future.applywould callmakeChunksas usual. This would allow users to generate chunks withmakeChunkseither inside or outside offuture.apply. The upside to this is that users can directly observe and edit the output ofmakeChunks.I could submit a pull request implementing this, but I'm not sure when/how
makeChunksis called internally - don't see it in makefiles or the definitions of thefuture.applyfunctions.