Hi Amit,
I'm finally getting around to trying the approach in your gist---many thanks for that example, it makes it very easy to use.
Currently I'm getting the result that for small-ish datasets (all I have room for on my laptop), using pfork with 4 workers is slower than just doing it the regular way. I tried profiling, but the profiler doesn't seem to work with pfork, unfortunately. Given the cautions about I/O, I'm a little unsure of the best approach for figuring out where the time is going---is it the fork itself, or some other aspect?
The overall task is one where I take a single 3D array as input, and each worker should work on a separate chunk of a 2D output (this is a 3D image-rendering problem). I'm allocating each output chunk with your anon_map, and just using a plain array (optionally, mmapped to a file) as an input. The running time for a small data set in single-threaded mode is 0.2 s, whereas a pfork with 4 workers is approximately 4 times slower.
Hi Amit,
I'm finally getting around to trying the approach in your gist---many thanks for that example, it makes it very easy to use.
Currently I'm getting the result that for small-ish datasets (all I have room for on my laptop), using
pforkwith 4 workers is slower than just doing it the regular way. I tried profiling, but the profiler doesn't seem to work with pfork, unfortunately. Given the cautions about I/O, I'm a little unsure of the best approach for figuring out where the time is going---is it theforkitself, or some other aspect?The overall task is one where I take a single 3D array as input, and each worker should work on a separate chunk of a 2D output (this is a 3D image-rendering problem). I'm allocating each output chunk with your
anon_map, and just using a plain array (optionally, mmapped to a file) as an input. The running time for a small data set in single-threaded mode is 0.2 s, whereas a pfork with 4 workers is approximately 4 times slower.