Skip to content

Could consider schemes to reduce memory footprint #2655

@bhaller

Description

@bhaller

I recently spent some time head-scratching over a strange observation. A simple neutral SLiM model that just forward-simulated 10,000 individuals with tree-sequence recording, but without doing any simplification at all, did not exhibit the memory-usage dynamics I expected. I expected that the memory footprint of the process would grow as a linear function of the number of generations simulated, since the tree-sequence tables would just grow and grow. Instead, the memory footprint displayed a sawtooth pattern; the memory usage would grow linearly for some number of generations, and then suddenly fall by a factor of two or more, and then resume linear growth.

Eventually I realized that this was due to a relatively new feature in the macOS kernel, memory compression. Basically, the kernel observes when a given block of memory has not been accessed for a long time, and compresses it to take less memory. If anybody tries to access the memory block, the kernel decompresses it on demand. This is all invisible to the process, which only ever sees the memory in its decompressed state. It's also quite fast. Rather remarkable.

After realizing that this was what was causing that saw-tooth memory usage pattern, I realized that the effectiveness of this memory compression scheme in reducing the memory usage of the tree sequence was actually pretty interesting. The kernel was apparently compressing the tree sequence by as much as 10x, with very little performance cost! It made me wonder: could tskit do the same sort of thing under the hood? Compress particular buffers in the tree sequence behind the scenes, and decompress them when they need to be accessed? If it resulted in a 10x reduction in memory and disk footprint, that would be pretty significant, right? And it might not actually be very hard to implement. Food for thought?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions