@@ -6,22 +6,22 @@ title: 'Fixed-Size Chunking'
66
77 Fixed-size chunking is currently a proof of concept and is in alpha status.
88 It is not recommended for production use.
9- Use [CDC Chunking](chunking_cdc.md) instead unless you have a specific reason to use fixed-size chunking.
9+ Use [CDC Chunking](chunking_cdc.md) instead unless you have a specific reason to use fixed-size chunking
1010
1111Fixed-size chunking splits files at predictable byte-offset boundaries, with every chunk being exactly the configured size
12- (the last chunk may be smaller if the file size is not a multiple of the chunk size).
12+ (the last chunk may be smaller if the file size is not a multiple of the chunk size)
1313
1414## What Fixed-Size Chunking Is
1515
1616Fixed-size chunking is conceptually simple: the file is divided into equal-sized pieces from start to end.
17- Each piece is hashed and stored independently, just like CDC chunks.
17+ Each piece is hashed and stored independently, just like CDC chunks
1818
1919Unlike CDC, chunk boundaries do not shift when content is inserted or deleted in the middle of the file.
2020Any edit before the end of a chunk changes that chunk's hash entirely, and any insertion or deletion causes all subsequent chunks to shift,
21- potentially invalidating a large number of previously stored chunks.
21+ potentially invalidating a large number of previously stored chunks
2222
2323This means fixed-size chunking is generally inferior to CDC for files with arbitrary edits.
24- Its benefit is only realized in scenarios where the file's write pattern is well-aligned to chunk boundaries.
24+ Its benefit is only realized in scenarios where the file's write pattern is well-aligned to chunk boundaries
2525
2626For example, with ` fixed_4k ` applied to a Minecraft region file:
2727
@@ -36,7 +36,7 @@ For example, with `fixed_4k` applied to a Minecraft region file:
3636
3737Each 4 KiB chunk corresponds to one internal page of the region file.
3838When only a few game chunks change between backups, only the corresponding pages are dirtied,
39- and the rest of the chunks are identical to those already stored.
39+ and the rest of the chunks are identical to those already stored
4040
4141## Available Algorithms
4242
@@ -50,25 +50,25 @@ and the rest of the chunks are identical to those already stored.
5050
5151The 4KiB chunk size aligns with the internal page structure of Minecraft's Anvil region files (` .mca ` ).
5252In theory, modifying a small number of chunks in the game only dirties a limited number of 4 KiB pages,
53- making ` fixed_4k ` capable of the finest-grained deduplication for region files.
53+ making ` fixed_4k ` capable of the finest-grained deduplication for region files
5454
5555However, ` fixed_4k ` has serious practical drawbacks:
5656
5757- extremely high metadata overhead: a 1 GiB file requires roughly 262 144 chunk records
5858- poor I/O performance: each chunk requires a separate read-write cycle during backup
5959
60- Unless the file is very large and only a tiny number of pages change per backup, ` fixed_4k ` is unlikely to be worth the cost.
60+ Unless the file is very large and only a tiny number of pages change per backup, ` fixed_4k ` is unlikely to be worth the cost
6161
6262### fixed_32k
6363
64- A middle-ground option. Metadata overhead is 32× lower than ` fixed_4k ` but granularity is also much coarser.
64+ A middle-ground option. Metadata overhead is 32× lower than ` fixed_4k ` but granularity is also much coarser
6565
6666### fixed_128k
6767
6868The 128 KiB chunk size is well-suited for files that grow by appending data at the end.
69- When new data is appended, only the trailing chunks change; all preceding chunks retain the same hash and are reused.
69+ When new data is appended, only the trailing chunks change; all preceding chunks retain the same hash and are reused
7070
71- This makes ` fixed_128k ` a reasonable alternative to CDC for pure append-write files.
71+ This makes ` fixed_128k ` a reasonable alternative to CDC for pure append-write files
7272
7373## Poor Candidates
7474
@@ -81,5 +81,5 @@ Fixed-size chunking is a poor choice for:
8181## No Extra Dependencies
8282
8383Fixed-size chunking has no additional Python dependency requirements.
84- It is available as long as Prime Backup is installed.
84+ It is available as long as Prime Backup is installed
8585
0 commit comments