You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/config.md
+38-32Lines changed: 38 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -225,10 +225,15 @@ Configs on how the backup is made
225
225
],
226
226
"mutating_file_patterns": [],
227
227
228
-
"cdc_enabled": false,
229
-
"cdc_file_size_threshold": 104857600,
230
-
"cdc_patterns": [
231
-
"**/*.db"
228
+
"chunking_enabled": false,
229
+
"chunking_rules": [
230
+
{
231
+
"algorithm": "cdc",
232
+
"file_size_threshold": 104857600,
233
+
"patterns": [
234
+
"**/*.db"
235
+
]
236
+
}
232
237
],
233
238
234
239
"hash_method": "blake3",
@@ -441,52 +446,53 @@ and can speed up the processing of such files during backup creation
441
446
- Type: `List[str]`
442
447
- Default: `[]`
443
448
444
-
#### cdc_enabled
449
+
#### chunking_enabled
445
450
446
-
Whether to enable content-defined chunking (CDC) for large files during backup creation
451
+
Whether to enable file chunking during backup creation
447
452
448
-
CDC stands for `Content-Defined Chunking`.
449
-
Unlike fixed-size chunking, CDC determines chunk boundaries from the file content itself,
450
-
so when data is inserted, deleted, or modified locally, many unchanged regions can still be cut into the same chunks and be reused across backups
453
+
When enabled, Prime Backup iterates through [chunking_rules](#chunking_rules) in order for each file.
454
+
The first rule whose `patterns` match the file path and whose `file_size_threshold` is met will be applied.
455
+
If no rule matches, the file is stored as a regular direct blob without chunking
451
456
452
457
Changing this option only affects files newly stored in future backups.
453
458
Existing direct blobs or chunked blobs will not be converted automatically
454
459
455
-
!!! note
456
-
457
-
CDC chunking requires the optional `pyfastcdc` dependency.
458
-
You can install all optional dependencies with `pip3 install -r requirements.optional.txt`,
459
-
or install `pyfastcdc` manually
460
-
461
460
- Type: `bool`
462
461
- Default: `false`
463
462
464
-
#### cdc_file_size_threshold
463
+
#### chunking_rules
465
464
466
-
The minimum file size in bytes for a file to be considered for CDC chunking
465
+
A list of chunking rules evaluated in order when [chunking_enabled](#chunking_enabled) is `true`
467
466
468
-
Files smaller than this threshold will continue to use the regular direct blob storage flow,
469
-
even if [cdc_enabled](#cdc_enabled)is enabled and the path matches [cdc_patterns](#cdc_patterns)
467
+
For each file, Prime Backup walks through this list and applies the first rule whose `patterns` match the file path and whose `file_size_threshold` is met.
468
+
If no rule matches, the file is stored as a regular direct blob
470
469
471
-
Changing this option only affects files newly stored in future backups.
472
-
Existing stored data will not be repartitioned automatically
470
+
Each rule contains the following fields:
473
471
474
-
- Type: `int`
475
-
- Default: `104857600` (`100 MiB`)
472
+
-`algorithm`: The chunking algorithm to use. Currently only `"cdc"` is available
476
473
477
-
#### cdc_patterns
474
+
CDC stands for Content-Defined Chunking. Unlike fixed-size chunking, CDC determines chunk boundaries from the file content itself,
475
+
so when data is inserted, deleted, or modified locally, many unchanged regions can still be cut into the same chunks and be reused across backups
478
476
479
-
A list of [gitignore flavor](http://git-scm.com/docs/gitignore) pattern strings,
480
-
matched against file paths relative to [source_root](#source_root)
477
+
!!! note
481
478
482
-
CDC chunking will only be applied when the file path matches one of these patterns,
483
-
the file size reaches [cdc_file_size_threshold](#cdc_file_size_threshold),
484
-
and [cdc_enabled](#cdc_enabled) is enabled
479
+
CDC chunking requires the optional `pyfastcdc` dependency.
480
+
You can install all optional dependencies with `pip3 install -r requirements.optional.txt`,
481
+
or install `pyfastcdc` manually
485
482
486
-
The default value is `["**/*.db"]`.
487
-
It is recommended to keep this list narrow and only include large files that are often modified locally and really need to be backed up
483
+
-`file_size_threshold`: The minimum file size in bytes for a file to be eligible for this rule.
484
+
Files smaller than this value will not match this rule, even if their path matches `patterns`
488
485
489
-
- Type: `List[str]`
486
+
-`patterns`: A list of [gitignore flavor](http://git-scm.com/docs/gitignore) pattern strings,
487
+
matched against file paths relative to [source_root](#source_root)
488
+
489
+
The default value contains one rule that applies CDC chunking to `.db` files larger than 100 MiB.
490
+
It is recommended to keep the rules narrow and only cover large files that are often modified locally and really need to be backed up
491
+
492
+
Changing this option only affects files newly stored in future backups.
493
+
Existing stored data will not be repartitioned automatically
0 commit comments