TISUnion
diff --git a/‎docs/chunking/fixed_size.md‎
Lines changed: 17 additions & 5 deletions b/‎docs/chunking/fixed_size.md‎
Lines changed: 17 additions & 5 deletions
diff --git a/‎docs/chunking/fixed_size.zh.md‎
Lines changed: 17 additions & 5 deletions b/‎docs/chunking/fixed_size.zh.md‎
Lines changed: 17 additions & 5 deletions
diff --git a/‎docs/chunking/index.md‎
Lines changed: 11 additions & 2 deletions b/‎docs/chunking/index.md‎
Lines changed: 11 additions & 2 deletions
diff --git a/‎docs/chunking/index.zh.md‎
Lines changed: 11 additions & 2 deletions b/‎docs/chunking/index.zh.md‎
Lines changed: 11 additions & 2 deletions
diff --git a/‎docs/config.md‎
Lines changed: 13 additions & 4 deletions b/‎docs/config.md‎
Lines changed: 13 additions & 4 deletions
diff --git a/‎docs/config.zh.md‎
Lines changed: 13 additions & 4 deletions b/‎docs/config.zh.md‎
Lines changed: 13 additions & 4 deletions
diff --git a/‎prime_backup/action/create_backup_action.py‎
Lines changed: 53 additions & 8 deletions b/‎prime_backup/action/create_backup_action.py‎
Lines changed: 53 additions & 8 deletions
@@ -40,11 +40,12 @@ and the rest of the chunks are identical to those already stored
 
 ## Available Algorithms
 
-| Algorithm    | Chunk Size | Typical Use Case                                                                                                                       |
-|--------------|------------|----------------------------------------------------------------------------------------------------------------------------------------|
-| `fixed_4k`   | 4 KiB      | Minecraft region files (`.mca`): each region file is organized in 4 KiB pages, so changes in one chunk only invalidate that 4 KiB page |
-| `fixed_32k`  | 32 KiB     | General intermediate granularity                                                                                                       |
-| `fixed_128k` | 128 KiB    | Append-only files: growth at the tail only creates new trailing chunks, leaving all previous chunks intact                             |
+| Algorithm    | Chunk Size      | Typical Use Case                                                                                                                            |
+|--------------|-----------------|---------------------------------------------------------------------------------------------------------------------------------------------|
+| `fixed_4k`   | 4 KiB           | Minecraft region files (`.mca`): each region file is organized in 4 KiB pages, so changes in one chunk only invalidate that 4 KiB page      |
+| `fixed_32k`  | 32 KiB          | General intermediate granularity                                                                                                            |
+| `fixed_128k` | 128 KiB         | Append-only files: growth at the tail only creates new trailing chunks, leaving all previous chunks intact                                  |
+| `fixed_auto` | 128 KiB / 4 KiB | Adaptive fixed-size strategy that uses the previous backup's same-path chunk layout to limit metadata growth while keeping some 4 KiB reuse |
 
 ### fixed_4k
 
@@ -70,6 +71,17 @@ When new data is appended, only the trailing chunks change; all preceding chunks
 
 This makes `fixed_128k` a reasonable alternative to CDC for pure append-write files
 
+### fixed_auto
+
+`fixed_auto` walks the file in 128 KiB windows.
+For each full window, it checks the previous backup's same-path chunk layout at the same offset:
+
+- if the previous window was one 128 KiB chunk and the current content is unchanged, it keeps one 128 KiB chunk
+- if the previous window was one 128 KiB chunk and the current content changed, it stores the current window as thirty-two 4 KiB chunks
+- if the previous window was thirty-two 4 KiB chunks, it compares the 4 KiB hashes first; when none changed, it stores one 128 KiB chunk, otherwise it keeps thirty-two 4 KiB chunks
+
+Missing previous data, direct blobs, irregular previous layouts, and incomplete tail windows are stored as one chunk for that window.
+
 ## Poor Candidates
 
 Fixed-size chunking is a poor choice for:
 
@@ -41,11 +41,12 @@ title: '固定大小分块'
 
 ## 可用算法
 
-| 算法           | 块大小     | 典型适用场景                                                                        |
-|--------------|---------|-------------------------------------------------------------------------------|
-| `fixed_4k`   | 4 KiB   | Minecraft region 文件（`.mca`）：region 文件以 4 KiB 页为内部组织单位，修改少量游戏区块只会脏化有限的 4 KiB 页 |
-| `fixed_32k`  | 32 KiB  | 一般性的中等粒度场景                                                                    |
-| `fixed_128k` | 128 KiB | 追加写文件：尾部追加的数据只会产生新的末尾数据块，之前的所有数据块保持不变                                         |
+| 算法           | 块大小             | 典型适用场景                                                                        |
+|--------------|-----------------|-------------------------------------------------------------------------------|
+| `fixed_4k`   | 4 KiB           | Minecraft region 文件（`.mca`）：region 文件以 4 KiB 页为内部组织单位，修改少量游戏区块只会脏化有限的 4 KiB 页 |
+| `fixed_32k`  | 32 KiB          | 一般性的中等粒度场景                                                                    |
+| `fixed_128k` | 128 KiB         | 追加写文件：尾部追加的数据只会产生新的末尾数据块，之前的所有数据块保持不变                                         |
+| `fixed_auto` | 128 KiB / 4 KiB | 根据上一次备份中同路径文件的分块布局自适应，在控制元数据增长的同时保留部分 4 KiB 复用能力                              |
 
 ### fixed_4k
 
@@ -71,6 +72,17 @@ title: '固定大小分块'
 
 对于纯追加写入的文件，`fixed_128k` 是 CDC 的一个合理替代选项
 
+### fixed_auto
+
+`fixed_auto` 会按 128 KiB 窗口遍历文件。
+对于每个完整窗口，它会检查上一次备份中同路径文件在相同 offset 的分块布局：
+
+- 如果上一版窗口是 1 个 128 KiB chunk，且当前内容未变化，则继续使用 1 个 128 KiB chunk
+- 如果上一版窗口是 1 个 128 KiB chunk，但当前内容已变化，则将当前窗口切成 32 个 4 KiB chunk
+- 如果上一版窗口是 32 个 4 KiB chunk，则先比较 4 KiB hash；当变化数量为 0 时，存成 1 个 128 KiB chunk，否则继续使用 32 个 4 KiB chunk
+
+上一版数据缺失、上一版是 direct blob、上一版布局不规则，或当前窗口是不完整尾块时，该窗口会作为单个 chunk 存储。
+
 ## 不适用场景
 
 固定大小分块在以下情况通常效果不佳：
 
@@ -35,9 +35,17 @@ The default configuration is:
     "chunking_rules": [
         {
             "algorithm": "fastcdc_32k",
-            "file_size_threshold": 104857600,
+            "file_size_threshold": 20971520,
             "patterns": [
-                "**/*.db"
+                "**/*.db",
+                "**/*.log"
+            ]
+        },
+        {
+            "algorithm": "fixed_auto",
+            "file_size_threshold": 262144,
+            "patterns": [
+                "**/*.mca"
             ]
         }
     ]
@@ -105,6 +113,7 @@ The benefit becomes apparent on subsequent backups where many chunks can be reus
 | `fixed_4k`     | Fixed | 4 KiB          | MC region files (matches 4 KiB page boundaries); note: causes severe metadata bloat |
 | `fixed_32k`    | Fixed | 32 KiB         | medium fixed-size use cases                                                         |
 | `fixed_128k`   | Fixed | 128 KiB        | append-write files with predictable end-growth                                      |
+| `fixed_auto`   | Fixed | 128 KiB / 4 KiB | adaptive fixed-size chunks based on the previous same-path backup                  |
 
 See the detailed pages for each approach:
 
 
@@ -35,9 +35,17 @@ title: '文件分块'
     "chunking_rules": [
         {
             "algorithm": "fastcdc_32k",
-            "file_size_threshold": 104857600,
+            "file_size_threshold": 20971520,
             "patterns": [
-                "**/*.db"
+                "**/*.db",
+                "**/*.log"
+            ]
+        },
+        {
+            "algorithm": "fixed_auto",
+            "file_size_threshold": 262144,
+            "patterns": [
+                "**/*.mca"
             ]
         }
     ]
@@ -105,6 +113,7 @@ Prime Backup 仍会为整个文件创建一条数据对象（blob）记录，但
 | `fixed_4k`     | 固定大小 | 4 KiB   | MC region 文件（与 4 KiB 页边界对齐）；注意：会导致严重的元数据膨胀 |
 | `fixed_32k`    | 固定大小 | 32 KiB  | 中等粒度的固定大小场景                                |
 | `fixed_128k`   | 固定大小 | 128 KiB | 以追加写为主的文件                                  |
+| `fixed_auto`   | 固定大小 | 128 KiB / 4 KiB | 根据上一次同路径备份自适应切块                         |
 
 各方式的详细说明见独立文档：
 
 
@@ -229,9 +229,17 @@ Configs on how the backup is made
     "chunking_rules": [
         {
             "algorithm": "fastcdc_32k",
-            "file_size_threshold": 104857600,
+            "file_size_threshold": 20971520,
             "patterns": [
-                "**/*.db"
+                "**/*.db",
+                "**/*.log"
+            ]
+        },
+        {
+            "algorithm": "fixed_auto",
+            "file_size_threshold": 262144,
+            "patterns": [
+                "**/*.mca"
             ]
         }
     ],
@@ -478,6 +486,7 @@ Each rule contains the following fields:
     | `fixed_4k` | Fixed (alpha) | 4 KiB chunks; aligns with MC region file pages, but causes heavy metadata overhead |
     | `fixed_32k` | Fixed (alpha) | 32 KiB chunks; intermediate fixed-size option |
     | `fixed_128k` | Fixed (alpha) | 128 KiB chunks; well-suited for append-write files |
+    | `fixed_auto` | Fixed (alpha) | Adaptive 128 KiB / 4 KiB chunks based on the previous backup's same-path chunk layout |
 
     CDC algorithms determine chunk boundaries from file content, so local insertions, deletions, or in-place edits leave many chunks unchanged for reuse.
     See [CDC Chunking](chunking/chunking_cdc.md) for details.
@@ -487,7 +496,7 @@ Each rule contains the following fields:
 
     !!! warning
 
-        Fixed-size algorithms (`fixed_4k`, `fixed_32k`, `fixed_128k`) are in alpha status and not recommended for production use.
+        Fixed-size algorithms (`fixed_4k`, `fixed_32k`, `fixed_128k`, `fixed_auto`) are in alpha status and not recommended for production use.
 
     !!! note
 
@@ -502,7 +511,7 @@ Each rule contains the following fields:
 - `patterns`: A list of [gitignore flavor](http://git-scm.com/docs/gitignore) pattern strings,
   matched against file paths relative to [source_root](#source_root)
 
-The default value contains one rule that applies `fastcdc_32k` CDC chunking to `.db` files larger than 100 MiB.
+The default value contains two rules: `fastcdc_32k` CDC chunking for `.db` and `.log` files larger than 20 MiB, and `fixed_auto` chunking for `.mca` files larger than 256 KiB.
 It is recommended to keep the rules narrow and only cover large files that are often modified locally and really need to be backed up
 
 Changing this option only affects files newly stored in future backups.
 
@@ -229,9 +229,17 @@ Prime Backup 在创建备份时的操作时序如下：
     "chunking_rules": [
         {
             "algorithm": "fastcdc_32k",
-            "file_size_threshold": 104857600,
+            "file_size_threshold": 20971520,
             "patterns": [
-                "**/*.db"
+                "**/*.db",
+                "**/*.log"
+            ]
+        },
+        {
+            "algorithm": "fixed_auto",
+            "file_size_threshold": 262144,
+            "patterns": [
+                "**/*.mca"
             ]
         }
     ],
@@ -479,6 +487,7 @@ Prime Backup 会检查文件的如下这些信息。下述这些信息完全一
     | `fixed_4k` | 固定大小（alpha） | 4 KiB 数据块；与 MC region 文件的页边界对齐，但元数据开销极大 |
     | `fixed_32k` | 固定大小（alpha） | 32 KiB 数据块；中等粒度的固定大小选项 |
     | `fixed_128k` | 固定大小（alpha） | 128 KiB 数据块；适合追加写入为主的文件 |
+    | `fixed_auto` | 固定大小（alpha） | 基于上一次备份中同路径文件的分块布局，在 128 KiB 与 4 KiB 粒度间自适应 |
 
     CDC 算法根据文件内容确定数据块边界，因此局部插入、删除或原地修改不会影响其他数据块的哈希，这些数据块可直接复用。
     详见 [CDC 分块](chunking/chunking_cdc.zh.md)
@@ -488,7 +497,7 @@ Prime Backup 会检查文件的如下这些信息。下述这些信息完全一
 
     !!! warning
 
-        固定大小算法（`fixed_4k`、`fixed_32k`、`fixed_128k`）处于 alpha 状态，不建议在生产环境中使用
+        固定大小算法（`fixed_4k`、`fixed_32k`、`fixed_128k`、`fixed_auto`）处于 alpha 状态，不建议在生产环境中使用
 
     !!! note
 
@@ -503,7 +512,7 @@ Prime Backup 会检查文件的如下这些信息。下述这些信息完全一
 - `patterns`：一个 [gitignore 风格](http://git-scm.com/docs/gitignore) 的模板串列表，
   匹配对象是相对于 [source_root](#source_root) 的文件路径
 
-默认值中包含一条规则，对大于 100 MiB 的 `.db` 文件启用 `fastcdc_32k` CDC 分块。
+默认值中包含两条规则：对大于 20 MiB 的 `.db` 和 `.log` 文件启用 `fastcdc_32k` CDC 分块；对大于 256 KiB 的 `.mca` 文件启用 `fixed_auto` 分块。
 建议将规则控制得尽量精确，只包含那些体积大、经常发生局部修改、且确实需要备份的文件
 
 修改此选项只会影响后续备份中新写入的文件。
 
@@ -18,11 +18,13 @@
 from prime_backup.db import schema
 from prime_backup.db.access import DbAccess
 from prime_backup.db.session import DbSession
-from prime_backup.db.values import FileRole
+from prime_backup.db.values import FileRole, BlobStorageMethod
 from prime_backup.exceptions import UnsupportedFileFormat
 from prime_backup.types.backup_info import BackupInfo
 from prime_backup.types.backup_tags import BackupTags
 from prime_backup.types.blob_info import BlobDeltaSummary
+from prime_backup.types.chunk_method import ChunkMethod
+from prime_backup.types.chunker import PrettyChunk
 from prime_backup.types.operator import Operator
 from prime_backup.types.units import ByteCount
 from prime_backup.utils import sqlalchemy_utils
@@ -60,6 +62,8 @@ class _PreCalculationResult:
 	stats: Dict[Path, os.stat_result] = dataclasses.field(default_factory=dict)
 	hashes_and_chunks: Dict[Path, BlobPrecalculateResult] = dataclasses.field(default_factory=dict)
 	reused_files: Dict[Path, schema.File] = dataclasses.field(default_factory=dict)
+	previous_backup_files: Dict[str, schema.File] = dataclasses.field(default_factory=dict)
+	previous_file_chunks: Dict[Path, List[PrettyChunk]] = dataclasses.field(default_factory=dict)
 
 
 class CreateBackupAction(Action[BackupInfo]):
@@ -154,12 +158,19 @@ def __pre_calculate_stats(self, scan_result: _ScanResult):
 		for file_entry in scan_result.all_files:
 			stats[file_entry.path] = file_entry.stat
 
-	def __reuse_unchanged_files(self, session: DbSession, scan_result: _ScanResult):
+	def __load_previous_backup_files(self, session: DbSession):
+		previous_backup_files = self.__pre_calc_result.previous_backup_files
+		previous_backup_files.clear()
 		with self.__time_costs.measure_time_cost(CreateBackupTimeCostKey.kind_db):
 			backup = session.get_last_backup()
 		if backup is None:
 			return
 
+		with self.__time_costs.measure_time_cost(CreateBackupTimeCostKey.kind_db):
+			for file in session.get_backup_files(backup):
+				previous_backup_files[file.path] = file
+
+	def __reuse_unchanged_files(self, scan_result: _ScanResult):
 		@dataclasses.dataclass(frozen=True)
 		class StatKey:
 			path: str
@@ -169,11 +180,8 @@ class StatKey:
 			gid: int
 			mtime_ns: int
 
-		with self.__time_costs.measure_time_cost(CreateBackupTimeCostKey.kind_db):
-			backup_files = session.get_backup_files(backup.id)
-
 		stat_to_files: Dict[StatKey, schema.File] = {}
-		for file in backup_files:
+		for file in self.__pre_calc_result.previous_backup_files.values():
 			if stat.S_ISREG(file.mode):
 				if file.uid is None or file.gid is None or file.mtime is None:
 					raise AssertionError('file {!r} with ISREG mode has missing fields'.format(file))
@@ -200,6 +208,32 @@ class StatKey:
 				if (file_opt := stat_to_files.get(key)) is not None:
 					self.__pre_calc_result.reused_files[file_entry.path] = file_opt
 
+	def __cache_previous_chunks_for_fixed_auto(self, session: DbSession, scan_result: _ScanResult):
+		previous_file_chunks = self.__pre_calc_result.previous_file_chunks
+		previous_file_chunks.clear()
+
+		for file_entry in scan_result.all_files:
+			if not file_entry.is_file() or file_entry.path in self.__pre_calc_result.reused_files:
+				continue
+
+			rel_path = file_entry.path.relative_to(self.__source_path)
+			if ChunkMethod.get_for_file(rel_path, file_entry.stat.st_size) != ChunkMethod.fixed_auto:
+				continue
+
+			previous_file = self.__pre_calc_result.previous_backup_files.get(rel_path.as_posix())
+			if (
+					previous_file is None or
+					previous_file.blob_id is None or
+					previous_file.blob_storage_method != BlobStorageMethod.chunked.value
+			):
+				continue
+
+			with self.__time_costs.measure_time_cost(CreateBackupTimeCostKey.kind_db):
+				previous_file_chunks[file_entry.path] = [
+					PrettyChunk(offset=offset_chunk.offset, length=offset_chunk.chunk.raw_size, hash=offset_chunk.chunk.hash)
+					for offset_chunk in session.get_blob_chunks(previous_file.blob_id)
+				]
+
 	def __pre_calculate_hash_and_chunks(self, session: DbSession, blob_allocator: BlobAllocator, scan_result: _ScanResult):
 		hashes_and_chunks = self.__pre_calc_result.hashes_and_chunks
 		hashes_and_chunks.clear()
@@ -220,7 +254,12 @@ def __pre_calculate_hash_and_chunks(self, session: DbSession, blob_allocator: Bl
 		def hash_worker(pth: Path, pth_size: int):
 			rel_path = pth.relative_to(self.__source_path)
 			try:
-				result = BlobPrecalculateResult.from_file(pth, rel_path, pth_size)
+				result = BlobPrecalculateResult.from_file(
+					pth,
+					rel_path,
+					pth_size,
+					previous_chunks=self.__pre_calc_result.previous_file_chunks.get(pth),
+				)
 			except BlobPrecalculateResult.SizeMismatched:
 				return  # the file keeps changing, so it's not good to create a pre-calc result for it
 			hashes_and_chunks[pth] = result
@@ -304,13 +343,17 @@ def __create_backup(self, session_context: ContextManager[DbSession], session: D
 		def pre_calc_result_getter(src_path: Path) -> Optional[BlobPrecalculateResult]:
 			return self.__pre_calc_result.hashes_and_chunks.pop(src_path, None)  # one-time use
 
+		def previous_chunks_getter(src_path: Path) -> Optional[List[PrettyChunk]]:
+			return self.__pre_calc_result.previous_file_chunks.get(src_path)
+
 		blob_allocator = BlobAllocator(
 			session=session,
 			time_costs=self.__time_costs,
 			blob_recorder=blob_recorder,
 			source_path=self.__source_path,
 			temp_path=self.__temp_path,
 			pre_calc_result_getter=pre_calc_result_getter,
+			previous_chunks_getter=previous_chunks_getter,
 		)
 
 		self.logger.info('Scanning file for backup creation at path {!r}, targets: {}'.format(
@@ -334,10 +377,12 @@ def pre_calc_result_getter(src_path: Path) -> Optional[BlobPrecalculateResult]:
 		))
 
 		self.__pre_calculate_stats(scan_result)
+		self.__load_previous_backup_files(session)
 		if self.config.backup.reuse_stat_unchanged_file:
 			with self.__time_costs.measure_time_cost(CreateBackupTimeCostKey.stage_reuse_unchanged_files):
-				self.__reuse_unchanged_files(session, scan_result)
+				self.__reuse_unchanged_files(scan_result)
 			self.logger.info('Reused {} / {} stat unchanged files'.format(len(self.__pre_calc_result.reused_files), len(scan_result.all_files)))
+		self.__cache_previous_chunks_for_fixed_auto(session, scan_result)
 		if self.config.get_effective_concurrency() > 1:
 			with self.__time_costs.measure_time_cost(CreateBackupTimeCostKey.stage_pre_calculate_hash):
 				self.__pre_calculate_hash_and_chunks(session, blob_allocator, scan_result)