Locus 性能基准测试

本项目使用 BenchmarkDotNet 进行性能基准测试。

运行基准测试

运行所有基准测试

cd tests/Locus.Benchmarks
dotnet run -c Release

运行特定的基准测试类

# 元数据仓库基准测试
dotnet run -c Release --filter "Locus.Benchmarks.MetadataRepositoryBenchmarks*"

# 并发操作基准测试
dotnet run -c Release --filter "Locus.Benchmarks.ConcurrentOperationsBenchmarks*"

# 目录配额基准测试
dotnet run -c Release --filter "Locus.Benchmarks.DirectoryQuotaBenchmarks*"

# 租户管理基准测试
dotnet run -c Release --filter "Locus.Benchmarks.TenantManagerBenchmarks*"

# StoragePool 写入吞吐量基准测试
dotnet run -c Release --filter "Locus.Benchmarks.StoragePoolWriteThroughputBenchmarks*"

# StoragePool 并发写入基准测试
dotnet run -c Release --filter "Locus.Benchmarks.StoragePoolConcurrencyBenchmarks*"

# Volume 健康检查基准测试
dotnet run -c Release --filter "Locus.Benchmarks.VolumeHealthCheckBenchmarks*"

# 待处理文件分配锁竞争（第7点）基准测试
dotnet run -c Release --filter "Locus.Benchmarks.PendingAllocationContentionBenchmarks*"

# 第5/6点基准测试（脏键刷盘 + known-dir 裁剪）
dotnet run -c Release --filter "Locus.Benchmarks.DirectoryQuotaDirtyFlushBenchmarks*"
dotnet run -c Release --filter "Locus.Benchmarks.KnownDirectoryTrimBenchmarks*"

# 第1~4点基准测试（FileWatcher 主路径）
dotnet run -c Release --filter "Locus.Benchmarks.FileWatcherLargeDirectoryBenchmarks*"
dotnet run -c Release --filter "Locus.Benchmarks.FileWatcherConfigurationDispatchBenchmarks*"
dotnet run -c Release --filter "Locus.Benchmarks.FileWatcherMultiTenantStatusBenchmarks*"
dotnet run -c Release --filter "Locus.Benchmarks.FileWatcherPrunePressureBenchmarks*"

# Metadata 冷启动内存验证（SQLite 冷状态不进内存）
dotnet run -c Release -- metadata-cold-start-memory --cold 900000 --hot 100 --cold-query-limit 100

# 生成第5/6点参数矩阵对比面板（会自动跑基准并输出 Markdown）
pwsh ./tests/Locus.Benchmarks/build-phase56-matrix-panel.ps1

# 生成第1~4点基准对比面板（会自动跑基准并输出 Markdown）
pwsh ./tests/Locus.Benchmarks/build-phase14-benchmark-panel.ps1

注意: BenchmarkDotNet 过滤器使用全限定名 namespace.typeName.methodName，不支持 | OR 语法。每次只能过滤一个类。

运行特定的基准测试方法

# 只运行并发写入测试
dotnet run -c Release --filter "*ConcurrentWrites*"

# 只运行元数据添加测试
dotnet run -c Release --filter "*AddOrUpdateAsync_Single*"

基准测试类说明

1. MetadataRepositoryBenchmarks

测试文件元数据仓库的性能（Write-Behind + _pendingKeys 索引架构）:

AddOrUpdate single file metadata: 单个文件元数据的添加/更新性能（内存优先，O(1)）
Get file metadata (cache hit): 缓存命中时的元数据查询性能（纯 ConcurrentDictionary 查找）
Get non-existent file (returns null): 查询不存在的 key 的性能（无 LiteDB fallback，直接返回 null）
Batch insert 100 files: 批量插入100个文件的性能
Get next pending file (100-file pool, stable state): 从固定100文件池中获取下一个待处理文件（_pendingKeys O(n_pending) 扫描）

2. DirectoryQuotaBenchmarks

测试目录配额管理的性能:

Check can add file (no limit): 无配额限制时检查是否可添加文件
Check can add file (with limit): 有配额限制时检查是否可添加文件
Increment file count: 原子递增文件计数
Decrement file count: 原子递减文件计数
Set directory limit: 设置目录限制
Get file count: 获取文件计数

3. TenantManagerBenchmarks

测试租户管理的性能:

Create tenant: 创建租户的性能
Get tenant (cache hit): 缓存命中时获取租户
Get tenant (cache miss): 缓存未命中时获取租户
Get tenant with auto-create: 自动创建租户的性能
Check tenant enabled: 检查租户是否启用
Enable tenant: 启用租户
Disable tenant: 禁用租户

4. ConcurrentOperationsBenchmarks

测试通过 StoragePool 的并发端到端场景:

10/50/100 concurrent writes: 10/50/100个并发写入操作
10 concurrent reads (pure read): 仅并发读取预置文件（不含预写成本）
10 concurrent reads (write+read): 每次基准先写入再并发读取（端到端准备+读取）
Mixed read/write operations: 混合读写操作 (10写+10读)

5. VolumeHealthCheckBenchmarks

测试 Volume 健康检查与磁盘空间查询（TTL 缓存路径）:

IsHealthy: 缓存命中时的健康检查延迟
AvailableSpace: 缓存命中时的可用空间查询延迟
TotalCapacity: 缓存命中时的总容量查询延迟

6. StoragePoolWriteThroughputBenchmarks

测试 StoragePool.WriteFileAsync 端到端单线程写入吞吐量:

Single-threaded write (100 KB): 100 KB 文件顺序写入
Single-threaded write (1 MB): 1 MB 文件顺序写入
Single-threaded write (10 MB): 10 MB 文件顺序写入

7. StoragePoolConcurrencyBenchmarks

测试 StoragePool.WriteFileAsync 并发写入扩展性:

1 writer (baseline): 单线程 baseline
10 concurrent writers: 10 线程并发写入
50 concurrent writers: 50 线程并发写入
100 concurrent writers: 100 线程并发写入

8. PendingAllocationContentionBenchmarks

测试 MetadataRepository 在单租户高并发下的待处理文件分配竞争:

pending allocation contention (single-file API): 并发 GetNextPendingFileAsync + 回填 Pending
pending allocation contention (batch API): 并发 GetNextPendingBatchAsync + 回填 Pending

9. DirectoryQuotaDirtyFlushBenchmarks

测试 DirectoryQuotaRepository 在大目录总量下的“脏键索引”刷盘路径:

directory quota flush (indexed sparse dirty set): 仅刷脏目录的后台 flush
参数矩阵：TotalDirectories = [20000, 100000] × DirtyDirectories = [32, 128, 512]

10. KnownDirectoryTrimBenchmarks

测试 LocalFileSystemVolume known-directory 缓存在高 churn 下的裁剪路径:

known-directory cache trim under churn: 高频新增目录触发缓存裁剪
参数矩阵：CacheMaxEntries = [256, 512, 2048] × DirectoryAddsPerOperation = [2048, 4096, 8192]

11. Phase56 Matrix Panel（对比面板）

用于把第5/6点矩阵结果汇总成一个可对比的 Markdown 面板：

输出文件：BenchmarkDotNet.Artifacts/results/Locus.Benchmarks.Phase56-Matrix-Panel.md
源数据：Locus.Benchmarks.DirectoryQuotaDirtyFlushBenchmarks-report.csv + Locus.Benchmarks.KnownDirectoryTrimBenchmarks-report.csv
明细表包含 RatioVsBaseline（相对基线倍率，x）

12. FileWatcherLargeDirectoryBenchmarks

测试第1点（流式扫描 + 小队列）在大目录下的扫描性能:

point1 large-directory streamed scan: 单租户大目录扫描与导入
参数矩阵：FileCount = [2000, 10000] × MaxConcurrentImports = [1, 8]

13. FileWatcherConfigurationDispatchBenchmarks

测试第2点（按配置对象扫描）调度路径:

point2 scan dispatch by watcher id (forced config load): 强制按 watcherId 重载配置
point2 scan dispatch by configuration snapshot: 直接按配置对象扫描

14. FileWatcherMultiTenantStatusBenchmarks

测试第3点（复用 GetTenantAsync 状态）在多租户扫描下的表现:

point3 multi-tenant scan with status reuse
参数矩阵：TenantDirectoryCount = [40, 160] × DisabledRatioPercent = [0, 50]

15. FileWatcherPrunePressureBenchmarks

测试第4点（prune 脱离热路径）在历史压力下的扫描表现:

point4 scan under imported-history prune pressure
参数矩阵：ExistingImportedEntries = [60000, 120000] × NewFiles = [256, 1024]

16. Phase14 Benchmark Panel（对比面板）

用于把第1~4点基准结果汇总成一个可对比的 Markdown 面板：

输出文件：BenchmarkDotNet.Artifacts/results/Locus.Benchmarks.Phase14-Benchmark-Panel.md
明细表包含 RatioVsBaseline（相对基线倍率，x）

性能指标说明

BenchmarkDotNet 会输出以下关键指标:

Mean: 平均执行时间
Error: 误差范围 (99.9% 置信区间的一半)
StdDev: 标准差
Median: 中位数执行时间
Gen0/Gen1/Gen2: GC 回收次数 (每1000次操作)
Allocated: 每次操作分配的内存

实际基准测试结果

测试环境: Intel Core i5-9400 CPU 2.90GHz (Coffee Lake), 6 cores, .NET 10.0.0, Windows 11 24H2

StoragePool 写入吞吐量（单线程）

FileSize	Mean	StdDev	Allocated
100 KB	1.074 ms	0.086 ms	7.34 KB
1 MB	1.296 ms	0.023 ms	7.34 KB
10 MB	4.736 ms	0.400 ms	12.55 KB

StoragePool 并发写入扩展性（1 MB/文件，iterationCount=10）

Concurrency	Mean	StdDev	Allocated
1 writer (baseline)	2.839 ms	0.350 ms	77.04 KB
10 concurrent writers	26.368 ms	6.849 ms	273.52 KB
50 concurrent writers	113.463 ms	10.608 ms	701.02 KB
100 concurrent writers	228.952 ms	11.859 ms	1163.76 KB

端到端并发场景（iterationCount=10）

Method	threadCount	Mean	StdDev	Allocated
10 concurrent reads (write+read)	—	14.413 ms	4.913 ms	1520.69 KB
Mixed read/write (20 ops)	—	8.312 ms	0.469 ms	1157.40 KB
Concurrent writes	10	3.076 ms	0.174 ms	380.72 KB
Concurrent writes	50	14.249 ms	1.270 ms	1644.43 KB
Concurrent writes	100	25.319 ms	1.970 ms	2781.31 KB

ConcurrentReads_PureRead 已新增，用于单独测量纯读取延迟；请运行最新基准获取独立数值。

目录配额操作（Lock-Free CAS）

Method	Mean	StdDev	Allocated
Check can add file (no limit)	141.25 ns	2.966 ns	176 B
Check can add file (with limit)	139.07 ns	1.272 ns	176 B
Increment file count	94.80 ns	0.906 ns	72 B
Decrement file count	231.05 ns	0.968 ns	248 B
Set directory limit	2.916 ms	11.215 μs	142.2 KB
Get file count	145.41 ns	0.194 ns	232 B

Volume 健康检查（30s TTL 缓存）

Method	Mean	StdDev
IsHealthy (cached)	17.88 ns	0.065 ns
AvailableSpace (cached)	22.33 ns	0.029 ns
TotalCapacity (cached)	22.32 ns	0.022 ns

元数据操作（Write-Behind + `_pendingKeys` 索引）

Method	Mean	StdDev	Allocated
AddOrUpdate single file	1.878 μs	121.9 ns	2.4 KB
Get file (cache hit)	40.91 ns	0.062 ns	72 B
Get non-existent file (returns null)	34.87 ns	0.017 ns	0 B
Batch insert 100 files	237.9 μs	17.73 μs	63.4 KB
Get next pending file (100-file pool)	2.975 μs	316.1 ns	1.4 KB

租户管理（5分钟缓存）

Method	Mean	StdDev	Allocated
Create tenant	2.589 ms	1.854 ms	7.0 KB
Get tenant (cache hit)	81.95 ns	0.230 ns	104 B
Get tenant (cache miss)	24.23 μs	105.8 ns	1.28 KB
Get tenant (auto-create)	2.116 ms	1.177 ms	9.3 KB
Check tenant enabled (cache hit)	89.86 ns	2.417 ns	104 B
Enable tenant	3.060 ms	1.267 ms	21.1 KB
Disable tenant	1.864 ms	274.2 μs	14.3 KB

关键结论

⚡ 目录配额 CAS 无锁: 94.80 ns 每次计数递增（vs 旧版 SemaphoreSlim + LiteDB 同步写入 ~200 μs，提升约 2000x）
⚡ Volume 健康/空间缓存: 17–22 ns，消除每次写入的额外磁盘 I/O（旧版每次写入触发临时文件创建+删除）
⚡ 元数据缓存命中: 40.91 ns（纯 ConcurrentDictionary 查找，旧测量值 317.7 ns 因错误地把 AddOrUpdateAsync 计入测量范围而虚高）
⚡ 元数据 Write-Behind: 1.878 μs 每文件（内存优先，LiteDB 后台异步落盘）
⚡ _pendingKeys 索引: GetNextPendingFileAsync 从 O(n_active) 全量扫描降为 O(n_pending) 线性扫描，旧测量值 5.886 ms 因状态累积严重失真，实际为 2.975 μs（100 文件池）
⚡ 租户缓存: 82–90 ns（5分钟缓存，远低于旧版 JSON 文件读取 ~24 μs）
✅ 100 KB 文件写入: ~1.1 ms 端到端（配额检查 + 磁盘写入 + 元数据）
✅ 100 并发写入（1 MB/文件）: 229 ms，StdDev 5.2%（iterationCount=10，统计更可靠）

注意: 并发写入 StdDev 受系统调度抖动影响。实际性能取决于硬件配置（CPU、磁盘类型: HDD vs SSD）。 ConcurrentOperationsBenchmarks 的 "10 concurrent reads" StdDev 较大（4.9 ms），原因是读取场景涉及磁盘 I/O，受操作系统缓存和调度影响较大；绝对误差仍在可接受范围。

优化建议

根据基准测试结果，如遇到性能问题可考虑以下方向:

如果元数据查询慢（cache hit > 100 ns，或 GetNextPendingFileAsync > 10 μs）:
- 检查 _pendingKeys 索引是否正常工作（应只扫描 Pending 文件，而非全量 active 文件）
- 如果 active 文件数量巨大（>10 万），考虑加快清理服务运行频率，及时清理 PermanentlyFailed 文件
- 检查是否有大量 Completed 文件未及时清理，导致 LiteDB 数据库过大
如果并发写入慢（> 250 ms / 100 写入）:
- 首要检查磁盘 I/O 吞吐量（100 KB 文件顺序写入 < 1.1 ms 是正常水平）
- 考虑挂载多个存储 Volume 以分散 I/O 压力
- 考虑使用 SSD 存储
如果 Volume 健康/空间查询慢（> 100 ns）:
- 检查 LocalFileSystemVolume 的 TTL 缓存是否生效（默认 30 秒）
- 如果每次都触发文件写删测试，说明缓存失效过快
如果配额检查慢（> 200 ns）:
- 检查 AtomicQuotaState 是否被正确初始化（GlobalSetup 应预热热路径）
- 如果每次都需要创建新路径，会触发 Write-Behind flush
如果租户操作慢（缓存命中 > 200 ns）:
- 增加租户缓存时间（默认 5 分钟）
- 检查是否频繁调用 EnableTenantAsync/DisableTenantAsync 导致缓存失效

持续集成

可以将基准测试集成到 CI 流程中，跟踪性能变化:

# 生成 Markdown 格式的性能报告
dotnet run -c Release --exporters markdown

# 生成 JSON + Markdown 格式报告
dotnet run -c Release --exporters json markdown

# 导出结果到指定目录
dotnet run -c Release --exporters markdown --artifacts ./benchmark-results

结果文件保存在 BenchmarkDotNet.Artifacts/results/ 目录，文件名格式:

Locus.Benchmarks.<ClassName>-report-github.md — GitHub Flavored Markdown 表格
Locus.Benchmarks.<ClassName>-report.csv — CSV 格式（适合自动化比较）
Locus.Benchmarks.<ClassName>-report.html — HTML 可视化报告

P0-2 Peak Memory Validation (100k+ files)

To validate the peak managed memory target for orphan-rebuild (not only BDN Allocated), run:

pwsh ./tests/Locus.Benchmarks/measure-orphan-peak-memory.ps1 `
  -Mode both `
  -FileCount 120000 `
  -MetadataRatio 0.90 `
  -AttachDelaySec 15 `
  -PostRunDelaySec 5

Or run directly:

dotnet run -c Release --project tests/Locus.Benchmarks -- `
  --scenario orphan-rebuild-peak `
  --mode both `
  --file-count 120000 `
  --metadata-ratio 0.90 `
  --attach-delay-sec 15 `
  --post-run-delay-sec 5

During attach delay, use dotnet-counters (if installed) for external EventPipe verification:

dotnet-counters monitor -p <pid> --counters System.Runtime[gc-heap-size]

The scenario prints:

baseline: legacy full-HashSet style path check
current: current on-demand orphan rebuild path
peak values and Delta(current-vs-baseline)

If dotnet-counters is unavailable, the command falls back to an in-process heap sampler and still reports a peak-bytes estimate for comparison.

Phase D Cold-Start Benchmark

Run cold-start active-index loading benchmark:

dotnet run -c Release --project tests/Locus.Benchmarks --filter "Locus.Benchmarks.MetadataRepositoryColdStartBenchmarks*"

Architecture-Sensitive Benchmark Update (2026-04-09)

The benchmark suite was refreshed after the queue/cleanup architecture changes introduced dead-letter disposition and made DeadLettered part of the active metadata lifecycle.

Benchmark adjustments

CleanupLargeTenantBenchmarks now seeds real permanently failed files on a mounted volume instead of benchmarking the already-missing-file fallback path.
The cleanup benchmark now compares the current default MoveToDeadLetter path with the optional Delete disposition so we can track the extra cost of preserving dead-letter artifacts.
MetadataRepositoryColdStartBenchmarks now includes DeadLettered rows in the 20,000-file startup dataset and probes a dead-lettered record after load to keep the new state in the measured path.

Latest results

Environment:

Ran on 2026-04-09 outside the sandbox.
BenchmarkDotNet v0.15.8, .NET 10.0.5, Windows 11 25H2.
Intel Core Ultra 9 185H, 22 logical cores / 16 physical cores.

Cleanup Status Path

Scenario	Processing	PermanentlyFailed	Mean	StdDev	Allocated
Move to dead letter	1200	800	848.0 ms	21.56 ms	19.28 MB
Delete permanently failed files	1200	800	559.8 ms	76.37 ms	13.97 MB

Notes:

The current default MoveToDeadLetter path is about 51.5% slower than Delete in this synthetic run.
The additional cost comes from moving physical files and preserving metadata in DeadLettered state.

Metadata Cold Start (includes `DeadLettered`)

Active files	Startup batch size	Mean	StdDev	Allocated
20000	512	114.6 ms	22.38 ms	29.94 MB
20000	2000	111.9 ms	14.62 ms	29.55 MB