This guide explains the parallel chunking feature available in the rules compilers for improved performance when processing large filter lists.
When compiling filter lists with many sources or millions of rules, chunking addresses this by:
- Splitting sources into chunks - Distributes sources across multiple parallel workers
- Compiling chunks in parallel - Uses multiple CPU cores simultaneously
- Merging results - Combines chunk outputs with deduplication
| Scenario | Sources | Rules | Sequential Time | Chunked Time (4 cores) | Speedup |
|---|---|---|---|---|---|
| Small | 10 | ~50k | 15s | 12s | 1.25x |
| Medium | 50 | ~250k | 75s | 25s | 3x |
| Large | 200 | ~1M | 300s | 85s | 3.5x |
Times are approximate and depend on source download speed and hardware
| Compiler | Chunking Support | Status |
|---|---|---|
| TypeScript | Full | Production |
| .NET | Full | Production |
| Python | Full | Production |
| Rust | Full | Production |
{
"name": "My Filter List",
"sources": [...],
"chunking": {
"enabled": true,
"chunkSize": 100000,
"maxParallel": 4,
"strategy": "source"
}
}deno task compile -- --enable-chunking --chunk-size 100000 --max-parallel 4var options = new CompilerOptions
{
ConfigPath = "config.yaml",
Chunking = new ChunkingOptions
{
Enabled = true,
ChunkSize = 100_000,
MaxParallel = Environment.ProcessorCount,
Strategy = ChunkingStrategy.Source
}
};
var result = await compiler.CompileAsync(options);// For small lists (chunking disabled)
var options = CompilerOptions.Default;
// For large lists (chunking enabled with optimal settings)
var options = CompilerOptions.ForLargeLists;services.AddRulesCompiler();
// The IChunkingService is automatically registered
var chunkingService = serviceProvider.GetRequiredService<IChunkingService>();from rules_compiler import RulesCompiler
from rules_compiler.chunking import ChunkingOptions, ChunkingStrategy
# Create chunking options
chunking_options = ChunkingOptions(
enabled=True,
chunk_size=100_000,
max_parallel=os.cpu_count() or 4,
strategy=ChunkingStrategy.SOURCE
)
# Use preset for large lists
chunking_options = ChunkingOptions.for_large_lists()
# Compile with chunking
compiler = RulesCompiler(chunking=chunking_options)
result = await compiler.compile_async("config.yaml")rules-compiler -c config.yaml --chunking --max-parallel 4use rules_compiler::{
ChunkingOptions, ChunkingStrategy, CompilerConfig,
should_enable_chunking, split_into_chunks, compile_chunks_async, merge_chunks
};
// Create chunking options
let options = ChunkingOptions::new()
.with_enabled(true)
.with_chunk_size(100_000)
.with_max_parallel(8)
.with_strategy(ChunkingStrategy::Source);
// Use preset for large lists
let options = ChunkingOptions::for_large_lists();
// Split, compile, and merge
if should_enable_chunking(&config, Some(&options)) {
let chunks = split_into_chunks(&config, &options);
let result = compile_chunks_async(chunks, &options, false).await?;
println!("Speedup: {:.2}x", result.estimated_speedup());
}rules-compiler -c config.yaml --chunking --max-parallel 4| Option | Type | Default | Description |
|---|---|---|---|
enabled |
boolean | false |
Enable parallel chunking |
chunkSize |
number | 100000 |
Maximum estimated rules per chunk |
maxParallel |
number | CPU cores | Maximum parallel workers |
strategy |
string | "source" |
Chunking strategy |
| Strategy | Description | Best For |
|---|---|---|
source |
Distributes sources evenly across chunks | Most use cases |
line-count |
Balances by estimated line count | (Planned) |
-
Calculate chunks: Sources are distributed evenly
Total sources: 20 Max parallel: 4 → 4 chunks with 5 sources each -
Batch processing: Chunks run in parallel batches
Batch 1: Chunks 1-4 (parallel) Batch 2: Chunks 5-8 (parallel) [if needed] -
Merge results: All outputs combined with deduplication
Chunk 1: 25,000 rules Chunk 2: 30,000 rules Chunk 3: 28,000 rules Chunk 4: 27,000 rules ───────────────────── Total: 110,000 rules After dedup: 95,000 rules (removed 15,000 duplicates)
When enabled is not explicitly set:
- Multiple sources + Source strategy → Chunking enabled automatically
- Single source → Chunking disabled (no benefit)
The merge process:
- Flattens all chunk outputs into a single list
- Deduplicates actual filter rules while preserving order
- Preserves comments (
!and#prefixed lines) - Preserves empty lines for readability
- Reports duplicate count in logs
[INFO] Merging 4 chunks...
[DEBUG] Total rules before deduplication: 110000
[INFO] Merged to 95000 rules (removed 15000 duplicates)
The compilation result includes chunking metrics:
var result = await compiler.CompileAsync(options);
// ChunkedCompilationResult properties:
result.TotalRules // Sum of all chunk rules
result.FinalRuleCount // After deduplication
result.DuplicatesRemoved // Number removed
result.TotalElapsedMs // Wall clock time
result.EstimatedSpeedup // Ratio of sequential/parallel time
result.Chunks // Individual chunk metadata| Sources | Recommendation |
|---|---|
| 1-5 | Disable chunking (overhead not worth it) |
| 6-20 | Enable with default settings |
| 20+ | Enable with maxParallel matching CPU cores |
// Recommended for most large filter lists
var options = new ChunkingOptions
{
Enabled = true,
ChunkSize = 100_000,
MaxParallel = Math.Max(2, Environment.ProcessorCount),
Strategy = ChunkingStrategy.Source
};- Each chunk runs a separate
hostlist-compilerprocess - Memory usage scales with
maxParallel - For memory-constrained systems, reduce
maxParallelto 2-4
- Network-bound sources: Chunking helps less when sources are slow to download
- Single large source: Cannot parallelize a single source file
- Transformation order: Global transformations run after merge, not per-chunk
Check that:
enabled: truein configuration- Multiple sources exist (for automatic enablement)
ChunkingServiceis registered (DI scenarios)
Possible causes:
- Sources are network-bound (download time dominates)
- Too few sources to benefit from parallelism
maxParallelset too low
Solutions:
- Reduce
maxParallelto 2-4 - Ensure sufficient RAM (2GB+ recommended for large lists)
// Options
public class ChunkingOptions
{
public bool Enabled { get; set; }
public int ChunkSize { get; set; }
public int MaxParallel { get; set; }
public ChunkingStrategy Strategy { get; set; }
}
// Result
public class ChunkedCompilationResult
{
public bool Success { get; set; }
public long TotalElapsedMs { get; set; }
public List<ChunkMetadata> Chunks { get; set; }
public int TotalRules { get; set; }
public int FinalRuleCount { get; set; }
public int DuplicatesRemoved { get; set; }
public double EstimatedSpeedup { get; }
}
// Service interface
public interface IChunkingService
{
bool ShouldEnableChunking(CompilerConfiguration config, ChunkingOptions? options);
List<(CompilerConfiguration Config, ChunkMetadata Metadata)> SplitIntoChunks(...);
Task<ChunkedCompilationResult> CompileChunksAsync(...);
(string[] Rules, int DuplicatesRemoved) MergeChunks(List<string[]> chunkResults);
double EstimateSpeedup(int totalRules, ChunkingOptions options);
}from dataclasses import dataclass
from enum import Enum
class ChunkingStrategy(Enum):
SOURCE = "source"
LINE_COUNT = "line_count"
@dataclass
class ChunkingOptions:
enabled: bool = False
chunk_size: int = 100_000
max_parallel: int = os.cpu_count() or 4
strategy: ChunkingStrategy = ChunkingStrategy.SOURCE
@classmethod
def default(cls) -> "ChunkingOptions": ...
@classmethod
def for_large_lists(cls) -> "ChunkingOptions": ...
@dataclass
class ChunkMetadata:
index: int
total: int
estimated_rules: int = 0
actual_rules: int | None = None
sources: list[FilterSource] = field(default_factory=list)
elapsed_ms: int | None = None
success: bool = False
@dataclass
class ChunkedCompilationResult:
success: bool = False
total_elapsed_ms: int = 0
chunks: list[ChunkMetadata] = field(default_factory=list)
total_rules: int = 0
final_rule_count: int = 0
duplicates_removed: int = 0
@property
def estimated_speedup(self) -> float: ...
# Functions
def should_enable_chunking(config: CompilerConfiguration, options: ChunkingOptions | None) -> bool: ...
def split_into_chunks(config: CompilerConfiguration, options: ChunkingOptions) -> list[tuple[CompilerConfiguration, ChunkMetadata]]: ...
async def compile_chunks_async(chunks: list, options: ChunkingOptions, debug: bool = False) -> ChunkedCompilationResult: ...
def merge_chunks(chunk_results: list[list[str]]) -> tuple[list[str], int]: ...
def estimate_speedup(total_rules: int, options: ChunkingOptions) -> float: ...// Strategy enum
#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
pub enum ChunkingStrategy {
#[default]
Source,
LineCount,
}
// Options
pub struct ChunkingOptions {
pub enabled: bool,
pub chunk_size: usize,
pub max_parallel: usize,
pub strategy: ChunkingStrategy,
}
impl ChunkingOptions {
pub fn new() -> Self;
pub fn for_large_lists() -> Self;
pub fn with_enabled(self, enabled: bool) -> Self;
pub fn with_chunk_size(self, chunk_size: usize) -> Self;
pub fn with_max_parallel(self, max_parallel: usize) -> Self;
pub fn with_strategy(self, strategy: ChunkingStrategy) -> Self;
}
// Metadata
pub struct ChunkMetadata {
pub index: usize,
pub total: usize,
pub estimated_rules: usize,
pub actual_rules: Option<usize>,
pub sources: Vec<FilterSource>,
pub elapsed_ms: Option<u64>,
pub success: bool,
pub error_message: Option<String>,
pub output_path: Option<PathBuf>,
}
// Result
pub struct ChunkedCompilationResult {
pub success: bool,
pub total_elapsed_ms: u64,
pub chunks: Vec<ChunkMetadata>,
pub total_rules: usize,
pub final_rule_count: usize,
pub duplicates_removed: usize,
pub merged_rules: Option<Vec<String>>,
pub errors: Vec<String>,
}
impl ChunkedCompilationResult {
pub fn estimated_speedup(&self) -> f64;
}
// Functions
pub fn should_enable_chunking(config: &CompilerConfig, options: Option<&ChunkingOptions>) -> bool;
pub fn split_into_chunks(config: &CompilerConfig, options: &ChunkingOptions) -> Vec<(CompilerConfig, ChunkMetadata)>;
pub async fn compile_chunks_async(chunks: Vec<(CompilerConfig, ChunkMetadata)>, options: &ChunkingOptions, debug: bool) -> Result<ChunkedCompilationResult>;
pub fn merge_chunks(chunk_results: &[Vec<String>]) -> (Vec<String>, usize);
pub fn estimate_speedup(total_rules: usize, options: &ChunkingOptions) -> f64;The repository includes a comprehensive benchmark suite to measure chunking performance.
Run a quick simulation to see expected speedups on your system:
cd benchmarks
# Run comparison suite (recommended)
python quick_benchmark.py --suite
# Run parallel scaling test
python quick_benchmark.py --scaling
# Custom benchmark
python quick_benchmark.py --rules 500000 --parallel 8
# Interactive mode
python quick_benchmark.py --interactiveExample output:
======================================================================
CHUNKING PERFORMANCE COMPARISON SUITE
======================================================================
CPU cores available: 8
Max parallel workers: 8
Size Sequential Parallel Speedup Efficiency
----------------------------------------------------------------------
10K rules 150 ms 70 ms 2.14x 27%
50K rules 570 ms 130 ms 4.38x 55%
200K rules 2,350 ms 350 ms 6.71x 84%
500K rules 5,400 ms 800 ms 6.75x 84%
----------------------------------------------------------------------
Average speedup: 5.00x
Maximum speedup: 6.75x
Generate synthetic test data and run actual compilation benchmarks:
cd benchmarks
# Generate test data (small, medium, large, xlarge filter lists)
python generate_synthetic_data.py --all
# Run benchmarks across all compilers
python run_benchmarks.py
# Run specific compiler only
python run_benchmarks.py --compiler python --iterations 5
# Run specific size only
python run_benchmarks.py --size largeBased on synthetic benchmarks:
| Rule Count | Sequential | 4 Workers | 8 Workers | Speedup (8w) |
|---|---|---|---|---|
| 10,000 | ~150ms | ~60ms | ~40ms | 3.75x |
| 50,000 | ~600ms | ~200ms | ~120ms | 5.0x |
| 200,000 | ~2.5s | ~800ms | ~400ms | 6.25x |
| 500,000 | ~6s | ~1.8s | ~900ms | 6.67x |
Actual times vary by hardware, I/O speed, and network latency for remote sources
Speedup scales with CPU cores but with diminishing returns:
| Workers | Theoretical Max | Typical Efficiency |
|---|---|---|
| 2 | 2.0x | 90-100% |
| 4 | 4.0x | 85-95% |
| 8 | 8.0x | 75-90% |
| 16 | 16.0x | 60-80% |
Efficiency decreases due to:
- Process startup overhead
- Merge/deduplication time
- Memory bandwidth limits
- I/O contention
- Line-count strategy: Balance chunks by estimated rule count
- Streaming merge: Reduce memory usage for very large outputs
- Source caching: Cache downloaded sources across chunks
- Progress callbacks: Real-time progress reporting