🔒 Security Note: All files in this directory are automatically validated for tampering and malicious content. Learn why validation matters →
This directory contains local filter rule lists that will be compiled into the final output filter list.
The data/input/ directory serves as the source location for:
- Local rule files: Filter rules in various formats (adblock, hosts, etc.)
- Internet list references: Text files containing URLs to remote filter lists
Place your local filter rule files in this directory. Supported formats:
-
AdBlock format (
.txt): Standard AdGuard/uBlock syntax! Comment ||example.com^ @@||allowed.com^ -
Hosts format (
.hosts,.txt): Traditional hosts file syntax# Comment 127.0.0.1 blocked-domain.com 0.0.0.0 another-blocked.com
Create a file named internet-sources.txt (or similar) containing one URL per line:
https://easylist.to/easylist/easylist.txt
https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
https://filters.adtidy.org/extension/chromium/filters/2.txt
Lines starting with # are treated as comments.
All input files are verified using SHA-384 hashes to ensure data integrity both at rest and in transit:
Purpose: Detect unauthorized modifications to local filter files
Process:
- On first compilation, compute SHA-384 hash for each local file
- Store hashes in
.hashes.jsondatabase (gitignored) - On subsequent compilations, verify files haven't changed unexpectedly
- Alert user if hash mismatch detected (potential tampering)
Hash Database (.hashes.json):
{
"custom-rules.txt": {
"hash": "abc123def456...",
"size": 1234,
"lastModified": "2024-12-27T10:30:00Z",
"lastVerified": "2024-12-27T14:45:00Z"
}
}Purpose: Prevent man-in-the-middle attacks and ensure download integrity
Process:
- User specifies expected hash in URL:
https://example.com/list.txt#sha384=hash - Download file over HTTPS (encrypted channel)
- Compute SHA-384 hash of downloaded content
- Compare with expected hash from URL
- Reject if mismatch (compilation fails with security error)
Hash Format Options:
# Option 1: Inline hash in URL (recommended)
https://easylist.to/easylist/easylist.txt#sha384=abc123...
# Option 2: Separate hash file
https://easylist.to/easylist/easylist.txt
https://easylist.to/easylist/easylist.txt.sha384
# Option 3: Hash database (internet-sources-hashes.json)
{
"https://easylist.to/easylist/easylist.txt": "abc123..."
}
Behavior:
- First download: Store hash automatically for future verification
- Subsequent downloads: Verify against stored hash
- Hash mismatch:
- Warn user of potential tampering or list update
- Require explicit confirmation to update hash
- Log old and new hashes for audit trail
Configure via environment variable or config file:
1. Strict Mode (production recommended):
{
"hashVerification": {
"mode": "strict",
"requireHashesForRemote": true,
"failOnMismatch": true
}
}- All remote sources must have hashes
- Any mismatch fails compilation
- Manual hash update required
2. Warning Mode (default):
{
"hashVerification": {
"mode": "warning",
"requireHashesForRemote": false,
"failOnMismatch": false
}
}- Hash mismatches generate warnings
- Compilation continues
- New hashes stored automatically
3. Disabled Mode (not recommended):
{
"hashVerification": {
"mode": "disabled"
}
}- No hash verification
- Security risk - only for testing
Before compilation, all input files undergo:
- Format detection: Automatic detection of adblock vs hosts syntax
- Syntax validation: Verification of rule syntax according to format
- Error reporting: Clear messages for invalid rules with line numbers
data/input/
├── README.md # This file
├── custom-rules.txt # Your custom adblock rules
├── company-blocklist.txt # Organization-specific rules
├── hosts-additions.hosts # Additional hosts entries
├── internet-sources.txt # URLs to remote lists
└── .gitignore # Ignore sensitive/large files
- Discovery: Scan
data/input/for all supported files - Validation: Lint and verify syntax of each file
- Hashing: Compute SHA-384 hash for integrity verification
- Remote fetch (if applicable): Download internet lists with hash verification
- Compilation: Merge all sources using
@jk-com/adblock-compiler - Output: Write final adblock-format list to
data/output/adguard_user_filter.txt
- Hash verification: Detects file tampering
- Syntax validation: Prevents injection of malformed rules
- Format enforcement: Final output is always in adblock syntax
- Source tracking: Maintains provenance of each rule
- URL validation: Internet sources undergo security checks before download
When using internet-sources.txt, all URLs are validated for security:
1. Protocol Verification
- ✅ HTTPS required: Only
https://URLs are allowed (enforced) - ❌ HTTP blocked: Insecure
http://URLs are rejected - ❌ Other protocols blocked:
ftp://,file://, etc. are rejected
2. Domain Validation
- Verify domain is resolvable via DNS
- Check against known malicious domain lists
- Ensure domain is not an IP address (prefer named domains)
- Validate domain follows proper DNS naming conventions
3. Content-Type Verification
- Verify HTTP response has
Content-Type: text/plainor similar text format - Reject binary content, executables, or unexpected MIME types
- Check
Content-Lengthheader for reasonable file sizes
4. Content Validation
- Download and scan first 1KB to verify it contains filter rules
- Check for valid rule syntax (adblock or hosts format)
- Reject files that don't match expected patterns
- Scan for suspicious patterns (scripts, embedded content, etc.)
5. Hash Verification
- Optionally specify expected SHA-384 hash for each URL
- Fail if downloaded content doesn't match expected hash
- Store hash database for known-good sources
Example with hash verification:
# internet-sources.txt with hashes
https://easylist.to/easylist/easylist.txt#sha384=abc123...
https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts#sha384=def456...
Security warnings:
⚠️ Untrusted sources may contain malicious or overly aggressive rules⚠️ Always verify the legitimacy of the source domain⚠️ Review downloaded lists before deploying to production⚠️ Use hash verification for critical sources⚠️ Monitor for unexpected changes in list content or size
- Create a
.txtfile indata/input/ - Add your filter rules in adblock or hosts format
- Run the compiler - it will automatically discover and include the file
- Create or edit
internet-sources.txt - Add one URL per line
- Run the compiler with internet source support enabled
# TypeScript compiler
cd src/rules-compiler-typescript
deno task compile
# .NET compiler
cd src/rules-compiler-dotnet
dotnet run
# Python compiler
cd src/rules-compiler-python
rules-compiler --input-dir ../../data/input --output ../../data/output/adguard_user_filter.txt
# Rust compiler
cd src/rules-compiler-rust
cargo run --release- Organize by purpose: Group related rules in separate files
- Add comments: Use
!or#to document rule purposes - Test incrementally: Add rules gradually and verify behavior
- Keep backups: Maintain copies before making bulk changes
- Track hashes: Note hash values for important reference files
- Review internet sources: Verify legitimacy of remote list URLs
- File was modified since last compilation
- Verify changes were intentional
- Recompile to update hash database
- Check rule format matches file type
- Ensure proper adblock or hosts syntax
- See format examples above
- Input file exceeds size limit
- Split into multiple smaller files
- Review and remove unnecessary rules