FSharp.Data Performance Research and Improvement Plan
Executive Summary
FSharp.Data is a mature F# library providing type providers and data access tools for CSV, HTML, JSON, and XML. After analyzing the codebase, this plan outlines performance improvement opportunities with focus on JSON parsing, string processing, and memory allocation optimizations.
Current Performance Infrastructure
Benchmarking Setup ✅
- Location:
tests/FSharp.Data.Benchmarks/
- Framework: BenchmarkDotNet with memory diagnostics
- Coverage: JSON parsing benchmarks (Simple, Nested, GitHub, Twitter, WorldBank)
- Commands:
dotnet run --project build/build.fsproj -- -t RunBenchmarks
cd tests/FSharp.Data.Benchmarks && ./run-benchmarks.sh
Build & Test Infrastructure ✅
- Build: FAKE-based build system (
build/build.fs)
- CI: GitHub Actions for Windows/Ubuntu
- Commands:
./build.sh or dotnet run --project build/build.fsproj -- -t Build
- Test:
dotnet run --project build/build.fsproj -- -t RunTests
Performance Analysis
1. JSON Processing Bottlenecks 🎯
Primary Target: src/FSharp.Data.Json.Core/JsonValue.fs
Identified Issues:
- String Building: Heavy use of
StringBuilder in JSON serialization (WriteTo method)
- Parsing Algorithm: Recursive descent parser with potential stack overhead
- Memory Allocations:
- Array allocations for
Record properties and Array elements
- String interpolation and concatenation
- Buffer management in parser state
Current Performance Characteristics:
- ParseSimpleJson: Small documents (~1KB)
- ParseGitHubJson: Medium documents (~75KB)
- ParseTwitterJson: Medium documents (~74KB)
- ParseWorldBankJson: Small-medium documents (~20KB)
2. Type Provider Performance 🔍
Type providers generate code at compile time, affecting:
- Design-time: IntelliSense responsiveness
- Runtime: Generated type instantiation
- Memory: Schema inference caching
3. CSV/XML/HTML Processing 📊
Similar patterns exist across:
src/FSharp.Data.Csv.Core/: CSV parsing and inference
src/FSharp.Data.Html.Core/: HTML parsing with CSS selectors
src/FSharp.Data.Xml.Core/: XML processing and XSD inference
Performance Goals & Roadmap
Round 1: JSON Parsing Optimization (Quick Wins) 🚀
Target: 15-30% improvement in JSON parsing speed, 10-20% memory reduction
-
JsonValue Parser Optimizations:
- Replace StringBuilder with span-based approaches
- Optimize number parsing with span methods
- Implement object/array pooling for common sizes
- Cache decoded strings in hot paths
-
Memory Management:
- Pre-size collections based on content hints
- Reduce intermediate allocations
- Use
ReadOnlySpan<char> for tokenization
Round 2: Advanced JSON Performance (Medium Impact) ⚡
Target: 30-50% total improvement over baseline
-
Parser Architecture:
- Implement single-pass parsing with minimal backtracking
- Add SIMD acceleration for string operations where available
- Optimize UTF-8 vs UTF-16 handling
-
Serialization Optimizations:
- Buffer writer patterns for JSON output
- Streaming serialization for large objects
Round 3: Ecosystem-Wide Optimizations (Long Term) 🎯
Target: Comprehensive performance improvements
- CSV Performance: Optimize delimiter detection and field parsing
- HTML Performance: Improve CSS selector performance and DOM traversal
- Type Provider Efficiency: Cache schema inference results
- HTTP Performance: Connection pooling and request optimization
Technical Implementation Strategy
Benchmarking Workflow 📈
- Baseline Measurement: Run existing benchmarks to establish baseline
- Incremental Testing: Validate each optimization with A/B comparisons
- Regression Detection: Ensure no performance regressions in unchanged code paths
- Memory Profiling: Use dotMemory/PerfView for allocation analysis
Development Environment Setup ⚙️
# Standard build
./build.sh
# Run benchmarks
dotnet run --project build/build.fsproj -- -t RunBenchmarks
# Quick performance validation
cd tests/FSharp.Data.Benchmarks
./run-benchmarks.sh quick
# Development iteration
./run-benchmarks.sh simple # Simple + Nested JSON only
Performance Validation Process 🔬
- Micro-benchmarks: BenchmarkDotNet for specific operations
- Macro-benchmarks: Real-world JSON document processing
- Stress Testing: Large document handling (>10MB JSON files)
- Memory Analysis: Allocation patterns and GC pressure
Success Metrics 📊
Primary KPIs:
- JSON Parse Speed: 20-40% improvement in ops/second
- Memory Usage: 15-25% reduction in allocations
- Latency: Lower p95 parsing times for medium documents
Secondary Metrics:
- Build Time: No regression in compilation speed
- Test Suite: All existing tests continue to pass
- API Compatibility: Zero breaking changes to public API
Risk Assessment & Mitigation 🛡️
High Risk:
- Breaking Changes: Maintain backward compatibility
- Correctness: Extensive testing of edge cases
- Platform Dependencies: Keep .NET Standard compatibility
Mitigation Strategy:
- Feature Flags: Allow fallback to original implementations
- Extensive Testing: Leverage existing comprehensive test suite
- Incremental Rollout: Small, verifiable changes
Next Steps 🚀
- Environment Setup: Validate benchmarking infrastructure
- Baseline Establishment: Run full benchmark suite and document results
- Low-Hanging Fruit: Start with JSON StringBuilder optimizations
- Iterative Improvement: Implement, measure, and validate each optimization
Resources & Documentation 📚
- Benchmark Results: Store in
/tests/FSharp.Data.Benchmarks/BenchmarkDotNet.Artifacts/
- Performance Guide: Document optimization techniques for contributors
- Profiling Data: Use dotTrace/dotMemory for detailed analysis
AI-generated content by Daily Perf Improver may contain mistakes.
FSharp.Data Performance Research and Improvement Plan
Executive Summary
FSharp.Data is a mature F# library providing type providers and data access tools for CSV, HTML, JSON, and XML. After analyzing the codebase, this plan outlines performance improvement opportunities with focus on JSON parsing, string processing, and memory allocation optimizations.
Current Performance Infrastructure
Benchmarking Setup ✅
tests/FSharp.Data.Benchmarks/Build & Test Infrastructure ✅
build/build.fs)./build.shordotnet run --project build/build.fsproj -- -t Builddotnet run --project build/build.fsproj -- -t RunTestsPerformance Analysis
1. JSON Processing Bottlenecks 🎯
Primary Target:
src/FSharp.Data.Json.Core/JsonValue.fsIdentified Issues:
StringBuilderin JSON serialization (WriteTomethod)Recordproperties andArrayelementsCurrent Performance Characteristics:
2. Type Provider Performance 🔍
Type providers generate code at compile time, affecting:
3. CSV/XML/HTML Processing 📊
Similar patterns exist across:
src/FSharp.Data.Csv.Core/: CSV parsing and inferencesrc/FSharp.Data.Html.Core/: HTML parsing with CSS selectorssrc/FSharp.Data.Xml.Core/: XML processing and XSD inferencePerformance Goals & Roadmap
Round 1: JSON Parsing Optimization (Quick Wins) 🚀
Target: 15-30% improvement in JSON parsing speed, 10-20% memory reduction
JsonValue Parser Optimizations:
Memory Management:
ReadOnlySpan<char>for tokenizationRound 2: Advanced JSON Performance (Medium Impact) ⚡
Target: 30-50% total improvement over baseline
Parser Architecture:
Serialization Optimizations:
Round 3: Ecosystem-Wide Optimizations (Long Term) 🎯
Target: Comprehensive performance improvements
Technical Implementation Strategy
Benchmarking Workflow 📈
Development Environment Setup ⚙️
Performance Validation Process 🔬
Success Metrics 📊
Primary KPIs:
Secondary Metrics:
Risk Assessment & Mitigation 🛡️
High Risk:
Mitigation Strategy:
Next Steps 🚀
Resources & Documentation 📚
/tests/FSharp.Data.Benchmarks/BenchmarkDotNet.Artifacts/