System Design & Data Flow Documentation
This document explains the internal architecture of NAAb Pivot, how components interact, and the data flow through the evolution pipeline.
- High-Level Overview
- Component Architecture
- Data Flow
- Module Descriptions
- Template System
- Plugin Architecture
- Caching & Incremental Builds
- Governance Integration
- Performance Considerations
NAAb Pivot follows a pipeline architecture with four main stages:
┌──────────┐ ┌─────────────┐ ┌───────────┐ ┌────────────┐
│ Analyze │ → │ Synthesize │ → │ Validate │ → │ Benchmark │
└──────────┘ └─────────────┘ └───────────┘ └────────────┘
↓ ↓ ↓ ↓
AST Parse Code Generation Parity Check Performance
Complexity Template Render Statistical Metrics
Detection Compilation Validation Tracking
- Composability: Each stage can run independently
- Idempotency: Running twice produces same result
- Incremental: Only recompile when source changes
- Extensibility: Plugin system for custom analyzers/synthesizers
- Safety: Governance system enforces security policies
naab-pivot/
├── Core CLI (pivot.naab)
│ └── Orchestrates pipeline stages
│
├── Analysis Engine (analyze.naab)
│ ├── Language Detection
│ ├── AST Parsing
│ ├── Complexity Analysis
│ └── Target Recommendation
│
├── Synthesis Engine (synthesize.naab)
│ ├── Template Engine
│ ├── Code Generation
│ ├── Compilation Manager
│ └── Vessel Cache
│
├── Validation Engine (validate.naab)
│ ├── Test Case Generator
│ ├── Execution Harness
│ ├── Statistical Analysis
│ └── Parity Engine
│
├── Benchmark Engine (benchmark.naab)
│ ├── Performance Profiling
│ ├── Metrics Collection
│ ├── Regression Detection
│ └── Report Generation
│
└── Support Modules
├── Config Manager
├── Plugin Loader
├── Dependency Analyzer
├── Hotspot Detector
└── Report Generator
- NAAb Language: Polyglot interpreter (C++)
- AST Parsers: Python (ast), Ruby (Ripper), JS (acorn)
- Compilers: Go, Rust (rustc), C++ (g++/clang)
- Template Engine: String interpolation with profiles
- Governance Engine: govern.json enforcement
Input: slow_code.py
↓
┌─────────────────────────────────────────────────┐
│ 1. ANALYZE STAGE │
│ │
│ ┌──────────────┐ ┌────────────────┐ │
│ │ Detect Lang │ → │ Parse AST │ │
│ └──────────────┘ └────────────────┘ │
│ ↓ ↓ │
│ ┌──────────────┐ ┌────────────────┐ │
│ │ Extract Func │ → │ Calc Complexity│ │
│ └──────────────┘ └────────────────┘ │
│ ↓ │
│ ┌──────────────┐ │
│ │ Recommend │ │
│ │ Target Lang │ │
│ └──────────────┘ │
│ ↓ │
│ Output: blueprint.json │
└─────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────┐
│ 2. SYNTHESIZE STAGE │
│ │
│ Input: blueprint.json │
│ ↓ │
│ ┌──────────────┐ ┌────────────────┐ │
│ │ Load Profile │ → │ Select Template│ │
│ └──────────────┘ └────────────────┘ │
│ ↓ ↓ │
│ ┌──────────────┐ ┌────────────────┐ │
│ │ Render Code │ → │ Check Cache │ │
│ └──────────────┘ └────────────────┘ │
│ ↓ ↓ │
│ ┌──────────────┐ ┌────────────────┐ │
│ │ Compile (if │ → │ Generate Binary│ │
│ │ needed) │ │ (vessel) │ │
│ └──────────────┘ └────────────────┘ │
│ ↓ │
│ Output: vessel binary + metadata │
└─────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────┐
│ 3. VALIDATE STAGE │
│ │
│ Input: legacy code + vessel binary │
│ ↓ │
│ ┌──────────────┐ ┌────────────────┐ │
│ │ Generate │ → │ Run Legacy │ │
│ │ Test Cases │ │ Implementation │ │
│ └──────────────┘ └────────────────┘ │
│ ↓ ↓ │
│ ┌──────────────┐ ┌────────────────┐ │
│ │ Run Vessel │ → │ Compare Results│ │
│ │ Implementation│ │ │ │
│ └──────────────┘ └────────────────┘ │
│ ↓ ↓ │
│ ┌──────────────┐ ┌────────────────┐ │
│ │ Statistical │ → │ Certify Parity │ │
│ │ Analysis │ │ (99.99% conf) │ │
│ └──────────────┘ └────────────────┘ │
│ ↓ │
│ Output: certification + statistics │
└─────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────┐
│ 4. BENCHMARK STAGE │
│ │
│ Input: certified vessels │
│ ↓ │
│ ┌──────────────┐ ┌────────────────┐ │
│ │ Run │ → │ Collect │ │
│ │ Iterations │ │ Timings │ │
│ └──────────────┘ └────────────────┘ │
│ ↓ ↓ │
│ ┌──────────────┐ ┌────────────────┐ │
│ │ Calculate │ → │ Detect │ │
│ │ Statistics │ │ Regressions │ │
│ └──────────────┘ └────────────────┘ │
│ ↓ ↓ │
│ ┌──────────────┐ ┌────────────────┐ │
│ │ Generate │ → │ Export │ │
│ │ Reports │ │ (JSON/HTML/CSV)│ │
│ └──────────────┘ └────────────────┘ │
│ ↓ │
│ Output: benchmark report + historical data │
└─────────────────────────────────────────────────┘
↓
Final Output: Optimized vessel + certification + benchmarks
{
"status": "ANALYZED",
"source": "PYTHON",
"file_path": "/path/to/slow_code.py",
"timestamp": 1234567890,
"functions": [
{
"name": "heavy_computation",
"line_start": 10,
"line_count": 25,
"complexity": 12,
"has_loops": true,
"has_recursion": false,
"target": "GO",
"reason": "High loop complexity, good for goroutines",
"dependencies": ["math", "time"],
"hotspot_score": 85.3
}
]
}{
"status": "SYNTHESIZED",
"profile": "balanced",
"vessels": [
{
"name": "heavy_computation",
"target": "GO",
"src": "vessels/heavy_computation_GO.go",
"bin": "vessels/heavy_computation_vessel",
"status": "COMPILED",
"compile_time_ms": 1234,
"binary_size_bytes": 1847296,
"checksum": "sha256:abc123..."
}
],
"cache_hits": 0,
"cache_misses": 1
}{
"certified": true,
"confidence": 99.99,
"test_count": 1000,
"passed": 1000,
"failed": 0,
"performance": {
"legacy_ms": 2843.52,
"vessel_ms": 812.34,
"speedup": 3.50,
"latency_reduction": 71.43
},
"statistics": {
"mean_error": 0.000012,
"median_error": 0.000008,
"stddev": 0.000005,
"max_error": 0.000045,
"ks_statistic": 0.032,
"ks_p_value": 0.876
},
"timestamp": 1234567890
}{
"benchmark_id": "bench_1234567890",
"vessel": "heavy_computation_vessel",
"iterations": 100,
"timings_ms": [812, 815, 810, ...],
"statistics": {
"mean": 812.34,
"median": 812.00,
"min": 810.00,
"max": 815.00,
"stddev": 1.42,
"p95": 814.50,
"p99": 815.00
},
"baseline": {
"mean": 2843.52,
"regression_threshold": 10.0,
"regression_detected": false
},
"environment": {
"os": "linux",
"arch": "x86_64",
"cpu": "Intel i7-9700K",
"cores": 8,
"ram_gb": 32
}
}Purpose: Detect optimization opportunities
Workflow:
- Detect source language via file extension
- Parse source code into AST (language-specific)
- Extract function definitions
- Calculate cyclomatic complexity
- Recommend target language based on heuristics
Language-Specific Parsing:
# Python AST
<<python[source]
import ast
tree = ast.parse(source)
for node in ast.walk(tree):
if isinstance(node, ast.FunctionDef):
# Extract function metadata
>># Ruby Ripper
<<ruby[source]
require 'ripper'
sexp = Ripper.sexp(source)
# Parse S-expressions
>>
# JavaScript Acorn
<<javascript[source]
const acorn = require('acorn');
const ast = acorn.parse(source);
// Walk AST
>>Complexity Metrics:
- Loop count (for/while)
- Conditional branches (if/else)
- Recursion depth
- Line count
- Function call density
Target Language Heuristics:
- Go: High loop count, concurrency opportunities
- C++: Math-heavy, SIMD potential, template metaprogramming
- Rust: Safety-critical, crypto, ownership complexity
Purpose: Generate optimized vessel code
Workflow:
- Load optimization profile (ultra-safe → aggressive)
- Select appropriate template for target language
- Render code with variable substitution
- Check vessel cache (SHA-256 hash)
- Compile if needed (incremental)
- Store vessel binary + metadata
Template Rendering:
let template = file.read("templates/go_template.naab")
let code = template
code = string.replace(code, "${FUNCTION_NAME}", func_spec["name"])
code = string.replace(code, "${ITERATIONS}", "" + func_spec["complexity"])
code = string.replace(code, "${OPTIMIZATION_FLAGS}", profile["flags"])
Compilation:
if target == "GO" {
<<bash[src, bin, flags]
go build -o "$bin" -ldflags="$flags" "$src" 2>&1
>>
} else if target == "RUST" {
<<bash[src, bin, profile]
rustc -C opt-level=$profile -o "$bin" "$src" 2>&1
>>
} else if target == "CPP" {
<<bash[src, bin, flags]
g++ -O3 -march=native $flags -o "$bin" "$src" 2>&1
>>
}
Purpose: Mathematically prove parity
Workflow:
- Generate test cases (random, edge cases, regression)
- Execute legacy implementation N times
- Execute vessel implementation N times
- Compare results statistically
- Calculate confidence interval
- Issue certification or reject
Statistical Tests:
- Mean Absolute Error (MAE): Average deviation
- Relative Error: Percentage difference
- Kolmogorov-Smirnov Test: Distribution similarity
- Confidence Interval: 99.99% threshold
Test Case Generation:
fn generate_test_cases(func_spec) {
let cases = []
// Random cases
for i in 0..100 {
cases.push(random_input(func_spec["signature"]))
}
// Edge cases
cases.push(0)
cases.push(1)
cases.push(-1)
cases.push(max_int)
cases.push(min_int)
// Regression cases (from previous runs)
let regression = load_regression_cases(func_spec["name"])
cases = array.concat(cases, regression)
return cases
}
Purpose: Track performance over time
Workflow:
- Load benchmark specifications
- Run N iterations (configurable)
- Collect timing data
- Calculate statistics (mean, median, p95, p99)
- Compare to baseline (regression detection)
- Generate reports (JSON, HTML, CSV, SARIF)
Benchmark Execution:
fn run_single_benchmark(bench_spec) {
let iterations = bench_spec["iterations"] || 100
let timings = []
for i in 0..iterations {
// Warmup
if i < 10 {
run_task(bench_spec["task"])
continue
}
// Timed execution
let start = time.now()
run_task(bench_spec["task"])
let duration = time.now() - start
timings.push(duration)
}
return compute_statistics(timings)
}
templates/
├── go_template.naab # Go codegen
├── cpp_template.naab # C++ codegen
├── rust_template.naab # Rust codegen
└── [other languages]
// Available variables in templates:
${FUNCTION_NAME} // Original function name
${FUNCTION_ARGS} // Argument list
${FUNCTION_BODY} // Translated function body
${ITERATIONS} // Complexity-based iteration count
${OPTIMIZATION_FLAGS} // Profile-specific flags
${IMPORTS} // Required imports/includes
${NAMESPACE} // Package/namespace
${TYPE_ANNOTATIONS} // Type hints
package main
import (
"fmt"
"math"
"os"
"strconv"
"time"
)
func ${FUNCTION_NAME}(${FUNCTION_ARGS}) ${RETURN_TYPE} {
${FUNCTION_BODY}
}
func main() {
if len(os.Args) < 2 {
fmt.Println("READY")
return
}
// Parse arguments
${ARG_PARSING}
// Execute function
start := time.Now()
result := ${FUNCTION_NAME}(${CALL_ARGS})
elapsed := time.Since(start)
// Output results
fmt.Printf("Result: %v\n", result)
fmt.Printf("Time: %.2fms\n", float64(elapsed.Microseconds())/1000.0)
}- Analyzers: Custom language support, specialized detection
- Synthesizers: Custom target languages, specialized code generation
- Validators: Custom testing strategies (fuzz, property-based)
// Plugin metadata (plugin_name.json)
{
"id": "ml-detector",
"type": "analyzer",
"version": "1.0.0",
"entry_point": "execute",
"supported_languages": ["python", "julia"]
}
// Plugin implementation (plugin_name.naab)
export fn execute(input_data) {
// Plugin logic here
return {
"status": "DETECTED",
"confidence": 0.95,
"metadata": {...}
}
}
// Load plugin
use plugin_loader
let plugin_id = plugin_loader.register_plugin("plugins/analyzers/ml_detector.naab", "analyzer")
// Execute plugin
let result = plugin_loader.execute_plugin(plugin_id, {
"source": source_code,
"language": "python"
})
Purpose: Avoid recompilation when source hasn't changed
Cache Key: SHA-256 hash of:
- Generated source code
- Optimization profile
- Compiler version
- Target architecture
Cache Structure:
.cache/
├── vessels/
│ ├── abc123def456.go # Generated source
│ ├── abc123def456.bin # Compiled binary
│ └── abc123def456.meta # Metadata
└── index.json # Cache index
Cache Lookup:
fn should_rebuild(src_path, bin_path, new_code) {
// Check if source exists and matches
if file.exists(src_path) == false { return true }
if file.exists(bin_path) == false { return true }
let old_code = file.read(src_path)
if old_code != new_code { return true }
// Check cache index
let hash = compute_hash(new_code + profile + compiler_version)
if cache_has(hash) {
io.write(" ✓ Using cached vessel\n")
return false
}
return true
}
NAAb Pivot enforces govern.json policies:
{
"languages": {
"allowed": ["python", "cpp", "rust", "go", "bash"]
},
"capabilities": {
"network": {"enabled": false},
"filesystem": {"mode": "read"}
},
"code_quality": {
"no_secrets": {"level": "hard"},
"no_placeholders": {"level": "soft"}
}
}- Analyzer: Checks if source language is allowed
- Synthesizer: Validates generated code against policies
- Validator: Ensures test execution within limits
- Benchmarker: Restricts resource usage
- Parallel Compilation: Compile multiple vessels concurrently
- Incremental Builds: Cache unchanged vessels
- Lazy Evaluation: Only compute when needed
- Memory Pooling: Reuse allocated memory
- Profile-Guided Optimization: Use profiling data to guide optimization
| Bottleneck | Solution |
|---|---|
| AST parsing | Cache parsed AST, use faster parsers (tree-sitter) |
| Compilation | Parallel compilation, ccache/sccache |
| Validation | Reduce test iterations, parallel execution |
| Benchmarking | Statistical sampling instead of exhaustive |
- Distributed Compilation: Compile vessels on remote servers
- ML-Based Optimization: Train models to predict best target language
- GPU Code Generation: CUDA/OpenCL backend
- WebAssembly Support: Generate WASM modules
- Cloud Integration: AWS Lambda, Google Cloud Functions
Next: CLI Reference | Profiles Guide