Examples and Use Cases

This document provides practical examples and use cases for the Provability-Fabric framework, demonstrating how to implement various AI agent scenarios with formal verification.

Basic Examples

1. Simple Text Generation Agent

A basic example of creating a text generation agent with length constraints.

Specification (spec.yaml):

name: text-generator
version: 1.0.0
description: Simple text generation agent with length constraints

capabilities:
  - name: text_generation
    description: Generate text responses
    constraints:
      max_length: 1000
      content_filter: true
      no_harmful_content: true

proofs:
  - name: length_constraint
    type: lean
    file: proofs/length_constraint.lean
  - name: content_safety
    type: lean
    file: proofs/content_safety.lean

Lean Proof (proofs/length_constraint.lean):

import Mathlib.Data.String.Basic

def max_response_length : Nat := 1000

theorem response_length_bound (response : String) : 
  response.length ≤ max_response_length := by
  -- Proof that response length is bounded
  sorry

theorem content_filter_applied (input : String) (output : String) :
  content_filter input = output → 
  output.length ≤ max_response_length := by
  -- Proof that content filter respects length constraint
  sorry

Deployment:

# Initialize agent
pf init text-generator
cd text-generator

# Build proofs
lake build

# Verify specification
pf verify

# Deploy
pf deploy --environment production

2. Image Classification Agent

An image classification agent with accuracy guarantees.

Specification (spec.yaml):

name: image-classifier
version: 1.0.0
description: Image classification with accuracy guarantees

capabilities:
  - name: image_classification
    description: Classify images into predefined categories
    constraints:
      min_accuracy: 0.95
      max_latency_ms: 100
      supported_formats: ["jpg", "png", "webp"]
      max_image_size_mb: 10

proofs:
  - name: accuracy_guarantee
    type: marabou
    file: proofs/accuracy_verification.py
  - name: latency_bound
    type: lean
    file: proofs/latency_bound.lean

α-β-CROWN Verification (proofs/accuracy_verification.py):

import subprocess
import json
from pathlib import Path

def verify_accuracy_with_alpha_beta_crown():
    """Verify accuracy using α-β-CROWN adapter."""
    # Run α-β-CROWN verification
    result = subprocess.run([
        "python", "adapter.py",
        "models/classifier.pt",
        "properties/accuracy.json",
        "--out", "witness.json",
        "--gpu",
        "--timeout", "600"
    ], capture_output=True, text=True)
    
    if result.returncode == 0:
        with open("witness.json", "r") as f:
            witness = json.load(f)
        return witness["verification_result"] == "verified"
    return False

# Alternative Marabou Verification (`proofs/accuracy_verification_marabou.py`):
```python
import marabou
import numpy as np

def verify_accuracy_guarantee(model_path, test_data):
    """Verify that model meets accuracy requirements"""
    model = marabou.read_onnx(model_path)
    
    # Set up verification problem
    input_vars = model.inputVars[0]
    output_vars = model.outputVars[0]
    
    # Add accuracy constraints
    for i, (input_data, expected_label) in enumerate(test_data):
        # Set input constraints
        for j, val in enumerate(input_data):
            model.setLowerBound(input_vars[j], val)
            model.setUpperBound(input_vars[j], val)
        
        # Set output constraints for accuracy
        correct_output = output_vars[expected_label]
        for j, output_var in enumerate(output_vars):
            if j != expected_label:
                model.addConstraint(correct_output >= output_var)
    
    # Solve verification problem
    status, vals = model.solve()
    
    if status == "sat":
        print("Accuracy guarantee verification FAILED")
        return False
    else:
        print("Accuracy guarantee verification PASSED")
        return True

if __name__ == "__main__":
    # Load test data and verify
    test_data = load_test_data()
    verify_accuracy_guarantee("model.onnx", test_data)

Advanced Examples

3. Multi-Modal Agent with Safety Guarantees

A complex agent that handles text, images, and audio with comprehensive safety constraints.

Specification (spec.yaml):

name: multimodal-agent
version: 2.0.0
description: Multi-modal AI agent with safety guarantees

capabilities:
  - name: text_processing
    description: Process and generate text
    constraints:
      max_length: 2000
      content_filter: true
      no_pii: true
  
  - name: image_processing
    description: Process and generate images
    constraints:
      max_resolution: "2048x2048"
      no_nsfw: true
      watermark_required: true
  
  - name: audio_processing
    description: Process and generate audio
    constraints:
      max_duration_seconds: 300
      no_copyright: true
      quality_threshold: 0.8

safety_constraints:
  - name: no_harmful_content
    description: Prevent generation of harmful content
    type: global
    enforcement: runtime
  
  - name: privacy_protection
    description: Protect user privacy
    type: global
    enforcement: runtime

proofs:
  - name: safety_verification
    type: lean
    file: proofs/safety_verification.lean
  - name: privacy_guarantees
    type: lean
    file: proofs/privacy_guarantees.lean
  - name: content_filter_verification
    type: marabou
    file: proofs/content_filter.py

Safety Verification (proofs/safety_verification.lean):

import Mathlib.Data.String.Basic
import Mathlib.Data.List.Basic

-- Define safety predicates
def is_safe_content (content : String) : Prop :=
  ¬contains_harmful_terms content ∧
  ¬contains_pii content ∧
  content.length ≤ 2000

def is_safe_image (image : Image) : Prop :=
  image.resolution.width ≤ 2048 ∧
  image.resolution.height ≤ 2048 ∧
  ¬contains_nsfw_content image

def is_safe_audio (audio : Audio) : Prop :=
  audio.duration ≤ 300 ∧
  ¬contains_copyrighted_content audio

-- Main safety theorem
theorem multimodal_safety_guarantee 
  (text : String) (image : Image) (audio : Audio) :
  is_safe_content text ∧
  is_safe_image image ∧
  is_safe_audio audio →
  is_safe_multimodal_output text image audio := by
  -- Proof that safe inputs produce safe outputs
  sorry

-- Runtime enforcement theorem
theorem runtime_safety_enforcement
  (input : MultimodalInput) :
  safety_check input = true →
  is_safe_input input := by
  -- Proof that runtime checks ensure safety
  sorry

4. Financial Trading Agent

A financial trading agent with risk management and compliance guarantees.

Specification (spec.yaml):

name: trading-agent
version: 1.0.0
description: Financial trading agent with risk management

capabilities:
  - name: market_analysis
    description: Analyze market conditions
    constraints:
      data_sources: ["approved_exchanges"]
      update_frequency_minutes: 5
  
  - name: trade_execution
    description: Execute trades
    constraints:
      max_position_size_usd: 100000
      max_daily_loss_usd: 5000
      trading_hours: "09:30-16:00"
      approved_securities: ["stocks", "etfs"]

risk_management:
  - name: position_limits
    description: Enforce position size limits
    type: runtime
    enforcement: strict
  
  - name: loss_limits
    description: Enforce daily loss limits
    type: runtime
    enforcement: strict
  
  - name: compliance_checks
    description: Ensure regulatory compliance
    type: runtime
    enforcement: strict

proofs:
  - name: risk_management_verification
    type: lean
    file: proofs/risk_management.lean
  - name: compliance_verification
    type: lean
    file: proofs/compliance.lean

Risk Management Verification (proofs/risk_management.lean):

import Mathlib.Data.Real.Basic
import Mathlib.Data.Fin.Basic

-- Define financial constraints
def max_position_size : ℝ := 100000
def max_daily_loss : ℝ := 5000

-- Position tracking
structure Position where
  security : String
  size : ℝ
  entry_price : ℝ
  current_price : ℝ

-- Risk calculations
def position_value (pos : Position) : ℝ :=
  pos.size * pos.current_price

def position_pnl (pos : Position) : ℝ :=
  pos.size * (pos.current_price - pos.entry_price)

def total_portfolio_value (positions : List Position) : ℝ :=
  positions.foldl (fun acc pos => acc + position_value pos) 0

def total_portfolio_pnl (positions : List Position) : ℝ :=
  positions.foldl (fun acc pos => acc + position_pnl pos) 0

-- Risk management theorems
theorem position_size_limit
  (positions : List Position) :
  (∀ pos ∈ positions, position_value pos ≤ max_position_size) →
  total_portfolio_value positions ≤ max_position_size * positions.length := by
  -- Proof that individual position limits ensure portfolio limits
  sorry

theorem daily_loss_limit
  (positions : List Position) :
  total_portfolio_pnl positions ≥ -max_daily_loss := by
  -- Proof that daily loss limits are enforced
  sorry

-- Runtime enforcement
theorem runtime_risk_enforcement
  (new_position : Position) (existing_positions : List Position) :
  position_value new_position ≤ max_position_size →
  total_portfolio_pnl (new_position :: existing_positions) ≥ -max_daily_loss →
  is_safe_trade new_position existing_positions := by
  -- Proof that runtime checks ensure safe trades
  sorry

Integration Examples

5. Kubernetes Integration

Deploying and managing agents in Kubernetes with automatic sidecar injection.

Agent Deployment (deployment.yaml):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: text-generator
  labels:
    app: text-generator
    provability-fabric.io/agent: "true"
spec:
  replicas: 3
  selector:
    matchLabels:
      app: text-generator
  template:
    metadata:
      labels:
        app: text-generator
        provability-fabric.io/agent: "true"
      annotations:
        provability-fabric.io/sidecar: "true"
        provability-fabric.io/proof-bundle: "text-generator-v1.0.0"
    spec:
      containers:
      - name: text-generator
        image: text-generator:1.0.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"
        env:
        - name: MAX_LENGTH
          value: "1000"
        - name: CONTENT_FILTER
          value: "true"
      
      # Sidecar is automatically injected by admission controller
      initContainers:
      - name: proof-verification
        image: provability-fabric/verifier:latest
        command: ["/bin/sh", "-c"]
        args:
        - |
          pf verify --bundle text-generator-v1.0.0
          if [ $? -eq 0 ]; then
            echo "Proof verification passed"
            exit 0
          else
            echo "Proof verification failed"
            exit 1
          fi

Service Configuration (service.yaml):

apiVersion: v1
kind: Service
metadata:
  name: text-generator-service
  labels:
    app: text-generator
spec:
  selector:
    app: text-generator
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  type: ClusterIP

Ingress Configuration (ingress.yaml):

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: text-generator-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    provability-fabric.io/agent: "true"
spec:
  rules:
  - host: text-generator.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: text-generator-service
            port:
              number: 80

6. CI/CD Pipeline Integration

Automated testing and deployment with proof verification.

GitHub Actions Workflow (.github/workflows/deploy.yml):

name: Deploy Agent

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    
    - name: Set up Lean
      uses: leanprover/lean4-action@v1
      with:
        lean-version: 'leanprover/lean4:nightly'
    
    - name: Build and verify proofs
      run: |
        lake build
        pf verify
    
    - name: Run tests
      run: |
        make test
        make test-integration
    
    - name: Run security scan
      run: |
        pf security-scan
    
    - name: Build agent image
      run: |
        docker build -t text-generator:${{ github.sha }} .
        docker tag text-generator:${{ github.sha }} text-generator:latest
    
    - name: Push to registry
      run: |
        echo ${{ secrets.REGISTRY_PASSWORD }} | docker login -u ${{ secrets.REGISTRY_USERNAME }} --password-stdin
        docker push text-generator:${{ github.sha }}
        docker push text-generator:latest

  deploy:
    needs: test
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
    - uses: actions/checkout@v3
    
    - name: Deploy to staging
      run: |
        kubectl config use-context staging
        kubectl apply -f k8s/
        kubectl rollout status deployment/text-generator
    
    - name: Run staging tests
      run: |
        pf test --environment staging
    
    - name: Deploy to production
      run: |
        kubectl config use-context production
        kubectl apply -f k8s/
        kubectl rollout status deployment/text-generator
    
    - name: Verify production deployment
      run: |
        pf verify --environment production

Testing Examples

7. Property-Based Testing

Using Lean for property-based testing of agent behavior.

Test Specification (tests/agent_properties.lean):

import Mathlib.Data.String.Basic
import Mathlib.Data.List.Basic

-- Define agent behavior properties
def agent_response_property (input : String) (output : String) : Prop :=
  output.length ≤ 1000 ∧
  ¬contains_harmful_content output ∧
  ¬contains_pii output

-- Property-based test generator
def generate_test_inputs : List String :=
  ["Hello", "Test input", "Long input " * 100, "Input with PII: 123-45-6789"]

-- Test all properties
theorem test_agent_properties :
  ∀ input ∈ generate_test_inputs,
  let output := agent_process input
  agent_response_property input output := by
  -- Proof that all test inputs produce valid outputs
  sorry

-- Fuzz testing with random inputs
def random_input_generator : IO String := do
  let length := (← IO.rand 1 1000).toNat
  let chars := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
  let mut result := ""
  for i in [:length] do
    let idx := (← IO.rand 0 (chars.length - 1)).toNat
    result := result ++ chars[idx].toString
  return result

-- Property verification for random inputs
theorem random_input_property_verification :
  ∀ input : String,
  input.length ≤ 1000 →
  let output := agent_process input
  agent_response_property input output := by
  -- Proof that random inputs within bounds produce valid outputs
  sorry

8. Performance Testing

Testing agent performance under various load conditions.

Performance Test Script (tests/performance_test.py):

import asyncio
import time
import statistics
from typing import List, Dict
import aiohttp
import json

class PerformanceTester:
    def __init__(self, base_url: str, api_key: str):
        self.base_url = base_url
        self.headers = {"Authorization": f"Bearer {api_key}"}
    
    async def test_latency(self, num_requests: int) -> Dict[str, float]:
        """Test agent latency under load"""
        latencies = []
        
        async with aiohttp.ClientSession() as session:
            tasks = []
            for i in range(num_requests):
                task = self._make_request(session, i)
                tasks.append(task)
            
            start_time = time.time()
            results = await asyncio.gather(*tasks, return_exceptions=True)
            total_time = time.time() - start_time
            
            # Calculate latencies
            for result in results:
                if isinstance(result, dict) and 'latency' in result:
                    latencies.append(result['latency'])
            
            return {
                'total_requests': num_requests,
                'successful_requests': len(latencies),
                'total_time': total_time,
                'requests_per_second': num_requests / total_time,
                'mean_latency': statistics.mean(latencies),
                'median_latency': statistics.median(latencies),
                'p95_latency': statistics.quantiles(latencies, n=20)[18],
                'p99_latency': statistics.quantiles(latencies, n=100)[98],
                'min_latency': min(latencies),
                'max_latency': max(latencies)
            }
    
    async def test_throughput(self, duration_seconds: int) -> Dict[str, float]:
        """Test agent throughput over time"""
        start_time = time.time()
        request_count = 0
        latencies = []
        
        async with aiohttp.ClientSession() as session:
            while time.time() - start_time < duration_seconds:
                start = time.time()
                try:
                    async with session.post(
                        f"{self.base_url}/generate",
                        headers=self.headers,
                        json={"prompt": "Test prompt", "max_length": 100}
                    ) as response:
                        if response.status == 200:
                            request_count += 1
                            latency = time.time() - start
                            latencies.append(latency)
                except Exception as e:
                    print(f"Request failed: {e}")
                
                # Small delay to prevent overwhelming
                await asyncio.sleep(0.01)
        
        total_time = time.time() - start_time
        return {
            'duration_seconds': duration_seconds,
            'total_requests': request_count,
            'requests_per_second': request_count / total_time,
            'mean_latency': statistics.mean(latencies) if latencies else 0,
            'success_rate': request_count / (request_count + 1)  # Avoid division by zero
        }
    
    async def test_concurrent_users(self, num_users: int, requests_per_user: int) -> Dict[str, float]:
        """Test agent performance with concurrent users"""
        async def user_workflow(user_id: int):
            latencies = []
            async with aiohttp.ClientSession() as session:
                for i in range(requests_per_user):
                    start = time.time()
                    try:
                        async with session.post(
                            f"{self.base_url}/generate",
                            headers=self.headers,
                            json={"prompt": f"User {user_id} request {i}", "max_length": 100}
                        ) as response:
                            if response.status == 200:
                                latency = time.time() - start
                                latencies.append(latency)
                    except Exception as e:
                        print(f"User {user_id} request {i} failed: {e}")
            return latencies
        
        # Run concurrent user workflows
        start_time = time.time()
        user_results = await asyncio.gather(*[
            user_workflow(i) for i in range(num_users)
        ])
        total_time = time.time() - start_time
        
        # Aggregate results
        all_latencies = []
        for latencies in user_results:
            all_latencies.extend(latencies)
        
        total_requests = num_users * requests_per_user
        successful_requests = len(all_latencies)
        
        return {
            'concurrent_users': num_users,
            'requests_per_user': requests_per_user,
            'total_requests': total_requests,
            'successful_requests': successful_requests,
            'success_rate': successful_requests / total_requests,
            'total_time': total_time,
            'requests_per_second': successful_requests / total_time,
            'mean_latency': statistics.mean(all_latencies) if all_latencies else 0,
            'p95_latency': statistics.quantiles(all_latencies, n=20)[18] if all_latencies else 0
        }
    
    async def _make_request(self, session: aiohttp.ClientSession, request_id: int) -> Dict[str, float]:
        """Make a single request and measure latency"""
        start = time.time()
        try:
            async with session.post(
                f"{self.base_url}/generate",
                headers=self.headers,
                json={"prompt": f"Request {request_id}", "max_length": 100}
            ) as response:
                if response.status == 200:
                    latency = time.time() - start
                    return {'latency': latency, 'success': True}
                else:
                    return {'latency': 0, 'success': False}
        except Exception as e:
            return {'latency': 0, 'success': False, 'error': str(e)}

async def main():
    tester = PerformanceTester(
        base_url="https://api.example.com/v1/agents/text-generator",
        api_key="your-api-key"
    )
    
    print("Running performance tests...")
    
    # Test 1: Latency under load
    print("\n1. Testing latency under load (100 requests)...")
    latency_results = await tester.test_latency(100)
    print(json.dumps(latency_results, indent=2))
    
    # Test 2: Throughput over time
    print("\n2. Testing throughput over 60 seconds...")
    throughput_results = await tester.test_throughput(60)
    print(json.dumps(throughput_results, indent=2))
    
    # Test 3: Concurrent users
    print("\n3. Testing concurrent users (10 users, 20 requests each)...")
    concurrent_results = await tester.test_concurrent_users(10, 20)
    print(json.dumps(concurrent_results, indent=2))
    
    # Performance validation
    print("\n4. Performance validation...")
    if latency_results['p95_latency'] <= 100:  # 100ms P95
        print("PASSED: P95 latency within acceptable range")
    else:
        print("FAILED: P95 latency exceeds acceptable range")
    
    if throughput_results['requests_per_second'] >= 10:  # 10 RPS minimum
        print("PASSED: Throughput meets minimum requirements")
    else:
        print("FAILED: Throughput below minimum requirements")
    
    if concurrent_results['success_rate'] >= 0.95:  # 95% success rate
        print("PASSED: Success rate meets requirements")
    else:
        print("FAILED: Success rate below requirements")

if __name__ == "__main__":
    asyncio.run(main())

Conclusion

These examples demonstrate the flexibility and power of Provability-Fabric for creating AI agents with formal guarantees. Key takeaways:

Start Simple: Begin with basic constraints and gradually add complexity
Focus on Safety: Always include safety and compliance constraints
Test Thoroughly: Use property-based testing and performance testing
Automate Everything: Integrate with CI/CD pipelines for continuous verification
Monitor Runtime: Use sidecar injection for continuous monitoring

α-β-CROWN Adapter Examples

GPU-Accelerated Neural Network Verification

The α-β-CROWN adapter provides high-performance neural network verification with GPU acceleration. Here are practical examples of how to use it:

1. Robustness Verification

Verify that a neural network is robust against adversarial attacks:

Property File (properties/robustness.json):

{
  "type": "robustness",
  "input_bounds": {
    "lower": [0.0, 0.0, 0.0, 0.0],
    "upper": [1.0, 1.0, 1.0, 1.0]
  },
  "output_bounds": {
    "lower": [0.0],
    "upper": [1.0]
  },
  "epsilon": 0.1,
  "property": "adversarial_robustness",
  "description": "Verify robustness against L∞ perturbations with epsilon=0.1"
}

Verification Script:

#!/usr/bin/env python3
"""Robustness verification using α-β-CROWN adapter."""

import subprocess
import json
from pathlib import Path

def verify_robustness(model_path: str, property_path: str) -> bool:
    """Verify neural network robustness using α-β-CROWN."""
    
    # Run α-β-CROWN verification
    result = subprocess.run([
        "python", "adapter.py",
        model_path,
        property_path,
        "--out", "robustness_witness.json",
        "--gpu",
        "--timeout", "600"
    ], capture_output=True, text=True)
    
    if result.returncode == 0:
        with open("robustness_witness.json", "r") as f:
            witness = json.load(f)
        
        print(f"Verification result: {witness['verification_result']}")
        print(f"Execution time: {witness['execution_time']:.2f}s")
        print(f"GPU utilized: {witness['gpu_utilized']}")
        
        return witness["verification_result"] == "verified"
    else:
        print(f"Verification failed: {result.stderr}")
        return False

if __name__ == "__main__":
    success = verify_robustness("models/classifier.pt", "properties/robustness.json")
    print(f"Robustness verification: {'PASSED' if success else 'FAILED'}")

2. Safety Property Verification

Verify that a neural network satisfies safety constraints:

Property File (properties/safety.json):

{
  "type": "safety",
  "input_bounds": {
    "lower": [-1.0, -1.0, -1.0],
    "upper": [1.0, 1.0, 1.0]
  },
  "output_bounds": {
    "lower": [0.0],
    "upper": [0.5]
  },
  "property": "output_constraint",
  "description": "Verify that output remains within safe bounds [0.0, 0.5]"
}

Batch Verification:

#!/usr/bin/env python3
"""Batch verification of multiple properties using α-β-CROWN."""

import subprocess
import json
from pathlib import Path
from concurrent.futures import ThreadPoolExecutor
import time

def verify_single_property(model_path: str, property_path: str, output_path: str) -> dict:
    """Verify a single property and return results."""
    start_time = time.time()
    
    result = subprocess.run([
        "python", "adapter.py",
        model_path,
        property_path,
        "--out", output_path,
        "--gpu",
        "--timeout", "300"
    ], capture_output=True, text=True)
    
    execution_time = time.time() - start_time
    
    if result.returncode == 0:
        with open(output_path, "r") as f:
            witness = json.load(f)
        return {
            "property": property_path,
            "success": True,
            "verification_result": witness["verification_result"],
            "execution_time": execution_time,
            "gpu_utilized": witness["gpu_utilized"]
        }
    else:
        return {
            "property": property_path,
            "success": False,
            "error": result.stderr,
            "execution_time": execution_time
        }

def batch_verify(model_path: str, properties_dir: str, output_dir: str, max_workers: int = 4):
    """Verify multiple properties in parallel."""
    
    properties_dir = Path(properties_dir)
    output_dir = Path(output_dir)
    output_dir.mkdir(exist_ok=True)
    
    property_files = list(properties_dir.glob("*.json"))
    
    print(f"Verifying {len(property_files)} properties with {max_workers} workers...")
    
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = []
        
        for prop_file in property_files:
            output_file = output_dir / f"{prop_file.stem}_witness.json"
            future = executor.submit(
                verify_single_property,
                model_path,
                str(prop_file),
                str(output_file)
            )
            futures.append(future)
        
        results = []
        for future in futures:
            results.append(future.result())
    
    # Summary
    successful = sum(1 for r in results if r["success"])
    total_time = sum(r["execution_time"] for r in results)
    
    print(f"\nBatch verification completed:")
    print(f"  Successful: {successful}/{len(results)}")
    print(f"  Total time: {total_time:.2f}s")
    print(f"  Average time: {total_time/len(results):.2f}s per property")
    
    return results

if __name__ == "__main__":
    results = batch_verify(
        "models/classifier.pt",
        "properties/",
        "witnesses/",
        max_workers=4
    )

3. Performance Benchmarking

Compare α-β-CROWN performance with other adapters:

Benchmark Script:

#!/usr/bin/env python3
"""Performance benchmark comparing α-β-CROWN with Marabou."""

import subprocess
import json
import time
from pathlib import Path
import matplotlib.pyplot as plt

def benchmark_adapter(adapter_name: str, model_path: str, property_path: str, 
                     iterations: int = 5) -> dict:
    """Benchmark a single adapter."""
    
    times = []
    successes = 0
    
    for i in range(iterations):
        start_time = time.time()
        
        if adapter_name == "alpha_beta_crown":
            result = subprocess.run([
                "python", "adapter.py",
                model_path, property_path,
                "--out", f"witness_{i}.json",
                "--gpu", "--timeout", "600"
            ], capture_output=True, text=True)
        elif adapter_name == "marabou":
            result = subprocess.run([
                "python", "marabou_adapter.py",
                model_path, property_path,
                "--out", f"witness_{i}.json"
            ], capture_output=True, text=True)
        
        execution_time = time.time() - start_time
        times.append(execution_time)
        
        if result.returncode == 0:
            successes += 1
    
    return {
        "adapter": adapter_name,
        "iterations": iterations,
        "successes": successes,
        "avg_time": sum(times) / len(times),
        "min_time": min(times),
        "max_time": max(times),
        "times": times
    }

def compare_adapters(model_path: str, property_path: str):
    """Compare α-β-CROWN and Marabou adapters."""
    
    print("Benchmarking α-β-CROWN adapter...")
    crown_results = benchmark_adapter("alpha_beta_crown", model_path, property_path)
    
    print("Benchmarking Marabou adapter...")
    marabou_results = benchmark_adapter("marabou", model_path, property_path)
    
    # Print results
    print(f"\nα-β-CROWN Results:")
    print(f"  Success rate: {crown_results['successes']}/{crown_results['iterations']}")
    print(f"  Average time: {crown_results['avg_time']:.2f}s")
    print(f"  Min time: {crown_results['min_time']:.2f}s")
    print(f"  Max time: {crown_results['max_time']:.2f}s")
    
    print(f"\nMarabou Results:")
    print(f"  Success rate: {marabou_results['successes']}/{marabou_results['iterations']}")
    print(f"  Average time: {marabou_results['avg_time']:.2f}s")
    print(f"  Min time: {marabou_results['min_time']:.2f}s")
    print(f"  Max time: {marabou_results['max_time']:.2f}s")
    
    # Calculate speedup
    if marabou_results['avg_time'] > 0:
        speedup = marabou_results['avg_time'] / crown_results['avg_time']
        print(f"\nα-β-CROWN speedup: {speedup:.2f}x faster than Marabou")
    
    # Plot results
    plt.figure(figsize=(10, 6))
    plt.boxplot([crown_results['times'], marabou_results['times']], 
                labels=['α-β-CROWN', 'Marabou'])
    plt.ylabel('Execution Time (seconds)')
    plt.title('Adapter Performance Comparison')
    plt.savefig('adapter_comparison.png')
    plt.show()

if __name__ == "__main__":
    compare_adapters("models/classifier.pt", "properties/robustness.json")

4. Integration with Specification Bundles

Use α-β-CROWN in a complete specification bundle:

Specification (spec.yaml):

name: robust-classifier
version: 1.0.0
description: Robust image classifier with adversarial guarantees

capabilities:
  - name: image_classification
    description: Classify images with robustness guarantees
    constraints:
      min_accuracy: 0.95
      adversarial_robustness: true
      epsilon: 0.1
      max_latency_ms: 200

verification:
  type: alpha_beta_crown
  model: models/robust_classifier.pt
  properties:
    - file: properties/robustness.json
      type: adversarial_robustness
    - file: properties/accuracy.json
      type: accuracy_guarantee
  gpu_enabled: true
  timeout: 600

proofs:
  - name: robustness_verification
    type: alpha_beta_crown
    file: proofs/robustness_verification.py
  - name: accuracy_guarantee
    type: lean
    file: proofs/accuracy_guarantee.lean

Verification Script (proofs/robustness_verification.py):

#!/usr/bin/env python3
"""Robustness verification for specification bundle."""

import subprocess
import json
import sys
from pathlib import Path

def verify_robustness_properties():
    """Verify all robustness properties in the specification."""
    
    model_path = "models/robust_classifier.pt"
    properties = [
        "properties/robustness.json",
        "properties/accuracy.json"
    ]
    
    results = []
    
    for prop_file in properties:
        print(f"Verifying {prop_file}...")
        
        result = subprocess.run([
            "python", "adapter.py",
            model_path,
            prop_file,
            "--out", f"witness_{Path(prop_file).stem}.json",
            "--gpu",
            "--timeout", "600"
        ], capture_output=True, text=True)
        
        if result.returncode == 0:
            with open(f"witness_{Path(prop_file).stem}.json", "r") as f:
                witness = json.load(f)
            
            results.append({
                "property": prop_file,
                "verified": witness["verification_result"] == "verified",
                "execution_time": witness["execution_time"]
            })
        else:
            results.append({
                "property": prop_file,
                "verified": False,
                "error": result.stderr
            })
    
    # Check if all properties are verified
    all_verified = all(r["verified"] for r in results)
    
    print(f"\nVerification Results:")
    for result in results:
        status = "✓ VERIFIED" if result["verified"] else "✗ FAILED"
        print(f"  {result['property']}: {status}")
        if "execution_time" in result:
            print(f"    Time: {result['execution_time']:.2f}s")
        if "error" in result:
            print(f"    Error: {result['error']}")
    
    return all_verified

if __name__ == "__main__":
    success = verify_robustness_properties()
    sys.exit(0 if success else 1)

Best Practices for α-β-CROWN

GPU Utilization: Always use --gpu flag when CUDA is available
Timeout Management: Set appropriate timeouts based on model complexity
Memory Management: Monitor GPU memory usage for large models
Batch Processing: Use parallel processing for multiple properties
Error Handling: Always check return codes and handle failures gracefully

For more examples and use cases, check the examples/ directory in the repository or explore the community-contributed examples on GitHub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Examples and Use Cases

Basic Examples

1. Simple Text Generation Agent

2. Image Classification Agent

Advanced Examples

3. Multi-Modal Agent with Safety Guarantees

4. Financial Trading Agent

Integration Examples

5. Kubernetes Integration

6. CI/CD Pipeline Integration

Testing Examples

7. Property-Based Testing

8. Performance Testing

Conclusion

α-β-CROWN Adapter Examples

GPU-Accelerated Neural Network Verification

1. Robustness Verification

2. Safety Property Verification

3. Performance Benchmarking

4. Integration with Specification Bundles

Best Practices for α-β-CROWN

Uh oh!

FilesExpand file tree

examples.md

Latest commit

History

examples.md

File metadata and controls

Examples and Use Cases

Basic Examples

1. Simple Text Generation Agent

2. Image Classification Agent

Advanced Examples

3. Multi-Modal Agent with Safety Guarantees

4. Financial Trading Agent

Integration Examples

5. Kubernetes Integration

6. CI/CD Pipeline Integration

Testing Examples

7. Property-Based Testing

8. Performance Testing

Conclusion

α-β-CROWN Adapter Examples

GPU-Accelerated Neural Network Verification

1. Robustness Verification

2. Safety Property Verification

3. Performance Benchmarking

4. Integration with Specification Bundles

Best Practices for α-β-CROWN