Skip to content

Latest commit

 

History

History
377 lines (277 loc) · 10 KB

File metadata and controls

377 lines (277 loc) · 10 KB

PDF Report Generation

Overview

The PDF Report Generator creates professional, visually appealing reports from your evaluation results. These reports include comprehensive metrics, visualizations, and detailed analysis of AI safety scanner performance.

Features

  • Professional Design: Modern, clean layout with color-coded sections
  • Comprehensive Metrics: All key performance indicators including accuracy, precision, recall, and F1 score
  • Visual Charts:
    • Confusion matrix heatmap
    • Bar charts for key metrics
  • Detailed Analysis: Breakdown of true/false positives and negatives
  • Example Errors: Sample false positives and false negatives
  • Performance Statistics: Latency and response time metrics

Installation

Install the required dependencies:

pip install reportlab matplotlib numpy

Or install all requirements:

pip install -r requirements.txt

Usage

Automatic Generation

PDF reports are now automatically generated after each evaluation completes!

# Run evaluation - PDF report is generated automatically
python prompt_evaluator.py --input datasets/pii_dataset.csv

# Or with the prompts API
python prompt_evaluator_prompts.py --input datasets/pii_dataset.csv

The PDF report will be saved alongside your results in the results/ directory.

Manual Generation (Optional)

You can also generate reports manually for existing results:

# Use the dataset NAME only (not the full file path)
python report_generator.py --dataset pii_dataset

Important: The --dataset parameter expects only the dataset name, not the full file path or extension. The script automatically:

  • Looks in the results/ directory (default)
  • Appends _results.{format} to find your results file
  • Searches for .jsonl, .csv, .tsv, or .parquet formats

Common Usage Examples

# If your results file is: results/pii_dataset_results.jsonl
python report_generator.py --dataset pii_dataset

# If your results file is: results/codesagar_malicious_llm_prompts_v4_test_results.jsonl
python report_generator.py --dataset codesagar_malicious_llm_prompts_v4_test

# If your results file is: results/fin_advice_dataset_results.csv
python report_generator.py --dataset fin_advice_dataset

❌ Common Mistakes

# WRONG - Do not include the path
python report_generator.py --dataset results/pii_dataset_results.jsonl

# WRONG - Do not include the extension
python report_generator.py --dataset pii_dataset_results.jsonl

# WRONG - Do not include "_results"
python report_generator.py --dataset pii_dataset_results

# ✅ CORRECT - Just the dataset name
python report_generator.py --dataset pii_dataset

Specify Results Directory

If your results are in a different directory:

python report_generator.py --dataset pii_dataset --results-dir /path/to/results

Custom Output Path

Specify a custom output path for the PDF:

python report_generator.py --dataset pii_dataset --output my_report.pdf

Auto-detect Dataset

If you only have one dataset in your results folder, you can omit the dataset name:

python report_generator.py

Report Contents

The generated PDF report includes the following sections:

1. Title Page

  • Dataset name
  • Generation timestamp
  • Professional formatting

2. Executive Summary

  • Overview of evaluation
  • Key metrics at a glance
  • Total samples analyzed

3. Performance Metrics

  • Accuracy
  • Precision
  • Recall
  • F1 Score
  • Confusion matrix breakdown
  • Visual bar chart

4. Confusion Matrix Analysis

  • Visual confusion matrix heatmap
  • Color-coded results (green = correct, red = errors)
  • Detailed table breakdown

5. Detailed Analysis

  • True Positives (correctly flagged)
  • True Negatives (correctly cleared)
  • False Positives (incorrectly flagged)
  • False Negatives (incorrectly cleared)

6. Performance Statistics

  • Average response time
  • Minimum response time
  • Maximum response time

7. Example Errors

  • Sample false positive prompts
  • Sample false negative prompts
  • Helps identify patterns in errors

8. Conclusion

  • Summary of findings
  • Recommendations
  • Next steps

Report Structure

┌─────────────────────────────┐
│   Title Page                │
├─────────────────────────────┤
│   Executive Summary         │
├─────────────────────────────┤
│   Performance Metrics       │
│   └─ Metrics Table          │
│   └─ Metrics Chart          │
├─────────────────────────────┤
│   Confusion Matrix          │
│   └─ Confusion Matrix Table │
│   └─ Confusion Matrix Chart │
├─────────────────────────────┤
│   Detailed Analysis         │
│   ├─ True Positives         │
│   ├─ True Negatives         │
│   ├─ False Positives        │
│   └─ False Negatives        │
├─────────────────────────────┤
│   Performance Statistics    │
│   └─ Latency Metrics        │
├─────────────────────────────┤
│   Example Errors            │
│   ├─ False Positives        │
│   └─ False Negatives        │
├─────────────────────────────┤
│   Conclusion                │
└─────────────────────────────┘

Requirements

The PDF generator expects results files in the following format:

Results Directory Structure

results/
├── {dataset_name}_results.csv
├── {dataset_name}_false_positives.jsonl
└── {dataset_name}_false_negatives.jsonl

Results CSV Format

The CSV file should contain columns:

  • prompt: The prompt text
  • expected: Expected outcome (true/false)
  • outcome: Scanner outcome (flagged/cleared)
  • response_time: API response time in seconds

False Positives/Negatives Format

JSONL files with JSON objects containing:

{
  "prompt": "Sample prompt text",
  "expected": "false",
  "outcome": "flagged",
  "response_time": 0.123,
  "prompt_size": 45,
  "original_line": "...",
  "metadata": {...}
}

Command Line Options

python report_generator.py [OPTIONS]

Options:
  -d, --dataset TEXT      Dataset name (e.g., pii_dataset)
  -r, --results-dir TEXT  Results directory (default: results)
  -o, --output TEXT       Output PDF path
  -h, --help             Show help message

Examples

Example 1: Basic Report Generation

# Generate report for PII dataset
python report_generator.py --dataset pii_dataset

# Output: results/pii_dataset_report.pdf

Example 2: Custom Output Location

# Generate report with custom output path
python report_generator.py --dataset fin_advice_dataset --output /tmp/financial_report.pdf

# Output: /tmp/financial_report.pdf

Example 3: Multiple Datasets

# If you have multiple datasets, specify which one
python report_generator.py --dataset eu-ai-act-prompts

# Output: results/eu-ai-act-prompts_report.pdf

Example 4: Automatic Workflow

With automatic PDF generation enabled:

# Run evaluation - PDF is generated automatically
python prompt_evaluator.py --input datasets/pii_dataset.csv

# The PDF report is saved as: results/pii_dataset_report.pdf

Report Customization

The PDF generator creates professionally styled reports with:

  • Color Scheme:

    • Primary: Blue (#3498DB)
    • Success: Green (#2ECC71)
    • Error: Red (#E74C3C)
    • Warning: Orange (#F39C12)
    • Text: Dark gray (#2C3E50)
  • Typography:

    • Headers: Helvetica-Bold
    • Body: Helvetica
    • Professional sizing and spacing
  • Visual Elements:

    • Color-coded tables
    • Gradient confusion matrices
    • Bar charts with value labels
    • Consistent formatting throughout

Troubleshooting

Missing Dependencies

If you see an import error:

pip install reportlab matplotlib numpy

No Results Found

If the generator can't find results:

  1. Check that results files exist in the expected location
  2. Verify the dataset name matches the file names
  3. Ensure the results directory path is correct

Chart Generation Issues

If charts don't appear:

  • Ensure matplotlib backend is properly installed
  • Check that numpy is installed and working
  • Verify sufficient disk space for temporary files

Integration with Evaluation Flow

You can integrate PDF generation into your workflow:

from report_generator import PDFReportGenerator, load_evaluation_results

# Load results
results = load_evaluation_results('my_dataset')

# Generate report
generator = PDFReportGenerator()
output_path = generator.generate_report(
    dataset_name='my_dataset',
    metrics=results['metrics'],
    false_positives=results['false_positives'],
    false_negatives=results['false_negatives'],
    latency_stats=results['latency_stats']
)

print(f"Report saved to: {output_path}")

Best Practices

  1. Automatic Generation: PDF reports are generated automatically after each evaluation
  2. Share Reports: PDF reports are easy to share with stakeholders
  3. Archive Reports: Keep historical reports for comparison over time
  4. Review Metrics: Use the detailed breakdown to identify improvement areas
  5. Install Dependencies: If PDF dependencies aren't installed, you'll see a warning but evaluation will continue

Output

The generated PDF report provides:

  • ✅ Professional, presentation-ready format
  • ✅ Complete performance analysis
  • ✅ Visual representations of data
  • ✅ Actionable insights
  • ✅ Easy sharing and archiving

This makes it perfect for:

  • Presenting results to management
  • Documentation and records
  • Sharing with stakeholders
  • Performance tracking over time
  • Compliance and audit purposes