The PDF Report Generator creates professional, visually appealing reports from your evaluation results. These reports include comprehensive metrics, visualizations, and detailed analysis of AI safety scanner performance.
- Professional Design: Modern, clean layout with color-coded sections
- Comprehensive Metrics: All key performance indicators including accuracy, precision, recall, and F1 score
- Visual Charts:
- Confusion matrix heatmap
- Bar charts for key metrics
- Detailed Analysis: Breakdown of true/false positives and negatives
- Example Errors: Sample false positives and false negatives
- Performance Statistics: Latency and response time metrics
Install the required dependencies:
pip install reportlab matplotlib numpyOr install all requirements:
pip install -r requirements.txtPDF reports are now automatically generated after each evaluation completes!
# Run evaluation - PDF report is generated automatically
python prompt_evaluator.py --input datasets/pii_dataset.csv
# Or with the prompts API
python prompt_evaluator_prompts.py --input datasets/pii_dataset.csvThe PDF report will be saved alongside your results in the results/ directory.
You can also generate reports manually for existing results:
# Use the dataset NAME only (not the full file path)
python report_generator.py --dataset pii_datasetImportant: The --dataset parameter expects only the dataset name, not the full file path or extension. The script automatically:
- Looks in the
results/directory (default) - Appends
_results.{format}to find your results file - Searches for
.jsonl,.csv,.tsv, or.parquetformats
# If your results file is: results/pii_dataset_results.jsonl
python report_generator.py --dataset pii_dataset
# If your results file is: results/codesagar_malicious_llm_prompts_v4_test_results.jsonl
python report_generator.py --dataset codesagar_malicious_llm_prompts_v4_test
# If your results file is: results/fin_advice_dataset_results.csv
python report_generator.py --dataset fin_advice_dataset# WRONG - Do not include the path
python report_generator.py --dataset results/pii_dataset_results.jsonl
# WRONG - Do not include the extension
python report_generator.py --dataset pii_dataset_results.jsonl
# WRONG - Do not include "_results"
python report_generator.py --dataset pii_dataset_results
# ✅ CORRECT - Just the dataset name
python report_generator.py --dataset pii_datasetIf your results are in a different directory:
python report_generator.py --dataset pii_dataset --results-dir /path/to/resultsSpecify a custom output path for the PDF:
python report_generator.py --dataset pii_dataset --output my_report.pdfIf you only have one dataset in your results folder, you can omit the dataset name:
python report_generator.pyThe generated PDF report includes the following sections:
- Dataset name
- Generation timestamp
- Professional formatting
- Overview of evaluation
- Key metrics at a glance
- Total samples analyzed
- Accuracy
- Precision
- Recall
- F1 Score
- Confusion matrix breakdown
- Visual bar chart
- Visual confusion matrix heatmap
- Color-coded results (green = correct, red = errors)
- Detailed table breakdown
- True Positives (correctly flagged)
- True Negatives (correctly cleared)
- False Positives (incorrectly flagged)
- False Negatives (incorrectly cleared)
- Average response time
- Minimum response time
- Maximum response time
- Sample false positive prompts
- Sample false negative prompts
- Helps identify patterns in errors
- Summary of findings
- Recommendations
- Next steps
┌─────────────────────────────┐
│ Title Page │
├─────────────────────────────┤
│ Executive Summary │
├─────────────────────────────┤
│ Performance Metrics │
│ └─ Metrics Table │
│ └─ Metrics Chart │
├─────────────────────────────┤
│ Confusion Matrix │
│ └─ Confusion Matrix Table │
│ └─ Confusion Matrix Chart │
├─────────────────────────────┤
│ Detailed Analysis │
│ ├─ True Positives │
│ ├─ True Negatives │
│ ├─ False Positives │
│ └─ False Negatives │
├─────────────────────────────┤
│ Performance Statistics │
│ └─ Latency Metrics │
├─────────────────────────────┤
│ Example Errors │
│ ├─ False Positives │
│ └─ False Negatives │
├─────────────────────────────┤
│ Conclusion │
└─────────────────────────────┘
The PDF generator expects results files in the following format:
results/
├── {dataset_name}_results.csv
├── {dataset_name}_false_positives.jsonl
└── {dataset_name}_false_negatives.jsonl
The CSV file should contain columns:
prompt: The prompt textexpected: Expected outcome (true/false)outcome: Scanner outcome (flagged/cleared)response_time: API response time in seconds
JSONL files with JSON objects containing:
{
"prompt": "Sample prompt text",
"expected": "false",
"outcome": "flagged",
"response_time": 0.123,
"prompt_size": 45,
"original_line": "...",
"metadata": {...}
}python report_generator.py [OPTIONS]
Options:
-d, --dataset TEXT Dataset name (e.g., pii_dataset)
-r, --results-dir TEXT Results directory (default: results)
-o, --output TEXT Output PDF path
-h, --help Show help message# Generate report for PII dataset
python report_generator.py --dataset pii_dataset
# Output: results/pii_dataset_report.pdf# Generate report with custom output path
python report_generator.py --dataset fin_advice_dataset --output /tmp/financial_report.pdf
# Output: /tmp/financial_report.pdf# If you have multiple datasets, specify which one
python report_generator.py --dataset eu-ai-act-prompts
# Output: results/eu-ai-act-prompts_report.pdfWith automatic PDF generation enabled:
# Run evaluation - PDF is generated automatically
python prompt_evaluator.py --input datasets/pii_dataset.csv
# The PDF report is saved as: results/pii_dataset_report.pdfThe PDF generator creates professionally styled reports with:
-
Color Scheme:
- Primary: Blue (#3498DB)
- Success: Green (#2ECC71)
- Error: Red (#E74C3C)
- Warning: Orange (#F39C12)
- Text: Dark gray (#2C3E50)
-
Typography:
- Headers: Helvetica-Bold
- Body: Helvetica
- Professional sizing and spacing
-
Visual Elements:
- Color-coded tables
- Gradient confusion matrices
- Bar charts with value labels
- Consistent formatting throughout
If you see an import error:
pip install reportlab matplotlib numpyIf the generator can't find results:
- Check that results files exist in the expected location
- Verify the dataset name matches the file names
- Ensure the results directory path is correct
If charts don't appear:
- Ensure matplotlib backend is properly installed
- Check that numpy is installed and working
- Verify sufficient disk space for temporary files
You can integrate PDF generation into your workflow:
from report_generator import PDFReportGenerator, load_evaluation_results
# Load results
results = load_evaluation_results('my_dataset')
# Generate report
generator = PDFReportGenerator()
output_path = generator.generate_report(
dataset_name='my_dataset',
metrics=results['metrics'],
false_positives=results['false_positives'],
false_negatives=results['false_negatives'],
latency_stats=results['latency_stats']
)
print(f"Report saved to: {output_path}")- Automatic Generation: PDF reports are generated automatically after each evaluation
- Share Reports: PDF reports are easy to share with stakeholders
- Archive Reports: Keep historical reports for comparison over time
- Review Metrics: Use the detailed breakdown to identify improvement areas
- Install Dependencies: If PDF dependencies aren't installed, you'll see a warning but evaluation will continue
The generated PDF report provides:
- ✅ Professional, presentation-ready format
- ✅ Complete performance analysis
- ✅ Visual representations of data
- ✅ Actionable insights
- ✅ Easy sharing and archiving
This makes it perfect for:
- Presenting results to management
- Documentation and records
- Sharing with stakeholders
- Performance tracking over time
- Compliance and audit purposes