This document describes the enhanced evaluation reporting features available in IDP v0.4.9+ using sticker-eval v0.1.4.
The evaluation module now leverages sticker-eval v0.1.4's fine-grain field comparison feature (from GitHub Issue #48 and PR #51) to provide:
- Detailed nested object match information alongside aggregate scores
- Interactive controls to filter and explore evaluation results
- Field-by-field comparison details for arrays and complex objects
For complex attributes (nested objects, arrays), the evaluation now captures detailed field-by-field comparison information:
{
"name": "LineItems",
"score": 0.88, // Aggregate score
"matched": false,
"field_comparison_details": [
{
"expected_key": "LineItems[0].Description",
"expected_value": "Service A",
"actual_key": "LineItems[0].Description",
"actual_value": "Service A",
"match": true,
"score": 1.0,
"weighted_score": 2.0
},
{
"expected_key": "LineItems[1].Description",
"expected_value": "Service B",
"actual_key": "LineItems[1].Description",
"actual_value": "Service C",
"match": false,
"score": 0.75,
"weighted_score": 1.5
}
// ... more comparisons
]
}The markdown reports now include interactive HTML controls:
Filter the attribute table to show only rows where matches failed, providing a compact view highlighting problematic fields.
<button onclick="toggleUnmatchedOnly()">🔍 Show Only Unmatched</button>Expand or collapse all nested field comparison details at once.
<button onclick="expandAllDetails()">➕ Expand All Details</button>
<button onclick="collapseAllDetails()">➖ Collapse All Details</button>Each attribute with nested comparisons has an expandable section:
<details>
<summary>🔍 View 6 Nested Field Comparisons</summary>
<!-- Detailed comparison table -->
</details>Aggregate scores for complex objects are clearly marked:
- Visual indicator:
<span class="aggregate-score">0.88</span> - Text annotation:
(aggregate)appears next to the score - Color coding: Blue styling distinguishes aggregate from simple field scores
The JSON report (results.json) includes:
{
"document_id": "doc-123",
"overall_metrics": { ... },
"section_results": [
{
"section_id": "section-001",
"document_class": "Invoice",
"metrics": { ... },
"attributes": [
{
"name": "AttributeName",
"expected": "...",
"actual": "...",
"matched": true,
"score": 0.95,
"field_comparison_details": [ // NEW in v0.1.4
{ /* detailed comparison */ }
]
}
]
}
]
}The markdown report (report.md) includes:
- Interactive Controls - Filter and navigation buttons
- Summary Section - High-level metrics with visual indicators
- Section Details - Per-section metrics and attributes
- Attribute Table - Enhanced with:
- Row classes for filtering (
matched-row,unmatched-row) - Aggregate score annotations
- Expandable nested details for complex fields
- Row classes for filtering (
- Evaluation Methods - Documentation of comparison methods
from idp_common.evaluation.service import EvaluationService
# Initialize service
eval_service = EvaluationService(region="us-east-1", config=config)
# Evaluate document (field_comparisons automatically enabled)
result_doc = eval_service.evaluate_document(
actual_document=actual_doc,
expected_document=expected_doc,
store_results=True # Generates both JSON and Markdown
)
# Access detailed comparisons programmatically
for section in result_doc.evaluation_result.section_results:
for attr in section.attributes:
if attr.field_comparison_details:
print(f"Attribute: {attr.name}")
print(f"Aggregate Score: {attr.score}")
print(f"Nested Comparisons: {len(attr.field_comparison_details)}")
for detail in attr.field_comparison_details:
if not detail['match']:
print(f" Mismatch: {detail['expected_key']}")
print(f" Expected: {detail['expected_value']}")
print(f" Actual: {detail['actual_value']}")
print(f" Score: {detail['score']}")GitHub's markdown renderer supports HTML, so the interactive controls will work when viewing the report in:
- Pull requests
- Issue comments
- Repository files
Install a markdown extension that supports HTML:
- Markdown Preview Enhanced (recommended)
- Markdown All in One
Open the .md file directly in a browser:
open test_evaluation_report.mdUse IPython.display.Markdown:
from IPython.display import Markdown, display
with open('evaluation/report.md', 'r') as f:
display(Markdown(f.read()))No additional configuration required! The enhancement automatically activates when using sticker-eval v0.1.4+.
The feature is enabled in lib/idp_common_pkg/idp_common/evaluation/service.py:
# Compare using Stickler with field_comparisons enabled
stickler_result = expected_instance.compare_with(
actual_instance,
document_field_comparisons=True, # Enables detailed comparison
)- Quickly identify which specific nested fields are causing mismatches
- See exact values that differ within complex objects
- Understand Hungarian matching results for arrays
- Filter to show only unmatched rows
- Focus attention on fields requiring investigation
- Reduce cognitive load when reviewing large reports
- Aggregate scores provide high-level overview
- Nested details provide granular diagnostics
- Both perspectives available in single report
- JSON structure fully captures all comparison data
- Can be consumed by analytics tools
- Markdown provides human-readable interface
AttributeEvaluationResult now includes:
@dataclass
class AttributeEvaluationResult:
# ... existing fields ...
field_comparison_details: Optional[List[Dict[str, Any]]] = NoneEach comparison in field_comparison_details:
{
"expected_key": "path.to.field", # Dot/bracket notation
"expected_value": "actual value",
"actual_key": "path.to.field",
"actual_value": "actual value",
"match": true, # Boolean match result
"score": 0.95, # Similarity score (0.0-1.0)
"weighted_score": 1.9, # score * field_weight
"reason": "explanation" # Human-readable reason
}Field comparisons are grouped by root field name:
LineItems[0].Description→ grouped underLineItemsAddress.City→ grouped underAddress- Simple fields have no grouping (single comparison or none)
The enhancement is fully backward compatible:
- ✅ Existing API unchanged
- ✅ JSON reports remain consumable by old code (new field is optional)
- ✅ Markdown reports viewable in any viewer (controls degrade gracefully)
- ✅ No configuration changes required
See test_evaluation_enhancements.py for complete working examples demonstrating:
- Nested object comparisons
- Array item comparisons
- Aggregate score calculations
- Interactive report generation
Run the test:
python test_evaluation_enhancements.pyThis generates test_evaluation_report.md demonstrating all features.
Potential future improvements:
- Export to CSV with nested details flattened
- Comparison history tracking across runs
- Threshold recommendations based on field mismatch patterns
- Visual diff viewer for nested structures