Summary
Two issues in ClassificationService reduce classification accuracy and prevent downstream consumers from identifying uncertain matches for human review.
Both affect Pattern 2 (and the shared idp_common library used across patterns).
Issue 1: Bedrock classification confidence is hardcoded to 1.0
Bedrock models return meaningful confidence scores in their classification responses, but classify_page_bedrock discards them and hardcodes confidence=1.0:
lib/idp_common_pkg/idp_common/classification/service.py ~line 1337:
return PageClassification(
page_id=page_id,
classification=DocumentClassification(
doc_type=doc_type,
confidence=1.0, # Default confidence
metadata={...},
),
)
This pattern propagates through multiple paths in the same file:
| Path |
Approx. line |
Confidence value |
| Bedrock classification |
~1337 |
1.0 |
| SageMaker/UDOP |
~1433 |
1.0 |
| Holistic/packet classification (segment pages) |
~2601 |
1.0 |
| Regex shortcut |
~1170 |
1.0 |
| Max-pages fallback |
~326 |
1.0 |
| Single-class config |
~2500 |
1.0 |
| Section splitting disabled |
~2080 |
1.0 |
The Section model also defaults to 1.0:
# lib/idp_common_pkg/idp_common/models.py ~line 57
classification: str
confidence: float = 1.0
Impact: Consumers of the classification output (UIs, evaluation pipelines, downstream automation) cannot distinguish high-confidence from low-confidence classifications. This prevents flagging uncertain sections for human review, which is critical for document processing accuracy.
Suggestion: Parse and propagate the confidence value from Bedrock's response. For paths where the model genuinely doesn't return confidence (SageMaker, regex shortcuts), the 1.0 default is reasonable, but it should be distinguished from real model confidence (e.g., via a confidence_source metadata field).
Issue 2: Classification prompt lacks schema field names
The class list injected into Bedrock prompts only includes type name + description:
lib/idp_common_pkg/idp_common/classification/service.py ~line 741:
def _format_classes_list(self) -> str:
return "\n".join(
[
f"{doc_type.type_name} \t[ {doc_type.description} ]"
for doc_type in self.document_types
]
)
The holistic classification path (_format_classes_and_descriptions, ~line 2332) similarly uses only type + description in a markdown table.
Schema field names (e.g., property_address, appraised_value, inspection_date) from the JSON Schema properties are available in config.classes but are never included in the prompt. These field names are the strongest semantic signal for disambiguation — when Bedrock sees page text containing "Appraised Value: $450,000" and the class list includes fields like appraised_value, effective_date, borrower_name, the match becomes significantly more accurate.
Impact: When multiple document types have similar names or descriptions (e.g., "Appraisal Reports" vs "Inspection Reports"), the model lacks sufficient signal to disambiguate, resulting in misclassification.
Suggestion: Append a subset of schema field names (e.g., top 10-15) to each class entry in the prompt. This is a low-risk change since it only adds content to the prompt without changing its structure, and stays well within model context limits.
Environment
- Version: v0.4.16
- File:
lib/idp_common_pkg/idp_common/classification/service.py
Summary
Two issues in
ClassificationServicereduce classification accuracy and prevent downstream consumers from identifying uncertain matches for human review.Both affect Pattern 2 (and the shared
idp_commonlibrary used across patterns).Issue 1: Bedrock classification confidence is hardcoded to
1.0Bedrock models return meaningful confidence scores in their classification responses, but
classify_page_bedrockdiscards them and hardcodesconfidence=1.0:lib/idp_common_pkg/idp_common/classification/service.py~line 1337:This pattern propagates through multiple paths in the same file:
1.01.01.01.01.01.01.0The
Sectionmodel also defaults to1.0:Impact: Consumers of the classification output (UIs, evaluation pipelines, downstream automation) cannot distinguish high-confidence from low-confidence classifications. This prevents flagging uncertain sections for human review, which is critical for document processing accuracy.
Suggestion: Parse and propagate the confidence value from Bedrock's response. For paths where the model genuinely doesn't return confidence (SageMaker, regex shortcuts), the
1.0default is reasonable, but it should be distinguished from real model confidence (e.g., via aconfidence_sourcemetadata field).Issue 2: Classification prompt lacks schema field names
The class list injected into Bedrock prompts only includes type name + description:
lib/idp_common_pkg/idp_common/classification/service.py~line 741:The holistic classification path (
_format_classes_and_descriptions, ~line 2332) similarly uses only type + description in a markdown table.Schema field names (e.g.,
property_address,appraised_value,inspection_date) from the JSON Schemapropertiesare available inconfig.classesbut are never included in the prompt. These field names are the strongest semantic signal for disambiguation — when Bedrock sees page text containing "Appraised Value: $450,000" and the class list includes fields likeappraised_value, effective_date, borrower_name, the match becomes significantly more accurate.Impact: When multiple document types have similar names or descriptions (e.g., "Appraisal Reports" vs "Inspection Reports"), the model lacks sufficient signal to disambiguate, resulting in misclassification.
Suggestion: Append a subset of schema field names (e.g., top 10-15) to each class entry in the prompt. This is a low-risk change since it only adds content to the prompt without changing its structure, and stays well within model context limits.
Environment
lib/idp_common_pkg/idp_common/classification/service.py