Describe the bug
When I run the PDFs attached most fields are incorrectly empty and the single page thumbnails are missing most of the filled in data.
To Reproduce
Steps to reproduce the behavior:
- Deploy a fresh IDP.
- Click 'Discovery'. Upload
VAF 21-22a_example_1.pdf.
- After discovery is finished, ensure that it created a new document class.
- Go to 'Upload Document(s).' Upload
VAF 21-22a_example_1.pdf. Use the default config. Click 'Upload'.
- Once the document status is COMPLETED, click on the document
- Under 'Document Sections', click 'View Data'. Under 'Visual Editor', you'll see that most of the inputted data is missing from both the 'Document Pages' (the images) and 'Document Data' (the drop downs).
Expected behavior
- Thumbnail matches the original
- The extraction is mostly if not 100% correct
Screenshots
VAF 21-22a_example_1.pdf
VAF_21-22a_example_1.pdf_section1_prediction.json
AWS Region
us-east-1
Accelerator Processing Mode
Pipeline mode
Accelerator Version / Build
Which Version did you deploy? 0.5.2
Custom Stack Parameters
What non-default stack parameters did you configure when deploying? None
Custom Configurations
Ran a discovery job. Created a new class. See above
Output of the 'TroubleShoot' agent (if issue is a document processing failure)
Summary of Findings
The Critical Problem: While all processing stages completed successfully (Classification, Extraction, OCR, Summarization, etc.), the extraction is completely failing to extract key personal information - specifically the VeteranInformation, ClaimantInformation, and ServiceOrganizationInformation sections show 0.00 confidence scores across all fields.
Specific Issues Identified:
Complete Data Extraction Failure on Key Sections:
VeteranInformation: 0/13 fields extracted (0% success rate)
ClaimantInformation: 0/5 fields extracted (0% success rate)
ServiceOrganizationInformation: 0/7 fields extracted (0% success rate)
All confidence scores are 0.00 across these critical sections
Partial Success on Secondary Sections:
AuthorizationInformation: 3/3 fields extracted successfully (95-92% confidence)
ConditionsOfAppointment: 4/4 fields extracted successfully (99% confidence)
FeesAndPenalties: 2/2 fields extracted successfully (99% confidence)
Output Image Issue: The fact that output images have "removed all inputted information" suggests that the extraction model is likely struggling with the document layout or image quality, potentially:
Document orientation issues
Low image quality or resolution
Text not being properly recognized by the OCR stage
Potential mismatch between document structure and the VA Form 21-22a schema
Recommended Next Steps:
Verify the input PDF quality - Check if the document is clear, properly scanned, and in portrait orientation
Review the OCR output - The OCR stage completed but may have failed to properly read the personal information fields
Check document structure - Ensure the VAF form follows the standard VA Form 21-22a format
Re-upload the document - Try processing the document again to see if this was a temporary extraction failure
The PDF is normal quality. Everything is typed so there are no text quality issues. The OCR stage seems to have the failures; it extracts nothing.When I run the document in textract, I see all of the text extracted properly.
FWIW, I found another bug with the 'Agent Companion Chat' during this conversation. If you leave the page while the agent is responding, and then you return to the page, it's frozen.
Link to DeepWiki answer
https://deepwiki.com/search/when-i-uploaded-vaf2122aexampl_b6924ee9-2536-4fb5-aadd-1bca24850784?mode=fast
I tried adding ["TABLES", "FORMS", "SIGNATURES", "LAYOUT"] to the Features but it didn't help
Additional context
Add any other context about the problem here.
Describe the bug
When I run the PDFs attached most fields are incorrectly empty and the single page thumbnails are missing most of the filled in data.
To Reproduce
Steps to reproduce the behavior:
VAF 21-22a_example_1.pdf.VAF 21-22a_example_1.pdf. Use the default config. Click 'Upload'.Expected behavior
Screenshots
VAF 21-22a_example_1.pdf
VAF_21-22a_example_1.pdf_section1_prediction.json
AWS Region
us-east-1
Accelerator Processing Mode
Pipeline mode
Accelerator Version / Build
Which Version did you deploy? 0.5.2
Custom Stack Parameters
What non-default stack parameters did you configure when deploying? None
Custom Configurations
Ran a discovery job. Created a new class. See above
Output of the 'TroubleShoot' agent (if issue is a document processing failure)
The PDF is normal quality. Everything is typed so there are no text quality issues. The OCR stage seems to have the failures; it extracts nothing.When I run the document in textract, I see all of the text extracted properly.
FWIW, I found another bug with the 'Agent Companion Chat' during this conversation. If you leave the page while the agent is responding, and then you return to the page, it's frozen.
Link to DeepWiki answer
https://deepwiki.com/search/when-i-uploaded-vaf2122aexampl_b6924ee9-2536-4fb5-aadd-1bca24850784?mode=fast
I tried adding ["TABLES", "FORMS", "SIGNATURES", "LAYOUT"] to the Features but it didn't help
Additional context
Add any other context about the problem here.