Go Parser Validator Audit#5834
Conversation
- Update FRA and Tribal schemas with validators - Add tests surrounding FRA and Tribal - Get rid of SerializeHeaderError in favor of generic SerializeParserError
- Remove extra field on m3 - fix tanf t7 schema - Update validator registry to not duplicate validators accross the same segment fields - Make required field validation easier to read
…lize sentinel errors
…n calendar quarter - Add record calendar quarter validator - Fix mis alignments in tribal validators
- Update incorrect python test and added comment justification
- Add FRA sections
- Uncomment FRA header tests
… fields - Add interface to decoder row to allow the row to dictate how to extract it's key fields for grouping and sorting - Update sortable to accept key fields instead of an extractor - Update accumulator to use new row interface - Clean out old static key fields config throughout
…ed record counts by running python version and updated python record counts too. - Add new sentinel error for decoder unknown - Add unit tests for server cover decoder unknown path
…/raft-tech/TANF-app into 5728-go-validator-audit
- Update file specs to use the correct cat4 validators - Add duplicate detection to s3 and s4 files - Rewrite requires_related_record & requires_related_record_with_int_value to execute per record - Remove unused compiled functions
- remove extra field from tests
- Disable keycloak sync for all other tests
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #5834 +/- ##
===========================================
+ Coverage 94.00% 94.01% +0.01%
===========================================
Files 538 538
Lines 24650 24819 +169
Branches 620 620
===========================================
+ Hits 23171 23334 +163
- Misses 1366 1372 +6
Partials 113 113
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 4 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
jtimpe
left a comment
There was a problem hiding this comment.
I tested most of the submissions by uploading them through the frontend to the python parser, then running them through the go parser in dry-run mode and comparing the outputs. Only noticed a few small inconsistencies
| expected_t3_record_count = 172525 | ||
| expected_t1_record_count = 96497 | ||
| expected_t2_record_count = 112622 | ||
| expected_t3_record_count = 172552 |
There was a problem hiding this comment.
These are the counts the python parser reported when I ran it. I don't remember the last time we actaully enabled this test. My best guess is that we introduced new record precheck or case consistency checks that changed the number.
| params: | ||
| record_type: T6 | ||
|
|
||
| validation_orchestrator: |
There was a problem hiding this comment.
does this need to be defined for every section file? they should be the same across any of the submissions, no?
There was a problem hiding this comment.
Indeed it does right now. We could setup a default definition for it to keep it DRYer. The reason I didn't do that initially is because we don't have a schema def/document for the complete shape of our yaml files yet. So everything we default and leave out is obscurred.
| # Education level validator - 0-16 or 98-99 | ||
| - id: education_level | ||
| expr: "(int(Value) >= 0 and int(Value) <= 16) or (int(Value) >= 98 and int(Value) <= 99)" | ||
| expr: "(int(trim(string(Value))) >= 0 and int(trim(string(Value))) <= 16) or (int(trim(string(Value))) >= 98 and int(trim(string(Value))) <= 99)" |
There was a problem hiding this comment.
maybe defining field type: integer could do some sort of pre-validation transformation so that int(trim(string(Value))) doesn't have to be done so many times?
There was a problem hiding this comment.
Yes, I really want to update all the schemas to treat actual integer values as integers at parse time instead of a string and then converting. I think this would save us a bunch of bandwidth, but it requires major migrations.
| @@ -73,12 +73,13 @@ accumulator: | |||
| # These byte positions are used BEFORE full parsing to quickly | |||
| # determine which case a record belongs to. | |||
There was a problem hiding this comment.
TANF section 2 files seem to have a bit of drift in the number of parser errors generated (638 for celery parser, 582 for go parser using the file ADS.E2J.FTP2.TS06)
There was a problem hiding this comment.
I will dig in and see what the extra errors are that the python parser is generating.
There was a problem hiding this comment.
@jtimpe I have found the reason why we have different number of errors between the two. The difference between the two sets came down to 2 different validators. The first is a difference between pythons category3.isOlderThan(18) and the go parser equivalent age calculation. The go parser actually uses the correct age calculation (the one used in the AGE_FIRST view calculation cc @mattcoleanderson ). Because of this, around half of the extra errors python generates that go doesn't is because they are actually valid from go's perspective/don't actually trigger the category 3 validation.
The second issue is that python generates the deprecated category four errors still. Go doesn't even implement them since they are going away.
|
|
||
| # Tribal TANF allows WORK_PART_STATUS 01-03, 05-09, 11-19, or 99. | ||
| - id: tribal_t2_family_affil_1_2_work_part_status | ||
| expr: "if GetInt('FAMILY_AFFILIATION') in [1, 2] { (GetInt('WORK_PART_STATUS') in 1..3) or (GetInt('WORK_PART_STATUS') in 5..9) or (GetInt('WORK_PART_STATUS') in 11..19) or GetString('WORK_PART_STATUS') == '99' } else { true }" |
There was a problem hiding this comment.
two things about this validator, i assume this applies to any validators of this style
thefield_nameanditem_numberdon't seem to be reported (at least in the dry run, the column comes back<nil>)- the python parser reports the given value for
FAMILY_AFFILIATION(Since Item 26 (Family Affiliation) is 1...)
apologies - please disregard my review comments for the time being. i need to re-verify everything as i believe i tested the wrong branch 🤦 |
|
@elipe17 i tested again and deleted the comments that were resolved. my only remaining blocker is the tanf s2 error count drift. otherwise looks great! |
Summary of Changes
Pull request closes Go Parser: Audit validators against Python parser for completeness #5728
How to Test
Deliverables
More details on how deliverables herein are assessed included here.
Deliverable 1: Accepted Features
Checklist of ACs:
Deliverable 2: Tested Code
CodeCov Reportcomment in PR)CodeCov Reportcomment in PR)Deliverable 3: Properly Styled Code
Deliverable 4: Accessible
iamjollyandttran-hubusing Accessibility Insights reveal any errors introduced in this PR?Deliverable 5: Deployed
Deliverable 6: Documented
Deliverable 7: Secure
Deliverable 8: User Research
Research product(s) clearly articulate(s):