Go Parser Validator Audit by elipe17 · Pull Request #5834 · raft-tech/TANF-app

elipe17 · 2026-05-06T02:46:09Z

Summary of Changes

Aligned remaining yaml schemas and file specs with python
Added go equivalent integrations tests for most of the remaining python parser integration tests
Added to and updated validators in validators.yaml where needed
Updated the grouping KeyFields config to allow for completely generic key field specification
Updated Decoder.Row to know how to extract it's key fields based on the config
Update tabular decoders to mirror python logic for reading the first row of an FRA datafile for a "header" check
Added new sentinel error for decoder unknown parser errors
Updated the sortable interface to leverage Decoder.Row interface instead of the thin adapter we had before (KeyFieldExtractor)
Updated accumulator/pipeline/orchestrators to adhere to new Decoder.Row interface
Fixed bug around rollbacks and parsing cancellation that was preventing sentinel errors from being written
Add FRA sections for celery server with a TODO caveat
Updated validation key to be schema path instead of record type since TANF and Tribal would have conflicted
Fixed bug where schemas with multiple segments could duplicate field validators across segment fields
Left some other TODOs that I will turn into follow on tickets
Pull request closes Go Parser: Audit validators against Python parser for completeness #5728

How to Test

cd tdrs-frontend && docker-compose up --build
cd tdrs-backend && docker-compose up --build

Submit pretty much any of the test files to the go parser and audit the results
Run the tests
Try and identify other gaps that I have missed that can be turned into follow on tickets

Deliverables

More details on how deliverables herein are assessed included here.

Deliverable 1: Accepted Features

Checklist of ACs:

Every preparsing_validator in Python schema_defs has a corresponding Go validator
Every field validator in Python schema_defs has a corresponding Go validator
Every postparsing_validator in Python schema_defs has a corresponding Go validator
Checklist produced documenting any gaps found
All identified gaps are either resolved or tracked as separate issues
Testing Checklist has been run and all tests pass
README is updated, if necessary

Deliverable 2: Tested Code

Are all areas of code introduced in this PR meaningfully tested?
- If this PR introduces backend code changes, are they meaningfully tested?
- If this PR introduces frontend code changes, are they meaningfully tested?
Are code coverage minimums met?
- Frontend coverage: [insert coverage %] (see CodeCov Report comment in PR)
- Backend coverage: [insert coverage %] (see CodeCov Report comment in PR)

Deliverable 3: Properly Styled Code

Are backend code style checks passing on CircleCI?
Are frontend code style checks passing on CircleCI?
Are code maintainability principles being followed?

Deliverable 4: Accessible

Does this PR complete the epic?
Are links included to any other gov-approved PRs associated with epic?
Does PR include documentation for Raft's a11y review?
Did automated and manual testing with iamjolly and ttran-hub using Accessibility Insights reveal any errors introduced in this PR?

Deliverable 5: Deployed

Was the code successfully deployed via automated CircleCI process to development on Cloud.gov?

Deliverable 6: Documented

Does this PR provide background for why coding decisions were made?
If this PR introduces backend code, is that code easy to understand and sufficiently documented, both inline and overall?
If this PR introduces frontend code, is that code easy to understand and sufficiently documented, both inline and overall?
If this PR introduces dependencies, are their licenses documented?
Can reviewer explain and take ownership of these elements presented in this code review?

Deliverable 7: Secure

Does the OWASP Scan pass on CircleCI?
Do manual code review and manual testing detect any new security issues?
If new issues detected, is investigation and/or remediation plan documented?

Deliverable 8: User Research

Research product(s) clearly articulate(s):

the purpose of the research
methods used to conduct the research
who participated in the research
what was tested and how
impact of research on TDP
(if applicable) final design mockups produced for TDP development

- Update FRA and Tribal schemas with validators - Add tests surrounding FRA and Tribal - Get rid of SerializeHeaderError in favor of generic SerializeParserError

- Remove extra field on m3 - fix tanf t7 schema - Update validator registry to not duplicate validators accross the same segment fields - Make required field validation easier to read

…lize sentinel errors

…n calendar quarter - Add record calendar quarter validator - Fix mis alignments in tribal validators

- Update incorrect python test and added comment justification

- Add FRA sections

- Uncomment FRA header tests

… fields - Add interface to decoder row to allow the row to dictate how to extract it's key fields for grouping and sorting - Update sortable to accept key fields instead of an extractor - Update accumulator to use new row interface - Clean out old static key fields config throughout

…ed record counts by running python version and updated python record counts too. - Add new sentinel error for decoder unknown - Add unit tests for server cover decoder unknown path

…/raft-tech/TANF-app into 5728-go-validator-audit

- Update file specs to use the correct cat4 validators - Add duplicate detection to s3 and s4 files - Rewrite requires_related_record & requires_related_record_with_int_value to execute per record - Remove unused compiled functions

- remove extra field from tests

- Disable keycloak sync for all other tests

codecov · 2026-05-07T18:53:12Z

Codecov Report

❌ Patch coverage is 81.25000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.01%. Comparing base (4c3f482) to head (23cf207).
⚠️ Report is 34 commits behind head on develop.

Files with missing lines	Patch %	Lines
.../tdpservice/parsers/test/test_parse_large_files.py	0.00%	3 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #5834      +/-   ##
===========================================
+ Coverage    94.00%   94.01%   +0.01%     
===========================================
  Files          538      538              
  Lines        24650    24819     +169     
  Branches       620      620              
===========================================
+ Hits         23171    23334     +163     
- Misses        1366     1372       +6     
  Partials       113      113

Flag	Coverage Δ
dev-backend	`94.30% <81.25%> (+0.01%)`	⬆️
dev-frontend	`91.84% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
tdrs-backend/tdpservice/conftest.py	`94.47% <100.00%> (+0.08%)`	⬆️
tdrs-backend/tdpservice/core/test/test_admin.py	`100.00% <ø> (ø)`
tdrs-backend/tdpservice/parsers/test/conftest.py	`97.72% <ø> (ø)`
tdrs-backend/tdpservice/parsers/test/test_parse.py	`99.78% <100.00%> (-0.01%)`	⬇️
...ackend/tdpservice/users/test/test_keycloak_sync.py	`100.00% <100.00%> (ø)`
.../tdpservice/parsers/test/test_parse_large_files.py	`80.00% <0.00%> (ø)`

... and 4 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4740d5c...23cf207. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jtimpe

I tested most of the submissions by uploading them through the frontend to the python parser, then running them through the go parser in dry-run mode and comparing the outputs. Only noticed a few small inconsistencies

jtimpe · 2026-05-14T17:34:49Z

-        expected_t3_record_count = 172525
+        expected_t1_record_count = 96497
+        expected_t2_record_count = 112622
+        expected_t3_record_count = 172552


why did the record counts change?

These are the counts the python parser reported when I ran it. I don't remember the last time we actaully enabled this test. My best guess is that we introduced new record precheck or case consistency checks that changed the number.

jtimpe · 2026-05-14T17:49:25Z

+    params:
+      record_type: T6
+
+validation_orchestrator:


does this need to be defined for every section file? they should be the same across any of the submissions, no?

Indeed it does right now. We could setup a default definition for it to keep it DRYer. The reason I didn't do that initially is because we don't have a schema def/document for the complete shape of our yaml files yet. So everything we default and leave out is obscurred.

jtimpe · 2026-05-14T17:57:52Z

  # Education level validator - 0-16 or 98-99
  - id: education_level
-    expr: "(int(Value) >= 0 and int(Value) <= 16) or (int(Value) >= 98 and int(Value) <= 99)"
+    expr: "(int(trim(string(Value))) >= 0 and int(trim(string(Value))) <= 16) or (int(trim(string(Value))) >= 98 and int(trim(string(Value))) <= 99)"


maybe defining field type: integer could do some sort of pre-validation transformation so that int(trim(string(Value))) doesn't have to be done so many times?

Yes, I really want to update all the schemas to treat actual integer values as integers at parse time instead of a string and then converting. I think this would save us a bunch of bandwidth, but it requires major migrations.

jtimpe · 2026-05-15T12:32:18Z

@@ -73,12 +73,13 @@ accumulator:
  # These byte positions are used BEFORE full parsing to quickly
  # determine which case a record belongs to.


TANF section 2 files seem to have a bit of drift in the number of parser errors generated (638 for celery parser, 582 for go parser using the file ADS.E2J.FTP2.TS06)

I will dig in and see what the extra errors are that the python parser is generating.

@jtimpe I have found the reason why we have different number of errors between the two. The difference between the two sets came down to 2 different validators. The first is a difference between pythons category3.isOlderThan(18) and the go parser equivalent age calculation. The go parser actually uses the correct age calculation (the one used in the AGE_FIRST view calculation cc @mattcoleanderson ). Because of this, around half of the extra errors python generates that go doesn't is because they are actually valid from go's perspective/don't actually trigger the category 3 validation.

The second issue is that python generates the deprecated category four errors still. Go doesn't even implement them since they are going away.

jtimpe · 2026-05-15T12:43:47Z


+  # Tribal TANF allows WORK_PART_STATUS 01-03, 05-09, 11-19, or 99.
+  - id: tribal_t2_family_affil_1_2_work_part_status
+    expr: "if GetInt('FAMILY_AFFILIATION') in [1, 2] { (GetInt('WORK_PART_STATUS') in 1..3) or (GetInt('WORK_PART_STATUS') in 5..9) or (GetInt('WORK_PART_STATUS') in 11..19) or GetString('WORK_PART_STATUS') == '99' } else { true }"


two things about this validator, i assume this applies to any validators of this style

~~the field_name and item_number don't seem to be reported (at least in the dry run, the column comes back <nil>)~~

the python parser reports the given value for FAMILY_AFFILIATION (Since Item 26 (Family Affiliation) is 1...)

jtimpe · 2026-05-15T14:49:31Z

I tested most of the submissions by uploading them through the frontend to the python parser, then running them through the go parser in dry-run mode and comparing the outputs. Only noticed a few small inconsistencies

apologies - please disregard my review comments for the time being. i need to re-verify everything as i believe i tested the wrong branch 🤦

jtimpe · 2026-05-15T19:27:44Z

@elipe17 i tested again and deleted the comments that were resolved. my only remaining blocker is the tanf s2 error count drift. otherwise looks great!

elipe17 added 9 commits May 5, 2026 09:45

- Update FRA and Tribal filespecs with orchestrator info

fa786d9

- Update FRA and Tribal schemas with validators - Add tests surrounding FRA and Tribal - Get rid of SerializeHeaderError in favor of generic SerializeParserError

- Start commenting in more tests

58464de

- Remove extra field on m3 - fix tanf t7 schema - Update validator registry to not duplicate validators accross the same segment fields - Make required field validation easier to read

- Fix issue where a rollback cold throw a runtime error and not seria…

8a0cb31

…lize sentinel errors

- Update section 3 and 4 schemas to have correct quarter validation o…

bcf5a35

…n calendar quarter - Add record calendar quarter validator - Fix mis alignments in tribal validators

- Current passing go parser tests. All tribal.

9365016

- Update incorrect python test and added comment justification

- Add application octet for columnar files

c564483

- Add FRA sections

- Update columnar decoders to mirror the python FRA read first logic

66af61a

- Uncomment FRA header tests

- Add super large file tests to go parser run since it is fast. Verfi…

bf2f568

…ed record counts by running python version and updated python record counts too. - Add new sentinel error for decoder unknown - Add unit tests for server cover decoder unknown path

elipe17 self-assigned this May 6, 2026

elipe17 added backend dev raft review This issue is ready for raft review labels May 6, 2026

elipe17 requested review from jtimpe, mattcoleanderson and raftmsohani May 6, 2026 02:58

elipe17 added 8 commits May 5, 2026 21:59

Merge branch '5727-go-parser-rollback-handling' of https://github.com…

4b0e7d2

…/raft-tech/TANF-app into 5728-go-validator-audit

- Add duplicate tests to go parse integration tests

98854c0

- Update file specs to use the correct cat4 validators - Add duplicate detection to s3 and s4 files - Rewrite requires_related_record & requires_related_record_with_int_value to execute per record - Remove unused compiled functions

- linting

ee6ef74

- Make tests thread safe

40c9adf

- remove extra field from tests

- Add better handling for csv binary files

3382b5e

- Disable super big file on python side

a412377

- Remove print statements and "assert False"

6bd45a1

- Add consistent ordering clause

80bda54

- Disable keycloak sync for all other tests

Base automatically changed from 5727-go-parser-rollback-handling to develop May 13, 2026 04:47

raftmsohani approved these changes May 14, 2026

View reviewed changes

jtimpe reviewed May 15, 2026

View reviewed changes

elipe17 requested a review from jtimpe May 20, 2026 16:37

jtimpe approved these changes May 20, 2026

View reviewed changes

Merge branch 'develop' into 5728-go-validator-audit

23cf207

elipe17 merged commit abc9541 into develop May 21, 2026
14 checks passed

elipe17 deleted the 5728-go-validator-audit branch May 21, 2026 16:33

		@@ -73,12 +73,13 @@ accumulator:
		# These byte positions are used BEFORE full parsing to quickly
		# determine which case a record belongs to.

Conversation

elipe17 commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of Changes

How to Test

Deliverables

Uh oh!

codecov Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jtimpe left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jtimpe May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jtimpe commented May 15, 2026

Uh oh!

jtimpe commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

elipe17 commented May 6, 2026 •

edited

Loading

codecov Bot commented May 7, 2026 •

edited

Loading

jtimpe May 15, 2026 •

edited

Loading