5566 Document parsing/reparsing workflow by raftmsohani · Pull Request #5821 · raft-tech/TANF-app

raftmsohani · 2026-05-01T12:22:53Z

Summary of Changes

Provide a brief summary of changes
Pull request closes #5566

How to Test

List the steps to test the PR
These steps are generic, please adjust as necessary.

cd tdrs-frontend && docker-compose up --build
cd tdrs-backend && docker-compose up --build

Open http://localhost:3000/ and sign in.
Proceed with functional tests as described herein.
Test steps should be captured in the demo GIF(s) and/or screenshots below.

Demo GIF(s) and screenshots for testing procedure

Deliverables

More details on how deliverables herein are assessed included here.

Deliverable 1: Accepted Features

Checklist of ACs:

[insert ACs here]
lfrohlich and/or adpennington confirmed that ACs are met.

Deliverable 2: Tested Code

Are all areas of code introduced in this PR meaningfully tested?
- If this PR introduces backend code changes, are they meaningfully tested?
- If this PR introduces frontend code changes, are they meaningfully tested?
Are code coverage minimums met?
- Frontend coverage: [insert coverage %] (see CodeCov Report comment in PR)
- Backend coverage: [insert coverage %] (see CodeCov Report comment in PR)

Deliverable 3: Properly Styled Code

Are backend code style checks passing on CircleCI?
Are frontend code style checks passing on CircleCI?
Are code maintainability principles being followed?

Deliverable 4: Accessible

Does this PR complete the epic?
Are links included to any other gov-approved PRs associated with epic?
Does PR include documentation for Raft's a11y review?
Did automated and manual testing with iamjolly and ttran-hub using Accessibility Insights reveal any errors introduced in this PR?

Deliverable 5: Deployed

Was the code successfully deployed via automated CircleCI process to development on Cloud.gov?

Deliverable 6: Documented

Does this PR provide background for why coding decisions were made?
If this PR introduces backend code, is that code easy to understand and sufficiently documented, both inline and overall?
If this PR introduces frontend code, is that code easy to understand and sufficiently documented, both inline and overall?
If this PR introduces dependencies, are their licenses documented?
Can reviewer explain and take ownership of these elements presented in this code review?

Deliverable 7: Secure

Does the OWASP Scan pass on CircleCI?
Do manual code review and manual testing detect any new security issues?
If new issues detected, is investigation and/or remediation plan documented?

Deliverable 8: User Research

Research product(s) clearly articulate(s):

the purpose of the research
methods used to conduct the research
who participated in the research
what was tested and how
impact of research on TDP
(if applicable) final design mockups produced for TDP development

…lows

codecov · 2026-05-01T16:35:32Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.98%. Comparing base (12a2679) to head (d91ef10).
⚠️ Report is 24 commits behind head on develop.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #5821   +/-   ##
========================================
  Coverage    93.98%   93.98%           
========================================
  Files          536      536           
  Lines        24527    24527           
  Branches       620      620           
========================================
  Hits         23051    23051           
  Misses        1363     1363           
  Partials       113      113

Flag	Coverage Δ
dev-backend	`94.26% <ø> (ø)`
dev-frontend	`91.84% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 17c8db8...d91ef10. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…lows

jtimpe

I only did a high-level pass here, I didn't go verify all the details. That said, a couple points of feedback:

I don't find the charts to be particularly useful, personally. I'm not quite sure how to read them, and there's a lot of complexity to parse through. The text flows make more sense to me.
There's a lot of implementation details represented that I feel are unnecessary. Class names and files might be unavoidable, but references to kwargs, try/except, function/method names, etc. seem too in-the-weeds.

The way it is currently written is about as cognitively demanding as reading the code itself. Plus, as we make changes to these implementation details, we have to meticulously update the documentation alongside or it will go out of date. I'd prefer the documentation to cover the structure, behavior rules, and the "why" behind the implementation, rather than cover the implementation with a lot of detail. Perhaps including some examples of how certain structures or validators get used could be helpful.

Open to conversation and opinions on this, those are just my initial thoughts.

jtimpe · 2026-05-13T11:55:39Z

+        │       └── DataFile.create_new_version(...) creates the DataFile with file=None (state = UPLOADED)
+        │
+        ├── transition_datafile → VIRUS_SCAN_STARTED
+        ├── ClamAVClient.scan_file(...)       ← synchronous, in-request scan


doesn't ClamAV scanning happen before DataFileSerializer.create()? or the create calls DataFile.create_new_version which calls the scan, but a scan failure blocks DF creation. Not sure how that should be represented here in terms of calls - but both this and the diagram above indicate (to me, anyway) that it's parallel or happens after model creation.

we have changed that behavior to ensure state can be stored. We first create the datafile but not storing any file. Then ClamAV scans during the request. If scan fails, the DataFile remains in failed scan state for lifecycle visibility, but the uploaded file is not stored and no parse task is queued.

jtimpe · 2026-05-13T12:27:50Z

+```
+ParserFactory.get_instance(**kwargs)
+        │
+        ├── pops program_type, is_program_audit from kwargs


the note about kwargs seems too implementation-detail-heavy to me. if we ever change from kwargs to args or some other method of passing the data, we have to go update the documentation

Co-authored-by: jtimpe <111305129+jtimpe@users.noreply.github.com>

raftmsohani · 2026-05-14T13:01:44Z

I only did a high-level pass here, I didn't go verify all the details. That said, a couple points of feedback:

I don't find the charts to be particularly useful, personally. I'm not quite sure how to read them, and there's a lot of complexity to parse through. The text flows make more sense to me.

There's a lot of implementation details represented that I feel are unnecessary. Class names and files might be unavoidable, but references to kwargs, try/except, function/method names, etc. seem too in-the-weeds.

The way it is currently written is about as cognitively demanding as reading the code itself. Plus, as we make changes to these implementation details, we have to meticulously update the documentation alongside or it will go out of date. I'd prefer the documentation to cover the structure, behavior rules, and the "why" behind the implementation, rather than cover the implementation with a lot of detail. Perhaps including some examples of how certain structures or validators get used could be helpful.

Open to conversation and opinions on this, those are just my initial thoughts.

I kind of agree with your observation Jan. The intention of this documentation for now was to document the functionality before we do any changes to parsing/reparsing, but I agree we should use the code as detail documentation and use this document as high level documentation for the user to understand better the flow.

elipe17 · 2026-05-14T15:06:47Z

I think we should also use this as an opportunity to clean up some old documentation/diagrams. I'd like to propose removing clean-and-reparse.md, create-elastic-kibana.md, nexus-repo.md, and parsing-flow.md. It would also probably be advantageous to update/delete the resources in docs/Technical-Documentation/diagrams. What are your guys thoughts? @jtimpe @mattcoleanderson @raftmsohani

jtimpe

I think this is coming along very nicely! I like the behavior-documentation approach much better than implementation details. I think this could use a bit more detail on

the fixed-width vs columnar files/decoders (it's mentioned briefly in the FRA validation example but that's all i see)
how the task selects which parser to use based on the program/section (might be hard to do without providing a lot of implementation detail, feel free to adjust how you see fit)
schema definitions and how validators are defined, as well as the order or operations for the different validation layers

Overall, looking very good!

jtimpe · 2026-05-28T16:10:08Z

+4. It applies cross-record rules such as case consistency and duplicate handling.
+5. It computes the final `DataFileSummary.status`.
+6. It maps the parser outcome back onto `DataFile.state`.
+7. It generates an error report and stores it on the summary.


Suggested change

7. It generates an error report and stores it on the summary.

7. It generates an error report and performs aggregate calculations according to the file type, stores them on the summary.

elipe17 · 2026-05-28T19:39:37Z

I think we should also use this as an opportunity to clean up some old documentation/diagrams. I'd like to propose removing clean-and-reparse.md, create-elastic-kibana.md, nexus-repo.md, and parsing-flow.md. It would also probably be advantageous to update/delete the resources in docs/Technical-Documentation/diagrams. What are your guys thoughts? @jtimpe @mattcoleanderson @raftmsohani

Did we decide to do anything about this @raftmsohani ? What are your thoughts?

5566 Document parsing/reparsing workflow

c067b71

raftmsohani self-assigned this May 1, 2026

raftmsohani linked an issue May 1, 2026 that may be closed by this pull request

Document current parsing & reparsing flows #5566

Open

12 tasks

Merge branch 'develop' into 5566-document-current-parsing-reparsing-f…

54efb88

…lows

raftmsohani and others added 2 commits May 11, 2026 07:52

Merge branch 'develop' into 5566-document-current-parsing-reparsing-f…

099eb53

…lows

added diagrams

f9a43c6

raftmsohani requested review from elipe17, jtimpe and mattcoleanderson May 11, 2026 14:24

added syn for help

f9a8e46

jtimpe reviewed May 13, 2026

View reviewed changes

Update docs/Technical-Documentation/parsing-reparsing-architecture.md

f5a7ea3

Co-authored-by: jtimpe <111305129+jtimpe@users.noreply.github.com>

5566 Refactored parsing documentation

d91ef10

jtimpe reviewed May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

5566 Document parsing/reparsing workflow#5821

5566 Document parsing/reparsing workflow#5821
raftmsohani wants to merge 7 commits into
developfrom
5566-document-current-parsing-reparsing-flows

raftmsohani commented May 1, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 1, 2026 •

edited

Loading

Uh oh!

jtimpe left a comment

Uh oh!

jtimpe May 13, 2026

Uh oh!

raftmsohani May 14, 2026

Uh oh!

Uh oh!

jtimpe May 13, 2026

Uh oh!

raftmsohani commented May 14, 2026

Uh oh!

elipe17 commented May 14, 2026

Uh oh!

jtimpe left a comment

Uh oh!

jtimpe May 28, 2026

Uh oh!

elipe17 commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	7. It generates an error report and stores it on the summary.
	7. It generates an error report and performs aggregate calculations according to the file type, stores them on the summary.

Conversation

raftmsohani commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of Changes

How to Test

Deliverables

Uh oh!

codecov Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jtimpe left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raftmsohani commented May 14, 2026

Uh oh!

elipe17 commented May 14, 2026

Uh oh!

jtimpe left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elipe17 commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

raftmsohani commented May 1, 2026 •

edited

Loading

codecov Bot commented May 1, 2026 •

edited

Loading