5566 Document parsing/reparsing workflow#5821
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #5821 +/- ##
========================================
Coverage 93.98% 93.98%
========================================
Files 536 536
Lines 24527 24527
Branches 620 620
========================================
Hits 23051 23051
Misses 1363 1363
Partials 113 113
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
jtimpe
left a comment
There was a problem hiding this comment.
I only did a high-level pass here, I didn't go verify all the details. That said, a couple points of feedback:
- I don't find the charts to be particularly useful, personally. I'm not quite sure how to read them, and there's a lot of complexity to parse through. The text flows make more sense to me.
- There's a lot of implementation details represented that I feel are unnecessary. Class names and files might be unavoidable, but references to
kwargs,try/except, function/method names, etc. seem too in-the-weeds.
The way it is currently written is about as cognitively demanding as reading the code itself. Plus, as we make changes to these implementation details, we have to meticulously update the documentation alongside or it will go out of date. I'd prefer the documentation to cover the structure, behavior rules, and the "why" behind the implementation, rather than cover the implementation with a lot of detail. Perhaps including some examples of how certain structures or validators get used could be helpful.
Open to conversation and opinions on this, those are just my initial thoughts.
| │ └── DataFile.create_new_version(...) creates the DataFile with file=None (state = UPLOADED) | ||
| │ | ||
| ├── transition_datafile → VIRUS_SCAN_STARTED | ||
| ├── ClamAVClient.scan_file(...) ← synchronous, in-request scan |
There was a problem hiding this comment.
doesn't ClamAV scanning happen before DataFileSerializer.create()? or the create calls DataFile.create_new_version which calls the scan, but a scan failure blocks DF creation. Not sure how that should be represented here in terms of calls - but both this and the diagram above indicate (to me, anyway) that it's parallel or happens after model creation.
There was a problem hiding this comment.
we have changed that behavior to ensure state can be stored. We first create the datafile but not storing any file. Then ClamAV scans during the request. If scan fails, the DataFile remains in failed scan state for lifecycle visibility, but the uploaded file is not stored and no parse task is queued.
| ``` | ||
| ParserFactory.get_instance(**kwargs) | ||
| │ | ||
| ├── pops program_type, is_program_audit from kwargs |
There was a problem hiding this comment.
the note about kwargs seems too implementation-detail-heavy to me. if we ever change from kwargs to args or some other method of passing the data, we have to go update the documentation
Co-authored-by: jtimpe <111305129+jtimpe@users.noreply.github.com>
I kind of agree with your observation Jan. The intention of this documentation for now was to document the functionality before we do any changes to parsing/reparsing, but I agree we should use the code as detail documentation and use this document as high level documentation for the user to understand better the flow. |
|
I think we should also use this as an opportunity to clean up some old documentation/diagrams. I'd like to propose removing |
jtimpe
left a comment
There was a problem hiding this comment.
I think this is coming along very nicely! I like the behavior-documentation approach much better than implementation details. I think this could use a bit more detail on
- the fixed-width vs columnar files/decoders (it's mentioned briefly in the FRA validation example but that's all i see)
- how the task selects which parser to use based on the program/section (might be hard to do without providing a lot of implementation detail, feel free to adjust how you see fit)
- schema definitions and how validators are defined, as well as the order or operations for the different validation layers
Overall, looking very good!
| 4. It applies cross-record rules such as case consistency and duplicate handling. | ||
| 5. It computes the final `DataFileSummary.status`. | ||
| 6. It maps the parser outcome back onto `DataFile.state`. | ||
| 7. It generates an error report and stores it on the summary. |
There was a problem hiding this comment.
| 7. It generates an error report and stores it on the summary. | |
| 7. It generates an error report and performs aggregate calculations according to the file type, stores them on the summary. |
Did we decide to do anything about this @raftmsohani ? What are your thoughts? |
Summary of Changes
Provide a brief summary of changes
Pull request closes #5566
How to Test
List the steps to test the PR
These steps are generic, please adjust as necessary.
Deliverables
More details on how deliverables herein are assessed included here.
Deliverable 1: Accepted Features
Checklist of ACs:
lfrohlichand/oradpenningtonconfirmed that ACs are met.Deliverable 2: Tested Code
CodeCov Reportcomment in PR)CodeCov Reportcomment in PR)Deliverable 3: Properly Styled Code
Deliverable 4: Accessible
iamjollyandttran-hubusing Accessibility Insights reveal any errors introduced in this PR?Deliverable 5: Deployed
Deliverable 6: Documented
Deliverable 7: Secure
Deliverable 8: User Research
Research product(s) clearly articulate(s):