Skip to content

5735 - go parser shadow tables#5832

Open
jtimpe wants to merge 39 commits into
developfrom
5735-go-parser-shadow-tables
Open

5735 - go parser shadow tables#5832
jtimpe wants to merge 39 commits into
developfrom
5735-go-parser-shadow-tables

Conversation

@jtimpe
Copy link
Copy Markdown

@jtimpe jtimpe commented May 5, 2026

Summary of Changes

Pull request closes #5735

  • Updates the go parser to use shadow tables (or any specified table prefix) when the GO_PARSER_SHADOW_MODE environment variable is true. This var defaults to true
  • When GO_PARSER_SHADOW_MODE=false, the go parser writes to the production parser tables
  • Creates a shadow datafile alongside the regular datafile when a file is uploaded (and GO_PARSER_SHADOW_MODE=true)
  • Queues the parse task for both the python parser and go parser whenever a file is uploaded
  • Codex added a preliminary status and record count update for the shadow datafile (and production df when the var is false)

How to Test

  1. Set GO_PARSER_SHADOW_MODE=True in your backend .env, then start the services
    cd tdrs-frontend && docker-compose up --build
    cd tdrs-backend && docker-compose up --build
    
  • Submit files across all program types and sections
  • Query the postgres db for the shadow tables
    psql -h localhost -U tdpuser -d tdrs_test
    select * from shadow_data_files_datafile;
    select * from shadow_parsers_datafilesummary;
    select COUNT(*) from shadow_parser_error;
    select COUNT(*) from shadow_search_indexes_{tanf_t1|tanf_t2|tribal_tanf_t1|ssp_m1|etc};
    
  • Compare the record counts and datafile/datafilesummary data to the production tables
    select * from data_files_datafile;
    select * from parsers_datafilesummary;
    select COUNT(*) from parser_error;
    select COUNT(*) from search_indexes_{tanf_t1|tanf_t2|tribal_tanf_t1|ssp_m1|etc};
    
  • Reparse submitted files. Re-check record counts and metadata.
  • Bring containers downs, change GO_PARSER_SHADOW_MODE to False.
  • Submit files across program types and sections. Check record counts and metadata. Nothing should be written to shadow tables, only to the production tables (doubled, for the python parser and go parser; docker compose stop celery to avoid this). *The go parser writes to the same datafile and datafilesummary row, this causes some issues in the frontend (no case_aggregates are written because it has not yet been implemented)*
  • Reparse submitted files. Re-check record counts and metadata.

Deliverables

More details on how deliverables herein are assessed included here.

Deliverable 1: Accepted Features

Checklist of ACs:

  • shadow_mode config option added to pipeline.yaml
  • When enabled, all table names prefixed with shadow_
  • Django migration creates shadow tables mirroring production schema
  • Go parser writes correctly to shadow tables
  • Shadow tables can be truncated independently of production tables
  • Integration tests updated to verify against shadow table models
  • Testing Checklist has been run and all tests pass
  • README is updated, if necessary
  • lfrohlich and/or adpennington confirmed that ACs are met.

Deliverable 2: Tested Code

  • Are all areas of code introduced in this PR meaningfully tested?
    • If this PR introduces backend code changes, are they meaningfully tested?
    • If this PR introduces frontend code changes, are they meaningfully tested?
  • Are code coverage minimums met?
    • Frontend coverage: [insert coverage %] (see CodeCov Report comment in PR)
    • Backend coverage: [insert coverage %] (see CodeCov Report comment in PR)

Deliverable 3: Properly Styled Code

  • Are backend code style checks passing on CircleCI?
  • Are frontend code style checks passing on CircleCI?
  • Are code maintainability principles being followed?

Deliverable 4: Accessible

  • Does this PR complete the epic?
  • Are links included to any other gov-approved PRs associated with epic?
  • Does PR include documentation for Raft's a11y review?
  • Did automated and manual testing with iamjolly and ttran-hub using Accessibility Insights reveal any errors introduced in this PR?

Deliverable 5: Deployed

  • Was the code successfully deployed via automated CircleCI process to development on Cloud.gov?

Deliverable 6: Documented

  • Does this PR provide background for why coding decisions were made?
  • If this PR introduces backend code, is that code easy to understand and sufficiently documented, both inline and overall?
  • If this PR introduces frontend code, is that code easy to understand and sufficiently documented, both inline and overall?
  • If this PR introduces dependencies, are their licenses documented?
  • Can reviewer explain and take ownership of these elements presented in this code review?

Deliverable 7: Secure

  • Does the OWASP Scan pass on CircleCI?
  • Do manual code review and manual testing detect any new security issues?
  • If new issues detected, is investigation and/or remediation plan documented?

Deliverable 8: User Research

Research product(s) clearly articulate(s):

  • the purpose of the research
  • methods used to conduct the research
  • who participated in the research
  • what was tested and how
  • impact of research on TDP
  • (if applicable) final design mockups produced for TDP development

@jtimpe jtimpe self-assigned this May 5, 2026
"user": models.ForeignKey(
"users.User",
on_delete=models.CASCADE,
related_name="+",
Copy link
Copy Markdown
Author

@jtimpe jtimpe May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related_name="+" disables the reverse relationship, so you cannot access user.shadow_data_files like you can user.data_files - can change this if necessary - Django docs

from django.db import models


def create_shadow_model(
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subclassing the "production" model (TANF_T1 for example) actually creates a model with a foreign key pointer/relationship to the parent model, rather than copying the model fields into the new model. this helper copies all the fields and relationships, but doesn't preserve the @property methods. Since we're not using the python properties on the go side, i think that's okay, but something to be aware of - the shadow models are more limited.

Comment thread tdrs-services/parser/internal/storage/writer/writer.go Outdated
Comment thread Taskfile.yml Outdated

dump-parser-schema:
desc: Dump selected parser table CREATE TABLE statements to schema-tmp.sql
desc: Dump selected parser table CREATE TABLE statements to parser schema.sql, using GO_PARSER_SHADOW_MODE to select shadow or production tables
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kinda torn on this change (along with the generated schema.sql, query.sql, query.sql.go, and models.go). we don't really use any of these, it's more of a gut-check that changes to the python models have been appropriately represented in the go schemas. i don't know if that should check the shadow tables or not. the task switches based on the GO_PARSER_SHADOW_MODE flag, but the generated files are all shadow-specific. if it's not useful, i can revert it

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • keep original schema.sql versioned
  • unversion models.go
  • change task to dump to schema.temp.sql for sqlc diff
  • generate models.go and models.temp.go for sqlc diff

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more helpful to validate against schema yml definition and storage/writer/tanf|ssp|tribal record serializers (still two steps) - generate serializer based on yml file?

define parsedrecord as an interface (look at decoders/decoder.go and decoder/csv.go), put serializer on validationresult telling each record how to serialize itself

separate ticket?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we write a ticket up to tighten up our django migration validation against the Go schemas yet?

@jtimpe jtimpe added the raft review This issue is ready for raft review label May 6, 2026
Comment thread tdrs-backend/tdpservice/scheduling/parser_task.py Outdated
Comment thread tdrs-backend/tdpservice/data_files/views.py
current_app.send_task(
GO_PARSER_TASK_NAME,
args=[data_file_id],
queue=GO_PARSER_QUEUE,
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't handle reparses

Comment thread tdrs-services/parser/internal/server/celery/celery.go
Comment thread tdrs-services/parser/internal/server/celery/celery.go Outdated

var df DataFilesDatafile
var df ShadowDataFilesDatafile
err := pool.QueryRow(ctx, query, id).Scan(
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look into easier way to map the struct

@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

❌ Patch coverage is 98.82353% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.02%. Comparing base (60b5d6c) to head (f4ba65a).
⚠️ Report is 7 commits behind head on develop.

Files with missing lines Patch % Lines
tdrs-backend/tdpservice/common/shadow_models.py 90.00% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #5832      +/-   ##
===========================================
+ Coverage    93.98%   94.02%   +0.04%     
===========================================
  Files          538      543       +5     
  Lines        24618    24776     +158     
  Branches       620      620              
===========================================
+ Hits         23137    23296     +159     
+ Misses        1368     1367       -1     
  Partials       113      113              
Flag Coverage Δ
dev-backend 94.31% <98.82%> (+0.04%) ⬆️
dev-frontend 91.84% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...rvice/data_files/migrations/0027_shadowdatafile.py 100.00% <100.00%> (ø)
tdrs-backend/tdpservice/data_files/models.py 83.51% <100.00%> (+0.63%) ⬆️
...drs-backend/tdpservice/data_files/test/test_api.py 98.76% <100.00%> (+0.02%) ⬆️
tdrs-backend/tdpservice/data_files/views.py 93.75% <100.00%> (+0.14%) ⬆️
...ns/0017_shadowdatafilesummary_shadowparsererror.py 100.00% <100.00%> (ø)
tdrs-backend/tdpservice/parsers/models.py 88.04% <100.00%> (+0.40%) ⬆️
tdrs-backend/tdpservice/scheduling/parser_task.py 96.45% <100.00%> (+0.36%) ⬆️
...end/tdpservice/scheduling/test/test_parser_task.py 99.70% <100.00%> (+0.03%) ⬆️
...wprogramaudit_t1_shadowprogramaudit_t2_and_more.py 100.00% <100.00%> (ø)
...ckend/tdpservice/search_indexes/models/__init__.py 100.00% <100.00%> (ø)
... and 5 more

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8a6d79b...f4ba65a. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jtimpe jtimpe requested a review from elipe17 May 7, 2026 20:37
Comment thread tdrs-services/parser/internal/db/datafile.go Outdated
Comment thread tdrs-services/parser/internal/db/datafile.go Outdated
Comment thread tdrs-services/parser/internal/server/celery/celery.go Outdated
Comment thread tdrs-services/parser/internal/server/celery/celery.go Outdated

summaryStatusAccepted = "Accepted"
summaryStatusAcceptedWithErrors = "Accepted with Errors"
summaryStatusPartiallyAccepted = "Partially Accepted"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go uses "Partially Accepted" while Django enum is "Partially Accepted with Errors", will this cause any problem?

Copy link
Copy Markdown
Author

@jtimpe jtimpe May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe "Partially Accepted with Errors" is the friendly/detail name, but the enum/db value is Partially Accepted. You can see it encoded in this way in the original migration

('Partially Accepted with Errors', 'Partially Accepted')

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These statuses can largely be removed right? The go parser will only ever need to set a rejected status based on specific validation/exceptions right?

@jtimpe jtimpe requested review from elipe17 and raftmsohani May 14, 2026 13:31
if err != nil {
return fmt.Errorf("update %s result for datafile_id=%d: %w", tableName, datafileID, err)
}
totalCreatedInt4, err := int64ToInt4(totalCreated)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the conversion? Seems like we should just let the "total" counts be int32 to prevent the extra conversion steps. Do the compiled queries require the pgtype or can we give it the exact Go int32 and let the DB engine handle the conversion if needed?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is because sqlc generated the queries for a nullable integer column, which is pgtype.Int4. if the field was NOT NULL then it would generate an int32 and we wouldn't have to convert. there's a sqlc config option emit_pointers_for_null_types which might handle this, but i didn't test it out

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that is a nice override. Might save our code from any conversions in the future and push it off to the library or db engine. Will you write up an exploratory snack ticket for this?

Comment thread tdrs-services/parser/internal/db/datafile.go Outdated
@jtimpe jtimpe requested a review from elipe17 May 19, 2026 13:05
Copy link
Copy Markdown

@elipe17 elipe17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two requests for tickets, otherwise LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

raft review This issue is ready for raft review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Go Parser: Implement shadow table write mode

3 participants