Skip to content

#139 - Support checksums and conditional loading for a given CSV row#322

Merged
mseaton merged 2 commits into
mekomsolutions:mainfrom
mseaton:issue-139
May 5, 2026
Merged

#139 - Support checksums and conditional loading for a given CSV row#322
mseaton merged 2 commits into
mekomsolutions:mainfrom
mseaton:issue-139

Conversation

@mseaton

@mseaton mseaton commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces opt-in per-row checksum tracking for CSV loaders so that, when enabled, unchanged rows can be skipped and only changed rows are processed on subsequent loads.

Changes:

  • Add initializer.row.checksums.enabled configuration and expose it via InitializerConfig.
  • Implement row-level checksum computation and persistence (*.rows.checksum) in ConfigDirUtil, and use it in BaseCsvLoader to skip unchanged rows and persist hashes for successfully processed rows.
  • Add unit/integration tests to validate row checksum stability and basic sidecar-file behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
api/src/main/java/org/openmrs/module/initializer/api/loaders/BaseCsvLoader.java Skips unchanged CSV rows based on previously persisted row hashes; persists new row hashes after processing.
api/src/main/java/org/openmrs/module/initializer/api/ConfigDirUtil.java Adds row checksum computation plus read/write/delete support for per-row checksum sidecar files.
api/src/main/java/org/openmrs/module/initializer/InitializerConstants.java Adds the initializer.row.checksums.enabled property constant.
api/src/main/java/org/openmrs/module/initializer/InitializerConfig.java Loads and exposes the new row-checksums-enabled flag.
api/src/test/java/org/openmrs/module/initializer/api/ConfigDirUtilTest.java Adds tests for row checksum stability and disk round-tripping.
api/src/test/java/org/openmrs/module/initializer/api/LocationsLoaderRowChecksumIntegrationTest.java Adds an integration test around row-checksum sidecar creation and reading.
Comments suppressed due to low confidence (1)

api/src/main/java/org/openmrs/module/initializer/api/loaders/BaseCsvLoader.java:156

  • When per-row checksums are enabled and some/all rows are skipped, totalCount still reflects all CSV rows, but the loader may process/save fewer (or zero) rows in this run. This makes the success log message misleading (e.g., it can claim all totalCount entities were saved even if all rows were skipped). Track and log a separate processed/saved count (and optionally skipped count) when row checksums are enabled.
		if (isEmpty(result.getFailingLines())) {
			log.info(file.getName() + " ('" + getDomainName() + "' domain) was entirely successfully processed.");
			log.info(totalCount + " entities were saved.");
			return;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mseaton mseaton requested a review from ibacher April 29, 2026 18:58
@mseaton mseaton linked an issue Apr 29, 2026 that may be closed by this pull request

@mogoodrich mogoodrich left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @mseaton ! Though maybe (ask Claude to) update the README with how to turn this on? Is enabling an all-or-nothing thing, or can we enable it just for certain domain objects (probably fine if all or nothing, just wondering).

And when upgrading to this method, all iniz domains using the new method will need to be reloaded (again, not a blocker/issue but just noting).

@mseaton

mseaton commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator Author

update the README with how to turn this on

Good point @mogoodrich - I have added a bullet point to the release notes and to the runtime properties README configuration.

And when upgrading to this method, all iniz domains using the new method will need to be reloaded (again, not a blocker/issue but just noting)

No. The first guard is still the file-level checksum. As long as a file hasn't changed, it won't need to be reloaded. If a file is changed and this is enabled, then it will reload everything in that file if no row-level checksums have previously been generated. Then, on subsequent changes it will be able to identify the specific changed rows. If one wanted to "preload" all of the row-level changes to guard against performance issues in the future, one could force a reload of all domains after enabling this at a predictable time so that any future updates and restarts would benefit from these improvements.

@mseaton

mseaton commented May 4, 2026

Copy link
Copy Markdown
Collaborator Author

I plan to merge this PR later today unless I hear otherwise @ibacher or others...

@mogoodrich

Copy link
Copy Markdown
Contributor

No. The first guard is still the file-level checksum. As long as a file hasn't changed, it won't need to be reloaded. If a file is changed and this is enabled, then it will reload everything in that file if no row-level checksums have previously been generated. Then, on subsequent changes it will be able to identify the specific changed rows. If one wanted to "preload" all of the row-level changes to guard against performance issues in the future, one could force a reload of all domains after enabling this at a predictable time so that any future updates and restarts would benefit from these improvements.

Sounds great.

@mogoodrich

Copy link
Copy Markdown
Contributor

I plan to merge this PR later today unless I hear otherwise @ibacher or others...

Fine with me

@mseaton mseaton merged commit ca0ef6f into mekomsolutions:main May 5, 2026
2 checks passed
@mseaton mseaton deleted the issue-139 branch May 5, 2026 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support checksums and conditional loading for a given CSV row

3 participants