#139 - Support checksums and conditional loading for a given CSV row by mseaton · Pull Request #322 · mekomsolutions/openmrs-module-initializer

mseaton · 2026-04-29T15:50:57Z

No description provided.

…given CSV row

Copilot

Pull request overview

This PR introduces opt-in per-row checksum tracking for CSV loaders so that, when enabled, unchanged rows can be skipped and only changed rows are processed on subsequent loads.

Changes:

Add initializer.row.checksums.enabled configuration and expose it via InitializerConfig.
Implement row-level checksum computation and persistence (*.rows.checksum) in ConfigDirUtil, and use it in BaseCsvLoader to skip unchanged rows and persist hashes for successfully processed rows.
Add unit/integration tests to validate row checksum stability and basic sidecar-file behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
api/src/main/java/org/openmrs/module/initializer/api/loaders/BaseCsvLoader.java	Skips unchanged CSV rows based on previously persisted row hashes; persists new row hashes after processing.
api/src/main/java/org/openmrs/module/initializer/api/ConfigDirUtil.java	Adds row checksum computation plus read/write/delete support for per-row checksum sidecar files.
api/src/main/java/org/openmrs/module/initializer/InitializerConstants.java	Adds the `initializer.row.checksums.enabled` property constant.
api/src/main/java/org/openmrs/module/initializer/InitializerConfig.java	Loads and exposes the new row-checksums-enabled flag.
api/src/test/java/org/openmrs/module/initializer/api/ConfigDirUtilTest.java	Adds tests for row checksum stability and disk round-tripping.
api/src/test/java/org/openmrs/module/initializer/api/LocationsLoaderRowChecksumIntegrationTest.java	Adds an integration test around row-checksum sidecar creation and reading.

Comments suppressed due to low confidence (1)

api/src/main/java/org/openmrs/module/initializer/api/loaders/BaseCsvLoader.java:156

When per-row checksums are enabled and some/all rows are skipped, totalCount still reflects all CSV rows, but the loader may process/save fewer (or zero) rows in this run. This makes the success log message misleading (e.g., it can claim all totalCount entities were saved even if all rows were skipped). Track and log a separate processed/saved count (and optionally skipped count) when row checksums are enabled.

		if (isEmpty(result.getFailingLines())) {
			log.info(file.getName() + " ('" + getDomainName() + "' domain) was entirely successfully processed.");
			log.info(totalCount + " entities were saved.");
			return;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mogoodrich

LGTM, thanks @mseaton ! Though maybe (ask Claude to) update the README with how to turn this on? Is enabling an all-or-nothing thing, or can we enable it just for certain domain objects (probably fine if all or nothing, just wondering).

And when upgrading to this method, all iniz domains using the new method will need to be reloaded (again, not a blocker/issue but just noting).

mseaton · 2026-04-29T22:43:47Z

update the README with how to turn this on

Good point @mogoodrich - I have added a bullet point to the release notes and to the runtime properties README configuration.

And when upgrading to this method, all iniz domains using the new method will need to be reloaded (again, not a blocker/issue but just noting)

No. The first guard is still the file-level checksum. As long as a file hasn't changed, it won't need to be reloaded. If a file is changed and this is enabled, then it will reload everything in that file if no row-level checksums have previously been generated. Then, on subsequent changes it will be able to identify the specific changed rows. If one wanted to "preload" all of the row-level changes to guard against performance issues in the future, one could force a reload of all domains after enabling this at a predictable time so that any future updates and restarts would benefit from these improvements.

mseaton · 2026-05-04T13:24:48Z

I plan to merge this PR later today unless I hear otherwise @ibacher or others...

mogoodrich · 2026-05-04T13:29:50Z

No. The first guard is still the file-level checksum. As long as a file hasn't changed, it won't need to be reloaded. If a file is changed and this is enabled, then it will reload everything in that file if no row-level checksums have previously been generated. Then, on subsequent changes it will be able to identify the specific changed rows. If one wanted to "preload" all of the row-level changes to guard against performance issues in the future, one could force a reload of all domains after enabling this at a predictable time so that any future updates and restarts would benefit from these improvements.

Sounds great.

mogoodrich · 2026-05-04T13:30:01Z

I plan to merge this PR later today unless I hear otherwise @ibacher or others...

Fine with me

mekomsolutions#139 - Support checksums and conditional loading for a …

4ee31cc

…given CSV row

mseaton requested a review from Copilot April 29, 2026 15:50

Copilot started reviewing on behalf of mseaton April 29, 2026 15:51 View session

Copilot AI reviewed Apr 29, 2026

View reviewed changes

Comment thread .../test/java/org/openmrs/module/initializer/api/LocationsLoaderRowChecksumIntegrationTest.java

Comment thread api/src/main/java/org/openmrs/module/initializer/api/loaders/BaseCsvLoader.java

mseaton requested a review from ibacher April 29, 2026 18:58

mseaton linked an issue Apr 29, 2026 that may be closed by this pull request

Support checksums and conditional loading for a given CSV row #139

Closed

mogoodrich approved these changes Apr 29, 2026

View reviewed changes

Update readme

1d6bd0f

mseaton merged commit ca0ef6f into mekomsolutions:main May 5, 2026
2 checks passed

mseaton deleted the issue-139 branch May 5, 2026 20:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

#139 - Support checksums and conditional loading for a given CSV row#322

#139 - Support checksums and conditional loading for a given CSV row#322
mseaton merged 2 commits into
mekomsolutions:mainfrom
mseaton:issue-139

mseaton commented Apr 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

mogoodrich left a comment

Uh oh!

mseaton commented Apr 29, 2026

Uh oh!

mseaton commented May 4, 2026

Uh oh!

mogoodrich commented May 4, 2026

Uh oh!

mogoodrich commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

mseaton commented Apr 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

mogoodrich left a comment

Choose a reason for hiding this comment

Uh oh!

mseaton commented Apr 29, 2026

Uh oh!

mseaton commented May 4, 2026

Uh oh!

mogoodrich commented May 4, 2026

Uh oh!

mogoodrich commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants