Skip to content

Support checksums and conditional loading for a given CSV row #139

@mseaton

Description

@mseaton

Currently, Iniz is able to determine whether a domain resource has changes based on computing checksums at the file level. This works well to reduce the startup time and ensure metadata isn't being re-saved unnecessarily each time Initializer runs.

However, once a file is determined to have changed, there is no further granular checking done to determine what entities defined in the file have actually changed. Given most of these files are CSV files, where one row within the CSV file represents one entity, it is likely quite possible to introduce a finer-level of granularity, and to maintain information to determine which specific rows in a domain CSV file should be processed. By introducing such capabilities at the level of the Base CSV Processor, we may be able to introduce such a feature for the majority of domains without much effort.

The idea would be to support having a checksum generated for each row in a given csv file that could be used to determine if that row has been changed. An additional file under configuration_checksums/domain that contains a properties file of = would suffice. This would only generate and verify a checksum if a Uuid has been specified on a given row. During processing of a given csv line, the uuid would be retrieved, and if it is not null, a checksum of the entire row would be generated and compared against the last saved checksum for that row, and if it is different then the row would be processed and the new checksum would be saved. Any changes to the header row would also need to be accounted for.

In particular, this would likely speed up processing of domains when there is a large amount of metadata (eg. it could better enable maintaining a single concepts.csv for a dictionary), and would allow better logging of exactly what metadata has been updated in a given update.

Interested in others thoughts about the usefulness and viability of this proposal @mks-d / @Ruhanga / @rbuisson / @brandones / @mogoodrich

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions