Skip to content

Bug in result collector#60

Merged
branislav-jenco-4ss merged 15 commits into
mainfrom
bug_in_ResultCollector
Jan 28, 2026
Merged

Bug in result collector#60
branislav-jenco-4ss merged 15 commits into
mainfrom
bug_in_ResultCollector

Conversation

@heidi-holm-4ss
Copy link
Copy Markdown
Contributor

Added consistent encoding in the LocalFileHandler, and changed LocalFileHandler pull buffer handling

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug in the result collector related to encoding and buffer handling in the LocalFileHandler class. The changes ensure consistent UTF-8 encoding across file operations and properly manage the internal buffer state during pull operations.

Changes:

  • Added explicit UTF-8 encoding to file operations in LocalFileHandler and ResultCollector
  • Modified LocalFileHandler._pull() to properly reset buffer state before writing content
  • Added two new test cases to verify correct parsing behavior with the updated encoding handling

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
fourinsight/engineroom/utils/_core.py Added encoding parameter to file operations and buffer management in LocalFileHandler._pull() and ResultCollector methods
tests/test_core.py Added two test cases (test_error_parsing and test_error_parsing2) to validate the encoding fix
tests/testdata/drio_sdk_usage_mod.csv New test data file with UTF-8 characters and edge cases
tests/testdata/drio_sdk_usage_mod2.csv Additional test data file with UTF-8 characters for validation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_core.py Outdated
Comment thread tests/test_core.py Outdated
Comment thread fourinsight/engineroom/utils/_core.py Outdated
heidi-holm-4ss and others added 2 commits January 25, 2026 20:28
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Comment thread fourinsight/engineroom/utils/_core.py Outdated
@bjorn-einar-bjartnes-4ss
Copy link
Copy Markdown
Contributor

Note to self, check if this fix also applies to AzureBlobHandler

@bjorn-einar-bjartnes-4ss
Copy link
Copy Markdown
Contributor

Issue seems to be related to the stream truncating number of characters, but truncate is in bytes, not characters, so when we have UTF-8 with two bytes, we start truncating from the end and loose data
image

return f"LocalFileHandler {self._path.resolve()}"

def _pull(self):
return self.write(open(self._path, mode="r").read())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This return that was removed, made the truncate function in the base class truncate differently.

@bjorn-einar-bjartnes-4ss
Copy link
Copy Markdown
Contributor

@bjorn-einar-bjartnes-4ss
Copy link
Copy Markdown
Contributor

The one failing test missing is due to not finding the file is meant to leave something untouched, I think. I guess we could check that the source exists before we truncate the stream or something.

Comment thread fourinsight/engineroom/utils/_core.py Outdated
else:
self.truncate(characters_written)
self.flush()
pos = self.tell()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just use tell to ask the stream where we are, and use that for truncation. I don't fully understand it but I think the flush is important so that the underlying buffer has been written to fully.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible the flush isn't necessary.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better safe than sorry, I'll rather flush one more too often in this case.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, it's probably unnecessary in case of files, but reading via streaming from Azure, there it might be important.

self.truncate(characters_written)
self.flush()
pos = self.tell()
self.truncate(pos)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, smart

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@branislav-jenco-4ss branislav-jenco-4ss merged commit 2a6bdf5 into main Jan 28, 2026
9 checks passed
@branislav-jenco-4ss branislav-jenco-4ss deleted the bug_in_ResultCollector branch January 28, 2026 08:01
Comment thread fourinsight/engineroom/utils/_core.py Outdated
content = f.read()
self.seek(0)
self.truncate(0)
self.write(content)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the return was lost

@heidi-holm-4ss
Copy link
Copy Markdown
Contributor Author

heidi-holm-4ss commented Jan 28, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants