Skip to content

fix: use utf-8 for local tracking text files#790

Open
Ghraven wants to merge 2 commits into
apache:mainfrom
Ghraven:fix/local-backend-annotation-utf8
Open

fix: use utf-8 for local tracking text files#790
Ghraven wants to merge 2 commits into
apache:mainfrom
Ghraven:fix/local-backend-annotation-utf8

Conversation

@Ghraven
Copy link
Copy Markdown

@Ghraven Ghraven commented May 26, 2026

Summary

Thanks for maintaining Burr. This PR makes the local tracking backend text-file reads and writes use explicit UTF-8 encoding.

What changed

  • Adds encoding="utf-8" to annotation JSONL reads and writes in LocalBackend
  • Adds encoding="utf-8" when reading children.jsonl
  • Leaves binary log/metadata reads unchanged

Before / after

Before, these JSONL text files used the platform default encoding, which can vary on Windows and other non-UTF-8 locale environments.

After, the local backend reads and writes these text files consistently as UTF-8.

Verification

  • python -m py_compile burr/tracking/server/backend.py
  • git diff --check

@github-actions github-actions Bot added the area/tracking Telemetry, tracing, OpenTelemetry label May 26, 2026
@andreahlert andreahlert added the kind/bug Something is broken label May 26, 2026
Copy link
Copy Markdown
Collaborator

@andreahlert andreahlert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. One spot in the same file looks missed: backend.py:524 reads graph.json text-mode without encoding="utf-8". Same class of bug, and client writes it as UTF-8 (client.py:420).

async with aiofiles.open(graph_file, encoding="utf-8") as f:

Could you include that line in this PR as well?

@Ghraven
Copy link
Copy Markdown
Author

Ghraven commented May 27, 2026

Thank you for catching that missed spot. I added the same explicit encoding="utf-8" to the graph.json read in LocalBackend and pushed the update. I verified it with python -m py_compile burr/tracking/server/backend.py and git diff --check.

@skrawcz
Copy link
Copy Markdown
Contributor

skrawcz commented May 28, 2026

This looks like the right fix, but since this is correcting platform-dependent file encoding behavior, I think it would be worth adding a regression test that writes and reads non-ASCII content through LocalBackend (annotations / graph / children) to prove UTF-8 handling end-to-end and prevent missed call sites. Could you add that please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/tracking Telemetry, tracing, OpenTelemetry kind/bug Something is broken

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants