Skip to content

magentic-one-cli: pin utf-8 encoding when loading YAML config (refs #5566)#7722

Open
adv0r wants to merge 1 commit into
microsoft:mainfrom
adv0r:tokenburn/autogen-5566-m1cli
Open

magentic-one-cli: pin utf-8 encoding when loading YAML config (refs #5566)#7722
adv0r wants to merge 1 commit into
microsoft:mainfrom
adv0r:tokenburn/autogen-5566-m1cli

Conversation

@adv0r
Copy link
Copy Markdown

@adv0r adv0r commented May 20, 2026

Dear maintainer — this PR has a permanent home with methodology + opt-out at tokens-for-good. A one-line "no thanks" → auto-close + blacklist. Sorry for the notification this edit caused.


Why

Refs #5566.

#5566 reported m1 crashing with
UnicodeDecodeError: 'cp950' codec can't decode byte ... on a non-UTF-8
default locale (Traditional Chinese Windows). The original call site
(playwright_controller.py) was fixed in #6094, but the reporter
flagged that there were "similar issues in the codebase while using
open function"
, and the issue was kept open to track the follow-up
sweep. This PR fixes the next two call sites on the same code path the
user actually hits when they type m1 ... on Windows.

What

python/packages/magentic-one-cli/src/magentic_one_cli/_m1.py loads its
YAML config in two places, both via open(..., \"r\") with no
encoding=:

Line Source Used for
100 DEFAULT_CONFIG_FILE default ~/.magentic_one_config.yaml
105 args.config[…] user-supplied --config <path>

On non-UTF-8 locales (cp950, cp1252, …) any non-ASCII byte in the
config (comments in CJK, accented paths, …) would raise
UnicodeDecodeError before the CLI gets to build an agent.

This PR pins encoding=\"utf-8\" on both, matching YAML 1.2's default
stream encoding.

- with open(DEFAULT_CONFIG_FILE, \"r\") as f:
+ with open(DEFAULT_CONFIG_FILE, \"r\", encoding=\"utf-8\") as f:

- with open(args.config if isinstance(args.config, str) else args.config[0], \"r\") as f:
+ with open(args.config if isinstance(args.config, str) else args.config[0], \"r\", encoding=\"utf-8\") as f:

Scope

Intentionally only these two call sites:

  • Same package as the original crash (magentic-one-cli), same code
    path the bug report hits.
  • A blanket sweep across python/packages/ would touch >40 files
    including test fixtures and benchmark scenario scripts that read
    JSONL produced by the agents themselves — forcing UTF-8 there could
    in theory mask issues. Better to land focused, then iterate.

Verification

  • AST parses cleanly.
  • Only two lines changed; no functional change for already-UTF-8-locale
    users.

AI-assisted via Cursor (Claude Opus 4.7). Personal token-burn
initiative by @adv0r to use up an expiring Cursor subscription budget on
small, useful upstream contributions.

Made with Cursor

Refs microsoft#5566.

The original report (microsoft#5566) was that `m1` crashed with
`UnicodeDecodeError: 'cp950' codec can't decode byte ...` when loading
`page_script.js` on a non-UTF-8 default locale (Traditional Chinese
Windows, cp950). That specific call site was fixed in microsoft#6094.

The reporter noted at the time: *"there will be some similar issues in
the codebase while using open function"*, and the issue stayed open
explicitly to track that follow-up. This PR fixes the next call sites
on the same code path the user actually hits when they type `m1 ...` on
Windows.

`magentic_one_cli/_m1.py` opens the YAML config file in two places:

- the default `~/.magentic_one_config.yaml` (line 100)
- the user-supplied `--config <path>` (line 105)

Both used `open(..., "r")` with no `encoding=`, so on a non-UTF-8
locale (cp950, cp1252, etc.) a config containing any non-ASCII byte
(comments in CJK, accented paths, …) would raise `UnicodeDecodeError`
before the CLI even got to construct an agent.

This change adds `encoding="utf-8"` to both `open()` calls. YAML 1.2
mandates UTF-8 as the default encoding for YAML streams, so pinning
UTF-8 on the reader matches what users are already writing.

Why only two call sites, and not a repo-wide sweep:
- Keeps the diff reviewable.
- Same package as the original crash (`magentic-one-cli`), same code
  path the bug report hits.
- A blanket sweep across `python/packages/` would touch >40 files
  (incl. test fixtures and benchmark scenario scripts that read JSONL
  produced by the agents themselves, where forcing UTF-8 could in
  theory mask issues). Better to land focused, then iterate.

No behaviour change for already-UTF-8-locale users.

AI-assisted via Cursor (Claude Opus 4.7). Personal token-burn
initiative by @adv0r to use up an expiring Cursor subscription budget
on small, useful upstream contributions.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant