Sanitize peer-supplied mDNS labels in discover CLI by bdraco · Pull Request #1689 · esphome/aioesphomeapi

bdraco · 2026-05-20T13:45:23Z

What does this implement/fix?

mDNS service properties (name, mac, version, platform, board) come from anything on the LAN that wants to broadcast _esphomelib._tcp.local.; the discover CLI was decoding them with bare data.decode() (which raises on non-UTF-8) and printing them straight to stdout with no control-character filter. A hostile broadcaster could clear or reflow the operator's terminal by stuffing ANSI escapes or newlines into the mac / version / platform / board field, or break the column-aligned table with an oversized value.

Route every peer-supplied label through safe_label_str (the same sanitizer the noise hello and protobuf hello paths already use), use a "replace" UTF-8 error handler so a malformed payload no longer raises out of the zeroconf callback, and apply per-column length caps that match the discover output FORMAT widths so a long value can't break alignment either.

To keep the sanitizer importable from both the discover CLI and the noise frame helper without dragging the framing module's transitive deps in, the caps and safe_label_str move to a dedicated aioesphomeapi/_sanitize.py (with _sanitize.pxd for cimport on the noise hot path). _frame_helper/base.py re-exports the names so existing from .base import MAX_NAME_LEN callers keep working unchanged.

Same defensive surface as #1664, #1660, #1656; mDNS is just a different peer-controlled channel feeding the same kind of output.

Testing: tests/test_discover.py covers the None / str / bytes paths, invalid UTF-8 (must not raise), ANSI / newline / null / tab stripping, length capping, and non-ASCII printable survival. Full suite green locally (1092 passed, 2 skipped); Cython build with REQUIRE_CYTHON=1 rebuilds cleanly.

This finishes the work originally opened as #1669; the CodSpeed benchmarks for the sanitize hot path were lifted to #1687 and the unrelated # type: ignore lint fix to #1688 so the change set here stays focused on the discover surface.

Types of changes

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Code quality improvements to existing code or addition of tests
Other

Related issue or feature (if applicable):

supersedes Sanitize peer-supplied mDNS labels in discover CLI #1669

Pull request in esphome (if applicable):

n/a, no protocol changes.

Checklist:

The code change is tested and works locally.
If api.proto was modified, a linked pull request has been made to esphome with the same changes.
Tests have been added to verify that the new code works (under tests/ folder).

mDNS service properties (name, mac, version, platform, board) come from anything on the LAN that wants to broadcast _esphomelib._tcp.local. The discover CLI was passing those bytes through `data.decode()` (raising on non-UTF-8 input) and printing them straight to stdout with no control-character filter. A hostile broadcaster could clear or reflow the operator's terminal by stuffing ANSI escapes / newlines into the mac or version field. Route the values through `safe_label_str` (the same sanitizer used on the noise hello + protobuf hello paths) and use a "replace" error handler on the UTF-8 decode so an invalid byte sequence no longer breaks the zeroconf callback. Per-column length caps match the discover output FORMAT widths so a long value can't break the table alignment either. To keep the sanitizer importable from both the discover CLI and the noise frame helper without dragging the framing module's transitive deps, the caps and `safe_label_str` move to a dedicated `aioesphomeapi/_sanitize.py` (with `_sanitize.pxd` for cdef cimports on the noise hot path). `_frame_helper/base.py` keeps the old import points working via re-export so existing callers stay unchanged. Tests added for the discover decode path (None, str, bytes, invalid UTF-8, control-char stripping, length cap, non-ASCII printable survival). Same defensive surface as PRs #1664 / #1660 / #1656, applied to the mDNS channel.

codecov · 2026-05-20T13:47:04Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (1868e05) to head (5bce804).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff            @@
##              main     #1689   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           25        26    +1     
  Lines         4159      4165    +6     
=========================================
+ Hits          4159      4165    +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codspeed-hq · 2026-05-20T13:47:46Z

Merging this PR will not alter performance

✅ 47 untouched benchmarks

_{Comparing discover-sanitize-mdns-labels (5bce804) with main (533d608)}

bluetoothbot · 2026-05-20T14:06:15Z

PR Review — Sanitize peer-supplied mDNS labels in discover CLI

Solid, focused hardening of the discover CLI. The same safe_label_str sanitizer + per-column caps that already protect the noise hello / handshake-reject paths now cover mDNS-supplied labels, the bare data.decode() is replaced with a replace-error-handler decode so a malformed payload no longer raises out of the zeroconf callback, and the _sanitize.py extraction is a clean dependency cut (no transitive framing deps pulled into the CLI). Tests cover the documented threat surface (UTF-8 failure, ANSI/newline/null/tab, length cap, unicode survival) and codspeed confirms no hot-path regression. Only suggestions — no blocking issues: (1) decode_bytes_or_unknown is now a sanitizer, the name no longer matches; (2) the per-column caps duplicate the FORMAT widths with no mechanical link; (3) the invalid-UTF-8 test only checks the return type rather than pinning the replacement-char contract; (4) _sanitize.py would benefit from a docstring and __all__. None of these block merge.

🟢 Suggestions

1. Function name no longer reflects behaviour (`aioesphomeapi/discover.py`, L16-22)

decode_bytes_or_unknown now does three things — decode, strip non-printables, length-cap — but the name still reads like a pure decoder. A reader importing it will be surprised that passing a clean ASCII string still mutates it (trailing whitespace, characters past limit, etc.). Consider renaming to something like decode_mdns_label_or_unknown or sanitize_property_or_unknown. Since it's a public helper (no underscore) the rename can keep a thin alias for one cycle if you're worried about external callers, but discover.py doesn't really have an external surface other than the CLI so a straight rename is probably fine.

def decode_bytes_or_unknown(data: str | bytes | None, limit: int = MAX_NAME_LEN) -> str:
    """Decode peer-supplied mDNS bytes, strip non-printables, length-cap."""

2. Per-column caps drift silently if FORMAT changes (`aioesphomeapi/discover.py`, L17-22)

_MAX_VERSION_LEN = 16 / _MAX_PLATFORM_LEN = 10 / _MAX_BOARD_LEN = 32 are hand-derived from the widths inside FORMAT. The comment says they 'match the FORMAT column widths' but nothing enforces this — if someone widens the Platform column in FORMAT from <10 to <12 they will not realise the cap also needs to move and the table will silently start to break alignment again.

Low-cost defensive options:

Parse the widths out of FORMAT once at module load: e.g. derive a tuple from re.findall(r'<(\d+)', FORMAT) and assert the order/length matches COLUMN_NAMES, then index into it.
Or just add a unit test that pins (MAX_NAME_LEN, _MAX_VERSION_LEN, _MAX_PLATFORM_LEN, _MAX_BOARD_LEN) against the widths in FORMAT so a future widener gets a red CI signal.

Not blocking — the change as-is is strictly better than the status quo.

_MAX_VERSION_LEN = 16
_MAX_PLATFORM_LEN = 10
_MAX_BOARD_LEN = 32

3. Invalid-UTF-8 test only checks the type, not the contract (`tests/test_discover.py`, L19-23)

test_decode_bytes_or_unknown_invalid_utf8_does_not_raise asserts only isinstance(result, str), which is already guaranteed by the return annotation. The test passes even if a future refactor decides to drop the replace handler and return UNKNOWN on decode failure — silently changing observable behaviour without a failing test.

Since b"\xff\xfe".decode("utf-8", "replace") yields "\ufffd\ufffd" and str.isprintable("\ufffd") is True, the contract you actually want to pin is something like result == "\ufffd\ufffd" (or len(result) == 2 and all(c == "\ufffd" for c in result)). That keeps the property test (no raise) while also pinning that bad bytes survive to the output rather than being silently swallowed.

result = decode_bytes_or_unknown(b"\xff\xfe")
assert isinstance(result, str)

Checklist

Summary

Solid, focused hardening of the discover CLI. The same safe_label_str sanitizer + per-column caps that already protect the noise hello / handshake-reject paths now cover mDNS-supplied labels, the bare data.decode() is replaced with a replace-error-handler decode so a malformed payload no longer raises out of the zeroconf callback, and the _sanitize.py extraction is a clean dependency cut (no transitive framing deps pulled into the CLI). Tests cover the documented threat surface (UTF-8 failure, ANSI/newline/null/tab, length cap, unicode survival) and codspeed confirms no hot-path regression. Only suggestions — no blocking issues: (1) decode_bytes_or_unknown is now a sanitizer, the name no longer matches; (2) the per-column caps duplicate the FORMAT widths with no mechanical link; (3) the invalid-UTF-8 test only checks the return type rather than pinning the replacement-char contract; (4) _sanitize.py would benefit from a docstring and __all__. None of these block merge.

Automated review by Kōanabf3c2a

Copilot

Pull request overview

This PR hardens aioesphomeapi-discover against hostile peer-supplied mDNS fields by ensuring all displayed labels are UTF-8 decoded safely, stripped of control characters, and length-capped to prevent terminal/log injection and table misalignment. It also extracts the shared sanitizer into a dedicated module so both the discover CLI and Cythonized noise path can reuse it.

Changes:

Add aioesphomeapi/_sanitize.py (+ .pxd) with safe_label_str and MAX_* caps; re-export from _frame_helper/base.py for backwards compatibility.
Sanitize all peer-controlled discover CLI fields (service name + mDNS properties) with UTF-8 errors="replace" and per-field length caps.
Add unit tests for decode_bytes_or_unknown covering invalid UTF-8, control stripping, and length capping.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`aioesphomeapi/discover.py`	Decode bytes with `errors="replace"` and sanitize/cap peer-controlled labels before printing.
`aioesphomeapi/_sanitize.py`	Introduce shared sanitizer + MAX_* constants in a lightweight module.
`aioesphomeapi/_sanitize.pxd`	Provide a Cython `cpdef` signature for `safe_label_str` for hot paths.
`aioesphomeapi/_frame_helper/base.py`	Import and re-export sanitizer/constants; keep `_MAX_*` cdef aliases.
`aioesphomeapi/_frame_helper/base.pxd`	Switch Cython import of `safe_label_str` to `.._sanitize`.
`aioesphomeapi/_frame_helper/noise.py`	Update imports to use the new sanitizer module.
`aioesphomeapi/_frame_helper/noise.pxd`	Switch Cython `safe_label_str` import to `.._sanitize`.
`setup.py`	Add `_sanitize.py` to the Cythonized extension list.
`tests/test_discover.py`	Add coverage for sanitizing/decoding behavior used by discover.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Derive per-column display caps from FORMAT widths via regex so the caps stay in lock-step with the table layout; an assertion at module load pins COLUMN_NAMES against the derived widths so a future FORMAT edit surfaces immediately. This also corrects the MAC cap that was set to MAX_MAC_LEN (16) while the FORMAT slot was only 12 chars wide, which would have let a 13-16 char MAC widen the column past its slot. - Rename `decode_bytes_or_unknown` to `decode_mdns_label_or_unknown` since the helper now decodes, strips non-printables and length-caps; the new name reflects all three behaviors. - Pin the invalid-UTF-8 contract by asserting the actual U+FFFD output instead of just `isinstance(result, str)`, so a future refactor that silently swaps the "replace" handler for UNKNOWN-or-empty trips a red test. - Add a `test_per_column_caps_match_format_widths` regression test that re-derives the widths from FORMAT and checks every cap, giving a second mechanical pin alongside the module-load assertion. - Add a module docstring and `__all__` to `_sanitize.py` so importers see the intent of the module and `from _sanitize import *` is bounded.

coderabbitai · 2026-05-20T14:56:53Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f8b6a5bf-ecd6-4648-ac64-5a98fba29756

📥 Commits

Reviewing files that changed from the base of the PR and between 533d608 and 5bce804.

📒 Files selected for processing (9)

aioesphomeapi/_frame_helper/base.pxd
aioesphomeapi/_frame_helper/base.py
aioesphomeapi/_frame_helper/noise.pxd
aioesphomeapi/_frame_helper/noise.py
aioesphomeapi/_sanitize.pxd
aioesphomeapi/_sanitize.py
aioesphomeapi/discover.py
setup.py
tests/test_discover.py

Walkthrough

The PR centralizes mDNS label input sanitization by creating a shared _sanitize module, refactoring Cython exports to use it, and applying the sanitizer to discovery service fields to prevent terminal and log injection attacks from malformed peer labels.

Changes

Input Sanitization and Discovery Label Protection

Layer / File(s)	Summary
Sanitizer module and Cython declaration `aioesphomeapi/_sanitize.py`, `aioesphomeapi/_sanitize.pxd`	New `_sanitize` module exports `safe_label_str()` that removes non-printable characters and truncates strings, plus constants `MAX_NAME_LEN`, `MAX_MAC_LEN`, `MAX_EXPLANATION_LEN`. Cython declaration file exposes the function as `cpdef` with typed parameters.
Frame helper import refactoring `aioesphomeapi/_frame_helper/base.pxd`, `aioesphomeapi/_frame_helper/base.py`, `aioesphomeapi/_frame_helper/noise.pxd`, `aioesphomeapi/_frame_helper/noise.py`	`base.pxd` cimports `safe_label_str` from `_sanitize` instead of declaring locally; `base.py` imports constants and function from `_sanitize` and re-exports via `__all__`. `noise.pxd` and `noise.py` source utilities directly from `_sanitize`, with updated comments describing the pure-Python fallback behavior.
Discovery label sanitization and length enforcement `aioesphomeapi/discover.py`	Introduces `decode_mdns_label_or_unknown()` replacing `decode_bytes_or_unknown()`, which decodes bytes with UTF-8 replacement and applies `safe_label_str()` sanitization. Per-column display caps derived from `FORMAT` width are applied to service name prefix and mDNS properties (`mac`, `version`, `platform`, `board`) in `async_service_update()`.
Cythonization setup and sanitizer validation `setup.py`, `tests/test_discover.py`	`_sanitize.py` added to `TO_CYTHONIZE` list for optional Cython compilation. Comprehensive test suite validates `decode_mdns_label_or_unknown()` across input types (None, strings, bytes), UTF-8 handling, control character stripping, and limit enforcement; asserts per-column display caps match module constants.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

esphome/aioesphomeapi#1664: Adds unit tests directly exercising the safe_label_str sanitization function and the mDNS discovery label handling that this PR centralizes.
esphome/aioesphomeapi#1656: Refactors Noise hello field sanitization using the same noise.py/pxd import wiring and shared MAX_* length caps that this PR consolidates.
esphome/aioesphomeapi#1660: Refactors Cython sanitizer plumbing for peer labels, specifically safe_label_str declarations and MAX constants in base.pxd/base.py that this PR centralizes into _sanitize.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch discover-sanitize-mdns-labels

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

bdraco requested a review from Copilot May 20, 2026 14:03

Copilot started reviewing on behalf of bdraco May 20, 2026 14:04 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

Comment thread aioesphomeapi/discover.py Outdated

bdraco marked this pull request as ready for review May 20, 2026 14:55

bdraco merged commit dcd9453 into main May 20, 2026
18 checks passed

bdraco deleted the discover-sanitize-mdns-labels branch May 20, 2026 14:55

bdraco mentioned this pull request May 20, 2026

Sanitize peer-supplied mDNS labels in discover CLI #1669

Closed

8 tasks

bdraco mentioned this pull request May 20, 2026

Sanitize peer-supplied mDNS labels in discover CLI esphome/device-builder#911

Merged

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sanitize peer-supplied mDNS labels in discover CLI#1689

Sanitize peer-supplied mDNS labels in discover CLI#1689
bdraco merged 2 commits into
mainfrom
discover-sanitize-mdns-labels

bdraco commented May 20, 2026

Uh oh!

codecov Bot commented May 20, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented May 20, 2026 •

edited

Loading

Uh oh!

bluetoothbot commented May 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented May 20, 2026

Review failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

bdraco commented May 20, 2026

What does this implement/fix?

Types of changes

Checklist:

Uh oh!

codecov Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

bluetoothbot commented May 20, 2026

PR Review — Sanitize peer-supplied mDNS labels in discover CLI

🟢 Suggestions

Checklist

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented May 20, 2026

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented May 20, 2026 •

edited

Loading

codspeed-hq Bot commented May 20, 2026 •

edited

Loading