Skip to content

Sanitize peer-supplied mDNS labels in discover CLI#1689

Merged
bdraco merged 2 commits into
mainfrom
discover-sanitize-mdns-labels
May 20, 2026
Merged

Sanitize peer-supplied mDNS labels in discover CLI#1689
bdraco merged 2 commits into
mainfrom
discover-sanitize-mdns-labels

Conversation

@bdraco
Copy link
Copy Markdown
Member

@bdraco bdraco commented May 20, 2026

What does this implement/fix?

mDNS service properties (name, mac, version, platform, board) come from anything on the LAN that wants to broadcast _esphomelib._tcp.local.; the discover CLI was decoding them with bare data.decode() (which raises on non-UTF-8) and printing them straight to stdout with no control-character filter. A hostile broadcaster could clear or reflow the operator's terminal by stuffing ANSI escapes or newlines into the mac / version / platform / board field, or break the column-aligned table with an oversized value.

Route every peer-supplied label through safe_label_str (the same sanitizer the noise hello and protobuf hello paths already use), use a "replace" UTF-8 error handler so a malformed payload no longer raises out of the zeroconf callback, and apply per-column length caps that match the discover output FORMAT widths so a long value can't break alignment either.

To keep the sanitizer importable from both the discover CLI and the noise frame helper without dragging the framing module's transitive deps in, the caps and safe_label_str move to a dedicated aioesphomeapi/_sanitize.py (with _sanitize.pxd for cimport on the noise hot path). _frame_helper/base.py re-exports the names so existing from .base import MAX_NAME_LEN callers keep working unchanged.

Same defensive surface as #1664, #1660, #1656; mDNS is just a different peer-controlled channel feeding the same kind of output.

Testing: tests/test_discover.py covers the None / str / bytes paths, invalid UTF-8 (must not raise), ANSI / newline / null / tab stripping, length capping, and non-ASCII printable survival. Full suite green locally (1092 passed, 2 skipped); Cython build with REQUIRE_CYTHON=1 rebuilds cleanly.

This finishes the work originally opened as #1669; the CodSpeed benchmarks for the sanitize hot path were lifted to #1687 and the unrelated # type: ignore lint fix to #1688 so the change set here stays focused on the discover surface.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Code quality improvements to existing code or addition of tests
  • Other

Related issue or feature (if applicable):

Pull request in esphome (if applicable):

  • n/a, no protocol changes.

Checklist:

  • The code change is tested and works locally.
  • If api.proto was modified, a linked pull request has been made to esphome with the same changes.
  • Tests have been added to verify that the new code works (under tests/ folder).

mDNS service properties (name, mac, version, platform, board) come from
anything on the LAN that wants to broadcast _esphomelib._tcp.local.
The discover CLI was passing those bytes through `data.decode()` (raising
on non-UTF-8 input) and printing them straight to stdout with no
control-character filter. A hostile broadcaster could clear or reflow
the operator's terminal by stuffing ANSI escapes / newlines into the
mac or version field.

Route the values through `safe_label_str` (the same sanitizer used on
the noise hello + protobuf hello paths) and use a "replace" error
handler on the UTF-8 decode so an invalid byte sequence no longer
breaks the zeroconf callback. Per-column length caps match the
discover output FORMAT widths so a long value can't break the table
alignment either.

To keep the sanitizer importable from both the discover CLI and the
noise frame helper without dragging the framing module's transitive
deps, the caps and `safe_label_str` move to a dedicated
`aioesphomeapi/_sanitize.py` (with `_sanitize.pxd` for cdef cimports
on the noise hot path). `_frame_helper/base.py` keeps the old import
points working via re-export so existing callers stay unchanged.

Tests added for the discover decode path (None, str, bytes, invalid
UTF-8, control-char stripping, length cap, non-ASCII printable
survival). Same defensive surface as PRs #1664 / #1660 / #1656,
applied to the mDNS channel.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (1868e05) to head (5bce804).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main     #1689   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           25        26    +1     
  Lines         4159      4165    +6     
=========================================
+ Hits          4159      4165    +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 20, 2026

Merging this PR will not alter performance

✅ 47 untouched benchmarks


Comparing discover-sanitize-mdns-labels (5bce804) with main (533d608)

Open in CodSpeed

@bluetoothbot
Copy link
Copy Markdown
Contributor

PR Review — Sanitize peer-supplied mDNS labels in discover CLI

Solid, focused hardening of the discover CLI. The same safe_label_str sanitizer + per-column caps that already protect the noise hello / handshake-reject paths now cover mDNS-supplied labels, the bare data.decode() is replaced with a replace-error-handler decode so a malformed payload no longer raises out of the zeroconf callback, and the _sanitize.py extraction is a clean dependency cut (no transitive framing deps pulled into the CLI). Tests cover the documented threat surface (UTF-8 failure, ANSI/newline/null/tab, length cap, unicode survival) and codspeed confirms no hot-path regression. Only suggestions — no blocking issues: (1) decode_bytes_or_unknown is now a sanitizer, the name no longer matches; (2) the per-column caps duplicate the FORMAT widths with no mechanical link; (3) the invalid-UTF-8 test only checks the return type rather than pinning the replacement-char contract; (4) _sanitize.py would benefit from a docstring and __all__. None of these block merge.


🟢 Suggestions

1. Function name no longer reflects behaviour (`aioesphomeapi/discover.py`, L16-22)

decode_bytes_or_unknown now does three things — decode, strip non-printables, length-cap — but the name still reads like a pure decoder. A reader importing it will be surprised that passing a clean ASCII string still mutates it (trailing whitespace, characters past limit, etc.). Consider renaming to something like decode_mdns_label_or_unknown or sanitize_property_or_unknown. Since it's a public helper (no underscore) the rename can keep a thin alias for one cycle if you're worried about external callers, but discover.py doesn't really have an external surface other than the CLI so a straight rename is probably fine.

def decode_bytes_or_unknown(data: str | bytes | None, limit: int = MAX_NAME_LEN) -> str:
    """Decode peer-supplied mDNS bytes, strip non-printables, length-cap."""
2. Per-column caps drift silently if FORMAT changes (`aioesphomeapi/discover.py`, L17-22)

_MAX_VERSION_LEN = 16 / _MAX_PLATFORM_LEN = 10 / _MAX_BOARD_LEN = 32 are hand-derived from the widths inside FORMAT. The comment says they 'match the FORMAT column widths' but nothing enforces this — if someone widens the Platform column in FORMAT from <10 to <12 they will not realise the cap also needs to move and the table will silently start to break alignment again.

Low-cost defensive options:

  1. Parse the widths out of FORMAT once at module load: e.g. derive a tuple from re.findall(r'<(\d+)', FORMAT) and assert the order/length matches COLUMN_NAMES, then index into it.
  2. Or just add a unit test that pins (MAX_NAME_LEN, _MAX_VERSION_LEN, _MAX_PLATFORM_LEN, _MAX_BOARD_LEN) against the widths in FORMAT so a future widener gets a red CI signal.

Not blocking — the change as-is is strictly better than the status quo.

_MAX_VERSION_LEN = 16
_MAX_PLATFORM_LEN = 10
_MAX_BOARD_LEN = 32
3. Invalid-UTF-8 test only checks the type, not the contract (`tests/test_discover.py`, L19-23)

test_decode_bytes_or_unknown_invalid_utf8_does_not_raise asserts only isinstance(result, str), which is already guaranteed by the return annotation. The test passes even if a future refactor decides to drop the replace handler and return UNKNOWN on decode failure — silently changing observable behaviour without a failing test.

Since b"\xff\xfe".decode("utf-8", "replace") yields "\ufffd\ufffd" and str.isprintable("\ufffd") is True, the contract you actually want to pin is something like result == "\ufffd\ufffd" (or len(result) == 2 and all(c == "\ufffd" for c in result)). That keeps the property test (no raise) while also pinning that bad bytes survive to the output rather than being silently swallowed.

result = decode_bytes_or_unknown(b"\xff\xfe")
assert isinstance(result, str)

Checklist

  • Input validation at system boundaries (mDNS callback)
  • No bare except / silent error swallowing
  • No hardcoded secrets
  • Unsafe decode/deserialize paths handled
  • Untested branches covered
  • Edge cases (None / empty / boundary) covered — suggestion #3
  • Tests assert observable behaviour, not source
  • Naming reflects behaviour — suggestion #1
  • Coupled constants pinned with assertion or test — suggestion #2
  • Cython .pxd updated alongside cdef signature changes

Summary

Solid, focused hardening of the discover CLI. The same safe_label_str sanitizer + per-column caps that already protect the noise hello / handshake-reject paths now cover mDNS-supplied labels, the bare data.decode() is replaced with a replace-error-handler decode so a malformed payload no longer raises out of the zeroconf callback, and the _sanitize.py extraction is a clean dependency cut (no transitive framing deps pulled into the CLI). Tests cover the documented threat surface (UTF-8 failure, ANSI/newline/null/tab, length cap, unicode survival) and codspeed confirms no hot-path regression. Only suggestions — no blocking issues: (1) decode_bytes_or_unknown is now a sanitizer, the name no longer matches; (2) the per-column caps duplicate the FORMAT widths with no mechanical link; (3) the invalid-UTF-8 test only checks the return type rather than pinning the replacement-char contract; (4) _sanitize.py would benefit from a docstring and __all__. None of these block merge.


Automated review by Kōanabf3c2a

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens aioesphomeapi-discover against hostile peer-supplied mDNS fields by ensuring all displayed labels are UTF-8 decoded safely, stripped of control characters, and length-capped to prevent terminal/log injection and table misalignment. It also extracts the shared sanitizer into a dedicated module so both the discover CLI and Cythonized noise path can reuse it.

Changes:

  • Add aioesphomeapi/_sanitize.py (+ .pxd) with safe_label_str and MAX_* caps; re-export from _frame_helper/base.py for backwards compatibility.
  • Sanitize all peer-controlled discover CLI fields (service name + mDNS properties) with UTF-8 errors="replace" and per-field length caps.
  • Add unit tests for decode_bytes_or_unknown covering invalid UTF-8, control stripping, and length capping.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
aioesphomeapi/discover.py Decode bytes with errors="replace" and sanitize/cap peer-controlled labels before printing.
aioesphomeapi/_sanitize.py Introduce shared sanitizer + MAX_* constants in a lightweight module.
aioesphomeapi/_sanitize.pxd Provide a Cython cpdef signature for safe_label_str for hot paths.
aioesphomeapi/_frame_helper/base.py Import and re-export sanitizer/constants; keep _MAX_* cdef aliases.
aioesphomeapi/_frame_helper/base.pxd Switch Cython import of safe_label_str to .._sanitize.
aioesphomeapi/_frame_helper/noise.py Update imports to use the new sanitizer module.
aioesphomeapi/_frame_helper/noise.pxd Switch Cython safe_label_str import to .._sanitize.
setup.py Add _sanitize.py to the Cythonized extension list.
tests/test_discover.py Add coverage for sanitizing/decoding behavior used by discover.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread aioesphomeapi/discover.py Outdated
- Derive per-column display caps from FORMAT widths via regex so the caps
  stay in lock-step with the table layout; an assertion at module load
  pins COLUMN_NAMES against the derived widths so a future FORMAT edit
  surfaces immediately. This also corrects the MAC cap that was set to
  MAX_MAC_LEN (16) while the FORMAT slot was only 12 chars wide, which
  would have let a 13-16 char MAC widen the column past its slot.
- Rename `decode_bytes_or_unknown` to `decode_mdns_label_or_unknown`
  since the helper now decodes, strips non-printables and length-caps;
  the new name reflects all three behaviors.
- Pin the invalid-UTF-8 contract by asserting the actual U+FFFD output
  instead of just `isinstance(result, str)`, so a future refactor that
  silently swaps the "replace" handler for UNKNOWN-or-empty trips a red
  test.
- Add a `test_per_column_caps_match_format_widths` regression test that
  re-derives the widths from FORMAT and checks every cap, giving a
  second mechanical pin alongside the module-load assertion.
- Add a module docstring and `__all__` to `_sanitize.py` so importers
  see the intent of the module and `from _sanitize import *` is bounded.
@bdraco bdraco marked this pull request as ready for review May 20, 2026 14:55
@bdraco bdraco merged commit dcd9453 into main May 20, 2026
18 checks passed
@bdraco bdraco deleted the discover-sanitize-mdns-labels branch May 20, 2026 14:55
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f8b6a5bf-ecd6-4648-ac64-5a98fba29756

📥 Commits

Reviewing files that changed from the base of the PR and between 533d608 and 5bce804.

📒 Files selected for processing (9)
  • aioesphomeapi/_frame_helper/base.pxd
  • aioesphomeapi/_frame_helper/base.py
  • aioesphomeapi/_frame_helper/noise.pxd
  • aioesphomeapi/_frame_helper/noise.py
  • aioesphomeapi/_sanitize.pxd
  • aioesphomeapi/_sanitize.py
  • aioesphomeapi/discover.py
  • setup.py
  • tests/test_discover.py

Walkthrough

The PR centralizes mDNS label input sanitization by creating a shared _sanitize module, refactoring Cython exports to use it, and applying the sanitizer to discovery service fields to prevent terminal and log injection attacks from malformed peer labels.

Changes

Input Sanitization and Discovery Label Protection

Layer / File(s) Summary
Sanitizer module and Cython declaration
aioesphomeapi/_sanitize.py, aioesphomeapi/_sanitize.pxd
New _sanitize module exports safe_label_str() that removes non-printable characters and truncates strings, plus constants MAX_NAME_LEN, MAX_MAC_LEN, MAX_EXPLANATION_LEN. Cython declaration file exposes the function as cpdef with typed parameters.
Frame helper import refactoring
aioesphomeapi/_frame_helper/base.pxd, aioesphomeapi/_frame_helper/base.py, aioesphomeapi/_frame_helper/noise.pxd, aioesphomeapi/_frame_helper/noise.py
base.pxd cimports safe_label_str from _sanitize instead of declaring locally; base.py imports constants and function from _sanitize and re-exports via __all__. noise.pxd and noise.py source utilities directly from _sanitize, with updated comments describing the pure-Python fallback behavior.
Discovery label sanitization and length enforcement
aioesphomeapi/discover.py
Introduces decode_mdns_label_or_unknown() replacing decode_bytes_or_unknown(), which decodes bytes with UTF-8 replacement and applies safe_label_str() sanitization. Per-column display caps derived from FORMAT width are applied to service name prefix and mDNS properties (mac, version, platform, board) in async_service_update().
Cythonization setup and sanitizer validation
setup.py, tests/test_discover.py
_sanitize.py added to TO_CYTHONIZE list for optional Cython compilation. Comprehensive test suite validates decode_mdns_label_or_unknown() across input types (None, strings, bytes), UTF-8 handling, control character stripping, and limit enforcement; asserts per-column display caps match module constants.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • esphome/aioesphomeapi#1664: Adds unit tests directly exercising the safe_label_str sanitization function and the mDNS discovery label handling that this PR centralizes.
  • esphome/aioesphomeapi#1656: Refactors Noise hello field sanitization using the same noise.py/pxd import wiring and shared MAX_* length caps that this PR consolidates.
  • esphome/aioesphomeapi#1660: Refactors Cython sanitizer plumbing for peer labels, specifically safe_label_str declarations and MAX constants in base.pxd/base.py that this PR centralizes into _sanitize.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch discover-sanitize-mdns-labels

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants