Skip to content

feat(error): add ErrorClassification trait for triage-friendly Error<Kind> categories#1267

Closed
Greg Lamberson (glamberson) wants to merge 1 commit into
Devolutions:masterfrom
lamco-admin:feat/ironrdp-error-classification
Closed

feat(error): add ErrorClassification trait for triage-friendly Error<Kind> categories#1267
Greg Lamberson (glamberson) wants to merge 1 commit into
Devolutions:masterfrom
lamco-admin:feat/ironrdp-error-classification

Conversation

@glamberson
Copy link
Copy Markdown
Contributor

Summary

Introduces ironrdp_error::ErrorClassification, a trait that assigns each *ErrorKind variant a coarse [ErrorCategory] (one of Protocol, DataCorruption, InternalBug, Unknown). A convenience Error<Kind>::classify() method becomes available on Error<Kind> when Kind: ErrorClassification, forwarding to the inner kind.

Implements the trait for DecodeErrorKind and EncodeErrorKind in ironrdp-core, and re-exports ErrorCategory and ErrorClassification from ironrdp-core so consumers can pull them via the same crate they already use for DecodeError / EncodeError.

Closes out the third of the substantive ironrdp-error follow-ons from #1257 (after #1262 location capture, #1263 bail!/ensure! macros, and the position-aware metadata in #1266). Primary consumer is the structured-fuzzing oracle code in #1120.

Why this matters for fuzzing

Structured-fuzz oracles need to distinguish:

  • Expected outcome: the decoder correctly rejected a malformed input (Protocol, DataCorruption).
  • Bug signal: the decoder reached an internal invariant violation that should not happen for any input (InternalBug).

Today, an oracle can only differentiate by parsing the Display string of an error, which is fragile and tightly couples oracle code to formatting choices. After this PR, the oracle calls error.classify() and matches on an enum.

API

#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)]
#[non_exhaustive]
pub enum ErrorCategory {
    Protocol,
    DataCorruption,
    InternalBug,
    Unknown,
}

pub trait ErrorClassification {
    fn classify(&self) -> ErrorCategory;
}

impl<Kind: ErrorClassification> Error<Kind> {
    pub fn classify(&self) -> ErrorCategory;
}

Initial impls in ironrdp-core

Kind Structured variants Other
DecodeErrorKind Protocol (peer-driven wire-format failures) Unknown
EncodeErrorKind InternalBug (we miscalculated something in our own serialisation path) Unknown

Other is intentionally Unknown because it carries a free-form description: &'static str and cannot be classified without inspecting that string at compile time.

DataCorruption is defined now but not used by DecodeErrorKind / EncodeErrorKind because those enums only carry low-level wire-format variants. It is reserved for the deeper-validation variants that other *ErrorKind enums in the workspace will adopt in follow-on PRs (e.g. session-layer integrity checks, license validation, capability negotiation mismatches).

Adoption pattern for other crates

The trait is intentionally not given a default impl returning Unknown. Per-crate adoption is opt-in: an *ErrorKind that wants classification implements the trait, and Error<Kind>::classify() becomes available on Error<Kind> accordingly. Kinds that have not been audited yet simply do not expose the convenience method. This forces classification work to be deliberate rather than silently degrading to Unknown for unaudited variants.

Scope

Strictly additive: no existing *ErrorKind enum signature changes, no existing constructor changes, no Display / Debug changes. The new trait method is purely opt-in for consumers who need categorisation.

This PR limits the per-crate impl work to ironrdp-core's two foundational kinds. Follow-on PRs may extend the impl to other workspace *ErrorKind types as their owners audit them.

Diff

  • crates/ironrdp-error/src/lib.rs (+73): ErrorCategory enum, ErrorClassification trait, Error<Kind>::classify() convenience impl block
  • crates/ironrdp-core/src/lib.rs (+2): re-exports ErrorCategory and ErrorClassification
  • crates/ironrdp-core/src/decode.rs (+19): ErrorClassification impl for DecodeErrorKind
  • crates/ironrdp-core/src/encode.rs (+20): ErrorClassification impl for EncodeErrorKind

Total: 4 files, +114, 0 deletions.

CI

cargo xtask check fmt | lints | locks | typos | tests all pass locally. ironrdp-core builds clean under --no-default-features (no_std) and --all-features.

Relationship to the other open error PRs

PR What it does Independent?
#1264 Removes the existing thiserror dependency from ironrdp-pdu by hand-rolling Display/Error impls Yes, independent
#1266 Adds optional byte-offset + symbolic field-context metadata to Error<Kind> Yes, independent
#1267 (this) Adds ErrorClassification trait + category enum + initial impls Yes, independent

All three are independent and can land in any order.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a lightweight, opt-in error triage mechanism by introducing a coarse ErrorCategory enum and an ErrorClassification trait in ironrdp-error, then wiring initial implementations for ironrdp-core’s DecodeErrorKind and EncodeErrorKind (and re-exporting the API from ironrdp-core).

Changes:

  • Introduces ironrdp_error::ErrorCategory and ironrdp_error::ErrorClassification, plus Error<Kind>::classify() when Kind: ErrorClassification.
  • Implements ErrorClassification for ironrdp_core::DecodeErrorKind and ironrdp_core::EncodeErrorKind.
  • Re-exports ErrorCategory and ErrorClassification from ironrdp-core to keep the common import path for consumers.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
crates/ironrdp-error/src/lib.rs Defines ErrorCategory, ErrorClassification, and adds Error<Kind>::classify() forwarding to the inner kind.
crates/ironrdp-core/src/lib.rs Re-exports ErrorCategory/ErrorClassification for easier downstream consumption via ironrdp-core.
crates/ironrdp-core/src/decode.rs Classifies DecodeErrorKind structured variants as Protocol, with Other as Unknown.
crates/ironrdp-core/src/encode.rs Classifies EncodeErrorKind structured variants as InternalBug, with Other as Unknown.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/ironrdp-error/src/lib.rs Outdated
@glamberson Greg Lamberson (glamberson) force-pushed the feat/ironrdp-error-classification branch from d48f20e to c336d58 Compare May 13, 2026 10:25
…Kind> categories

Introduces ironrdp_error::{ErrorCategory, ErrorClassification} so consumers
can route errors by coarse category (Protocol, DataCorruption, InternalBug,
Unknown) without inspecting Display strings. Adds a convenience
Error<Kind>::classify() method that is available when Kind: ErrorClassification.

Implements the trait for the two foundational Kind types in ironrdp-core:
DecodeErrorKind classifies all structured variants as Protocol (peer-driven
wire-format failures) and Other as Unknown; EncodeErrorKind classifies all
structured variants as InternalBug (we miscalculated something in our own
serialisation path) and Other as Unknown. ironrdp-core re-exports the trait
and enum so consumers can pull them via the same crate they already use for
DecodeError/EncodeError type aliases.

Primary consumer is the structured-fuzzing oracle code in ironrdp-fuzzing
tracked under Devolutions#1120: categories let an oracle distinguish "decoder
correctly rejected malformed input" from "decoder hit an internal invariant
violation" without inspecting Display strings. The DataCorruption category
is defined now and reserved for the deeper-validation variants that other
*ErrorKind enums in the workspace will adopt in follow-on PRs.

Refs Devolutions#1257, Devolutions#1120.
@glamberson Greg Lamberson (glamberson) force-pushed the feat/ironrdp-error-classification branch from c336d58 to fab5f54 Compare May 14, 2026 11:34
Copy link
Copy Markdown
Member

@CBenoit Benoît Cortier (CBenoit) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the careful write-up, and the relationship-to-other-PRs table is genuinely helpful.

Before going further on the trait shape, I want to push back on the premise, because I think it leads to a different and ultimately more useful PR.

First, the InternalBug category encodes a pattern I'd rather discourage. My general stance is that bug-shaped conditions, internal invariant violations, "this can't happen," miscalculated buffer sizes on the encode path, should be debug_assert! / assert! / panic!, not Err variants. The rationale is:

  • The fuzzer detects panics natively. No oracle classification is needed: an Err is an expected rejection, a SIGABRT is a finding. That's the entire distinction we need for fuzzing.
  • debug_assert! costs nothing in release and turns every test + fuzz iteration into a free invariant verifier.
  • Returning Err for a bug is the worst-of-both-worlds middle: no fuzzer signal, no clean crash, and every caller now has to handle an error that, by definition, shouldn't happen.

Process resilience for long-running servers, the obvious objection, is better solved by a catch_unwind boundary at the per-connection scope than by typed error categories. That gives us both properties at once: assert! discipline in code, isolation in production.

Under this lens, the InternalBug category in this PR has no legitimate producer in a codebase that uses asserts properly, and shipping the category invites future code to create producers, which is the wrong direction.

My counter-proposal is: turn this into an audit pass.

I think the better long-term shape for IronRDP is:

  1. Audit every *ErrorKind variant. For each one whose producer is "we, ourselves, hit an internal contradiction," replace the producing call site with debug_assert! (or assert! where the invariant is load-bearing enough to keep in release).
  2. Keep a small number of well-placed input guards at API boundaries, particularly the FFI surface, that return Err rather than unwinding. Unwinding into C is unsafe and "panic = abort" consumers exist; we owe those callers a typed error. But these are deliberately narrow and live at the boundary, not scattered through the parsing code.

How does that sound?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've convinced me. Both the panic-as-bug-signal argument and the InternalBug-invites-bad-code argument are right, and the trait is the wrong shape for what we actually need.

I'll close this PR and open a tracking issue for the audit pass you described:

  1. Walk every *ErrorKind variant whose producer is "we hit an internal contradiction" and convert the producing call site to debug_assert! (or assert! where the invariant should hold in release).
  2. Once producers are gone, the variants themselves become removable.
  3. Keep narrow API-boundary input guards as the only Err-returning case for internal-bug-shaped conditions.

I'll likely break this into per-crate PRs to keep them reviewable, starting with ironrdp-pdu and ironrdp-core since those have the most variant surface.

On the catch_unwind boundary at per-connection scope: I'll surface that as a server-side follow-up separately, since it provides the long-running-process resilience property the typed-category approach was partially trying to satisfy. Different concern, worth its own tracking.

@glamberson
Copy link
Copy Markdown
Contributor Author

Closing in favor of the audit-pass direction. Tracking issue at #1274.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants