feat(error): add ErrorClassification trait for triage-friendly Error<Kind> categories#1267
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a lightweight, opt-in error triage mechanism by introducing a coarse ErrorCategory enum and an ErrorClassification trait in ironrdp-error, then wiring initial implementations for ironrdp-core’s DecodeErrorKind and EncodeErrorKind (and re-exporting the API from ironrdp-core).
Changes:
- Introduces
ironrdp_error::ErrorCategoryandironrdp_error::ErrorClassification, plusError<Kind>::classify()whenKind: ErrorClassification. - Implements
ErrorClassificationforironrdp_core::DecodeErrorKindandironrdp_core::EncodeErrorKind. - Re-exports
ErrorCategoryandErrorClassificationfromironrdp-coreto keep the common import path for consumers.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| crates/ironrdp-error/src/lib.rs | Defines ErrorCategory, ErrorClassification, and adds Error<Kind>::classify() forwarding to the inner kind. |
| crates/ironrdp-core/src/lib.rs | Re-exports ErrorCategory/ErrorClassification for easier downstream consumption via ironrdp-core. |
| crates/ironrdp-core/src/decode.rs | Classifies DecodeErrorKind structured variants as Protocol, with Other as Unknown. |
| crates/ironrdp-core/src/encode.rs | Classifies EncodeErrorKind structured variants as InternalBug, with Other as Unknown. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
d48f20e to
c336d58
Compare
…Kind> categories
Introduces ironrdp_error::{ErrorCategory, ErrorClassification} so consumers
can route errors by coarse category (Protocol, DataCorruption, InternalBug,
Unknown) without inspecting Display strings. Adds a convenience
Error<Kind>::classify() method that is available when Kind: ErrorClassification.
Implements the trait for the two foundational Kind types in ironrdp-core:
DecodeErrorKind classifies all structured variants as Protocol (peer-driven
wire-format failures) and Other as Unknown; EncodeErrorKind classifies all
structured variants as InternalBug (we miscalculated something in our own
serialisation path) and Other as Unknown. ironrdp-core re-exports the trait
and enum so consumers can pull them via the same crate they already use for
DecodeError/EncodeError type aliases.
Primary consumer is the structured-fuzzing oracle code in ironrdp-fuzzing
tracked under Devolutions#1120: categories let an oracle distinguish "decoder
correctly rejected malformed input" from "decoder hit an internal invariant
violation" without inspecting Display strings. The DataCorruption category
is defined now and reserved for the deeper-validation variants that other
*ErrorKind enums in the workspace will adopt in follow-on PRs.
Refs Devolutions#1257, Devolutions#1120.
c336d58 to
fab5f54
Compare
There was a problem hiding this comment.
Thanks for the careful write-up, and the relationship-to-other-PRs table is genuinely helpful.
Before going further on the trait shape, I want to push back on the premise, because I think it leads to a different and ultimately more useful PR.
First, the InternalBug category encodes a pattern I'd rather discourage. My general stance is that bug-shaped conditions, internal invariant violations, "this can't happen," miscalculated buffer sizes on the encode path, should be debug_assert! / assert! / panic!, not Err variants. The rationale is:
- The fuzzer detects panics natively. No oracle classification is needed: an
Erris an expected rejection, a SIGABRT is a finding. That's the entire distinction we need for fuzzing. debug_assert!costs nothing in release and turns every test + fuzz iteration into a free invariant verifier.- Returning
Errfor a bug is the worst-of-both-worlds middle: no fuzzer signal, no clean crash, and every caller now has to handle an error that, by definition, shouldn't happen.
Process resilience for long-running servers, the obvious objection, is better solved by a catch_unwind boundary at the per-connection scope than by typed error categories. That gives us both properties at once: assert! discipline in code, isolation in production.
Under this lens, the InternalBug category in this PR has no legitimate producer in a codebase that uses asserts properly, and shipping the category invites future code to create producers, which is the wrong direction.
My counter-proposal is: turn this into an audit pass.
I think the better long-term shape for IronRDP is:
- Audit every
*ErrorKindvariant. For each one whose producer is "we, ourselves, hit an internal contradiction," replace the producing call site withdebug_assert!(orassert!where the invariant is load-bearing enough to keep in release). - Keep a small number of well-placed input guards at API boundaries, particularly the FFI surface, that return
Errrather than unwinding. Unwinding into C is unsafe and "panic = abort" consumers exist; we owe those callers a typed error. But these are deliberately narrow and live at the boundary, not scattered through the parsing code.
How does that sound?
Greg Lamberson (glamberson)
left a comment
There was a problem hiding this comment.
You've convinced me. Both the panic-as-bug-signal argument and the InternalBug-invites-bad-code argument are right, and the trait is the wrong shape for what we actually need.
I'll close this PR and open a tracking issue for the audit pass you described:
- Walk every
*ErrorKindvariant whose producer is "we hit an internal contradiction" and convert the producing call site todebug_assert!(orassert!where the invariant should hold in release). - Once producers are gone, the variants themselves become removable.
- Keep narrow API-boundary input guards as the only
Err-returning case for internal-bug-shaped conditions.
I'll likely break this into per-crate PRs to keep them reviewable, starting with ironrdp-pdu and ironrdp-core since those have the most variant surface.
On the catch_unwind boundary at per-connection scope: I'll surface that as a server-side follow-up separately, since it provides the long-running-process resilience property the typed-category approach was partially trying to satisfy. Different concern, worth its own tracking.
|
Closing in favor of the audit-pass direction. Tracking issue at #1274. |
Summary
Introduces
ironrdp_error::ErrorClassification, a trait that assigns each*ErrorKindvariant a coarse [ErrorCategory] (one ofProtocol,DataCorruption,InternalBug,Unknown). A convenienceError<Kind>::classify()method becomes available onError<Kind>whenKind: ErrorClassification, forwarding to the inner kind.Implements the trait for
DecodeErrorKindandEncodeErrorKindinironrdp-core, and re-exportsErrorCategoryandErrorClassificationfromironrdp-coreso consumers can pull them via the same crate they already use forDecodeError/EncodeError.Closes out the third of the substantive ironrdp-error follow-ons from #1257 (after #1262 location capture, #1263 bail!/ensure! macros, and the position-aware metadata in #1266). Primary consumer is the structured-fuzzing oracle code in #1120.
Why this matters for fuzzing
Structured-fuzz oracles need to distinguish:
Protocol,DataCorruption).InternalBug).Today, an oracle can only differentiate by parsing the
Displaystring of an error, which is fragile and tightly couples oracle code to formatting choices. After this PR, the oracle callserror.classify()and matches on an enum.API
Initial impls in ironrdp-core
OtherDecodeErrorKindProtocol(peer-driven wire-format failures)UnknownEncodeErrorKindInternalBug(we miscalculated something in our own serialisation path)UnknownOtheris intentionallyUnknownbecause it carries a free-formdescription: &'static strand cannot be classified without inspecting that string at compile time.DataCorruptionis defined now but not used byDecodeErrorKind/EncodeErrorKindbecause those enums only carry low-level wire-format variants. It is reserved for the deeper-validation variants that other*ErrorKindenums in the workspace will adopt in follow-on PRs (e.g. session-layer integrity checks, license validation, capability negotiation mismatches).Adoption pattern for other crates
The trait is intentionally not given a default impl returning
Unknown. Per-crate adoption is opt-in: an*ErrorKindthat wants classification implements the trait, andError<Kind>::classify()becomes available onError<Kind>accordingly. Kinds that have not been audited yet simply do not expose the convenience method. This forces classification work to be deliberate rather than silently degrading toUnknownfor unaudited variants.Scope
Strictly additive: no existing
*ErrorKindenum signature changes, no existing constructor changes, noDisplay/Debugchanges. The new trait method is purely opt-in for consumers who need categorisation.This PR limits the per-crate impl work to
ironrdp-core's two foundational kinds. Follow-on PRs may extend the impl to other workspace*ErrorKindtypes as their owners audit them.Diff
crates/ironrdp-error/src/lib.rs(+73):ErrorCategoryenum,ErrorClassificationtrait,Error<Kind>::classify()convenience impl blockcrates/ironrdp-core/src/lib.rs(+2): re-exportsErrorCategoryandErrorClassificationcrates/ironrdp-core/src/decode.rs(+19):ErrorClassificationimpl forDecodeErrorKindcrates/ironrdp-core/src/encode.rs(+20):ErrorClassificationimpl forEncodeErrorKindTotal: 4 files, +114, 0 deletions.
CI
cargo xtask check fmt | lints | locks | typos | testsall pass locally.ironrdp-corebuilds clean under--no-default-features(no_std) and--all-features.Relationship to the other open error PRs
thiserrordependency fromironrdp-pduby hand-rolling Display/Error implsError<Kind>ErrorClassificationtrait + category enum + initial implsAll three are independent and can land in any order.