This document describes the current protection, fault, diagnostic, and transmission model in
ST-LIB.
It is intentionally split into two parts:
- How to use
- Internal development
If you only want to integrate protections into an application, read the first part only.
The subsystem has three explicit runtime operations:
Board::init()Board::ProtectionEngine::evaluate()orBoard::evaluate_protections()Diagnostics::Hub::flush()
If the application uses an operational state machine nested under the global runtime, it also polls:
FaultController::check_transitions()
The global fault model is always the same:
- the framework owns a global runtime with two states:
OPERATIONALandFAULT - internally, the only way to enter the global
FAULTstate isFaultController::request_fault(...) - protection faults,
PANIC(...), andFAULT(...)all end up there - fault diagnostics are transmitted with urgent priority through
Diagnostics
flowchart TD
A["Declare protection rules"] --> B["Declare Board policy and request objects"]
B --> C["Board::init()"]
C --> D["while (1)"]
D --> E["FaultController::check_transitions()"]
D --> F["Board::evaluate_protections()"]
D --> G["Diagnostics::Hub::flush()"]
The application integration contract is:
Board<FaultPolicyT, dev0, dev1, ...>Where:
FaultPolicyTis mandatory and is always the first template argumentdev0, dev1, ...are board declarations, including hardware requests and protection requests- the framework always owns the top-level runtime machine
- the application may optionally provide a nested operational machine and/or a
FAULTentry callback
Protections are compile-time board requests. A protection request:
- has a stable name encoded in the type
- reads from one sample source object or sample variable
- owns a fixed set of rules declared before runtime
- is passed to
Board<...>with the rest of the board request objects
Rules are created through the factories in Protections::Rules and passed to
Protections::protection<"name", source>(...).
Available rule factories:
Rules::below(...)Rules::above(...)Rules::range(...)Rules::equals(...)Rules::not_equals(...)Rules::time_accumulation(...)
Rule factories return std::expected; Protections::protection(...) unwraps them while building
the compile-time declaration. Invalid declarations fail during build or constant evaluation instead
of creating a partial runtime registry.
Current rule signatures are:
Rules::below(fault_threshold)
Rules::below(fault_threshold, warning_threshold)
Rules::above(fault_threshold)
Rules::above(fault_threshold, warning_threshold)
Rules::range(low_fault, high_fault)
Rules::range(low_fault, high_fault, low_warning, high_warning)
Rules::equals(value)
Rules::not_equals(value)
Rules::time_accumulation(fault_threshold, window_seconds)
Rules::time_accumulation(fault_threshold, warning_threshold, window_seconds)Rules::time_accumulation(...) has these semantics:
- it is intended for floating-point samples
- it evaluates
abs(sample) - it measures continuous active time, not an integral over samples
- it resets the accumulated active time when the triggering condition clears
- it uses
Scheduler::get_global_tick(), so it does not depend on thewhile (1)iteration rate
Declare protections at namespace scope and pass them to Board.
The intended lifecycle is:
- compile-time declaration
Board::init()- evaluation and flushing in the runtime loop
There is no runtime registration phase and no mutable protection registry. Board derives a
board-specific ProtectionEngine type from the protection requests it receives, initializes it from
Board::init(), and then starts the global fault runtime.
#include "ST-LIB.hpp"
using namespace ST_LIB;
constexpr auto led = DigitalOutputDomain::DigitalOutput(PF13);
float bus_voltage = 0.0f;
inline constexpr auto bus_voltage_protection = Protections::protection<"bus_voltage", bus_voltage>(
Protections::Rules::below(350.0f, 370.0f),
Protections::Rules::time_accumulation(20.0f, 15.0f, 0.5f)
);
using MainBoard = Board<DefaultFaultPolicy, bus_voltage_protection, led>;
int main() {
MainBoard::init();
while (1) {
MainBoard::evaluate_protections();
Diagnostics::Hub::flush();
}
}Board::init() always installs and starts the global fault runtime.
That runtime has two states:
OPERATIONALFAULT
If the application does not use a functional state machine, nothing else is required.
Typical choices are:
Board<DefaultFaultPolicy, ...>when no extra fault callback is neededBoard<FaultPolicyNoMachine<on_fault_enter>, ...>when onlyFAULTentry actions are neededBoard<FaultPolicy<app_machine, on_fault_enter>, ...>when both a nested operational machine andFAULTentry actions are needed
If the application does use a functional state machine, it can be nested inside OPERATIONAL
through a FaultPolicy.
Example:
enum class AppState : uint8_t { IDLE = 0, RUN = 1 };
static constexpr auto idle_state = make_state(AppState::IDLE);
static constexpr auto run_state = make_state(AppState::RUN);
static inline auto app_machine = make_state_machine(AppState::IDLE, idle_state, run_state);
static void on_fault_enter() {
// disable power stage, set LEDs, open contactors, etc.
}
using MainBoard = Board<FaultPolicy<app_machine, on_fault_enter>, led>;
int main() {
MainBoard::init();
while (1) {
FaultController::check_transitions();
MainBoard::evaluate_protections();
Diagnostics::Hub::flush();
}
}Important rules:
- the user state machine models operational behavior only
- the user does not program transitions to the global
FAULT - if a fatal condition must force the system into
FAULT, user code should usePANIC(...)orFAULT(...) - if a nested operational state machine is used, poll
FaultController::check_transitions(), not the child machine directly Boardtakes the fault policy type as its first template argument
on_fault_enter semantics:
- it is an optional callback owned by the global fault runtime
- it runs when the global runtime enters
FAULT - it is the right place to perform application fault-entry actions such as disabling power stages, opening contactors, or setting status LEDs
- it does not replace the fault transition itself; it is an enter action attached to the global
FAULTstate
If the application needs neither a nested machine nor a FAULT entry action, use:
using MainBoard = Board<DefaultFaultPolicy, led>;The runtime diagnostic façade is:
PANIC(...)FAULT(...)WARNING(...)INFO(...)
Their semantics are:
PANIC(...): fatal runtime/internal error, enters the globalFAULTFAULT(...): fatal domain/application fault, enters the globalFAULTWARNING(...): non-fatal diagnosticINFO(...): informational diagnostic
PANIC(...) and FAULT(...) both call the same global fault path underneath.
The difference is semantic classification of the cause and diagnostic category.
Internally, protections and fatal runtime reporters converge on:
FaultController::request_fault(cause);This primitive is not part of the normal user-facing API.
In the current implementation it is an internal FaultController entry point, not a public
application hook.
User code should prefer FAULT(...) or PANIC(...) so the library captures consistent source
metadata and preserves the public runtime contract.
In practice:
- protections use
FaultController::request_fault(...)internally PANIC(...)andFAULT(...)use that same path internally- user application code should not call
request_fault(...)directly
All external reporting goes through Diagnostics.
There is no separate fault-broadcast subsystem anymore.
The transmission model is:
- normal diagnostics are queued with
NORMALpriority - faults are published with
URGENTpriority Diagnostics::Hub::flush()always drains urgent records first
Default sinks are installed during Board::init():
- UART sink when UART printing is available
- TCP sink when
STLIB_ETHis enabled
If a transport is not compiled in, it is simply not installed.
If you are migrating from the previous architecture:
- stop using
ProtectionManager - stop using the low/high protection split
- stop using
Boundary/BoundaryInterfaceas the protection integration model - stop depending on
FaultRuntime - stop treating
STLIB::start(),STLIB::update(),STLIB_LOW::start(), orSTLIB_HIGH::start()as the real bootstrap path - move bootstrap to
Board::init() - declare
Board<fault_policy, ...>explicitly - move operational user behavior into
FaultPolicy<app_machine, on_fault_enter>when needed - stop programming transitions to the global
FAULT - replace legacy reporting paths with
PANIC(...),FAULT(...),WARNING(...), andINFO(...)
The design is split into four concerns:
- protections: evaluate domain rules over samples
- faulting: control the global
OPERATIONAL/FAULTruntime - diagnostics: store and dispatch structured records
- transport: serialize and emit diagnostics
flowchart LR
A["ProtectionEngine"] --> B["FaultController::request_fault(...)"]
A --> C["Diagnostics::Hub"]
D["PANIC / FAULT"] --> B
E["WARNING / INFO"] --> C
B --> F["FaultCause"]
F --> G["FaultDiagnosticMapper"]
G --> C
C --> H["DiagnosticSink"]
H --> I["UART / TCP"]
The key boundaries are:
- protections do not know transport
- diagnostics do not own or evaluate protections
FaultControllerdoes not own sinks- transport does not change system state
Public API:
Protections::protection<"name", source>(...)Board::ProtectionEngineBoard::evaluate_protections()Protections::Rules::*
Internal model:
- one board-specific compile-time collection of protections
- no low/high frequency split in the domain model
- rule configuration returned through
std::expected - rule evaluation produces
RuleState,RuleEdge, andRuleSnapshot
Supported rule kinds:
BELOWABOVERANGEEQUALSNOT_EQUALSTIME_ACCUMULATION
TIME_ACCUMULATION uses Scheduler::get_global_tick() to measure real elapsed time.
It no longer assumes a fixed evaluation rate.
Board::ProtectionEngine::evaluate():
- walks every protection
- publishes non-fatal rule edges through
Diagnostics - requests the global fault when a rule reaches
FAULT - throttles repeated fault notifications with
notify_delay_in_microseconds
FaultController owns the global runtime state machine.
The runtime machine is:
- always present
- always two-state:
OPERATIONAL/FAULT - optionally composed with a nested operational machine through
FaultPolicy
Responsibilities of FaultController:
- own and start the global runtime
- latch the first
FaultCause - request transition to the global
FAULT - execute the user
on_fault_entercallback through theFAULTstate enter action - publish the fault diagnostic with urgent priority
Important invariant:
- entering the global
FAULTnever depends on transport delivery succeeding
The fault path is valid during Board::init().
That is why Board::init() installs, as early as possible in the bootstrap path:
- default diagnostic sinks
- the global fault runtime
before clock/peripheral setup and before subsystem initialization that may trigger PANIC(...).
If a fatal request arrives before the global runtime has been started:
- the cause is latched
- the runtime is rebuilt so that it starts directly in
FAULT - the urgent fault diagnostic is still published through
Diagnostics
If the diagnostic record is produced before any sink exists, it is still retained in local history. When the first sink is installed, the retained history is replayed into the pending queue so the record can still be delivered later.
This avoids losing early boot faults and other pre-transport diagnostics.
FaultCause is not a DiagnosticRecord.
FaultCause is the control-plane object for fatal conditions.
It stores:
- fault kind
- stable origin
- runtime fault payload or protection fault payload
Diagnostics::DiagnosticRecord is the reporting-plane object.
Conversion between both is explicit:
FaultControllerlatches and operates onFaultCauseFaultDiagnosticMapperconvertsFaultCauseinto aDiagnosticRecordDiagnostics::Hubonly stores and deliversDiagnosticRecord
This keeps the global fault runtime independent from the storage and transport shape of diagnostics.
Diagnostics::Hub is a fixed-capacity internal event bus.
Main types:
DiagnosticRecordRuntimeDiagnosticPayloadProtectionDiagnosticPayloadDiagnosticSink
Supporting components:
RecordFactoryDiagnosticFormatterDiagnosticTimestampProvider
Memory policy:
- fixed sink storage
- fixed history ring
- fixed pending queue
- no heap in
publish() - no heap in
flush()
Priority policy:
NORMALURGENT
flush_urgent() drains urgent records only.
flush() drains urgent first and then normal records.
The runtime reporters are intentionally thin façades.
PANIC(...), FAULT(...), WARNING(...), and INFO(...):
- capture source metadata with
std::source_location - format the runtime message into a fixed stack buffer
- publish a diagnostic or request a fault
They do not use shared mutable metadata anymore.
That keeps the reporting path reentrant and removes the old SetMetadata + Trigger split.
DiagnosticTimestampProvider does not start RTC services from the diagnostic hot path.
If RTC is already running and has valid time, records use RTC timestamp data. Otherwise, diagnostics fall back to uptime when available.
This avoids recursive or bootstrap-dependent fatal paths while timestamping diagnostics.
This subsystem uses a narrow set of C++23 features where they provide direct value:
std::expectedfor explicit rule configuration failurestd::variantandstd::visitfor static rule compositionconceptsto constrain rule factories and sample sourcesstd::source_locationfor runtime reporter metadata without mutable globalsstd::to_underlyingfor transport encodingstd::byteswapandstd::endianin the TCP diagnostic encoderstd::span<std::byte>for fixed binary transport encoding
The intent is not to maximize feature usage. The intent is to improve:
- correctness
- API clarity
- determinism
- suitability for embedded firmware
The subsystem is expected to preserve these invariants:
- no heap in protection evaluation
- no heap in diagnostic publish/flush
- no shared mutable metadata in runtime reporters
- no transport logic inside protection rules
- no separate fault-broadcast path outside
Diagnostics - fixed-capacity storage for protections, sinks, history, and pending queue
- explicit lifecycle: register, init, evaluate, flush