Skip to content

Simplify resilient wrapper: fail-fast → communication error → outage error#381

Merged
MaStr merged 6 commits into
mainfrom
claude/exciting-ptolemy-qxaxbn
Jun 15, 2026
Merged

Simplify resilient wrapper: fail-fast → communication error → outage error#381
MaStr merged 6 commits into
mainfrom
claude/exciting-ptolemy-qxaxbn

Conversation

@MaStr

@MaStr MaStr commented Jun 13, 2026

Copy link
Copy Markdown
Owner

Summary

Refactor ResilientInverterWrapper to use a simpler, three-state failure model instead of caching values during outages. The wrapper now:

  1. Before initialization (first successful set_mode_*): propagate errors unchanged (fail-fast for config errors)
  2. After initialization: raise InverterCommunicationError on any failure, allowing the caller to skip the cycle and retry on the next scheduled run
  3. After tolerance window expires: escalate to InverterOutageError to terminate batcontrol

This eliminates the complexity of caching, backoff periods, and default values while maintaining the same resilience guarantees.

Key Changes

  • Removed caching logic: No more CachedValues dataclass or cache fallback during outages. Failures are signaled immediately via exceptions instead.
  • Removed retry backoff: The main loop's natural interval (15 minutes) serves as the backoff; no need for explicit backoff tracking.
  • Simplified state tracking: Replaced _initialization_complete, _first_failure_time, _last_failure_time, _consecutive_failures with just _initialized and _outage_start.
  • New exception hierarchy: Added InverterError base class and InverterCommunicationError for transient failures; InverterOutageError remains for permanent outages.
  • Thin proxy design: ResilientInverterWrapper now uses __getattr__ to delegate all attributes and methods to the wrapped inverter, only intercepting method calls to guard them.
  • Best-effort methods: refresh_api_values() and shutdown() never raise or start outage tracking—they log and continue.
  • Updated core.run(): Catches InverterCommunicationError to skip the control cycle; InverterOutageError propagates to terminate.
  • Background thread safety: Added @_tolerate_inverter_outage decorator for externally triggered actions (MQTT/evcc) to swallow transient failures.

Implementation Details

  • The wrapper intercepts method calls via __getattr__ and wraps them with _guard(), which classifies failures based on initialization state and outage duration.
  • Command methods (set_mode_*) mark initialization complete on first success.
  • Read methods (e.g., get_SOC()) no longer cache values; failures immediately raise InverterCommunicationError.
  • Tests reduced from 582 to 306 lines, focusing on the three-state model rather than caching behavior.
  • Documentation updated to reflect the new behavior and exception semantics.

https://claude.ai/code/session_019TARxrY2uWGTDuxwyugtqD

claude added 5 commits June 13, 2026 10:29
A read failure (e.g. refresh_api_values) starts a retry backoff window in
which the inverter is presumed unavailable. Command operations (set_mode_*)
were subject to the same backoff skip as reads, but have no cached value or
default to fall back on. A set_mode_* call issued during the backoff window
was therefore routed into _get_cached_or_default, which raised RuntimeError
and crashed the whole process.

Commands now degrade gracefully instead of raising:
- During backoff the command is discarded (returns None) rather than fired
  at a presumed-unavailable inverter; batcontrol recalculates and re-issues
  the mode on the next cycle.
- A command attempted outside backoff that fails also returns None instead
  of raising.

Fail-fast before initialization and InverterOutageError after the outage
tolerance window are preserved.

https://claude.ai/code/session_019TARxrY2uWGTDuxwyugtqD
… _read/_command

The old implementation had a single _call_with_resilience method with 7
parameters plus a separate _get_cached_or_default, driven by flags like
is_command and cache_attr strings. Replaced with two clear primitives:

- _read(key, method, default): caches result, returns cached value during
  outages; the dict-based cache replaces the CachedValues dataclass.
- _command(name, method, *args): discards silently during backoff, returns
  None on transient failure instead of crashing.

The _on_failure helper now returns bool (True = re-raise) instead of mixing
returns and raises, making the control flow easier to follow.

Line count: 528 -> 278 (wrapper), 651 -> 380 (tests). Behaviour unchanged.

https://claude.ai/code/session_019TARxrY2uWGTDuxwyugtqD
Replace the value-caching/backoff design with a much simpler concept driven
by the existing run() loop:

- The wrapper no longer caches reads or discards writes. After the first
  successful set_mode_* (initialization), any inverter failure raises
  InverterCommunicationError. core.run() catches it, skips the control cycle
  and retries on the next scheduled run - the loop interval is the natural
  backoff, and no decision is ever made on stale data.
- A failure that persists past outage_tolerance_minutes escalates to
  InverterOutageError, which propagates to __main__ and terminates (unchanged).
- Errors before initialization still propagate unchanged (fail-fast on config).
- Externally triggered control (api_set_mode/charge_rate/limit,
  set_discharge_blocked) runs on background threads and tolerates a transient
  outage via the _tolerate_inverter_outage decorator; termination stays driven
  by the main run() loop.

New InverterError base with InverterCommunicationError; refresh_api_values is
best-effort. Removes CachedValues, retry backoff and get_outage_status. The
retry_backoff_seconds config option is dropped.

resilient_wrapper.py: 528 -> 208 lines.

https://claude.ai/code/session_019TARxrY2uWGTDuxwyugtqD
The wrapper re-declared every InverterInterface method (and forwarded a list
of attributes) just to delegate each call - pure boilerplate. The optional
methods it provided fallbacks for (get_designed_capacity, get_usable_capacity,
get_mqtt_inverter_topic, publish_inverter_discovery_messages) live in the
backend baseclass and are only ever called internally on the backend, never on
the wrapper, so the wrapper never needed them.

Replace the whole thing with a __getattr__ proxy: delegate every attribute and
method to the wrapped inverter and intercept only failing method calls. The
outage classification (fail-fast before init, InverterCommunicationError within
tolerance, InverterOutageError past it) is unchanged; shutdown and
refresh_api_values stay best-effort.

No longer subclasses InverterInterface (the ABC's abstractmethods are satisfied
by delegation, not concrete overrides). Existing wrapper tests pass unchanged.

resilient_wrapper.py: 208 -> 114 lines (528 at the start of this work).

https://claude.ai/code/session_019TARxrY2uWGTDuxwyugtqD
Add the rationale that, without the wrapper, an inverter outage terminates
batcontrol and the container restart policy brings it straight back, turning a
multi-minute outage into a restart loop. Each cold start re-fetches the price
and solar forecasts, so repeated restarts can hit provider rate limits
(Awattar/Tibber, Forecast.Solar/SolarPrognose) - leaving batcontrol without
fresh data even after the inverter recovers.

https://claude.ai/code/session_019TARxrY2uWGTDuxwyugtqD
Copilot AI review requested due to automatic review settings June 13, 2026 19:17

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the inverter resilience layer to a simpler three-state failure model (fail-fast pre-init, transient communication error post-init, outage error after tolerance) and updates core control flow, tests, and documentation accordingly.

Changes:

  • Replaces cached/backoff-based ResilientInverterWrapper with a thin __getattr__ proxy that normalizes failures into InverterCommunicationError / InverterOutageError.
  • Updates Batcontrol.run() to skip a control cycle on InverterCommunicationError, and adds a decorator to swallow inverter errors for externally triggered actions.
  • Updates tests and configuration documentation to match the new failure semantics and removal of retry backoff.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/batcontrol/test_core.py Adds coverage for skipping cycles on InverterCommunicationError and propagating InverterOutageError, plus external API tolerance.
tests/batcontrol/inverter/test_resilient_wrapper.py Reworks wrapper tests around the new three-state model (no caching/backoff).
src/batcontrol/inverter/resilient_wrapper.py Implements the simplified resilient proxy via __getattr__ + guarded method calls and outage tracking.
src/batcontrol/inverter/inverter.py Removes retry-backoff configuration wiring and updates wrapper construction/logging.
src/batcontrol/inverter/exceptions.py Introduces InverterError base class and InverterCommunicationError; keeps InverterOutageError.
src/batcontrol/inverter/init.py Exports the new exception classes from the package.
src/batcontrol/core.py Catches InverterCommunicationError in run() to skip cycles; adds _tolerate_inverter_outage for background-triggered actions.
docs/configuration/inverter-configuration.md Updates user documentation to describe skip-cycle behavior and removes retry-backoff docs.
config/batcontrol_config_dummy.yaml Removes retry_backoff_seconds and updates the resilient-wrapper comment to reflect skip-cycle behavior.

Comment thread src/batcontrol/inverter/resilient_wrapper.py Outdated
Comment thread src/batcontrol/inverter/resilient_wrapper.py
Comment thread src/batcontrol/core.py
Comment thread src/batcontrol/core.py Outdated
- Use bare `raise` in pre-init path to preserve original traceback
- Include original exception in transient-failure warning log
- Correct _tolerate_inverter_outage docstring: background calls advance
  the outage clock; termination only triggers from run()
- Fix PEP 257 leading space in run() docstring

https://claude.ai/code/session_019TARxrY2uWGTDuxwyugtqD

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Comment thread src/batcontrol/inverter/resilient_wrapper.py
@MaStr MaStr merged commit 8142967 into main Jun 15, 2026
14 checks passed
@MaStr MaStr deleted the claude/exciting-ptolemy-qxaxbn branch June 15, 2026 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants