Simplify resilient wrapper: fail-fast → communication error → outage error#381
Merged
Conversation
A read failure (e.g. refresh_api_values) starts a retry backoff window in which the inverter is presumed unavailable. Command operations (set_mode_*) were subject to the same backoff skip as reads, but have no cached value or default to fall back on. A set_mode_* call issued during the backoff window was therefore routed into _get_cached_or_default, which raised RuntimeError and crashed the whole process. Commands now degrade gracefully instead of raising: - During backoff the command is discarded (returns None) rather than fired at a presumed-unavailable inverter; batcontrol recalculates and re-issues the mode on the next cycle. - A command attempted outside backoff that fails also returns None instead of raising. Fail-fast before initialization and InverterOutageError after the outage tolerance window are preserved. https://claude.ai/code/session_019TARxrY2uWGTDuxwyugtqD
… _read/_command The old implementation had a single _call_with_resilience method with 7 parameters plus a separate _get_cached_or_default, driven by flags like is_command and cache_attr strings. Replaced with two clear primitives: - _read(key, method, default): caches result, returns cached value during outages; the dict-based cache replaces the CachedValues dataclass. - _command(name, method, *args): discards silently during backoff, returns None on transient failure instead of crashing. The _on_failure helper now returns bool (True = re-raise) instead of mixing returns and raises, making the control flow easier to follow. Line count: 528 -> 278 (wrapper), 651 -> 380 (tests). Behaviour unchanged. https://claude.ai/code/session_019TARxrY2uWGTDuxwyugtqD
Replace the value-caching/backoff design with a much simpler concept driven by the existing run() loop: - The wrapper no longer caches reads or discards writes. After the first successful set_mode_* (initialization), any inverter failure raises InverterCommunicationError. core.run() catches it, skips the control cycle and retries on the next scheduled run - the loop interval is the natural backoff, and no decision is ever made on stale data. - A failure that persists past outage_tolerance_minutes escalates to InverterOutageError, which propagates to __main__ and terminates (unchanged). - Errors before initialization still propagate unchanged (fail-fast on config). - Externally triggered control (api_set_mode/charge_rate/limit, set_discharge_blocked) runs on background threads and tolerates a transient outage via the _tolerate_inverter_outage decorator; termination stays driven by the main run() loop. New InverterError base with InverterCommunicationError; refresh_api_values is best-effort. Removes CachedValues, retry backoff and get_outage_status. The retry_backoff_seconds config option is dropped. resilient_wrapper.py: 528 -> 208 lines. https://claude.ai/code/session_019TARxrY2uWGTDuxwyugtqD
The wrapper re-declared every InverterInterface method (and forwarded a list of attributes) just to delegate each call - pure boilerplate. The optional methods it provided fallbacks for (get_designed_capacity, get_usable_capacity, get_mqtt_inverter_topic, publish_inverter_discovery_messages) live in the backend baseclass and are only ever called internally on the backend, never on the wrapper, so the wrapper never needed them. Replace the whole thing with a __getattr__ proxy: delegate every attribute and method to the wrapped inverter and intercept only failing method calls. The outage classification (fail-fast before init, InverterCommunicationError within tolerance, InverterOutageError past it) is unchanged; shutdown and refresh_api_values stay best-effort. No longer subclasses InverterInterface (the ABC's abstractmethods are satisfied by delegation, not concrete overrides). Existing wrapper tests pass unchanged. resilient_wrapper.py: 208 -> 114 lines (528 at the start of this work). https://claude.ai/code/session_019TARxrY2uWGTDuxwyugtqD
Add the rationale that, without the wrapper, an inverter outage terminates batcontrol and the container restart policy brings it straight back, turning a multi-minute outage into a restart loop. Each cold start re-fetches the price and solar forecasts, so repeated restarts can hit provider rate limits (Awattar/Tibber, Forecast.Solar/SolarPrognose) - leaving batcontrol without fresh data even after the inverter recovers. https://claude.ai/code/session_019TARxrY2uWGTDuxwyugtqD
Contributor
There was a problem hiding this comment.
Pull request overview
Refactors the inverter resilience layer to a simpler three-state failure model (fail-fast pre-init, transient communication error post-init, outage error after tolerance) and updates core control flow, tests, and documentation accordingly.
Changes:
- Replaces cached/backoff-based
ResilientInverterWrapperwith a thin__getattr__proxy that normalizes failures intoInverterCommunicationError/InverterOutageError. - Updates
Batcontrol.run()to skip a control cycle onInverterCommunicationError, and adds a decorator to swallow inverter errors for externally triggered actions. - Updates tests and configuration documentation to match the new failure semantics and removal of retry backoff.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/batcontrol/test_core.py | Adds coverage for skipping cycles on InverterCommunicationError and propagating InverterOutageError, plus external API tolerance. |
| tests/batcontrol/inverter/test_resilient_wrapper.py | Reworks wrapper tests around the new three-state model (no caching/backoff). |
| src/batcontrol/inverter/resilient_wrapper.py | Implements the simplified resilient proxy via __getattr__ + guarded method calls and outage tracking. |
| src/batcontrol/inverter/inverter.py | Removes retry-backoff configuration wiring and updates wrapper construction/logging. |
| src/batcontrol/inverter/exceptions.py | Introduces InverterError base class and InverterCommunicationError; keeps InverterOutageError. |
| src/batcontrol/inverter/init.py | Exports the new exception classes from the package. |
| src/batcontrol/core.py | Catches InverterCommunicationError in run() to skip cycles; adds _tolerate_inverter_outage for background-triggered actions. |
| docs/configuration/inverter-configuration.md | Updates user documentation to describe skip-cycle behavior and removes retry-backoff docs. |
| config/batcontrol_config_dummy.yaml | Removes retry_backoff_seconds and updates the resilient-wrapper comment to reflect skip-cycle behavior. |
- Use bare `raise` in pre-init path to preserve original traceback - Include original exception in transient-failure warning log - Correct _tolerate_inverter_outage docstring: background calls advance the outage clock; termination only triggers from run() - Fix PEP 257 leading space in run() docstring https://claude.ai/code/session_019TARxrY2uWGTDuxwyugtqD
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Refactor
ResilientInverterWrapperto use a simpler, three-state failure model instead of caching values during outages. The wrapper now:set_mode_*): propagate errors unchanged (fail-fast for config errors)InverterCommunicationErroron any failure, allowing the caller to skip the cycle and retry on the next scheduled runInverterOutageErrorto terminate batcontrolThis eliminates the complexity of caching, backoff periods, and default values while maintaining the same resilience guarantees.
Key Changes
CachedValuesdataclass or cache fallback during outages. Failures are signaled immediately via exceptions instead._initialization_complete,_first_failure_time,_last_failure_time,_consecutive_failureswith just_initializedand_outage_start.InverterErrorbase class andInverterCommunicationErrorfor transient failures;InverterOutageErrorremains for permanent outages.ResilientInverterWrappernow uses__getattr__to delegate all attributes and methods to the wrapped inverter, only intercepting method calls to guard them.refresh_api_values()andshutdown()never raise or start outage tracking—they log and continue.core.run(): CatchesInverterCommunicationErrorto skip the control cycle;InverterOutageErrorpropagates to terminate.@_tolerate_inverter_outagedecorator for externally triggered actions (MQTT/evcc) to swallow transient failures.Implementation Details
__getattr__and wraps them with_guard(), which classifies failures based on initialization state and outage duration.set_mode_*) mark initialization complete on first success.get_SOC()) no longer cache values; failures immediately raiseInverterCommunicationError.https://claude.ai/code/session_019TARxrY2uWGTDuxwyugtqD