coding-kitties
diff --git a/‎docs/design/pipeline-api.md‎
Lines changed: 188 additions & 0 deletions b/‎docs/design/pipeline-api.md‎
Lines changed: 188 additions & 0 deletions
diff --git a/‎investing_algorithm_framework/domain/backtesting/backtest.py‎
Lines changed: 30 additions & 6 deletions b/‎investing_algorithm_framework/domain/backtesting/backtest.py‎
Lines changed: 30 additions & 6 deletions
diff --git a/‎investing_algorithm_framework/domain/backtesting/backtest_metrics.py‎
Lines changed: 15 additions & 0 deletions b/‎investing_algorithm_framework/domain/backtesting/backtest_metrics.py‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎investing_algorithm_framework/domain/backtesting/backtest_summary_metrics.py‎
Lines changed: 24 additions & 0 deletions b/‎investing_algorithm_framework/domain/backtesting/backtest_summary_metrics.py‎
Lines changed: 24 additions & 0 deletions
diff --git a/‎investing_algorithm_framework/domain/backtesting/combine_backtests.py‎
Lines changed: 35 additions & 16 deletions b/‎investing_algorithm_framework/domain/backtesting/combine_backtests.py‎
Lines changed: 35 additions & 16 deletions
@@ -0,0 +1,188 @@
+# Pipeline API — Design Doc
+
+> Status: **DRAFT for review**.
+> Tracking issue: [#438](https://github.com/coding-kitties/investing-algorithm-framework/issues/438).
+> Phase issues: [#501](https://github.com/coding-kitties/investing-algorithm-framework/issues/501) (event), [#502](https://github.com/coding-kitties/investing-algorithm-framework/issues/502) (vector), [#503](https://github.com/coding-kitties/investing-algorithm-framework/issues/503) (live).
+
+## 1. Goals & non-goals
+
+### Goals
+
+1. Declarative cross-sectional factor / filter / classifier computation across an asset universe at each bar.
+2. Look-ahead-safe by construction: a factor evaluated at bar `t` sees only data with timestamp `≤ t`.
+3. Strict opt-in: strategies without `pipelines = [...]` see **zero** behavioural or performance change.
+4. Three execution backends (event backtest, vector backtest, live) sharing the same `Pipeline` definition.
+
+### Non-goals (v1)
+
+- Full Zipline-Pipeline parity (no classifier hierarchies, no winsorization, no OLS factors).
+- Live pipelines on sub-daily timeframes.
+- Universes outside the supported envelope (see §6).
+- Cross-market order routing (separate issue).
+
+## 2. Public API
+
+```python
+from investing_algorithm_framework import (
+    TradingStrategy, Pipeline, Returns, AverageDollarVolume, TimeUnit,
+)
+
+class MomentumScreener(Pipeline):
+    dollar_volume = AverageDollarVolume(window=30)
+    momentum = Returns(window=60)
+
+    universe = dollar_volume.top(100)
+    alpha = momentum.rank(mask=universe)
+
+
+class MyStrategy(TradingStrategy):
+    time_unit = TimeUnit.DAY
+    interval = 1
+    pipelines = [MomentumScreener]
+    universe = ["BTC/EUR", "ETH/EUR", ...]   # candidate symbols
+
+    def run_strategy(self, context, data):
+        out = data["MomentumScreener"]   # pl.DataFrame, one row per surviving symbol
+        ...
+```
+
+### Class attributes added to `TradingStrategy`
+
+| Attribute | Type | Default | Meaning |
+|---|---|---|---|
+| `pipelines` | `list[type[Pipeline]]` | `[]` | Pipelines to run before each `run_strategy` call. |
+| `universe` | `list[str] \| list[DataSource]` | `[]` | Candidate symbols. Folded into `data_sources` at app startup; pipelines filter down. |
+
+### `Pipeline` class
+
+- Class attributes that are `Factor` / `Filter` instances are introspected via `__init_subclass__`.
+- A class attribute named `universe` is treated as the **root mask**: if present, every other column is computed on the masked subset.
+- All other attributes become columns of the output frame.
+
+## 3. Panel shape
+
+The engine's internal representation is a **long-form Polars DataFrame**:
+
+```
+schema = {
+    "datetime": pl.Datetime,
+    "symbol":   pl.Utf8,
+    "open":     pl.Float32,
+    "high":     pl.Float32,
+    "low":      pl.Float32,
+    "close":    pl.Float32,
+    "volume":   pl.Float32,
+}
+```
+
+Long-form is chosen because:
+
+- Polars rolling/group-by is faster on long form than on wide.
+- Sparse symbols (delisted, late-listed) are natural — no NaN columns.
+- Cache files are smaller (no per-symbol column duplication).
+
+Per-bar pipeline output handed to the strategy is a **wide** frame keyed by symbol:
+
+```
+out = pl.DataFrame({
+    "symbol":        pl.Utf8,
+    "<factor name>": pl.Float64,    # one column per Factor/Filter on the Pipeline
+    ...
+})
+```
+
+## 4. Engine API (internal)
+
+```python
+class PipelineEngine(Protocol):
+    def evaluate_at(
+        self,
+        pipeline: type[Pipeline],
+        as_of: datetime,
+    ) -> pl.DataFrame: ...
+    """Event mode: return wide per-symbol frame for the given timestamp."""
+
+    def evaluate_range(
+        self,
+        pipeline: type[Pipeline],
+        start: datetime,
+        end: datetime,
+    ) -> pl.DataFrame: ...
+    """Vector mode: return long (date, symbol)-indexed frame for the range."""
+```
+
+Two implementations:
+
+- `LazyPolarsPipelineEngine` (event + vector). Compiles Factor expressions into a single `pl.LazyFrame` plan; `collect()` only at the boundary.
+- `LiveBatchedPipelineEngine` (Phase 3). Adds async batched fetch + universe-refresh.
+
+## 5. Cache key (Phase 2)
+
+Cache lives under `<resource_dir>/pipeline_cache/`. Key:
+
+```
+hash(
+    universe_hash:   sha1(sorted(symbol_list)),
+    daterange:       (start.isoformat(), end.isoformat()),
+    timeframe:       e.g. "1d",
+    expr_hash:       sha1(canonical_repr(factor_expression_tree)),
+    schema_version:  int,           # bump on any cache-incompatible change
+)
+```
+
+Hits return the cached panel/factor frame without recomputation.
+Parameter sweeps over **non-pipeline** attributes (signal thresholds, position sizing) reuse the cache for free.
+
+## 6. Performance contract
+
+| Mode | Timeframe | Max universe | Tested in CI |
+|---|---|---|---|
+| Event BT | daily | 5,000 | ✅ |
+| Event BT | 4h / 1h | 1,000 / 500 | ✅ |
+| Event BT | < 1h | — | ❌ raises |
+| Vector BT | daily | 5,000 | ✅ |
+| Vector BT | 4h / 1h | 1,000 / 500 | ✅ |
+| Vector BT | < 1h | — | ❌ raises |
+| Live | daily | 50 | smoke only |
+| Live | < daily | — | ❌ raises |
+
+**Opt-in guarantee (CI-asserted):** vector backtest of the existing single-symbol example must run within ±10% of the pre-pipeline baseline wall-clock.
+
+## 7. Built-in factors (v1)
+
+| Factor | Formula |
+|---|---|
+| `Returns(window=N)` | `close.pct_change(N)` |
+| `AverageDollarVolume(window=N)` | `(close * volume).rolling_mean(N)` |
+| `SMA(window=N)` | `close.rolling_mean(N)` |
+| `RSI(window=N)` | standard Wilder RSI |
+| `Volatility(window=N)` | `log_returns.rolling_std(N) * sqrt(periods_per_year)` |
+
+All other factors mentioned in #438's original draft (`MACD`, `BollingerBands`, `EWMA`, `VWAP`, `MaxDrawdown`) are deferred. Users can subclass `CustomFactor`:
+
+```python
+class MACD(CustomFactor):
+    inputs = ["close"]
+    window = 26
+
+    def compute(self, close: pl.Series) -> pl.Series:
+        ...
+```
+
+## 8. Look-ahead safety
+
+Factors operate on a Polars `LazyFrame` filtered to `datetime <= as_of` *before* any rolling op. Rolling windows are right-aligned (closed on the right). Tests must assert that injecting a future bar does not change a past factor value.
+
+## 9. Open questions
+
+1. **Universe declaration ergonomics.** Do we accept a callable `universe = lambda ctx: top_500_by_market_cap()` or only a static list in v1? (Proposed: static list in v1, callable in v2.)
+2. **Pipeline scheduling.** Always run every bar, or honour a per-pipeline `time_unit`? (Proposed: same `time_unit` as the strategy in v1; per-pipeline scheduling in v2.)
+3. **Multiple pipelines on one strategy.** Independent (each gets its own cache key) or composable (one pipeline can reference another's column)? (Proposed: independent in v1.)
+4. **Float32 vs float64.** Default to float32 for memory; users opt into float64 per factor? (Proposed: yes, factor-level `dtype=` override.)
+
+## 10. Out of code, in the order of work
+
+1. ✅ This doc reviewed and merged.
+2. Phase 1 ([#501](https://github.com/coding-kitties/investing-algorithm-framework/issues/501)) — event backtest + 5 factors.
+3. Phase 2 ([#502](https://github.com/coding-kitties/investing-algorithm-framework/issues/502)) — vector + cache + benchmark.
+4. Phase 3 ([#503](https://github.com/coding-kitties/investing-algorithm-framework/issues/503)) — live, gated on async CCXT fetch.
@@ -418,6 +418,23 @@ def save(
                 os.makedirs(destination_run_path, exist_ok=True)
                 br.save(destination_run_path)
 
+        # Always rebuild the summary from the current set of backtest runs
+        # before writing it to disk. This guarantees that summary.json is
+        # self-consistent with the per-run metrics.json files (e.g.
+        # number_of_windows == number of runs, total_net_gain == sum of
+        # per-run total_net_gain, etc.) regardless of how the in-memory
+        # Backtest was constructed (single backtest, walk-forward,
+        # merge(), …). See issue #511.
+        if self.backtest_runs:
+            per_run_metrics = [
+                br.backtest_metrics for br in self.backtest_runs
+                if br.backtest_metrics is not None
+            ]
+            if per_run_metrics:
+                self.backtest_summary = generate_backtest_summary_metrics(
+                    per_run_metrics
+                )
+
         # Save combined backtest metrics if available
         if self.backtest_summary:
             summary_file = os.path.join(
@@ -517,15 +534,22 @@ def merge(self, other: 'Backtest') -> 'Backtest':
             Backtest: The merged Backtest instance.
         """
 
-        merged = Backtest()
+        merged = Backtest(algorithm_id=self.algorithm_id)
         merged.backtest_runs = self.backtest_runs + other.backtest_runs
 
-        summary = BacktestSummaryMetrics()
-
-        for bt_run in merged.get_all_backtest_metrics():
-            summary.add(bt_run)
+        # Rebuild the summary from the full set of merged backtest runs.
+        # `BacktestSummaryMetrics` is a plain dataclass and does not expose
+        # an `add()` method, so the previous incremental approach raised
+        # AttributeError and silently produced a stale / single-window
+        # summary. See issue #511.
+        merged_metrics = merged.get_all_backtest_metrics()
+        if merged_metrics:
+            merged.backtest_summary = generate_backtest_summary_metrics(
+                merged_metrics
+            )
+        else:
+            merged.backtest_summary = BacktestSummaryMetrics()
 
-        merged.backtest_summary = summary
         merged.backtest_permutation_tests = \
             self.backtest_permutation_tests + other.backtest_permutation_tests
 
 
@@ -20,6 +20,21 @@ class BacktestMetrics:
     total return, annualized return, volatility, Sharpe ratio,
     and maximum drawdown.
 
+    .. note:: Field semantics & known duplicates (issue #511)
+
+        - ``total_loss`` is the **gross loss magnitude** (a non-negative
+          number equal to ``sum(abs(net_gain))`` over losing trades).
+          ``total_loss_percentage`` is ``total_loss /
+          initial_unallocated`` (decimal, non-negative). These fields
+          no longer mirror ``total_net_gain`` / ``total_net_gain_pct``;
+          see B1 in issue #511.
+
+        - ``total_growth`` / ``total_growth_percentage`` are
+          numerically equivalent to ``total_net_gain`` /
+          ``total_net_gain_percentage`` for the standard
+          mark-to-market portfolio used in backtests. They are kept as
+          legacy aliases. See B3 in issue #511.
+
     Attributes:
         backtest_date_range_name (str): The name of the date range
             used for the backtest.
 
@@ -13,6 +13,30 @@ class BacktestSummaryMetrics:
     Represents the summarized results of a backtest,
     focusing on key headline performance and risk metrics.
 
+    .. note:: Field semantics & known duplicates (issue #511)
+
+        - ``total_loss`` / ``total_loss_percentage`` are gross-loss
+          based: ``total_loss`` is ``sum(per-run gross_loss)`` (a
+          non-negative magnitude in account currency) and
+          ``total_loss_percentage`` is ``total_loss /
+          sum(initial_unallocated)`` (decimal). They no longer mix
+          with net-return semantics. See B1/B2 in issue #511.
+
+        - ``total_growth`` / ``total_growth_percentage`` are
+          numerically equivalent to ``total_net_gain`` /
+          ``total_net_gain_percentage`` for closed-position backtests
+          because both are derived from the same start/end portfolio
+          values. They are kept for backwards compatibility but should
+          be considered legacy aliases. See B3 in issue #511.
+
+        - ``average_net_gain``, ``average_loss``, ``average_growth``
+          (and their ``*_percentage`` counterparts) are time-weighted
+          means **across windows**. For a single-window backtest they
+          collapse to the corresponding ``total_*`` value by
+          definition (weighted mean of one element equals that
+          element). See B4 in issue #511. For per-trade averages use
+          ``average_trade_*`` instead.
+
     Attributes:
         total_net_gain (float): Total net gain from the backtest.
         total_net_gain_percentage (float): Total net gain percentage
 
@@ -39,28 +39,32 @@ def _compound_percentage_returns(percentages):
     Compound percentage returns across multiple periods.
 
     For example, if period 1 has 10% return and period 2 has 5% return,
-    the compounded return is: (1 + 0.10) * (1 + 0.05) - 1 = 15.5%
-    NOT simply 10% + 5% = 15%
+    the compounded return is: (1 + 0.10) * (1 + 0.05) - 1 = 0.155 = 15.5%
+    NOT simply 0.10 + 0.05 = 0.15.
+
+    The framework consistently represents percentages as **decimals**
+    (e.g. ``0.10`` for 10%), so this helper expects decimal inputs and
+    returns a decimal. See issue #511 (B5) — earlier versions assumed
+    whole-number percentages, which silently produced results off by a
+    factor of ~100 once multi-window aggregation was exercised.
 
     Args:
-        percentages (List[float | None]): List of percentage returns
-            (as whole numbers, e.g., 10 for 10%).
+        percentages (List[float | None]): List of period returns expressed
+            as decimals (e.g. ``0.10`` for 10%).
 
     Returns:
-        float | None: The compounded percentage return, or None if no
-            valid percentages.
+        float | None: The compounded return as a decimal, or ``None`` if
+            no valid percentages.
     """
     valid_percentages = [p for p in percentages if p is not None]
     if not valid_percentages:
         return None
 
-    # Convert percentages to decimals, compound, then convert back
     compounded = 1.0
     for pct in valid_percentages:
-        compounded *= (1 + pct / 100)
+        compounded *= (1 + pct)
 
-    # Convert back to percentage
-    return (compounded - 1) * 100
+    return compounded - 1
 
 
 def combine_backtests(backtests):
@@ -173,9 +177,13 @@ def generate_backtest_summary_metrics(
         b.total_net_gain for b in valid_metrics
         if b.total_net_gain is not None
     )
+    # B1/B2 fix (issue #511): per-run ``total_loss`` is now the gross
+    # loss magnitude, so the aggregate is simply the sum of per-run
+    # ``total_loss`` (equivalent to ``sum(gross_loss)``). Both per-run
+    # and aggregate use the same unit (positive currency).
     total_loss = sum(
-        b.gross_loss for b in valid_metrics
-        if b.gross_loss is not None
+        b.total_loss for b in valid_metrics
+        if b.total_loss is not None
     )
     total_growth = sum(
         b.total_growth for b in valid_metrics
@@ -184,13 +192,24 @@ def generate_backtest_summary_metrics(
 
     # === PERCENTAGE RETURNS (compounded, not summed) ===
     # Compound returns: (1 + r1) * (1 + r2) * ... - 1
-    # For percentages stored as whole numbers (e.g., 10 for 10%)
+    # All percentages are stored as decimals (e.g. 0.10 for 10%).
     total_net_gain_percentage = _compound_percentage_returns(
         [b.total_net_gain_percentage for b in valid_metrics]
     )
-    total_loss_percentage = _compound_percentage_returns(
-        [b.total_loss_percentage for b in valid_metrics]
-    )
+    # ``total_loss`` is a non-multiplicative magnitude (it does not
+    # compound across windows). Express the aggregate as the sum of
+    # gross losses divided by the sum of initial capital across
+    # windows, which keeps the unit (decimal fraction) consistent
+    # with the per-run definition. See issue #511 (B2).
+    total_initial_value = 0.0
+    for b in valid_metrics:
+        iv = getattr(b, "initial_unallocated", None)
+        if isinstance(iv, (int, float)) and iv > 0:
+            total_initial_value += iv
+    if total_initial_value > 0 and total_loss is not None:
+        total_loss_percentage = total_loss / total_initial_value
+    else:
+        total_loss_percentage = None
     total_growth_percentage = _compound_percentage_returns(
         [b.total_growth_percentage for b in valid_metrics]
     )