You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(api): per-bank provider cost attribution via OpenAI user field (#1965)
* feat(api): per-bank provider cost attribution via OpenAI user field
Lets operators attribute Hindsight's provider spend per bank.
- Add a `_current_bank_id` engine ContextVar (mirroring the existing
`_current_schema` pattern) bound in recall_async, retain_async,
retain_batch_async, and execute_task, with a `get_current_bank_id()`
accessor. Bindings use a token + finally reset.
- Add `HINDSIGHT_API_LLM_SEND_BANK_AS_USER` (bool, default off). When on,
outbound OpenAI-compatible LLM and embedding calls are tagged with
`user=<bank_id>` so downstream cost gateways (OpenRouter usage
accounting, LiteLLM, Helicone) can key spend per bank. Injection is
centralized per call_params construction site and never overrides a
`user` the caller already set.
- Propagate the bank ContextVar into the embedding executor thread:
generate_embeddings_batch now copies the current context before the
run_in_executor offload (run_in_executor does not inherit contextvars),
preserving the existing exception wrapping and 1:1 length validation.
- Make the OpenRouter reranker base URL configurable via
`HINDSIGHT_API_RERANKER_OPENROUTER_BASE_URL` (default unchanged:
https://openrouter.ai/api/v1/rerank) so rerank can route through a
metering gateway. The URL is a credential field (not bank-configurable).
Tests cover ContextVar set/reset including on exception, user injection
gated on flag + bank presence + no caller override (chat and tool-calling
paths plus embeddings), real-executor context propagation, and the
configurable rerank base URL.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* refactor(engine): bind the bank ContextVar via decorator, not inline try/finally
The inline token/try/finally wraps re-indented the entire bodies of
execute_task, retain_batch_async, and recall_async — ~1,130 lines of
indentation-only churn in the diff for a ~40-line feature.
Replace the four inline bindings with a @_bind_bank_id decorator that
binds _current_bank_id from the method's bank_id argument (or a key in
a dict argument, for execute_task's task_dict) with the same token +
finally-reset semantics. Method bodies return to their original
indentation, shrinking the memory_engine.py diff to +51/-1.
Behavior is unchanged and now directly unit-tested: the decorator gets
its own tests for positional/keyword binding, dict-key extraction,
reset-on-exception, and non-string fallback to None.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* refactor(api): dedupe bank-attribution helper into shared module
Collapse the two identical _apply_bank_attribution copies (embeddings + OpenAI-compatible
LLM) into engine/bank_attribution.apply_bank_attribution. Add a docs note that the bank id
is transmitted to the provider as the end-user identifier, and de-pad the new config rows.
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Nicolò Boschi <boschi1997@gmail.com>
0 commit comments