Feature hasn't been suggested before.
Describe the enhancement you want to request
Related issues: #5148 #14451 #21240
The problem:
Yesterday I spent ~$100 in two OpenCode sessions with Claude Opus via OpenRouter.
Looking at the usage logs, every single request was sending 660,000–689,000 input
tokens — the entire chat history and codebase context re-sent from scratch each time.
My actual new message was maybe 100 tokens. The rest was stale history.
What I want:
A way to intercept the messages array before it gets sent to the LLM provider,
so I can strip old history and inject only relevant context retrieved from
tools like MemPalace or SocratiCode instead.
The cleanest solution would be a plugin hook that fires before the provider
call — something like provider.request.before — where the plugin receives
the full messages array and returns a modified (pruned) version.
As a workaround I could run a local OpenAI-compatible proxy that sits between
OpenCode and the real API (by swapping baseURL in opencode.json), but a native
plugin hook would be much cleaner.
Why this matters:
With MemPalace and SocratiCode already installed, the relevant context for any
given question is usually under 10,000 tokens. There's no reason to send 600,000+
tokens every time. Right now there's no way to intercept and replace that context
before it hits the API.
Is a pre-request hook feasible in the current plugin architecture?
Feature hasn't been suggested before.
Describe the enhancement you want to request
Related issues: #5148 #14451 #21240
The problem:
Yesterday I spent ~$100 in two OpenCode sessions with Claude Opus via OpenRouter.
Looking at the usage logs, every single request was sending 660,000–689,000 input
tokens — the entire chat history and codebase context re-sent from scratch each time.
My actual new message was maybe 100 tokens. The rest was stale history.
What I want:
A way to intercept the messages array before it gets sent to the LLM provider,
so I can strip old history and inject only relevant context retrieved from
tools like MemPalace or SocratiCode instead.
The cleanest solution would be a plugin hook that fires before the provider
call — something like provider.request.before — where the plugin receives
the full messages array and returns a modified (pruned) version.
As a workaround I could run a local OpenAI-compatible proxy that sits between
OpenCode and the real API (by swapping baseURL in opencode.json), but a native
plugin hook would be much cleaner.
Why this matters:
With MemPalace and SocratiCode already installed, the relevant context for any
given question is usually under 10,000 tokens. There's no reason to send 600,000+
tokens every time. Right now there's no way to intercept and replace that context
before it hits the API.
Is a pre-request hook feasible in the current plugin architecture?