ctx-opt

Intelligent context window optimization middleware for LLM applications. Drop it in front of any chat-completion call: it takes a messages[] array, trims or compresses it to fit a token budget, and returns the optimized array plus metadata describing exactly what it did.

Framework-agnostic — works with the OpenAI SDK, Anthropic SDK, or anything else that consumes a chat messages[] array.
Strict TypeScript types. ESM and CJS builds. Node 18+ and browser-friendly.
No required peer dependencies — bring your own LLM client.

Install

npm install ctx-opt

Quick start

import { ContextOptimizer } from 'ctx-opt';

const optimizer = new ContextOptimizer({
  maxTokens: 8_000,
  strategy: 'sliding-window',
  slidingWindow: { size: 12 },
});

const { messages, meta } = await optimizer.optimize(history);

// Now pass `messages` to your LLM client.
console.log(`saved ${meta.saved} tokens (${meta.compressionRatio.toFixed(2)}x)`);

One-line drop-in for OpenAI or Anthropic

If you're already using one of the major SDKs, wrap your client and stop thinking about it:

import OpenAI from 'openai';
import { withOptimizer } from 'ctx-opt/openai';

const ai = withOptimizer(new OpenAI(), {
  maxTokens: 8_000,
  strategy: 'sliding-window',
});

// chat.completions.create now auto-trims `messages` to fit the budget.
const res = await ai.chat.completions.create({
  model: 'gpt-4o',
  messages: longHistory,
});

console.log(ai.lastMeta); // { saved, compressionRatio, ... }

import Anthropic from '@anthropic-ai/sdk';
import { withOptimizer } from 'ctx-opt/anthropic';

const ai = withOptimizer(new Anthropic(), {
  maxTokens: 8_000,
  strategy: 'sliding-window',
  model: 'claude-haiku-4-5-20251001',
});

const res = await ai.messages.create({
  model: 'claude-haiku-4-5-20251001',
  max_tokens: 1024,
  system: 'you are concise',
  messages: longHistory,
});

import { generateText } from 'ai';
import { withOptimizer } from 'ctx-opt/ai-sdk';

const trimmedGenerate = withOptimizer(generateText, {
  maxTokens: 8_000,
  strategy: 'sliding-window',
});

const { text } = await trimmedGenerate({
  model: openai('gpt-4o'),
  messages: longHistory,
});

openai, @anthropic-ai/sdk, and ai are optional peer deps — install only the one(s) you use. See examples/ for OpenAI, Anthropic, summarizer-with-real-LLM, and LangChain.js integrations.

Strategies

Strategy	Speed	Quality	Needs LLM call?	When to use
`sliding-window`	Fastest	Good	No	Default. Cheap, predictable, lossy at the tails.
`summarizer`	Slow	Best	Yes	Long sessions where older context still matters.
`relevance`	Medium	High	Yes (scorer)	Heterogeneous histories where some turns clearly aren't relevant.
`hybrid`	Slow	Best	Yes (both)	Production: relevance-filter first, then summarize the rest if still over budget.

All strategies preserve the system prompt by default and never split a tool-use / tool-result pair across the boundary. Multi-modal content (images, files, audio) is preserved verbatim through optimization via the passthrough content-block type; the messages reach the LLM unmodified even if ctx-opt trims around them.

Benchmarks

Reproducible workload: a 60-turn synthetic support conversation (9.8k input tokens), budget = 30% of input (2.9k tokens). Cost basis: gpt-4o ($2.50 per 1M input tokens). Run npm run bench to regenerate.

Strategy	Output tokens	Saved	$ saved/call	$ saved/1k calls	Compression	Time
`sliding-window`	717	9,147	$0.02287	$22.87	92.7%	10ms
`summarizer`	628	9,236	$0.02309	$23.09	93.6%	23ms
`relevance`	2,644	7,220	$0.01805	$18.05	73.2%	18ms
`hybrid`	2,644	7,220	$0.01805	$18.05	73.2%	21ms

Numbers are wall-clock on a single laptop with mock LLM + scorer (so they isolate ctx-opt's own overhead from network latency). In production the LLM-using strategies will be dominated by the model round-trip, not ctx-opt.

Try it in your browser

Open the interactive playground to paste a chat history and watch all four strategies trim it side-by-side, with live token and dollar savings.

cd playground && npm install && npm run dev

API

`new ContextOptimizer(config)`

interface OptimizerConfig {
  maxTokens: number;                    // hard token budget for the output
  strategy: 'sliding-window' | 'summarizer' | 'relevance' | 'hybrid';
  model?: string;                       // for token-counting accuracy (default: 'gpt-4o')
  preserveSystem?: boolean;             // default: true
  recentWindow?: number;                // never-drop window size (default: 6)

  slidingWindow?: { size: number };

  summarizer?: {
    llmCall: SummarizerLLMFn;           // your LLM call — see "Plugging in your LLM"
    maxSummaryTokens?: number;          // default: 400
    triggerThreshold?: number;          // 0..1, default: 0.85
    recentWindow?: number;              // per-strategy override
    onError?: 'fall-back' | 'throw' | ((err: unknown) => void); // default: 'fall-back'
  };

  relevance?: {
    scorer: RelevanceScorerFn;          // your scorer — returns one score per message
    minScore?: number;                  // default: 0.2
    recentWindow?: number;              // per-strategy override
  };

  pricing?: Record<string, { inputUsdPerMillion: number }>;     // override built-in pricing table
}

Built-in relevance scorers

Don't want to write your own scorer? Two are shipped under ctx-opt/scorers:

import { ContextOptimizer } from 'ctx-opt';
import { bm25Scorer, createEmbeddingScorer } from 'ctx-opt/scorers';

// Pure-JS BM25 keyword scoring. Zero deps, zero network calls.
new ContextOptimizer({
  maxTokens: 8_000,
  strategy: 'relevance',
  relevance: { scorer: bm25Scorer(), minScore: 0.05 },
});

// Embedding-based scoring. Bring your own embed function.
import OpenAI from 'openai';
const openai = new OpenAI();
new ContextOptimizer({
  maxTokens: 8_000,
  strategy: 'relevance',
  relevance: {
    scorer: createEmbeddingScorer({
      embed: async (texts) => {
        const res = await openai.embeddings.create({
          model: 'text-embedding-3-small',
          input: texts,
        });
        return res.data.map((d) => d.embedding);
      },
    }),
    minScore: 0.3,
  },
});

The embedding scorer caches per-message vectors in-process so subsequent optimize() calls only embed new messages.

Cost tracking

When model is set to a known model, meta includes the dollar cost of the optimized input and the dollars saved versus the unoptimized array:

const { meta } = await optimizer.optimize(history);
console.log(`saved $${meta.savedUsd?.toFixed(4)} this call`);

Built-in pricing covers GPT-4o family, GPT-4 Turbo, GPT-3.5, o1/o3, the Claude 3.5 / 4.x family, and Gemini 1.5 / 2.0. Override or add your own via the pricing config option.

`optimize(messages, input?)`

Returns { messages, meta }. If the input is already within budget, the array is returned unchanged but meta is still populated.

interface OptimizeInput {
  task?: string;                        // current user goal — used by relevance strategy
  forceStrategy?: StrategyName;         // override config strategy for this call
}

`countTokens(messages)`

Token count for a messages array, including per-message overhead.

`withinBudget(messages)`

Returns true if countTokens(messages) <= maxTokens.

`updateConfig(patch)`

Apply a partial update to the config without creating a new instance.

Plugging in your LLM

The summarizer and relevance strategies need you to provide the actual model call. This keeps ctx-opt zero-dependency on any specific SDK.

Anthropic example

import Anthropic from '@anthropic-ai/sdk';
import type { SummarizerLLMFn } from 'ctx-opt';

const client = new Anthropic();

const llmCall: SummarizerLLMFn = async (messages, instruction) => {
  const res = await client.messages.create({
    model: 'claude-haiku-4-5-20251001',
    max_tokens: 400,
    system: instruction,
    messages: messages.map((m) => ({
      role: m.role === 'assistant' ? 'assistant' : 'user',
      content: typeof m.content === 'string' ? m.content : JSON.stringify(m.content),
    })),
  });
  const block = res.content[0];
  return block && block.type === 'text' ? block.text : '';
};

OpenAI example

import OpenAI from 'openai';
import type { SummarizerLLMFn } from 'ctx-opt';

const client = new OpenAI();

const llmCall: SummarizerLLMFn = async (messages, instruction) => {
  const res = await client.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      { role: 'system', content: instruction },
      ...messages.map((m) => ({
        role: m.role === 'assistant' ? 'assistant' : 'user',
        content: typeof m.content === 'string' ? m.content : JSON.stringify(m.content),
      })) as any,
    ],
  });
  return res.choices[0]?.message?.content ?? '';
};

The metadata object

Every call to optimize() returns a meta describing what happened:

Field	Meaning
`inputTokens`	Token count of the input `messages[]`.
`outputTokens`	Token count after optimization.
`saved`	`inputTokens - outputTokens`.
`compressionRatio`	`outputTokens / inputTokens`.
`strategyUsed`	Which strategy actually ran (useful when `forceStrategy` is set).
`messagesDropped`	Number of messages removed from the array.
`messagesSummarized`	Number of messages that were folded into a summary.
`withinBudget`	`true` if `outputTokens <= maxTokens`.
`inputCostUsd`	Dollar cost of the optimized input. Undefined if model pricing is unknown.
`savedUsd`	Dollars saved on input cost vs the unoptimized array. Undefined if model pricing is unknown.
`fellBackTo`	Set when the requested strategy couldn't run cleanly and fell back (e.g. summarizer's llmCall threw → falls back to `sliding-window`).

Token counting accuracy

ctx-opt uses js-tiktoken for token counts.

OpenAI models — model-specific encoding when known, falling back to cl100k_base.
Anthropic models — cl100k_base is the closest publicly available approximation. Counts will be within a few percent of the official tokenizer.
A per-message overhead of 4 tokens is added to each message to approximate the role and formatting tokens (per OpenAI's chat-completion cookbook formula).

For exact Anthropic counts, use the built-in helper that delegates to Anthropic's messages.countTokens endpoint:

import Anthropic from '@anthropic-ai/sdk';
import { countMessageTokensWithAnthropic } from 'ctx-opt';

const client = new Anthropic();
const tokens = await countMessageTokensWithAnthropic(
  client,
  messages,
  'claude-haiku-4-5-20251001'
);

Status: pre-1.0

ctx-opt is at 0.x. The core API surface (ContextOptimizer, strategies, meta shape, SDK adapters) is settling but breaking changes are still on the table until 1.0. Each minor version (0.4 -> 0.5) may contain breaking changes; patch versions (0.5.0 -> 0.5.1) will not. The CHANGELOG calls out anything breaking explicitly.

Pin to a minor version in production:

"dependencies": { "ctx-opt": "~0.5.0" }

Browser bundle

The core works in the browser, but the underlying js-tiktoken encoding tables add ~2 MB to your bundle. That's fine for an internal tool or a server-rendered app but not for a tightly performance-budgeted client. For client-side use, consider running optimization on the server and streaming the result down.

Changelog

See CHANGELOG.md for the release history.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github		.github
benchmarks		benchmarks
examples		examples
playground		playground
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ctx-opt

Install

Quick start

One-line drop-in for OpenAI or Anthropic

Strategies

Benchmarks

Try it in your browser

API

`new ContextOptimizer(config)`

Built-in relevance scorers

Cost tracking

`optimize(messages, input?)`

`countTokens(messages)`

`withinBudget(messages)`

`updateConfig(patch)`

Plugging in your LLM

Anthropic example

OpenAI example

The metadata object

Token counting accuracy

Status: pre-1.0

Browser bundle

Changelog

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ctx-opt

Install

Quick start

One-line drop-in for OpenAI or Anthropic

Strategies

Benchmarks

Try it in your browser

API

new ContextOptimizer(config)

Built-in relevance scorers

Cost tracking

optimize(messages, input?)

countTokens(messages)

withinBudget(messages)

updateConfig(patch)

Plugging in your LLM

Anthropic example

OpenAI example

The metadata object

Token counting accuracy

Status: pre-1.0

Browser bundle

Changelog

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`new ContextOptimizer(config)`

`optimize(messages, input?)`

`countTokens(messages)`

`withinBudget(messages)`

`updateConfig(patch)`

Packages