PromptOpsKit supports opt-in compression before provider request generation. Compression never runs unless a prompt or placeholder asks for it.
Use TheTokenCompany when you want backend compression for the rendered # Prompt template:
compression:
thetokencompany:
enabled: true
model: bear-2
aggressiveness: 0.2At render time, pass your API key:
const result = await kit.renderPrompt({
path: 'support/reply',
provider: 'openai',
variables,
theTokenCompany: {
apiKey: process.env.THETOKENCOMPANY_API_KEY,
},
});If credentials are unavailable or the TheTokenCompany call fails, PromptOpsKit leaves the rendered prompt template unchanged, returns a POK057 warning in warnings, and reports zero token savings with matching input/output token counts. Library rendering does not log this fallback to the console.
Use the local heuristic compressor when you want no backend calls:
compression:
heuristic:
enabled: true
mode: conservative
min_tokens: 120
max_sentences: 8
target_reduction: 0.45
query_variable: user_question
preserve_neighbors: true
fail_on_low_confidence: trueThe local compressor deduplicates adjacent repeated sentences, scores sentences against query, query_variable, or system instructions, and keeps exact source sentences within the configured budget.
The default mode: conservative favors meaning retention over maximum savings. It skips compression when there are no usable query terms or no sentence matches the query, preserves neighboring sentences when max_sentences allows, avoids token-level truncation of selected prose, and leaves structured blocks such as Markdown tables, lists, and fenced code unchanged. Set mode: balanced or fail_on_low_confidence: false only when best-effort extraction is acceptable.
Because heuristic compression is lossy, static validation warns when it is enabled for the full prompt template (POK055) or used inside system instructions (POK056). Prefer per-placeholder compression for large context values so task instructions, output constraints, and safety conditions remain intact.
When the input is a complete JSON object or array, enable TOON preprocessing:
compression:
heuristic:
enabled: true
json_to_toon: trueThis converts JSON to TOON and preserves structured output instead of sentence-selecting through it. If the input is not a complete valid JSON object or array, PromptOpsKit leaves it unchanged and returns a POK031 warning instead of text-compressing the broken JSON.
For one placeholder only:
Payload:
{{ json_payload | toon }}Code should not be text-compressed. Use code compaction instead:
compression:
code:
enabled: true
remove_comments: true
trim_indentation: true
collapse_blank_lines: trueFor one placeholder only:
Source:
{{ source_code | compact }}Code compaction preserves code order and tokens while removing comments, common indentation, trailing whitespace, and blank lines by default. It does not perform AST minification.
If comments are semantically important, such as compiler pragmas, @ts-ignore, or generated-code markers, set remove_comments: false. When compression.code.enabled: true, PromptOpsKit skips TheTokenCompany prompt-template compression and returns a POK033 warning so code is not sent through text compression.
Context inputs can opt in through schema:
context:
inputs:
- name: account_context
compression:
heuristic:
enabled: true
mode: conservative
query_variable: user_question
- name: source_code
compression: codeOr at the placeholder call site:
Context: {{ account_context | compress }}
Payload: {{ json_payload | toon }}
Source: {{ source_code | compact }}Placeholder modifiers are single-token shortcuts. Use context.inputs[].compression when a placeholder needs options such as query_variable, mode, preserve_neighbors, fail_on_low_confidence, or remove_comments: false.
Render results include detailed per-step records plus a lightweight aggregate:
const result = await kit.renderPrompt({ /* ... */ });
console.log(result.compressionSummary?.tokensSaved);
console.log(result.compression);compressionSummary is an operation-level aggregate across compression and compaction steps. If a placeholder is compressed and the whole prompt is also compressed, the same source text can contribute to more than one step. Use compression for the per-step breakdown.
compressionSummary contains:
{
steps: number;
inputTokens: number;
outputTokens: number;
tokensSaved: number;
reductionRatio: number;
}The same summary is also attached to result.request.compressionSummary when a provider request is produced.
The local token-compression approach is based on Jason Kneen's open-thetokenco.
TOON preprocessing uses a local encode-only implementation inspired by the MIT-licensed TOON project by Johann Schopplich, without adding @toon-format/toon as a runtime dependency.