Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,7 @@ Supported values for `warnings.contextSize` are `auto`, `off`, `result-only`, `c
- **Folder defaults** — `defaults.md` inheritance for shared provider, model, metadata, and system instructions
- **Overrides** — Environment and tier-based overrides (base → env → tier → runtime)
- **4 provider adapters** — OpenAI, Anthropic, Gemini, OpenRouter — body-only output
- **Provider-aware input caching controls** — optional `cache` front matter maps to OpenAI prompt cache hints, Anthropic `cache_control`, and Gemini `cachedContent`
- **Validation** — Zod schema validation, Levenshtein-based "did you mean?" for typos, variable usage checks
- **Context hardening** — structured regexes with flags, `/pattern/i` convenience syntax, and built-in `non_empty` / `reject_secrets` validators
- **Optional short-circuit messages** — validators can return a structured `returnMessage` instead of throwing when configured
Expand Down
27 changes: 27 additions & 0 deletions docs/prompt-format.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ Supported default fields:

- `provider` (front matter) — default provider for the folder
- `model` (front matter) — default model for the folder
- `cache` (front matter) — default provider-specific caching hints
- `metadata` (front matter) — merged with prompt-local metadata
- `# System instructions` (body section) — used when the prompt has none

Expand All @@ -75,6 +76,10 @@ prompts/
---
provider: openai
model: gpt-5.4
cache:
openai:
prompt_cache_key: support-v1
retention: in_memory
metadata:
owner: platform
review_required: true
Expand All @@ -101,10 +106,32 @@ Use support tone and escalation policy.
`prompts/support/reply.md` (no local `metadata.owner` and no local system section) will use:
- `provider: openai` (inherited from root defaults)
- `model: gpt-5.4` (inherited from root defaults)
- `cache.openai.prompt_cache_key: support-v1` (inherited from root defaults)
- `metadata.owner: support` (nearest override)
- `metadata.review_required: true` (inherited from parent defaults)
- system instructions from `support/defaults.md`

## Caching configuration

Use the optional `cache` front matter block to pass vendor-specific caching hints:

```yaml
cache:
openai:
prompt_cache_key: support-v2
retention: 24h
anthropic:
mode: automatic
ttl: 5m
gemini:
cached_content: cachedContents/1234567890
```

- `openai.prompt_cache_key` and `openai.retention` map to OpenAI prompt caching fields.
- `anthropic.mode: automatic` sets top-level `cache_control`; `explicit` applies block-level cache controls to configured sections/tools.
- `gemini.cached_content` (or `google.cached_content`) maps to `cachedContent` for requests that reuse a previously created Gemini cache.
- You can safely include multiple provider blocks in the same prompt. Each adapter only reads its own block (`openai`, `anthropic`, or `gemini`/`google`) and ignores the others.

## Sections

The Markdown body is split on **H1 headings** into named sections. Three section names are recognized (case-insensitive):
Expand Down
12 changes: 12 additions & 0 deletions docs/providers.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ const { request } = result;
```

The provider passed to `renderPrompt` determines which adapter shapes the body. The `provider` field in front matter is informational — the render-time provider controls output.
When a prompt includes multiple cache blocks (for example `cache.openai` + `cache.anthropic`), adapters ignore non-matching blocks so cross-provider settings never leak into the wrong payload.

## Direct adapter imports

Expand Down Expand Up @@ -208,10 +209,17 @@ Field mapping:
| `reasoning.effort` | `reasoning_effort` |
| `response.format: json` | `response_format: { type: "json_object" }` |
| `response.stream` | `stream` |
| `cache.openai.prompt_cache_key` | `prompt_cache_key` |
| `cache.openai.retention` | `prompt_cache_retention` |

Warnings:
- `reasoning.budget_tokens` is ignored (OpenAI uses `reasoning_effort` instead)

Caching notes:
- Prompt caching is already automatic for eligible OpenAI requests.
- `cache.openai.prompt_cache_key` helps route similar prefixes together.
- `cache.openai.retention` can be `in_memory` (default) or `24h`.

## Anthropic

Body shape: [Messages API](https://docs.anthropic.com/en/api/messages)
Expand All @@ -233,6 +241,9 @@ Key differences from OpenAI:
- `max_tokens` is **required** — defaults to `4096` if `sampling.max_output_tokens` is not set.
- `sampling.stop` maps to `stop_sequences`.
- `reasoning.budget_tokens` maps to `thinking: { type: "enabled", budget_tokens }`.
- `cache.anthropic.mode: automatic` maps to top-level `cache_control`.
- `cache.anthropic.mode: explicit` applies `cache_control` at block level for selected sections/tools.
- `cache.anthropic.ttl` supports `5m` (default) or `1h`.

Warnings:
- `frequency_penalty` and `presence_penalty` are not supported — ignored with a warning.
Expand Down Expand Up @@ -266,6 +277,7 @@ Key differences:
- `top_p` maps to `topP`, `max_output_tokens` maps to `maxOutputTokens`, `stop` maps to `stopSequences`.
- `response.format: json` maps to `generationConfig.responseMimeType: "application/json"`.
- `reasoning.effort` maps to `thinkingConfig.thinkingBudget` (high=8192, medium=4096, low=1024).
- `cache.gemini.cached_content` (or `cache.google.cached_content`) maps to top-level `cachedContent`.

Warnings:
- `frequency_penalty` and `presence_penalty` are not supported — ignored with a warning.
Expand Down
35 changes: 34 additions & 1 deletion docs/schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Prompt files use YAML front matter. This page documents every supported field.
| `reasoning` | `object` | No | Reasoning/thinking configuration |
| `sampling` | `object` | No | Sampling parameters |
| `response` | `object` | No | Response format and streaming |
| `cache` | `object` | No | Provider-specific prompt/context caching options |
| `tools` | `array` | No | Tool references (strings or inline definitions) |
| `mcp` | `object` | No | MCP server references |
| `context` | `object` | No | Declare expected variables and history settings |
Expand All @@ -31,6 +32,7 @@ Prompt files use YAML front matter. This page documents every supported field.
|-------|------|-------------|
| `provider` | `enum` | Default provider (`openai`, `anthropic`, `google`, `gemini`, `openrouter`, `any`) |
| `model` | `string` | Default model identifier |
| `cache` | `object` | Same as prompt-level `cache` block |
| `metadata` | `object` | Same as the prompt `metadata` block (`owner`, `tags`, `review_required`, `stable`) |
| `# System instructions` | section | System instructions inherited by prompts in this folder |

Expand Down Expand Up @@ -114,6 +116,37 @@ Inline tool definition fields:
| `description` | `string` | No | Tool description |
| `input_schema` | `object` | No | JSON Schema for tool input |

## `cache`

```yaml
cache:
openai:
prompt_cache_key: support-v1
retention: in_memory # in_memory | 24h
anthropic:
mode: automatic # automatic | explicit
ttl: 5m # 5m | 1h
cache_system_instructions: true
cache_tools: true
cache_prompt_template: false
gemini:
cached_content: cachedContents/1234567890
```

| Field | Type | Description |
|-------|------|-------------|
| `openai.prompt_cache_key` | `string` | Optional routing key to improve cache-hit locality on shared prefixes |
| `openai.retention` | `'in_memory' \| '24h'` | Prompt cache retention policy |
| `anthropic.mode` | `'automatic' \| 'explicit'` | Automatic top-level caching or explicit block-level cache breakpoints |
| `anthropic.type` | `'ephemeral'` | Cache type (currently only `ephemeral`) |
| `anthropic.ttl` | `'5m' \| '1h'` | Anthropic cache duration |
| `anthropic.cache_system_instructions` | `boolean` | In explicit mode, cache system instructions block |
| `anthropic.cache_tools` | `boolean` | In explicit mode, cache tool declarations |
| `anthropic.cache_prompt_template` | `boolean` | In explicit mode, cache prompt-template user block |
| `gemini.cached_content` / `google.cached_content` | `string` | Previously created Gemini cache resource name used as `cachedContent` |

You can define multiple provider cache blocks in one prompt; each adapter reads only its own cache settings.

## `mcp`

```yaml
Expand Down Expand Up @@ -190,7 +223,7 @@ tiers:
model: gpt-5.4
```

Each environment/tier key maps to an overrides object. Overridable fields: `model`, `fallback_models`, `reasoning`, `sampling`, `response`, `tools`. See [Overrides](./overrides.md).
Each environment/tier key maps to an overrides object. Overridable fields: `model`, `fallback_models`, `reasoning`, `sampling`, `response`, `cache`, `tools`. See [Overrides](./overrides.md).

## `metadata`

Expand Down
5 changes: 5 additions & 0 deletions src/cli/commands/init.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,11 @@ context:
- app_context
includes:
- ./shared/tone.md
cache:
openai:
# Keep this stable across requests that share a long static prefix.
prompt_cache_key: hello-v1
retention: in_memory
reasoning:
effort: high
environments:
Expand Down
45 changes: 41 additions & 4 deletions src/providers/anthropic.ts
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,15 @@ export const anthropicAdapter: ProviderAdapter = withPromptInputSupport({
});

const messages: Array<Record<string, unknown>> = [];
const anthropicCacheConfig = resolvedAsset.cache?.anthropic;
const cacheType = anthropicCacheConfig?.type ?? 'ephemeral';
const cacheControl = anthropicCacheConfig
? {
type: cacheType,
...(anthropicCacheConfig.ttl ? { ttl: anthropicCacheConfig.ttl } : {}),
}
: undefined;
const cacheMode = anthropicCacheConfig?.mode ?? 'automatic';

// History
if (runtime.history) {
Expand All @@ -56,7 +65,14 @@ export const anthropicAdapter: ProviderAdapter = withPromptInputSupport({

// User message (prompt template)
if (sections.prompt_template) {
messages.push({ role: 'user', content: sections.prompt_template });
if (cacheControl && cacheMode === 'explicit' && anthropicCacheConfig?.cache_prompt_template) {
messages.push({
role: 'user',
content: [{ type: 'text', text: sections.prompt_template, cache_control: cacheControl }],
});
} else {
messages.push({ role: 'user', content: sections.prompt_template });
}
}

const body: Record<string, unknown> = {
Expand All @@ -66,7 +82,11 @@ export const anthropicAdapter: ProviderAdapter = withPromptInputSupport({

// System goes as top-level field in Anthropic
if (sections.system_instructions) {
body.system = sections.system_instructions;
if (cacheControl && cacheMode === 'explicit' && anthropicCacheConfig?.cache_system_instructions !== false) {
body.system = [{ type: 'text', text: sections.system_instructions, cache_control: cacheControl }];
} else {
body.system = sections.system_instructions;
}
}

// Sampling params
Expand All @@ -93,18 +113,35 @@ export const anthropicAdapter: ProviderAdapter = withPromptInputSupport({
body.stream = resolvedAsset.response.stream;
}

if (cacheControl && cacheMode === 'automatic') {
body.cache_control = cacheControl;
}

// Tools
if (resolvedAsset.tools && resolvedAsset.tools.length > 0) {
body.tools = resolvedAsset.tools.map((tool) => {
if (typeof tool === 'string') {
const def = runtime.toolRegistry?.[tool];
if (def) return def;
return { name: tool };
if (def) {
if (cacheControl && cacheMode === 'explicit' && anthropicCacheConfig?.cache_tools) {
return { ...(def as Record<string, unknown>), cache_control: cacheControl };
}
return def;
}
return {
name: tool,
...(cacheControl && cacheMode === 'explicit' && anthropicCacheConfig?.cache_tools
? { cache_control: cacheControl }
: {}),
};
}
return {
name: tool.name,
description: tool.description,
input_schema: tool.input_schema ?? { type: 'object', properties: {} },
...(cacheControl && cacheMode === 'explicit' && anthropicCacheConfig?.cache_tools
? { cache_control: cacheControl }
: {}),
};
});
}
Expand Down
10 changes: 10 additions & 0 deletions src/providers/gemini.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ export const geminiAdapter: ProviderAdapter = withPromptInputSupport({
const resolvedAsset = resolveAssetForProvider(asset, runtime);
const errors: string[] = [];
const warnings: string[] = [];
const geminiCache = resolvedAsset.cache?.gemini?.cached_content;
const googleCache = resolvedAsset.cache?.google?.cached_content;

if (!resolvedAsset.model) {
errors.push('Gemini adapter requires a model to be specified.');
Expand All @@ -31,6 +33,9 @@ export const geminiAdapter: ProviderAdapter = withPromptInputSupport({
if (resolvedAsset.sampling?.presence_penalty !== undefined) {
warnings.push('Gemini does not support presence_penalty. It will be ignored.');
}
if (geminiCache && googleCache && geminiCache !== googleCache) {
warnings.push('Both cache.gemini.cached_content and cache.google.cached_content are set. Gemini uses cache.gemini.cached_content.');
}

return { valid: errors.length === 0, errors, warnings };
},
Expand Down Expand Up @@ -65,6 +70,7 @@ export const geminiAdapter: ProviderAdapter = withPromptInputSupport({
const body: Record<string, unknown> = {
contents,
};
const geminiCacheConfig = resolvedAsset.cache?.gemini ?? resolvedAsset.cache?.google;

// System instruction
if (sections.system_instructions) {
Expand Down Expand Up @@ -96,6 +102,10 @@ export const geminiAdapter: ProviderAdapter = withPromptInputSupport({
body.generationConfig = generationConfig;
}

if (geminiCacheConfig?.cached_content) {
body.cachedContent = geminiCacheConfig.cached_content;
}

// Tools
if (resolvedAsset.tools && resolvedAsset.tools.length > 0) {
const functionDeclarations = resolvedAsset.tools.map((tool) => {
Expand Down
8 changes: 8 additions & 0 deletions src/providers/openai.ts
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ export const openaiAdapter: ProviderAdapter = withPromptInputSupport({
model: resolvedAsset.model,
messages,
};
const openaiCacheConfig = resolvedAsset.cache?.openai;

// Sampling params
if (resolvedAsset.sampling?.temperature !== undefined) body.temperature = resolvedAsset.sampling.temperature;
Expand All @@ -86,6 +87,13 @@ export const openaiAdapter: ProviderAdapter = withPromptInputSupport({
body.stream = resolvedAsset.response.stream;
}

if (openaiCacheConfig?.prompt_cache_key) {
body.prompt_cache_key = openaiCacheConfig.prompt_cache_key;
}
if (openaiCacheConfig?.retention) {
body.prompt_cache_retention = openaiCacheConfig.retention;
}

// Tools
if (resolvedAsset.tools && resolvedAsset.tools.length > 0) {
body.tools = resolvedAsset.tools.map((tool) => {
Expand Down
4 changes: 4 additions & 0 deletions src/schema/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ export {
ReasoningSchema,
SamplingSchema,
ResponseSchema,
CacheSchema,
OpenAICacheSchema,
AnthropicCacheSchema,
GeminiCacheSchema,
ContextSchema,
ContextInputDefinitionSchema,
ContextInputDefinitionObjectSchema,
Expand Down
30 changes: 30 additions & 0 deletions src/schema/schema.ts
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,33 @@ export const ResponseSchema = z.object({
stream: z.boolean().optional(),
});

// --- Cache controls ---

export const OpenAICacheSchema = z.object({
prompt_cache_key: z.string().min(1).optional(),
retention: z.enum(['in_memory', '24h']).optional(),
});

export const AnthropicCacheSchema = z.object({
mode: z.enum(['automatic', 'explicit']).optional(),
type: z.literal('ephemeral').optional(),
ttl: z.enum(['5m', '1h']).optional(),
cache_system_instructions: z.boolean().optional(),
cache_tools: z.boolean().optional(),
cache_prompt_template: z.boolean().optional(),
});

export const GeminiCacheSchema = z.object({
cached_content: z.string().min(1).optional(),
});

export const CacheSchema = z.object({
openai: OpenAICacheSchema.optional(),
anthropic: AnthropicCacheSchema.optional(),
gemini: GeminiCacheSchema.optional(),
google: GeminiCacheSchema.optional(),
});

// --- Context ---

export const HistorySchema = z.object({
Expand Down Expand Up @@ -118,6 +145,7 @@ export const PromptAssetOverridesSchema = z.object({
reasoning: ReasoningSchema.optional(),
sampling: SamplingSchema.optional(),
response: ResponseSchema.optional(),
cache: CacheSchema.optional(),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Merge cache in applyOverrides before rendering

Adding cache to override schema here makes environments/tiers/runtime cache blocks validate, but mergeOverride in src/overrides/apply-overrides.ts never applies override.cache. That means resolveAssetForProvider silently drops all cache overrides, so cache behavior cannot actually vary by environment or tier even though this commit documents cache as overridable.

Useful? React with 👍 / 👎.

tools: z.array(ToolRefSchema).optional(),
});

Expand All @@ -143,6 +171,7 @@ export const SectionsSchema = z.object({
export const PromptDefaultsSchema = z.object({
provider: z.enum(['openai', 'anthropic', 'google', 'gemini', 'openrouter', 'any']).optional(),
model: z.string().optional(),
cache: CacheSchema.optional(),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Propagate defaults cache into loaded prompt assets

This makes defaults.md accept a cache block, but defaults application in src/parser/loader.ts does not merge or apply defaults.cache (the merge and short-circuit checks still only handle provider/model/metadata/system). As a result, cache settings in folder defaults are parsed but never inherited by prompt files, so the new defaults examples and init template cache config are no-ops.

Useful? React with 👍 / 👎.

metadata: MetadataSchema.optional(),
sections: z.object({
system_instructions: z.string().optional(),
Expand All @@ -165,6 +194,7 @@ export const PromptAssetSchema = z.object({
reasoning: ReasoningSchema.optional(),
sampling: SamplingSchema.optional(),
response: ResponseSchema.optional(),
cache: CacheSchema.optional(),

tools: z.array(ToolRefSchema).optional(),
mcp: MCPSchema.optional(),
Expand Down
2 changes: 1 addition & 1 deletion src/validation/validate.ts
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ export interface PromptValidationResult {
const KNOWN_FRONT_MATTER_KEYS = new Set([
'id', 'schema_version', 'description', 'provider', 'model', 'fallback_models',
'reasoning', 'sampling', 'response', 'tools', 'mcp', 'context', 'includes',
'environments', 'tiers', 'metadata',
'environments', 'tiers', 'metadata', 'cache',
]);

/**
Expand Down
Loading
Loading