PredictabilityAtScale
diff --git a/‎README.md‎
Lines changed: 39 additions & 10 deletions b/‎README.md‎
Lines changed: 39 additions & 10 deletions
diff --git a/‎SKILL.md‎
Lines changed: 64 additions & 8 deletions b/‎SKILL.md‎
Lines changed: 64 additions & 8 deletions
diff --git a/‎docs/api-reference.md‎
Lines changed: 4 additions & 2 deletions b/‎docs/api-reference.md‎
Lines changed: 4 additions & 2 deletions
@@ -34,7 +34,7 @@ To one reviewable asset:
 Core capabilities:
 
 - **Markdown prompt assets** — capture prompt text, model config, tool bindings, context rules, and metadata together.
-- **Provider-ready output** — render request bodies for OpenAI Chat, OpenAI Responses, Anthropic, Gemini, and OpenRouter while your app owns transport.
+- **Provider-ready output** — render request bodies for OpenAI Chat, OpenAI Responses, Anthropic, Gemini, OpenRouter, and LLMAsAService while your app owns transport.
 - **Input hardening** — define required values, size limits, allow/deny patterns, and secret rejection close to the prompt template.
 - **Reusable composition** — share tone, policy, and safety instructions with `includes`, and apply folder-level standards with `defaults.md`.
 - **Environment and tier overrides** — keep dev/prod and plan-specific behavior in one prompt source with explicit, reviewable overrides.
@@ -148,7 +148,7 @@ Supported values for `warnings.contextSize` are `auto`, `off`, `result-only`, `c
 - **Composition** — `includes` to share system instructions across prompts, with circular detection
 - **Folder defaults** — `defaults.md` inheritance for shared provider, model, metadata, and system instructions
 - **Overrides** — Environment and tier-based overrides (base → env → tier → runtime)
-- **5 provider adapters** — OpenAI (Chat), OpenAI (Responses), Anthropic, Gemini, OpenRouter — body-only output
+- **6 provider adapters** — OpenAI (Chat), OpenAI (Responses), Anthropic, Gemini, OpenRouter, LLMAsAService
 - **Provider-aware input caching controls** — optional `cache` front matter maps to OpenAI prompt cache hints, Anthropic `cache_control`, and Gemini `cachedContent`
 - **Vendor escape hatch** — optional `raw.<provider>` blocks shallow-merge unmodeled request-body fields into the final provider payload
 - **Validation** — Zod schema validation, Levenshtein-based "did you mean?" for typos, variable usage checks
@@ -203,6 +203,24 @@ result = await kit.renderPrompt({
   variables: { name: 'World', app_context: 'Welcome screen' },
 });
 if (!result.request) throw new Error(result.returnMessage ?? 'Prompt rendering failed.');
+
+// LLMAsAService — OpenAI-compatible gateway with project and customer metadata
+result = await kit.renderPrompt({
+  path: 'hello',
+  provider: 'llmasaservice',
+  runtime: {
+    provider_options: {
+      llmasaservice: {
+        project_id: process.env.LLM_GATEWAY_PROJECT_ID,
+        customer: { customer_id: 'cust_123', customer_name: 'Acme' },
+      },
+    },
+  },
+  variables: { name: 'World', app_context: 'Welcome screen' },
+});
+if (!result.request) throw new Error(result.returnMessage ?? 'Prompt rendering failed.');
+// result.request.body → { model, messages, customer, ... }
+// result.request.headers → { 'x-project-id': '...' }
 ```
 
 Provider adapters are also available as direct imports:
@@ -213,6 +231,7 @@ import { openaiResponsesAdapter } from 'promptopskit/openai-responses';
 import { anthropicAdapter } from 'promptopskit/anthropic';
 import { geminiAdapter } from 'promptopskit/gemini';
 import { openrouterAdapter } from 'promptopskit/openrouter';
+import { llmasaserviceAdapter } from 'promptopskit/llmasaservice';
 ```
 
 Direct adapter rendering also accepts `environment` and `tier` selectors. This is useful for compiled JSON/ESM assets in browser, edge, or worker code:
@@ -242,9 +261,9 @@ In browser or client-side code, keep provider credentials on the server. Use the
 
 ### Provider-specific fields and raw passthrough
 
-Use normalized fields first (`sampling`, `response`, `cache`, `tools`) so prompts stay portable. `response.schema` is the neutral JSON Schema path; adapters emit it as OpenAI/OpenRouter `response_format`, OpenAI Responses `text.format`, Anthropic `output_config.format`, and Gemini `generationConfig.responseJsonSchema`.
+Use normalized fields first (`sampling`, `response`, `cache`, `tools`) so prompts stay portable. `response.schema` is the neutral JSON Schema path; adapters emit it as OpenAI/OpenRouter/LLMAsAService `response_format`, OpenAI Responses `text.format`, Anthropic `output_config.format`, and Gemini `generationConfig.responseJsonSchema`.
 
-Use `provider_options` when PromptOpsKit has a known provider-specific mapping, such as Anthropic `top_k`, Gemini's native `response_schema`, or OpenRouter routing fields.
+Use `provider_options` when PromptOpsKit has a known provider-specific mapping, such as Anthropic `top_k`, Gemini's native `response_schema`, OpenRouter routing fields, or LLMAsAService gateway routing/customer metadata.
 
 ```yaml
 response:
@@ -261,8 +280,16 @@ provider_options:
     provider:
       order: ["anthropic", "openai"]
     transforms: ["middle-out"]
+  llmasaservice:
+    project_id: "llm-project-id"
+    # Optional default; usually pass the real customer at render time.
+    customer:
+      customer_id: "cust_123"
+      customer_name: "Acme"
 ```
 
+For LLMAsAService, `provider_options.llmasaservice.customer` is intended to be render-time attribution for the current account/user. A prompt can keep a default, but production calls should normally override it through `runtime.provider_options.llmasaservice.customer`.
+
 When a provider adds a body field PromptOpsKit does not model yet, use `raw`:
 
 ```yaml
@@ -278,6 +305,8 @@ raw:
   openrouter:
     usage:
       include: true
+  llmasaservice:
+    conversationId: "conv_123"
 ```
 
 Each adapter reads only its matching raw block and shallow-merges it into the generated request body after normalized mappings. This is intentionally an escape hatch; prefer first-class fields when they exist.
@@ -336,7 +365,7 @@ Use PromptOpsKit when you want:
 
 ## Optional UsageTap Tracking
 
-PromptOpsKit can also help you track provider calls with UsageTap.com while keeping the core render API body-only.
+PromptOpsKit can also help you track provider calls with UsageTap.com while keeping the core render API transport-light.
 
 ```typescript
 import { createPromptOpsKit } from 'promptopskit';
@@ -400,7 +429,7 @@ const tracked = await runOpenAIWithUsageTap(usageTap, {
 
 Notes:
 - `entitlementMode` defaults to `'off'`. Set it to `'apply'` only when you want UsageTap allowances to mutate a cloned provider request.
-- `runOpenRouterWithUsageTap`, `runAnthropicWithUsageTap`, and `runGeminiWithUsageTap` follow the same pattern.
+- `runOpenRouterWithUsageTap`, `runLLMAsAServiceWithUsageTap`, `runAnthropicWithUsageTap`, and `runGeminiWithUsageTap` follow the same pattern.
 - `extractOpenAIUsage`, `extractAnthropicUsage`, and `extractGeminiUsage` are public if you want to manage UsageTap lifecycle yourself.
 
 For explicit lifecycle control, use `beginUsageTapCall`, `endUsageTapCall`, or `withUsageTapCall` from `promptopskit/usagetap`. Full documentation: [docs/usagetap.md](./docs/usagetap.md).
@@ -593,7 +622,7 @@ Renders a prompt for a specific provider. Returns `{ resolved, request?, returnM
 |--------|------|-------------|
 | `path` | `string` | Prompt path (no extension), e.g. `'support/reply'` |
 | `source` | `string` | Inline prompt source (alternative to path) |
-| `provider` | `string` | `'openai'`, `'openai-responses'`, `'anthropic'`, `'gemini'`, `'openrouter'` |
+| `provider` | `string` | `'openai'`, `'openai-responses'`, `'anthropic'`, `'gemini'`, `'openrouter'`, `'llmasaservice'` |
 | `variables` | `Record<string, string>` | Template variables |
 | `onContextOverflow` | `(info) => string` | Optional callback to transform oversized context values before rendering |
 | `onHistoryCompaction` | `(info) => string \| { role, content }` | Optional callback to compact overflow history when `context.history.max_items` is exceeded |
@@ -622,16 +651,16 @@ Prompt files use YAML front matter with these fields:
 |-------|------|-------------|
 | `id` | `string` | Unique prompt identifier (required) |
 | `schema_version` | `number` | Schema version, currently `1` |
-| `provider` | `string` | `openai`, `openai-responses`, `anthropic`, `gemini` (or `google`), `openrouter`, `any` |
+| `provider` | `string` | `openai`, `openai-responses`, `anthropic`, `gemini` (or `google`), `openrouter`, `llmasaservice`, `any` |
 | `model` | `string` | Model name |
 | `fallback_models` | `string[]` | Fallback model list |
 | `reasoning` | `object` | `{ effort, budget_tokens }` |
 | `sampling` | `object` | `{ temperature, top_p, frequency_penalty, presence_penalty, stop, max_output_tokens }` |
 | `response` | `object` | `{ format, stream, schema, schema_name, schema_description, schema_strict }` |
 | `cache` | `object` | Provider-specific cache controls (`openai`, `anthropic`, `gemini`/`google`) |
 | `tools` | `array` | Tool references (string names or inline definitions) |
-| `provider_options` | `object` | Provider-specific non-portable options (`anthropic`, `gemini`, `openrouter`) |
-| `raw` | `object` | Provider-scoped request-body passthrough (`openai`, `openai-responses`, `anthropic`, `gemini`/`google`, `openrouter`) |
+| `provider_options` | `object` | Provider-specific non-portable options (`anthropic`, `gemini`, `openrouter`, `llmasaservice`) |
+| `raw` | `object` | Provider-scoped request-body passthrough (`openai`, `openai-responses`, `anthropic`, `gemini`/`google`, `openrouter`, `llmasaservice`) |
 | `mcp` | `object` | MCP server references |
 | `context` | `object` | `{ inputs, history }` — declare expected variables, with optional per-input `max_size`, `trim`, structured or literal `allow_regex`/`deny_regex`, built-in `non_empty` / `reject_secrets` validators, and `history.max_items` compaction |
 | `includes` | `string[]` | Paths to included prompt files |
 
@@ -8,7 +8,7 @@ description: Guidance for creating and editing promptopskit prompt files, defaul
 This project uses **promptopskit** to manage LLM prompts as code.
 Prompts live in markdown files with YAML front matter, are validated against
 a schema, and render into provider-specific request bodies (OpenAI, Anthropic,
-Gemini, OpenRouter, and OpenAI Responses). Follow these instructions when creating or editing prompts.
+Gemini, OpenRouter, LLMAsAService, and OpenAI Responses). Follow these instructions when creating or editing prompts.
 
 ---
 
@@ -119,8 +119,8 @@ into a provider request body:
 - "generate the Anthropic call for the prompt [name]"
 - "render/generate/build/create/produce the request/body/call/payload/messages
   for prompt [name] with provider [provider]"
-- "turn [name] into a provider request for openai/anthropic/google/gemini/openrouter"
-- "wire up prompt [name] to OpenAI/Anthropic/Gemini/OpenRouter"
+- "turn [name] into a provider request for openai/anthropic/google/gemini/openrouter/llmasaservice"
+- "wire up prompt [name] to OpenAI/Anthropic/Gemini/OpenRouter/LLMAsAService"
 - "give me code to call [provider] with prompt [name]"
 
 Provider aliases:
@@ -132,14 +132,15 @@ Provider aliases:
 | `anthropic`, `claude` | `anthropic` |
 | `google`, `gemini` | `gemini` |
 | `openrouter` | `openrouter` |
+| `llmasaservice`, `llmasaservice.io`, `llm gateway` | `llmasaservice` |
 
 Behavior:
 
 1. Generate code unless the user explicitly asks for only the raw rendered JSON.
 2. Prefer `createPromptOpsKit().renderPrompt()` for server-side app code that loads
    prompt source or compiled JSON by path.
 3. Prefer provider adapters (`openaiAdapter`, `anthropicAdapter`,
-   `geminiAdapter`, `openrouterAdapter`) when the user asks for provider-specific
+   `geminiAdapter`, `openrouterAdapter`, `llmasaserviceAdapter`) when the user asks for provider-specific
    integration code or already has a compiled asset.
 4. Include `variables` for every declared prompt input, using realistic placeholder
    values or function parameters.
@@ -256,6 +257,51 @@ if (!result.request) throw new Error('Prompt rendering did not produce an OpenRo
 const completion = await client.chat.completions.create(result.request.body as any);
 ```
 
+LLMAsAService example:
+
+```typescript
+import OpenAI from 'openai';
+import {
+  createLLMAsAServiceOpenAIConfig,
+  llmasaserviceAdapter,
+} from 'promptopskit/llmasaservice';
+
+const client = new OpenAI(createLLMAsAServiceOpenAIConfig({
+  baseURL: process.env.LLM_GATEWAY_BASE_URL,
+  projectId: process.env.LLM_GATEWAY_PROJECT_ID,
+}));
+
+const result = await llmasaserviceAdapter.renderPrompt(
+  {
+    path: 'support/triage-summary',
+  },
+  {
+    runtime: {
+      provider_options: {
+        llmasaservice: {
+          project_id: process.env.LLM_GATEWAY_PROJECT_ID,
+          customer: {
+            customer_id: customer.id,
+            customer_name: customer.name,
+            customer_user_id: user.id,
+            customer_user_email: user.email,
+          },
+        },
+      },
+    },
+    variables: { ticket: ticketText },
+    strict: true,
+  },
+);
+
+if (result.returnMessage) return result.returnMessage;
+if (!('body' in result)) {
+  throw new Error('Prompt rendering did not produce an LLMAsAService request.');
+}
+
+const completion = await client.chat.completions.create(result.body as any);
+```
+
 If the user asks for "just the body", render with `kit.renderPrompt()` and show
 or return `result.request.body`, not the whole render result.
 
@@ -330,15 +376,15 @@ the fields required by that specific file:
 | `id` | string | **yes** | Unique identifier for the prompt |
 | `schema_version` | number | yes | Always `1` |
 | `description` | string | no | Human-readable description |
-| `provider` | enum | no | `openai`, `openai-responses`, `anthropic`, `google`, `gemini`, `openrouter`, or `any` |
+| `provider` | enum | no | `openai`, `openai-responses`, `anthropic`, `google`, `gemini`, `openrouter`, `llmasaservice`, or `any` |
 | `model` | string | no | Model identifier (e.g. `gpt-5.4`, `claude-sonnet-4-20250514`) |
 | `fallback_models` | string[] | no | Ordered fallback model list |
 | `reasoning` | object | no | `{ effort: low|medium|high, budget_tokens: number }` |
 | `sampling` | object | no | `{ temperature, top_p, frequency_penalty, presence_penalty, stop, max_output_tokens }` |
 | `response` | object | no | `{ format: text|json|markdown, stream: boolean, schema?: object, schema_name?: string, schema_description?: string, schema_strict?: boolean }` |
 | `cache` | object | no | Provider-specific cache controls (`openai`, `anthropic`, `gemini`/`google`) |
 | `tools` | array | no | Tool names (strings) or inline definitions with `{ name, description, input_schema }` |
-| `provider_options` | object | no | Provider-specific advanced options (`anthropic`, `gemini`, `openrouter`) |
+| `provider_options` | object | no | Provider-specific advanced options (`anthropic`, `gemini`, `openrouter`, `llmasaservice`) |
 | `raw` | object | no | Provider-scoped request-body passthrough for unmodeled vendor fields |
 | `mcp` | object | no | `{ servers: [string | { name, config }] }` |
 | `context.inputs` | `Array<string | { name, max_size?, trim?, allow_regex?, deny_regex?, non_empty?, reject_secrets? }>` | no | Declared variable names used in templates, with optional size budgets and runtime hardening controls |
@@ -548,7 +594,7 @@ Prefer portable fields first:
 - Use `cache` for provider cache hints
 - Use `tools` for tool definitions
 
-Treat `response.schema` as the provider-neutral JSON Schema contract. The adapters emit it through provider-specific request fields: OpenAI/OpenRouter `response_format`, OpenAI Responses `text.format`, Anthropic `output_config.format`, and Gemini `generationConfig.responseJsonSchema`.
+Treat `response.schema` as the provider-neutral JSON Schema contract. The adapters emit it through provider-specific request fields: OpenAI/OpenRouter/LLMAsAService `response_format`, OpenAI Responses `text.format`, Anthropic `output_config.format`, and Gemini `generationConfig.responseJsonSchema`.
 
 Use `provider_options` for known non-portable mappings:
 
@@ -573,8 +619,16 @@ provider_options:
     provider:
       order: ["anthropic", "openai"]
     transforms: ["middle-out"]
+  llmasaservice:
+    project_id: llm-project-id
+    # Optional default; usually pass the real customer at render time.
+    customer:
+      customer_id: cust_123
+      customer_name: Acme
 ```
 
+For LLMAsAService, prefer putting the current customer/account/user attribution in `runtime.provider_options.llmasaservice.customer` during rendering. Static prompt metadata may include a default, but runtime values should override it for real requests.
+
 Use `raw` only when a vendor request-body field is important and PromptOpsKit does not model it yet:
 
 ```yaml
@@ -590,9 +644,11 @@ raw:
   openrouter:
     usage:
       include: true
+  llmasaservice:
+    conversationId: conv_123
 ```
 
-Raw blocks are provider-scoped (`openai`, `openai-responses`/`openai_responses`, `anthropic`, `gemini`/`google`, `openrouter`) and are shallow-merged into the final request body after normalized fields. When adding `raw`, include a short note in `# Notes` explaining why a first-class field is not being used.
+Raw blocks are provider-scoped (`openai`, `openai-responses`/`openai_responses`, `anthropic`, `gemini`/`google`, `openrouter`, `llmasaservice`) and are shallow-merged into the final request body after normalized fields. When adding `raw`, include a short note in `# Notes` explaining why a first-class field is not being used.
 
 ---
 
 
@@ -65,7 +65,7 @@ const result = await kit.renderPrompt({
 |--------|------|-------------|
 | `path` | `string` | Prompt path (no extension), e.g. `'support/reply'` |
 | `source` | `string` | Inline prompt source (alternative to `path`) |
-| `provider` | `string` | `'openai'`, `'openai-responses'`, `'anthropic'`, `'gemini'`, `'openrouter'` (required) |
+| `provider` | `string` | `'openai'`, `'openai-responses'`, `'anthropic'`, `'gemini'`, `'openrouter'`, `'llmasaservice'` (required) |
 | `variables` | `Record<string, string>` | Template variables |
 | `onContextOverflow` | `(info) => string` | Optional callback to transform an oversized context value before rendering |
 | `onHistoryCompaction` | `(info) => string \| { role, content }` | Optional callback used when `context.history.max_items` compacts overflow history |
@@ -83,7 +83,7 @@ Either `path` or `source` must be provided.
 ```typescript
 interface RenderResult {
   resolved: ResolvedPromptAsset;  // Fully resolved asset
-  request?: ProviderRequest;      // { body, provider, model } when rendering continues
+  request?: ProviderRequest;      // { body, provider, model, baseURL?, headers? } when rendering continues
   returnMessage?: string;         // Short-circuit message from context validation when configured
   warnings: string[];             // Non-fatal provider and render-time warnings
 }
@@ -250,6 +250,8 @@ const request = adapter.render(resolvedAsset, {
 });
 ```
 
+Supported adapter names are `openai`, `openai-responses`, `anthropic`, `gemini`/`google`, `openrouter`, and `llmasaservice`.
+
 `RuntimeRenderOptions` for direct adapter rendering supports `environment`, `tier`, `runtime`, `variables`, `onContextOverflow`, `history`, `onHistoryCompaction`, `toolRegistry`, `strict`, and `openaiResponses`.
 
 Runtime overrides can include the same overridable front matter fields as `environments` and `tiers`, including `raw` provider passthrough blocks. Raw blocks are merged into provider request bodies after normalized fields and provider-specific options.