| Feature | Namespace | Key types / call |
|---|---|---|
| User profiles | beta | client.beta.userProfiles.create(...) / .retrieve(id) / .list(). Pass the returned profile id on client.beta.messages.create. Requires a beta header — check the SDK's beta-headers reference for the current flag. |
npm install @anthropic-ai/sdkReading local files (ESM):
__dirnameand__filenameare undefined in ES modules — using either throwsReferenceError: __dirname is not definedat runtime. For cwd-relative reads, pass the bare relative path (fs.readFileSync("./sample.png")). For script-relative paths, derive the directory fromimport.meta.url:const here = path.dirname(fileURLToPath(import.meta.url)). Never writepath.join(__dirname, …)in an ESM.tsfile.
import Anthropic from "@anthropic-ai/sdk";
// Default — resolves credentials from the environment:
// ANTHROPIC_API_KEY, or ANTHROPIC_AUTH_TOKEN, or an `ant auth login` profile.
// Prefer this for local dev; don't hardcode a key.
const client = new Anthropic();
// Explicit API key (only when you must inject a specific key)
const client = new Anthropic({ apiKey: "your-api-key" });const response = await client.messages.create({
model: "{{OPUS_ID}}",
max_tokens: 16000,
messages: [{ role: "user", content: "What is the capital of France?" }],
});
// response.content is ContentBlock[] — a discriminated union. Narrow by .type
// before accessing .text (TypeScript will error on content[0].text without this).
for (const block of response.content) {
if (block.type === "text") {
console.log(block.text);
}
}const response = await client.messages.create({
model: "{{OPUS_ID}}",
max_tokens: 16000,
system:
"You are a helpful coding assistant. Always provide examples in Python.",
messages: [{ role: "user", content: "How do I read a JSON file?" }],
});For operator instructions that arrive mid-conversation (mode switches, injected state), append {role: "system", ...} to messages instead of editing top-level system — this preserves the cached prefix and carries operator authority. Must follow a user message (or an assistant message ending in server-tool use), and must be either the last entry in messages or be followed by an assistant turn; cannot be messages[0]. Unsupported models return a 400 (role 'system' is not supported on this model). See shared/prompt-caching.md for when to use this vs. top-level system.
// No beta header needed — use regular client.messages.create.
const response = await client.messages.create({
model: MODEL_ID, // must support mid-conversation system messages
max_tokens: 16000,
system: [
{ type: "text", text: STABLE_SYSTEM, cache_control: { type: "ephemeral" } },
],
messages: [
...history,
{ role: "user", content: userMessage },
{ role: "system", content: "Terse mode enabled — keep responses under 40 words." },
],
});const response = await client.messages.create({
model: "{{OPUS_ID}}",
max_tokens: 16000,
messages: [
{
role: "user",
content: [
{
type: "image",
source: { type: "url", url: "https://example.com/image.png" },
},
{ type: "text", text: "Describe this image" },
],
},
],
});import fs from "fs";
const imageData = fs.readFileSync("image.png").toString("base64");
const response = await client.messages.create({
model: "{{OPUS_ID}}",
max_tokens: 16000,
messages: [
{
role: "user",
content: [
{
type: "image",
source: { type: "base64", media_type: "image/png", data: imageData },
},
{ type: "text", text: "What's in this image?" },
],
},
],
});Caching is a prefix match — any byte change anywhere in the prefix invalidates everything after it. For placement patterns, architectural guidance (frozen system prompt, deterministic tool order, where to put volatile content), and the silent-invalidator audit checklist, read shared/prompt-caching.md.
Use top-level cache_control to automatically cache the last cacheable block in the request:
const response = await client.messages.create({
model: "{{OPUS_ID}}",
max_tokens: 16000,
cache_control: { type: "ephemeral" }, // auto-caches the last cacheable block
system: "You are an expert on this large document...",
messages: [{ role: "user", content: "Summarize the key points" }],
});For fine-grained control, add cache_control to specific content blocks:
const response = await client.messages.create({
model: "{{OPUS_ID}}",
max_tokens: 16000,
system: [
{
type: "text",
text: "You are an expert on this large document...",
cache_control: { type: "ephemeral" }, // default TTL is 5 minutes
},
],
messages: [{ role: "user", content: "Summarize the key points" }],
});
// With explicit TTL (time-to-live)
const response2 = await client.messages.create({
model: "{{OPUS_ID}}",
max_tokens: 16000,
system: [
{
type: "text",
text: "You are an expert on this large document...",
cache_control: { type: "ephemeral", ttl: "1h" }, // 1 hour TTL
},
],
messages: [{ role: "user", content: "Summarize the key points" }],
});console.log(response.usage.cache_creation_input_tokens); // tokens written to cache (~1.25x cost)
console.log(response.usage.cache_read_input_tokens); // tokens served from cache (~0.1x cost)
console.log(response.usage.input_tokens); // uncached tokens (full cost)If cache_read_input_tokens is zero across repeated identical-prefix requests, a silent invalidator is at work — Date.now() or a UUID in the system prompt, non-deterministic key ordering, or a varying tool set. See shared/prompt-caching.md for the full audit table.
Fable 5, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6: Use adaptive thinking.
budget_tokensis removed on Fable 5, Opus 4.8, and 4.7 (400 if sent); deprecated on Opus 4.6 and Sonnet 4.6. Older models: Usethinking: {type: "enabled", budget_tokens: N}(must be <max_tokens, min 1024).
// Fable 5 / Opus 4.8 / 4.7 / 4.6: adaptive thinking (recommended)
const response = await client.messages.create({
model: "{{OPUS_ID}}",
max_tokens: 16000,
thinking: { type: "adaptive", display: "summarized" }, // display opt-in: default is omitted (empty thinking text) on Fable 5 / Mythos 5 / Opus 4.8 / 4.7
output_config: { effort: "high" }, // low | medium | high | max
messages: [
{ role: "user", content: "Solve this math problem step by step..." },
],
});
for (const block of response.content) {
if (block.type === "thinking") {
console.log("Thinking:", block.thinking);
} else if (block.type === "text") {
console.log("Response:", block.text);
}
}Use the SDK's typed exception classes — never check error messages with string matching:
import Anthropic from "@anthropic-ai/sdk";
try {
const response = await client.messages.create({...});
} catch (error) {
if (error instanceof Anthropic.BadRequestError) {
console.error("Bad request:", error.message);
} else if (error instanceof Anthropic.AuthenticationError) {
console.error("Invalid API key");
} else if (error instanceof Anthropic.RateLimitError) {
console.error("Rate limited - retry later");
} else if (error instanceof Anthropic.APIError) {
console.error(`API error ${error.status}:`, error.message);
}
}All classes extend Anthropic.APIError with a typed status field. Check from most specific to least specific. See shared/error-codes.md for the full error code reference.
The API is stateless — send the full conversation history each time. Use Anthropic.MessageParam[] to type the messages array:
const messages: Anthropic.MessageParam[] = [
{ role: "user", content: "My name is Alice." },
{ role: "assistant", content: "Hello Alice! Nice to meet you." },
{ role: "user", content: "What's my name?" },
];
const response = await client.messages.create({
model: "{{OPUS_ID}}",
max_tokens: 16000,
messages: messages,
});Rules:
- Consecutive same-role messages are allowed — the API combines them into a single turn
- First message must be
user - Use SDK types (
Anthropic.MessageParam,Anthropic.Message,Anthropic.Tool, etc.) for all API data structures — don't redefine equivalent interfaces
Beta, Fable 5, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6. When conversations approach the 200K context window, compaction automatically summarizes earlier context server-side. The API returns a
compactionblock; you must pass it back on subsequent requests — appendresponse.content, not just the text.
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const messages: Anthropic.Beta.BetaMessageParam[] = [];
async function chat(userMessage: string): Promise<string> {
messages.push({ role: "user", content: userMessage });
const response = await client.beta.messages.create({
betas: ["compact-2026-01-12"],
model: "{{OPUS_ID}}",
max_tokens: 16000,
messages,
context_management: {
edits: [{ type: "compact_20260112" }],
},
});
// Append full content — compaction blocks must be preserved
messages.push({ role: "assistant", content: response.content });
const textBlock = response.content.find(
(b): b is Anthropic.Beta.BetaTextBlock => b.type === "text",
);
return textBlock?.text ?? "";
}
// Compaction triggers automatically when context grows large
console.log(await chat("Help me build a Python web scraper"));
console.log(await chat("Add support for JavaScript-rendered pages"));
console.log(await chat("Now add rate limiting and error handling"));The stop_reason field in the response indicates why the model stopped generating:
| Value | Meaning |
|---|---|
end_turn |
Claude finished its response naturally |
max_tokens |
Hit the max_tokens limit — increase it or use streaming |
stop_sequence |
Hit a custom stop sequence |
tool_use |
Claude wants to call a tool — execute it and continue |
pause_turn |
Model paused and can be resumed (agentic flows) |
refusal |
Claude refused for safety reasons — check stop_details |
When stop_reason is "refusal", the response includes a stop_details object with structured information about the refusal:
if (response.stop_reason === "refusal" && response.stop_details) {
console.log(`Category: ${response.stop_details.category}`); // e.g. "cyber", "bio", "reasoning_extraction", "frontier_llm", or null — see docs for the full set
console.log(`Explanation: ${response.stop_details.explanation}`);
}Fallbacks are opt-in: without them a refused request simply stops. Include the server-side fallbacks parameter in {{FABLE_ID}} code by default — on a policy decline the API re-runs the same request on the fallback model inside the same call. A decline before any output isn't billed (a mid-stream decline bills the streamed partial); the rescue bills at the fallback model's own rates, with cache repricing applied automatically.
const response = await client.beta.messages.create({
model: "{{FABLE_ID}}",
max_tokens: 16000,
betas: ["server-side-fallback-2026-06-01"],
fallbacks: [{ model: "{{OPUS_ID}}" }],
messages: [{ role: "user", content: "..." }],
});
// Switch points: one fallback block per model that ran and declined this turn
for (const block of response.content) {
if (block.type === "fallback") {
console.log(`${block.from.model} declined; ${block.to.model} continued`);
}
}
// Served-by signal — covers sticky turns, which carry no fallback block.
// Pair with stop_reason: the fallback model can itself refuse.
const fallbackRan = (response.usage.iterations ?? []).some(
(entry) => entry.type === "fallback_message",
);
if (fallbackRan && response.stop_reason !== "refusal") {
console.log(`Served by ${response.model}`);
}A stop_reason: "refusal" on the final response means the whole chain refused. The header must be exactly server-side-fallback-2026-06-01; the parameter is rejected on the Batches API and unavailable on Amazon Bedrock, Vertex AI, and Microsoft Foundry — register the client-side betaRefusalFallbackMiddleware on the client there instead. Full semantics (sticky routing, billing, streaming, echoing fallback turns back): shared/model-migration.md → Migrating to {{FABLE_NAME}} → refusal stop reason.
// Automatic caching (simplest — caches the last cacheable block)
const response = await client.messages.create({
model: "{{OPUS_ID}}",
max_tokens: 16000,
cache_control: { type: "ephemeral" },
system: largeDocumentText, // e.g., 50KB of context
messages: [{ role: "user", content: "Summarize the key points" }],
});
// First request: full cost
// Subsequent requests: ~90% cheaper for cached portionconst countResponse = await client.messages.countTokens({
model: "{{OPUS_ID}}",
messages: messages,
system: system,
});
const estimatedInputCost = countResponse.input_tokens * 0.000005; // $5/1M tokens
console.log(`Estimated input cost: $${estimatedInputCost.toFixed(4)}`);