| layout | default |
|---|---|
| title | Chapter 6: ChatML and Tool Call Accounting |
| nav_order | 6 |
| parent | tiktoken Tutorial |
Welcome to Chapter 6: ChatML and Tool Call Accounting. In this part of tiktoken Tutorial: OpenAI Token Encoding & Optimization, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.
Accurate token accounting for chat and tools is essential for reliability and cost predictability.
Teams often count only user-visible text and miss:
- role/message wrapper overhead
- tool schema tokens
- serialized tool arguments/results
- retry-induced duplicate token spend
- tokenize each message with the exact target encoding
- add fixed wrapper overhead expected by your request format
- account for tool payloads separately
- include response-token guardband for retries/replans
def estimate_chat_tokens(messages, encoding, fixed_overhead=0):
total = fixed_overhead
for m in messages:
total += len(encoding.encode(m.get("content", "")))
return totalFor tool flows, create separate counters for:
- tool call request payload
- tool response payload
- assistant synthesis after tool result
- preflight estimate before API call
- reject or compress if over budget
- log estimate vs actual for calibration
You can now estimate chat/tool token usage with fewer hidden-cost surprises.
Next: Chapter 7: Multilingual Tokenization
Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for total, messages, encoding so behavior stays predictable as complexity grows.
In practical terms, this chapter helps you avoid three common failures:
- coupling core logic too tightly to one implementation path
- missing the handoff boundaries between setup, execution, and validation
- shipping changes without clear rollback or observability strategy
After working through this chapter, you should be able to reason about Chapter 6: ChatML and Tool Call Accounting as an operating subsystem inside tiktoken Tutorial: OpenAI Token Encoding & Optimization, with explicit contracts for inputs, state transitions, and outputs.
Use the implementation notes around fixed_overhead, estimate_chat_tokens, encode as your checklist when adapting these patterns to your own repository.
Under the hood, Chapter 6: ChatML and Tool Call Accounting usually follows a repeatable control path:
- Context bootstrap: initialize runtime config and prerequisites for
total. - Input normalization: shape incoming data so
messagesreceives stable contracts. - Core execution: run the main logic branch and propagate intermediate state through
encoding. - Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
- Output composition: return canonical result payloads for downstream consumers.
- Operational telemetry: emit logs/metrics needed for debugging and performance tuning.
When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.
Use the following upstream sources to verify implementation details while reading this chapter:
- tiktoken repository
Why it matters: authoritative reference on
tiktoken repository(github.com).
Suggested trace strategy:
- search upstream code for
totalandmessagesto map concrete implementation paths - compare docs claims against actual runtime/config code before reusing patterns in production
- Tutorial Index
- Previous Chapter: Chapter 5: Optimization Strategies
- Next Chapter: Chapter 7: Multilingual Tokenization
- Main Catalog
- A-Z Tutorial Directory
The encoding_for_model function in tiktoken/model.py handles a key part of this chapter's functionality:
def encoding_for_model(model_name: str) -> Encoding:
"""Returns the encoding used by a model.
Raises a KeyError if the model name is not recognised.
"""
return get_encoding(encoding_name_for_model(model_name))This function is important because it defines how tiktoken Tutorial: OpenAI Token Encoding & Optimization implements the patterns covered in this chapter.
The download_artifacts function in scripts/wheel_download.py handles a key part of this chapter's functionality:
def download_artifacts(token, owner, repo, run_id, output_dir):
headers = {"Authorization": f"token {token}", "Accept": "application/vnd.github.v3+json"}
# Get list of artifacts
artifacts_url = f"https://api.github.com/repos/{owner}/{repo}/actions/runs/{run_id}/artifacts"
response = requests.get(artifacts_url, headers=headers)
response.raise_for_status()
artifacts = response.json()["artifacts"]
if not artifacts:
print(f"No artifacts found for run ID: {run_id}")
return
output_dir = Path(output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
print(f"Found {len(artifacts)} artifacts")
for artifact in artifacts:
name = artifact["name"]
download_url = artifact["archive_download_url"]
print(f"Downloading {name}...")
response = requests.get(download_url, headers=headers, stream=True)
response.raise_for_status()
temp_zip = output_dir / f"{name}.zip"
with open(temp_zip, "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)This function is important because it defines how tiktoken Tutorial: OpenAI Token Encoding & Optimization implements the patterns covered in this chapter.
The redact_file function in scripts/redact.py handles a key part of this chapter's functionality:
def redact_file(path: Path, dry_run: bool) -> None:
if not path.exists() or path.is_dir():
return
text = path.read_text()
if not text:
return
first_line = text.splitlines()[0]
if "redact" in first_line:
if not dry_run:
path.unlink()
print(f"Deleted {path}")
return
pattern = "|".join(
r" *" + re.escape(x)
for x in [
"# ===== redact-beg =====\n",
"# ===== redact-end =====\n",
"<!--- redact-beg -->\n",
"<!--- redact-end -->\n",
]
)
if re.search(pattern, text):
redacted_text = "".join(re.split(pattern, text)[::2])
if not dry_run:
path.write_text(redacted_text)
print(f"Redacted {path}")This function is important because it defines how tiktoken Tutorial: OpenAI Token Encoding & Optimization implements the patterns covered in this chapter.
The redact function in scripts/redact.py handles a key part of this chapter's functionality:
def redact_file(path: Path, dry_run: bool) -> None:
if not path.exists() or path.is_dir():
return
text = path.read_text()
if not text:
return
first_line = text.splitlines()[0]
if "redact" in first_line:
if not dry_run:
path.unlink()
print(f"Deleted {path}")
return
pattern = "|".join(
r" *" + re.escape(x)
for x in [
"# ===== redact-beg =====\n",
"# ===== redact-end =====\n",
"<!--- redact-beg -->\n",
"<!--- redact-end -->\n",
]
)
if re.search(pattern, text):
redacted_text = "".join(re.split(pattern, text)[::2])
if not dry_run:
path.write_text(redacted_text)
print(f"Redacted {path}")This function is important because it defines how tiktoken Tutorial: OpenAI Token Encoding & Optimization implements the patterns covered in this chapter.
flowchart TD
A[encoding_for_model]
B[download_artifacts]
C[redact_file]
D[redact]
E[main]
A --> B
B --> C
C --> D
D --> E