| layout | default |
|---|---|
| title | Chapter 5: Optimization Strategies |
| nav_order | 5 |
| parent | tiktoken Tutorial |
Welcome to Chapter 5: Optimization Strategies. In this part of tiktoken Tutorial: OpenAI Token Encoding & Optimization, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.
This chapter focuses on performance and operational optimization for token-heavy systems.
import tiktoken
ENC = tiktoken.encoding_for_model("gpt-4.1-mini")
def count_tokens(text: str) -> int:
return len(ENC.encode(text))from functools import lru_cache
@lru_cache(maxsize=20000)
def cached_count(text: str) -> int:
return len(ENC.encode(text))def count_many(texts):
return [len(ENC.encode(t)) for t in texts]- Add tests for max prompt token budget.
- Fail builds when prompt templates exceed limits.
- Track token deltas for prompt changes.
- Fixed encoding strategy per model.
- Centralized counting utility in shared library.
- Caching for repeated templates.
- Alerting for sudden token-cost spikes.
You now have a complete tiktoken workflow from basics to production optimization.
Next: Chapter 6: ChatML and Tool Call Accounting
Related:
Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for text, encode, tiktoken so behavior stays predictable as complexity grows.
In practical terms, this chapter helps you avoid three common failures:
- coupling core logic too tightly to one implementation path
- missing the handoff boundaries between setup, execution, and validation
- shipping changes without clear rollback or observability strategy
After working through this chapter, you should be able to reason about Chapter 5: Optimization Strategies as an operating subsystem inside tiktoken Tutorial: OpenAI Token Encoding & Optimization, with explicit contracts for inputs, state transitions, and outputs.
Use the implementation notes around lru_cache, texts, encoding_for_model as your checklist when adapting these patterns to your own repository.
Under the hood, Chapter 5: Optimization Strategies usually follows a repeatable control path:
- Context bootstrap: initialize runtime config and prerequisites for
text. - Input normalization: shape incoming data so
encodereceives stable contracts. - Core execution: run the main logic branch and propagate intermediate state through
tiktoken. - Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
- Output composition: return canonical result payloads for downstream consumers.
- Operational telemetry: emit logs/metrics needed for debugging and performance tuning.
When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.
Use the following upstream sources to verify implementation details while reading this chapter:
- tiktoken repository
Why it matters: authoritative reference on
tiktoken repository(github.com).
Suggested trace strategy:
- search upstream code for
textandencodeto map concrete implementation paths - compare docs claims against actual runtime/config code before reusing patterns in production
- Tutorial Index
- Previous Chapter: Chapter 4: Educational Module
- Next Chapter: Chapter 6: ChatML and Tool Call Accounting
- Main Catalog
- A-Z Tutorial Directory
The CoreBPE interface in src/lib.rs handles a key part of this chapter's functionality:
#[cfg_attr(feature = "python", pyclass(frozen))]
#[derive(Clone)]
pub struct CoreBPE {
encoder: HashMap<Vec<u8>, Rank>,
special_tokens_encoder: HashMap<String, Rank>,
decoder: HashMap<Rank, Vec<u8>>,
special_tokens_decoder: HashMap<Rank, Vec<u8>>,
regex_tls: Vec<Regex>,
special_regex_tls: Vec<Regex>,
sorted_token_bytes: Vec<Vec<u8>>,
}
impl CoreBPE {
fn _get_tl_regex(&self) -> &Regex {
// See performance notes above for what this is about
// It's also a little janky, please make a better version of it!
// However, it's nice that this doesn't leak memory to short-lived threads
&self.regex_tls[hash_current_thread() % MAX_NUM_THREADS]
}
fn _get_tl_special_regex(&self) -> &Regex {
&self.special_regex_tls[hash_current_thread() % MAX_NUM_THREADS]
}
/// Decodes tokens into a list of bytes.
///
/// The bytes are not gauranteed to be a valid utf-8 string.
fn decode_bytes(&self, tokens: &[Rank]) -> Result<Vec<u8>, DecodeKeyError> {
let mut ret = Vec::with_capacity(tokens.len() * 2);
for &token in tokens {
let token_bytes = match self.decoder.get(&token) {
Some(bytes) => bytes,This interface is important because it defines how tiktoken Tutorial: OpenAI Token Encoding & Optimization implements the patterns covered in this chapter.
The Encoding class in tiktoken/core.py handles a key part of this chapter's functionality:
class Encoding:
def __init__(
self,
name: str,
*,
pat_str: str,
mergeable_ranks: dict[bytes, int],
special_tokens: dict[str, int],
explicit_n_vocab: int | None = None,
):
"""Creates an Encoding object.
See openai_public.py for examples of how to construct an Encoding object.
Args:
name: The name of the encoding. It should be clear from the name of the encoding
what behaviour to expect, in particular, encodings with different special tokens
should have different names.
pat_str: A regex pattern string that is used to split the input text.
mergeable_ranks: A dictionary mapping mergeable token bytes to their ranks. The ranks
must correspond to merge priority.
special_tokens: A dictionary mapping special token strings to their token values.
explicit_n_vocab: The number of tokens in the vocabulary. If provided, it is checked
that the number of mergeable tokens and special tokens is equal to this number.
"""
self.name = name
self._pat_str = pat_str
self._mergeable_ranks = mergeable_ranks
self._special_tokens = special_tokensThis class is important because it defines how tiktoken Tutorial: OpenAI Token Encoding & Optimization implements the patterns covered in this chapter.
The raise_disallowed_special_token function in tiktoken/core.py handles a key part of this chapter's functionality:
disallowed_special = frozenset(disallowed_special)
if match := _special_token_regex(disallowed_special).search(text):
raise_disallowed_special_token(match.group())
try:
return self._core_bpe.encode(text, allowed_special)
except UnicodeEncodeError:
# BPE operates on bytes, but the regex operates on unicode. If we pass a str that is
# invalid UTF-8 to Rust, it will rightfully complain. Here we do a quick and dirty
# fixup for any surrogate pairs that may have sneaked their way into the text.
# Technically, this introduces a place where encode + decode doesn't roundtrip a Python
# string, but given that this is input we want to support, maybe that's okay.
# Also we use errors="replace" to handle weird things like lone surrogates.
text = text.encode("utf-16", "surrogatepass").decode("utf-16", "replace")
return self._core_bpe.encode(text, allowed_special)
def encode_to_numpy(
self,
text: str,
*,
allowed_special: Literal["all"] | AbstractSet[str] = set(), # noqa: B006
disallowed_special: Literal["all"] | Collection[str] = "all",
) -> npt.NDArray[np.uint32]:
"""Encodes a string into tokens, returning a numpy array.
Avoids the overhead of copying the token buffer into a Python list.
"""
if allowed_special == "all":
allowed_special = self.special_tokens_set
if disallowed_special == "all":
disallowed_special = self.special_tokens_set - allowed_special
if disallowed_special:This function is important because it defines how tiktoken Tutorial: OpenAI Token Encoding & Optimization implements the patterns covered in this chapter.
The an interface in tiktoken/core.py handles a key part of this chapter's functionality:
from __future__ import annotations
import functools
from concurrent.futures import ThreadPoolExecutor
from typing import TYPE_CHECKING, AbstractSet, Collection, Literal, NoReturn, Sequence
from tiktoken import _tiktoken
if TYPE_CHECKING:
import re
import numpy as np
import numpy.typing as npt
class Encoding:
def __init__(
self,
name: str,
*,
pat_str: str,
mergeable_ranks: dict[bytes, int],
special_tokens: dict[str, int],
explicit_n_vocab: int | None = None,
):
"""Creates an Encoding object.
See openai_public.py for examples of how to construct an Encoding object.
Args:This interface is important because it defines how tiktoken Tutorial: OpenAI Token Encoding & Optimization implements the patterns covered in this chapter.
flowchart TD
A[CoreBPE]
B[Encoding]
C[raise_disallowed_special_token]
D[an]
E[encoding_name_for_model]
A --> B
B --> C
C --> D
D --> E