TOON achieves MASSIVE memory and token savings compared to JSON across 50 diverse, real-world datasets.
┌────────────────────────────────────────────────────────────────┐
│ │
│ ⚡ HEADLINE RESULTS ⚡ │
│ │
│ 📉 63.9% SMALLER file sizes │
│ 📉 54.1% FEWER tokens for LLM APIs │
│ 💾 35.81KB saved across 50 test datasets │
│ 🎯 10,735 tokens saved │
│ │
│ FOR HIGH-VOLUME APPLICATIONS: │
│ 💰 $2,147 saved per million API requests │
│ 💰 $5,408 saved per billion tokens │
│ │
└────────────────────────────────────────────────────────────────┘
If you're sending structured data to LLM APIs (GPT-4, Claude, etc.), you're paying for every token. TOON can cut your token usage by MORE THAN HALF (54.1% average), translating directly to:
- 54% lower API costs
- Faster API responses (less data to transmit)
- More content in context windows (fit more data within token limits)
At typical GPT-4 pricing ($10 per 1M tokens):
| Usage Volume | JSON Cost | TOON Cost | Savings |
|---|---|---|---|
| 1,000 requests | $3.97 | $1.82 | $2.15 |
| 1M requests/year | $3,970 | $1,823 | $2,147 |
| 1B tokens | $10,000 | $4,592 | $5,408 |
We benchmarked TOON against JSON using 50 diverse, production-ready datasets representing common use cases:
- E-commerce (products, orders, inventory)
- Databases (query results, employee records)
- APIs (responses, logs, requests)
- Analytics (metrics, A/B tests, surveys)
- IoT (sensor data, time series)
- Social media (posts, profiles, comments)
- Finance (transactions, stock data)
- And much more...
🔥 EXCELLENT (≥60% savings): 30 datasets (60%)
✅ GOOD (40-60% savings): 19 datasets (38%)
📊 MODERATE (<40% savings): 1 dataset (2%)
98% of tested datasets achieved 40%+ savings!
- 73.4% size reduction (935B → 249B)
- 63.4% token reduction (287 → 105 tokens)
- ML Training Data: 71.2% size, 61.9% tokens
- Large Inventory (100 items): 71.2% size, 57.7% tokens
- Student Grades: 71.2% size, 61.9% tokens
- Customer Reviews: 69.1% size, 61.0% tokens
- Weather Forecast: 69.0% size, 55.9% tokens
| Category | Datasets | Avg Size Savings | Avg Token Savings |
|---|---|---|---|
| Tabular Data (databases, spreadsheets) | 12 | 69.2% | 59.8% |
| E-commerce (products, orders) | 8 | 66.1% | 56.4% |
| Analytics (metrics, surveys) | 7 | 65.7% | 55.2% |
| API Data (responses, logs) | 10 | 58.3% | 48.9% |
| IoT/Sensors (time series) | 5 | 60.0% | 43.7% |
| Social/Content (posts, profiles) | 8 | 61.5% | 52.1% |
| # | Dataset | JSON Size | TOON Size | Size Savings | Token Savings |
|---|---|---|---|---|---|
| 01 | E-commerce Products | 1.57KB | 542B | 66.3% | 58.2% |
| 02 | API Response | 934B | 501B | 46.4% | 39.7% |
| 03 | Database Results | 1.52KB | 582B | 62.5% | 56.5% |
| 04 | ML Training Data | 1.85KB | 545B | 71.2% | 61.9% |
| 05 | Server Configuration | 1016B | 719B | 29.2% | 28.4% |
| 06 | Analytics Data | 1.40KB | 526B | 63.3% | 49.4% |
| 07 | Large Inventory (100 items) | 13.55KB | 3.90KB | 71.2% | 57.7% |
| 08 | Customer Reviews | 828B | 256B | 69.1% | 61.0% |
| 09 | Social Media Posts | 849B | 282B | 66.8% | 52.1% |
| 10 | Weather Forecast | 777B | 241B | 69.0% | 55.9% |
| 11 | Stock Market Data | - | - | 59.8% | 44.2% |
| 12 | Restaurant Menu | - | - | 66.4% | 61.5% |
| 13 | Hotel Bookings | - | - | 64.2% | 52.1% |
| 14 | Flight Schedule | - | - | 68.9% | 59.9% |
| 15 | Medical Records | - | - | 59.3% | 50.6% |
| 16 | Student Grades | - | - | 71.2% | 61.9% |
| 17 | Sports Statistics | - | - | 66.3% | 54.6% |
| 18 | Movie Catalog | - | - | 68.5% | 59.8% |
| 19 | Music Playlist | - | - | 62.5% | 56.7% |
| 20 | Real Estate Listings | - | - | 66.5% | 58.7% |
| 21-50 | ... (see full benchmark output) | - | - | 60%+ avg | 50%+ avg |
JSON TOTAL: 56.00KB (57,349 bytes)
TOON TOTAL: 20.20KB (20,680 bytes)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
SAVED: 35.81KB (36,669 bytes) ⬇ 63.9%
JSON TOKENS: 19,851 tokens
TOON TOKENS: 9,116 tokens
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
SAVED: 10,735 tokens ⬇ 54.1%
Size Reduction: ████████████████████████████████░░░░░░░░░░░░░░░░░░ 63.9%
Token Reduction: ███████████████████████████░░░░░░░░░░░░░░░░░░░░░░░ 54.1%
JSON repeats keys for every object:
[
{"id": 1, "name": "Laptop", "price": 999},
{"id": 2, "name": "Mouse", "price": 29},
{"id": 3, "name": "Keyboard", "price": 149}
]134 bytes, 48 tokens
TOON declares headers once:
[3]{id,name,price}:
1,Laptop,999
2,Mouse,29
3,Keyboard,149
52 bytes, 23 tokens → 61% smaller, 52% fewer tokens!
JSON requires:
- Braces
{}and brackets[]everywhere - Quotes around all keys
- Quotes around string values
- Commas between all elements
TOON uses:
- Indentation for structure (like YAML)
- Colons for key-value pairs
- Quotes only when necessary
- Headers for uniform arrays
TOON automatically detects when quotes aren't needed and preserves types (numbers, booleans, null) while maintaining human readability.
- LLM API Payloads - Cut token costs in half
- Database Query Results - Tabular data compression
- Analytics & Metrics - Time series, aggregates
- E-commerce Data - Product catalogs, inventory
- IoT Sensor Data - Regular readings
- API Logs & Traces - Structured log entries
- ML Training Data - Feature vectors, labels
- Highly irregular/nested data (still saves 20-40%)
- Maximum compatibility required (JSON is universal)
- Microsecond-level performance critical (TOON is fast, but JSON is faster)
- 50 diverse datasets representing real-world use cases
- Accurate token counting using tiktoken (GPT-4 encoding)
- Multiple iterations (100+) for performance measurements
- Production-ready data (not synthetic/trivial examples)
- E-commerce: products, orders, inventory, reviews
- Databases: employee records, query results
- APIs: responses, logs, requests, errors
- Analytics: metrics, A/B tests, surveys, time series
- IoT: sensor readings, device data
- Social: posts, profiles, comments, messages
- Finance: transactions, stock prices
- Media: videos, music, blogs
- System: logs, audit trails, notifications
- And many more...
All token counts use tiktoken with GPT-4 encoding for accuracy. Results are directly applicable to:
- GPT-4 / GPT-4 Turbo
- GPT-3.5 Turbo
- Claude (similar tokenization)
- Other modern LLMs
# Clone the repo
git clone https://github.com/ScrapeGraphAI/toonify.git
cd toonify
# Install dependencies
pip install -e .
pip install tiktoken
# Run benchmarks
python benchmark/compare_formats.py
python benchmark/memory_benchmark.py
# Or run all at once
python benchmark/run_all.pyTOON delivers massive, consistent savings across diverse real-world datasets:
- ✅ 63.9% average size reduction
- ✅ 54.1% average token reduction
- ✅ 98% of datasets achieve 40%+ savings
- ✅ 60% of datasets achieve 60%+ savings
- ✅ Thousands of dollars saved in LLM API costs for high-volume applications
The results speak for themselves: If you're working with structured data and LLM APIs, TOON can cut your costs in half while maintaining full data fidelity and readability.
Want to see the live output? Run python benchmark/compare_formats.py to see the full, interactive benchmark results!