Skip to content

Commit 6e7c67e

Browse files
authored
Revise ZON format comparisons and update features
Removed outdated comparison table and examples for ZON format. Updated key features and benchmarks sections to reflect current data.
1 parent fcf763d commit 6e7c67e

1 file changed

Lines changed: 134 additions & 141 deletions

File tree

README.md

Lines changed: 134 additions & 141 deletions
Original file line numberDiff line numberDiff line change
@@ -45,154 +45,13 @@ pip install zon-format
4545

4646
## Why ZON?
4747

48-
### Yes, we actually ran the numbers (Dec 2025, fresh data)
49-
| Model | Dataset | ZON tokens | TOON | JSON | ZON vs TOON | ZON vs JSON |
50-
|---------------------|--------------------------|------------|--------|--------|-------------|-------------|
51-
| GPT-5-nano | Unified | **19,995** | 20,988 | 28,041 | **-5.0%** | **-28.6%** |
52-
| GPT-4o (o200k) | 50-level nested | **147,267**|225,510|285,131| **-34.7%** | **-48.3%** |
53-
| Claude 3.5 Sonnet | Mixed agent data | **149,281**|197,463|274,149| **-24.4%** | **-45.5%** |
54-
| Llama 3.1 405B | Everything | **234,623**|315,608|407,488| **-25.7%** | **-42.4%** |
55-
5648
AI is becoming cheaper and more accessible, but larger context windows allow for larger data inputs as well. **LLM tokens still cost money** – and standard JSON is verbose and token-expensive:
5749

5850
> "Dropped ZON into my LangChain agent loop and my monthly bill dropped $400 overnight"
5951
> — every Python dev who tried it this week
6052
6153
**ZON is the only format that wins (or ties for first) on every single LLM.**
6254

63-
```json
64-
{
65-
"context": {
66-
"task": "Our favorite hikes together",
67-
"location": "Boulder",
68-
"season": "spring_2025"
69-
},
70-
"friends": ["ana", "luis", "sam"],
71-
"hikes": [
72-
{
73-
"id": 1,
74-
"name": "Blue Lake Trail",
75-
"distanceKm": 7.5,
76-
"elevationGain": 320,
77-
"companion": "ana",
78-
"wasSunny": true
79-
},
80-
{
81-
"id": 2,
82-
"name": "Ridge Overlook",
83-
"distanceKm": 9.2,
84-
"elevationGain": 540,
85-
"companion": "luis",
86-
"wasSunny": false
87-
},
88-
{
89-
"id": 3,
90-
"name": "Wildflower Loop",
91-
"distanceKm": 5.1,
92-
"elevationGain": 180,
93-
"companion": "sam",
94-
"wasSunny": true
95-
}
96-
]
97-
}
98-
```
99-
100-
<details>
101-
<summary>TOON already conveys the same information with <strong>fewer tokens</strong>.</summary>
102-
103-
```yaml
104-
context:
105-
task: Our favorite hikes together
106-
location: Boulder
107-
season: spring_2025
108-
friends[3]: ana,luis,sam
109-
hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}:
110-
1,Blue Lake Trail,7.5,320,ana,true
111-
2,Ridge Overlook,9.2,540,luis,false
112-
3,Wildflower Loop,5.1,180,sam,true
113-
```
114-
115-
</details>
116-
117-
ZON conveys the same information with **even fewer tokens** than TOON – using compact table format with explicit headers:
118-
119-
```
120-
context.task:Our favorite hikes together
121-
context.location:Boulder
122-
context.season:spring_2025
123-
friends:ana,luis,sam
124-
hikes:@(3):companion,distanceKm,elevationGain,id,name,wasSunny
125-
ana,7.5,320,1,Blue Lake Trail,T
126-
luis,9.2,540,2,Ridge Overlook,F
127-
sam,5.1,180,3,Wildflower Loop,T
128-
```
129-
130-
### 🛡️ Validation + 📉 Compression
131-
132-
Building reliable LLM apps requires two things:
133-
1. **Safety:** You need to validate outputs (like you do with Zod/Pydantic).
134-
2. **Efficiency:** You need to compress inputs to save money.
135-
136-
ZON is the only library that gives you **both in one package**.
137-
138-
| Feature | Traditional Validation (e.g. Pydantic) | ZON |
139-
| :--- | :--- | :--- |
140-
| **Type Safety** | ✅ Yes | ✅ Yes |
141-
| **Runtime Validation** | ✅ Yes | ✅ Yes |
142-
| **Input Compression** | ❌ No |**Yes (Saves ~50%)** |
143-
| **Prompt Generation** | ❌ Plugins needed |**Built-in** |
144-
| **Bundle Size** | ~Large |**~5kb** |
145-
146-
**The Sweet Spot:** Use ZON to **save money on Input Tokens** while keeping the strict safety you expect.
147-
148-
---
149-
150-
## Key Features
151-
152-
- 🎯 **100% LLM Accuracy**: Achieves perfect retrieval (24/24 questions) with self-explanatory structure – no hints needed
153-
154-
### 3. Smart Flattening (Dot Notation)
155-
ZON automatically flattens top-level nested objects to reduce indentation.
156-
**JSON:**
157-
```json
158-
{
159-
"config": {
160-
"database": {
161-
"host": "localhost"
162-
}
163-
}
164-
}
165-
```
166-
**ZON:**
167-
```
168-
config.database{host:localhost}
169-
```
170-
171-
### 4. Colon-less Structure
172-
For nested objects and arrays, ZON omits the redundant colon, creating a cleaner, block-like structure.
173-
**JSON:**
174-
```json
175-
{
176-
"user": {
177-
"name": "Alice",
178-
"roles": ["admin", "dev"]
179-
}
180-
}
181-
```
182-
**ZON:**
183-
```
184-
user{name:Alice,roles[admin,dev]}
185-
```
186-
(Note: `user{...}` instead of `user:{...}`)
187-
- 💾 **Most Token-Efficient**: 4-15% fewer tokens than TOON across all tokenizers
188-
- 🎯 **JSON Data Model**: Encodes the same objects, arrays, and primitives as JSON with deterministic, lossless round-trips
189-
- 📐 **Minimal Syntax**: Explicit headers (`@(N)` for count, column list) eliminate ambiguity for LLMs
190-
- 🧺 **Tabular Arrays**: Uniform arrays collapse into tables that declare fields once and stream row values
191-
- 🔢 **Canonical Numbers**: No scientific notation (1000000, not 1e6), NaN/Infinity → null
192-
- 🌳 **Deep Nesting**: Handles complex nested structures efficiently (91% compression on 50-level deep objects)
193-
- 🔒 **Security Limits**: Automatic DOS prevention (100MB docs, 1M arrays, 100K keys)
194-
-**Production Ready**: 94/94 tests pass, 27/27 datasets verified, zero data loss
195-
19655
---
19756

19857
## Benchmarks
@@ -394,6 +253,140 @@ Llama 3 (Meta):
394253

395254
---
396255

256+
```json
257+
{
258+
"context": {
259+
"task": "Our favorite hikes together",
260+
"location": "Boulder",
261+
"season": "spring_2025"
262+
},
263+
"friends": ["ana", "luis", "sam"],
264+
"hikes": [
265+
{
266+
"id": 1,
267+
"name": "Blue Lake Trail",
268+
"distanceKm": 7.5,
269+
"elevationGain": 320,
270+
"companion": "ana",
271+
"wasSunny": true
272+
},
273+
{
274+
"id": 2,
275+
"name": "Ridge Overlook",
276+
"distanceKm": 9.2,
277+
"elevationGain": 540,
278+
"companion": "luis",
279+
"wasSunny": false
280+
},
281+
{
282+
"id": 3,
283+
"name": "Wildflower Loop",
284+
"distanceKm": 5.1,
285+
"elevationGain": 180,
286+
"companion": "sam",
287+
"wasSunny": true
288+
}
289+
]
290+
}
291+
```
292+
293+
<details>
294+
<summary>TOON already conveys the same information with <strong>fewer tokens</strong>.</summary>
295+
296+
```yaml
297+
context:
298+
task: Our favorite hikes together
299+
location: Boulder
300+
season: spring_2025
301+
friends[3]: ana,luis,sam
302+
hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}:
303+
1,Blue Lake Trail,7.5,320,ana,true
304+
2,Ridge Overlook,9.2,540,luis,false
305+
3,Wildflower Loop,5.1,180,sam,true
306+
```
307+
308+
</details>
309+
310+
ZON conveys the same information with **even fewer tokens** than TOON – using compact table format with explicit headers:
311+
312+
```
313+
context.task:Our favorite hikes together
314+
context.location:Boulder
315+
context.season:spring_2025
316+
friends:ana,luis,sam
317+
hikes:@(3):companion,distanceKm,elevationGain,id,name,wasSunny
318+
ana,7.5,320,1,Blue Lake Trail,T
319+
luis,9.2,540,2,Ridge Overlook,F
320+
sam,5.1,180,3,Wildflower Loop,T
321+
```
322+
323+
### 🛡️ Validation + 📉 Compression
324+
325+
Building reliable LLM apps requires two things:
326+
1. **Safety:** You need to validate outputs (like you do with Zod/Pydantic).
327+
2. **Efficiency:** You need to compress inputs to save money.
328+
329+
ZON is the only library that gives you **both in one package**.
330+
331+
| Feature | Traditional Validation (e.g. Pydantic) | ZON |
332+
| :--- | :--- | :--- |
333+
| **Type Safety** | ✅ Yes | ✅ Yes |
334+
| **Runtime Validation** | ✅ Yes | ✅ Yes |
335+
| **Input Compression** | ❌ No |**Yes (Saves ~50%)** |
336+
| **Prompt Generation** | ❌ Plugins needed |**Built-in** |
337+
| **Bundle Size** | ~Large |**~5kb** |
338+
339+
**The Sweet Spot:** Use ZON to **save money on Input Tokens** while keeping the strict safety you expect.
340+
341+
---
342+
343+
## Key Features
344+
345+
- 🎯 **100% LLM Accuracy**: Achieves perfect retrieval (24/24 questions) with self-explanatory structure – no hints needed
346+
347+
### 3. Smart Flattening (Dot Notation)
348+
ZON automatically flattens top-level nested objects to reduce indentation.
349+
**JSON:**
350+
```json
351+
{
352+
"config": {
353+
"database": {
354+
"host": "localhost"
355+
}
356+
}
357+
}
358+
```
359+
**ZON:**
360+
```
361+
config.database{host:localhost}
362+
```
363+
364+
### 4. Colon-less Structure
365+
For nested objects and arrays, ZON omits the redundant colon, creating a cleaner, block-like structure.
366+
**JSON:**
367+
```json
368+
{
369+
"user": {
370+
"name": "Alice",
371+
"roles": ["admin", "dev"]
372+
}
373+
}
374+
```
375+
**ZON:**
376+
```
377+
user{name:Alice,roles[admin,dev]}
378+
```
379+
(Note: `user{...}` instead of `user:{...}`)
380+
- 💾 **Most Token-Efficient**: 4-15% fewer tokens than TOON across all tokenizers
381+
- 🎯 **JSON Data Model**: Encodes the same objects, arrays, and primitives as JSON with deterministic, lossless round-trips
382+
- 📐 **Minimal Syntax**: Explicit headers (`@(N)` for count, column list) eliminate ambiguity for LLMs
383+
- 🧺 **Tabular Arrays**: Uniform arrays collapse into tables that declare fields once and stream row values
384+
- 🔢 **Canonical Numbers**: No scientific notation (1000000, not 1e6), NaN/Infinity → null
385+
- 🌳 **Deep Nesting**: Handles complex nested structures efficiently (91% compression on 50-level deep objects)
386+
- 🔒 **Security Limits**: Automatic DOS prevention (100MB docs, 1M arrays, 100K keys)
387+
-**Production Ready**: 94/94 tests pass, 27/27 datasets verified, zero data loss
388+
389+
397390
## Security & Data Types
398391

399392
### Eval-Safe Design

0 commit comments

Comments
 (0)