@@ -45,154 +45,13 @@ pip install zon-format
4545
4646## Why ZON?
4747
48- ### Yes, we actually ran the numbers (Dec 2025, fresh data)
49- | Model | Dataset | ZON tokens | TOON | JSON | ZON vs TOON | ZON vs JSON |
50- | ---------------------| --------------------------| ------------| --------| --------| -------------| -------------|
51- | GPT-5-nano | Unified | ** 19,995** | 20,988 | 28,041 | ** -5.0%** | ** -28.6%** |
52- | GPT-4o (o200k) | 50-level nested | ** 147,267** | 225,510| 285,131| ** -34.7%** | ** -48.3%** |
53- | Claude 3.5 Sonnet | Mixed agent data | ** 149,281** | 197,463| 274,149| ** -24.4%** | ** -45.5%** |
54- | Llama 3.1 405B | Everything | ** 234,623** | 315,608| 407,488| ** -25.7%** | ** -42.4%** |
55-
5648AI is becoming cheaper and more accessible, but larger context windows allow for larger data inputs as well. ** LLM tokens still cost money** – and standard JSON is verbose and token-expensive:
5749
5850> "Dropped ZON into my LangChain agent loop and my monthly bill dropped $400 overnight"
5951> — every Python dev who tried it this week
6052
6153** ZON is the only format that wins (or ties for first) on every single LLM.**
6254
63- ``` json
64- {
65- "context" : {
66- "task" : " Our favorite hikes together" ,
67- "location" : " Boulder" ,
68- "season" : " spring_2025"
69- },
70- "friends" : [" ana" , " luis" , " sam" ],
71- "hikes" : [
72- {
73- "id" : 1 ,
74- "name" : " Blue Lake Trail" ,
75- "distanceKm" : 7.5 ,
76- "elevationGain" : 320 ,
77- "companion" : " ana" ,
78- "wasSunny" : true
79- },
80- {
81- "id" : 2 ,
82- "name" : " Ridge Overlook" ,
83- "distanceKm" : 9.2 ,
84- "elevationGain" : 540 ,
85- "companion" : " luis" ,
86- "wasSunny" : false
87- },
88- {
89- "id" : 3 ,
90- "name" : " Wildflower Loop" ,
91- "distanceKm" : 5.1 ,
92- "elevationGain" : 180 ,
93- "companion" : " sam" ,
94- "wasSunny" : true
95- }
96- ]
97- }
98- ```
99-
100- <details >
101- <summary >TOON already conveys the same information with <strong >fewer tokens</strong >.</summary >
102-
103- ``` yaml
104- context :
105- task : Our favorite hikes together
106- location : Boulder
107- season : spring_2025
108- friends[3] : ana,luis,sam
109- hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny} :
110- 1,Blue Lake Trail,7.5,320,ana,true
111- 2,Ridge Overlook,9.2,540,luis,false
112- 3,Wildflower Loop,5.1,180,sam,true
113- ` ` `
114-
115- </details>
116-
117- ZON conveys the same information with **even fewer tokens** than TOON – using compact table format with explicit headers:
118-
119- ` ` `
120- context.task:Our favorite hikes together
121- context.location:Boulder
122- context.season:spring_2025
123- friends:ana,luis,sam
124- hikes:@(3):companion,distanceKm,elevationGain,id,name,wasSunny
125- ana,7.5,320,1,Blue Lake Trail,T
126- luis,9.2,540,2,Ridge Overlook,F
127- sam,5.1,180,3,Wildflower Loop,T
128- ```
129-
130- ### 🛡️ Validation + 📉 Compression
131-
132- Building reliable LLM apps requires two things:
133- 1 . ** Safety:** You need to validate outputs (like you do with Zod/Pydantic).
134- 2 . ** Efficiency:** You need to compress inputs to save money.
135-
136- ZON is the only library that gives you ** both in one package** .
137-
138- | Feature | Traditional Validation (e.g. Pydantic) | ZON |
139- | :--- | :--- | :--- |
140- | ** Type Safety** | ✅ Yes | ✅ Yes |
141- | ** Runtime Validation** | ✅ Yes | ✅ Yes |
142- | ** Input Compression** | ❌ No | ✅ ** Yes (Saves ~ 50%)** |
143- | ** Prompt Generation** | ❌ Plugins needed | ✅ ** Built-in** |
144- | ** Bundle Size** | ~ Large | ⚡ ** ~ 5kb** |
145-
146- ** The Sweet Spot:** Use ZON to ** save money on Input Tokens** while keeping the strict safety you expect.
147-
148- ---
149-
150- ## Key Features
151-
152- - 🎯 ** 100% LLM Accuracy** : Achieves perfect retrieval (24/24 questions) with self-explanatory structure – no hints needed
153-
154- ### 3. Smart Flattening (Dot Notation)
155- ZON automatically flattens top-level nested objects to reduce indentation.
156- ** JSON:**
157- ``` json
158- {
159- "config" : {
160- "database" : {
161- "host" : " localhost"
162- }
163- }
164- }
165- ```
166- ** ZON:**
167- ```
168- config.database{host:localhost}
169- ```
170-
171- ### 4. Colon-less Structure
172- For nested objects and arrays, ZON omits the redundant colon, creating a cleaner, block-like structure.
173- ** JSON:**
174- ``` json
175- {
176- "user" : {
177- "name" : " Alice" ,
178- "roles" : [" admin" , " dev" ]
179- }
180- }
181- ```
182- ** ZON:**
183- ```
184- user{name:Alice,roles[admin,dev]}
185- ```
186- (Note: ` user{...} ` instead of ` user:{...} ` )
187- - 💾 ** Most Token-Efficient** : 4-15% fewer tokens than TOON across all tokenizers
188- - 🎯 ** JSON Data Model** : Encodes the same objects, arrays, and primitives as JSON with deterministic, lossless round-trips
189- - 📐 ** Minimal Syntax** : Explicit headers (` @(N) ` for count, column list) eliminate ambiguity for LLMs
190- - 🧺 ** Tabular Arrays** : Uniform arrays collapse into tables that declare fields once and stream row values
191- - 🔢 ** Canonical Numbers** : No scientific notation (1000000, not 1e6), NaN/Infinity → null
192- - 🌳 ** Deep Nesting** : Handles complex nested structures efficiently (91% compression on 50-level deep objects)
193- - 🔒 ** Security Limits** : Automatic DOS prevention (100MB docs, 1M arrays, 100K keys)
194- - ✅ ** Production Ready** : 94/94 tests pass, 27/27 datasets verified, zero data loss
195-
19655---
19756
19857## Benchmarks
@@ -394,6 +253,140 @@ Llama 3 (Meta):
394253
395254---
396255
256+ ``` json
257+ {
258+ "context" : {
259+ "task" : " Our favorite hikes together" ,
260+ "location" : " Boulder" ,
261+ "season" : " spring_2025"
262+ },
263+ "friends" : [" ana" , " luis" , " sam" ],
264+ "hikes" : [
265+ {
266+ "id" : 1 ,
267+ "name" : " Blue Lake Trail" ,
268+ "distanceKm" : 7.5 ,
269+ "elevationGain" : 320 ,
270+ "companion" : " ana" ,
271+ "wasSunny" : true
272+ },
273+ {
274+ "id" : 2 ,
275+ "name" : " Ridge Overlook" ,
276+ "distanceKm" : 9.2 ,
277+ "elevationGain" : 540 ,
278+ "companion" : " luis" ,
279+ "wasSunny" : false
280+ },
281+ {
282+ "id" : 3 ,
283+ "name" : " Wildflower Loop" ,
284+ "distanceKm" : 5.1 ,
285+ "elevationGain" : 180 ,
286+ "companion" : " sam" ,
287+ "wasSunny" : true
288+ }
289+ ]
290+ }
291+ ```
292+
293+ <details >
294+ <summary >TOON already conveys the same information with <strong >fewer tokens</strong >.</summary >
295+
296+ ``` yaml
297+ context :
298+ task : Our favorite hikes together
299+ location : Boulder
300+ season : spring_2025
301+ friends[3] : ana,luis,sam
302+ hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny} :
303+ 1,Blue Lake Trail,7.5,320,ana,true
304+ 2,Ridge Overlook,9.2,540,luis,false
305+ 3,Wildflower Loop,5.1,180,sam,true
306+ ` ` `
307+
308+ </details>
309+
310+ ZON conveys the same information with **even fewer tokens** than TOON – using compact table format with explicit headers:
311+
312+ ` ` `
313+ context.task:Our favorite hikes together
314+ context.location:Boulder
315+ context.season:spring_2025
316+ friends:ana,luis,sam
317+ hikes:@(3):companion,distanceKm,elevationGain,id,name,wasSunny
318+ ana,7.5,320,1,Blue Lake Trail,T
319+ luis,9.2,540,2,Ridge Overlook,F
320+ sam,5.1,180,3,Wildflower Loop,T
321+ ```
322+
323+ ### 🛡️ Validation + 📉 Compression
324+
325+ Building reliable LLM apps requires two things:
326+ 1 . ** Safety:** You need to validate outputs (like you do with Zod/Pydantic).
327+ 2 . ** Efficiency:** You need to compress inputs to save money.
328+
329+ ZON is the only library that gives you ** both in one package** .
330+
331+ | Feature | Traditional Validation (e.g. Pydantic) | ZON |
332+ | :--- | :--- | :--- |
333+ | ** Type Safety** | ✅ Yes | ✅ Yes |
334+ | ** Runtime Validation** | ✅ Yes | ✅ Yes |
335+ | ** Input Compression** | ❌ No | ✅ ** Yes (Saves ~ 50%)** |
336+ | ** Prompt Generation** | ❌ Plugins needed | ✅ ** Built-in** |
337+ | ** Bundle Size** | ~ Large | ⚡ ** ~ 5kb** |
338+
339+ ** The Sweet Spot:** Use ZON to ** save money on Input Tokens** while keeping the strict safety you expect.
340+
341+ ---
342+
343+ ## Key Features
344+
345+ - 🎯 ** 100% LLM Accuracy** : Achieves perfect retrieval (24/24 questions) with self-explanatory structure – no hints needed
346+
347+ ### 3. Smart Flattening (Dot Notation)
348+ ZON automatically flattens top-level nested objects to reduce indentation.
349+ ** JSON:**
350+ ``` json
351+ {
352+ "config" : {
353+ "database" : {
354+ "host" : " localhost"
355+ }
356+ }
357+ }
358+ ```
359+ ** ZON:**
360+ ```
361+ config.database{host:localhost}
362+ ```
363+
364+ ### 4. Colon-less Structure
365+ For nested objects and arrays, ZON omits the redundant colon, creating a cleaner, block-like structure.
366+ ** JSON:**
367+ ``` json
368+ {
369+ "user" : {
370+ "name" : " Alice" ,
371+ "roles" : [" admin" , " dev" ]
372+ }
373+ }
374+ ```
375+ ** ZON:**
376+ ```
377+ user{name:Alice,roles[admin,dev]}
378+ ```
379+ (Note: ` user{...} ` instead of ` user:{...} ` )
380+ - 💾 ** Most Token-Efficient** : 4-15% fewer tokens than TOON across all tokenizers
381+ - 🎯 ** JSON Data Model** : Encodes the same objects, arrays, and primitives as JSON with deterministic, lossless round-trips
382+ - 📐 ** Minimal Syntax** : Explicit headers (` @(N) ` for count, column list) eliminate ambiguity for LLMs
383+ - 🧺 ** Tabular Arrays** : Uniform arrays collapse into tables that declare fields once and stream row values
384+ - 🔢 ** Canonical Numbers** : No scientific notation (1000000, not 1e6), NaN/Infinity → null
385+ - 🌳 ** Deep Nesting** : Handles complex nested structures efficiently (91% compression on 50-level deep objects)
386+ - 🔒 ** Security Limits** : Automatic DOS prevention (100MB docs, 1M arrays, 100K keys)
387+ - ✅ ** Production Ready** : 94/94 tests pass, 27/27 datasets verified, zero data loss
388+
389+
397390## Security & Data Types
398391
399392### Eval-Safe Design
0 commit comments