Adding 5 tests that were failing on me: [ wtq-00080, wtq-00085, wtq-00113, wtq-00162, wtq-00168]#9
Conversation
📝 WalkthroughWalkthroughAdds five benchmark JSON datasets, new unit tests (dynamic roundtrip, boolean-key, delta float), and updates core parsing/encoding: introduces Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (4)
zon-format/benchmarks/data/wtq-00113.json (1)
39-41: Normalize missing-value representation for benchmark stability.Using
"-"here while other missing fields use empty strings introduces schema inconsistency and can skew encode/decode retrieval behavior. Prefer one sentinel (""ornull) across all records.Proposed normalization (empty strings)
- "Driver": "-", - "Co-driver": "-", - "Car": "-", + "Driver": "", + "Co-driver": "", + "Car": "",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@zon-format/benchmarks/data/wtq-00113.json` around lines 39 - 41, The JSON record uses "-" as a missing-value sentinel for the fields "Driver", "Co-driver", and "Car", which breaks schema consistency; update those three values from "-" to the normalized sentinel (use empty string "") so they match other records' missing-value representation and avoid encoding/decoding inconsistencies.zon-format/benchmarks/data/wtq-00162.json (1)
15-215: Normalize distance formatting inEventvalues for benchmark stability.
Eventmixes"5000 m"and"10,000 m". For retrieval-accuracy fixtures, this can create avoidable mismatches in tokenization and exact-match assertions. Use one canonical format across all rows (e.g., always"5,000 m"/"10,000 m"or always uncommaed).♻️ Suggested normalization example
- "Event": "5000 m", + "Event": "5,000 m",Apply consistently to every
5000 mentry in this file.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@zon-format/benchmarks/data/wtq-00162.json` around lines 15 - 215, The Event field mixes distance formats (e.g., "5000 m" vs "10,000 m") which hurts benchmark stability; pick a single canonical format (suggestion: remove thousands separators so use "5000 m" and "10000 m", or alternatively use comma style "5,000 m" and "10,000 m") and apply it consistently to every "5000 m" and "10,000 m" entry in this JSON (search for the "Event" keys and replace all occurrences so all rows use the chosen canonical form).zon-format/benchmarks/data/wtq-00085.json (1)
4-4: StandardizeDateandFormatfield values for consistency.This benchmark fixture mixes date granularities (YYYY, Month YYYY, Month DD YYYY) and format casing/tokenization patterns. If the evaluator normalizes or compares these values directly, heterogeneity could introduce spurious variance in benchmark results. Consider standardizing these fields to a single format (e.g., YYYY-MM-DD for dates, consistent casing for formats) or documenting that the evaluator handles this variation intentionally.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@zon-format/benchmarks/data/wtq-00085.json` at line 4, Normalize the "Date" and "Format" fields in this JSON fixture: replace the human-readable "Date" value ("April 19, 2004") with a consistent ISO-like representation (e.g., "2004-04-19" or "2004" depending on required granularity used across fixtures) and standardize the "Format" token/casing to match the project convention (e.g., all-lowercase or TitleCase) so all benchmark fixtures use the same date format and format-token style; update the entry keys "Date" and "Format" in wtq-00085.json accordingly and ensure it follows the same pattern used by other fixtures.zon-format/benchmarks/data/wtq-00168.json (1)
7-7: Replace non-breaking spaces inEventvalues with normal spacesMany
Eventstrings use NBSP (\u00A0) instead of ASCII spaces (e.g.,20 km walk). This can cause subtle encode/decode mismatches and brittle string comparisons in tests.Consider normalizing all
Eventvalues to plain spaces (20 km walk,50 km walk) across benchmark files for deterministic matching.Also applies to: 15-15, 31-31, 39-39, 47-47, 55-55, 63-63, 71-71, 79-79, 87-87, 95-95, 103-103, 111-111, 119-119, 127-127, 135-135, 143-143, 151-151, 159-159, 167-167, 175-175, 183-183, 191-191, 199-199
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@zon-format/benchmarks/data/wtq-00168.json` at line 7, Many benchmark JSON objects use a non-breaking space (\u00A0) inside the "Event" value (e.g., "20\u00A0km walk"), which breaks deterministic string matching — search all files for the "Event" key and replace any NBSP characters in its value with a normal ASCII space so values read like "20 km walk"; update every occurrence (e.g., "50\u00A0km walk" etc.), ensure files are saved with UTF-8 encoding, and run the repository-wide search/replace or a small script to normalize all "Event" values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zon-format/benchmarks/data/wtq-00080.json`:
- Line 21: The file mixes empty strings ("") and hyphens ("-") for missing
values—normalize all missing/unavailable fields in this JSON to use JSON null
instead; specifically replace occurrences of the "Notes" field and other
empty/hyphen values (e.g., the entries referenced at lines 28, 35, 39-41, 49,
56-57, 63-64, 70-71, 84-85, 91-92, 105-106, 112-113, 119-120, 126-127) so their
values are null, ensuring consistent encode/decode behavior across functions
that read these records.
- Line 98: Update the JSON entry where the "Notes" field currently contains the
typo "Price granted posthumously": change the string value to "Prize granted
posthumously" by editing the "Notes" key in the affected record so the wording
is correct.
In `@zon-format/benchmarks/data/wtq-00113.json`:
- Line 98: The "Notes" field contains a typo: replace the string "Price granted
posthumously" with "Prize granted posthumously" in the JSON entry so retrievals
and assertions match expected wording; locate the entry with the "Notes" key in
wtq-00113.json and update its value accordingly.
In `@zon-format/benchmarks/data/wtq-00162.json`:
- Around line 78-80: The Position field currently uses an en dash sentinel
("Position": "–") alongside "Notes": "DNF"; replace the en dash with a canonical
machine-friendly value (e.g., change "Position" to "DNF") so the record reads
"Position": "DNF", or alternatively add normalization logic in consumers to map
the en dash "–" to "DNF"; update any code that reads the Position key to expect
the canonical "DNF" value and/or to normalize the "–" sentinel to "DNF".
In `@zon-format/benchmarks/data/wtq-00168.json`:
- Line 16: Normalize the inconsistent time strings in the Notes fields to
include hours so the format matches HH:MM:SS (and preserve fractional seconds if
present); specifically replace "1:23.47" with "00:01:23.47" and similarly change
"3:45.57" and "3:42.08" to "00:03:45.57" and "00:03:42.08". Update the Notes
values (the JSON "Notes" keys) at the affected entries to use the normalized
HH:MM:SS.xx form to keep the benchmark parser assumptions consistent.
- Around line 67-80: Two consecutive array entries share the same key tuple
(Year, Competition, Venue, Event) but differ (Position "–", Notes "DQ" vs
Position "5th", Notes "3:45:57") and the Event contains a non‑breaking space;
normalize Event by replacing the NBSP with a normal space and dedupe by
collapsing entries with the same (Year, Competition, Venue, Event) after
normalization, keeping the canonical record (prefer the entry with a concrete
Position/result — e.g. Position "5th" with Notes "3:45:57") or otherwise
disambiguate per source data; ensure the fields Year, Competition, Venue, Event,
Position, and Notes are used for the duplicate-detection and update the array to
remove the DQ duplicate.
---
Nitpick comments:
In `@zon-format/benchmarks/data/wtq-00085.json`:
- Line 4: Normalize the "Date" and "Format" fields in this JSON fixture: replace
the human-readable "Date" value ("April 19, 2004") with a consistent ISO-like
representation (e.g., "2004-04-19" or "2004" depending on required granularity
used across fixtures) and standardize the "Format" token/casing to match the
project convention (e.g., all-lowercase or TitleCase) so all benchmark fixtures
use the same date format and format-token style; update the entry keys "Date"
and "Format" in wtq-00085.json accordingly and ensure it follows the same
pattern used by other fixtures.
In `@zon-format/benchmarks/data/wtq-00113.json`:
- Around line 39-41: The JSON record uses "-" as a missing-value sentinel for
the fields "Driver", "Co-driver", and "Car", which breaks schema consistency;
update those three values from "-" to the normalized sentinel (use empty string
"") so they match other records' missing-value representation and avoid
encoding/decoding inconsistencies.
In `@zon-format/benchmarks/data/wtq-00162.json`:
- Around line 15-215: The Event field mixes distance formats (e.g., "5000 m" vs
"10,000 m") which hurts benchmark stability; pick a single canonical format
(suggestion: remove thousands separators so use "5000 m" and "10000 m", or
alternatively use comma style "5,000 m" and "10,000 m") and apply it
consistently to every "5000 m" and "10,000 m" entry in this JSON (search for the
"Event" keys and replace all occurrences so all rows use the chosen canonical
form).
In `@zon-format/benchmarks/data/wtq-00168.json`:
- Line 7: Many benchmark JSON objects use a non-breaking space (\u00A0) inside
the "Event" value (e.g., "20\u00A0km walk"), which breaks deterministic string
matching — search all files for the "Event" key and replace any NBSP characters
in its value with a normal ASCII space so values read like "20 km walk"; update
every occurrence (e.g., "50\u00A0km walk" etc.), ensure files are saved with
UTF-8 encoding, and run the repository-wide search/replace or a small script to
normalize all "Event" values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 431a006a-3848-4c12-a301-281f13e2a516
📒 Files selected for processing (5)
zon-format/benchmarks/data/wtq-00080.jsonzon-format/benchmarks/data/wtq-00085.jsonzon-format/benchmarks/data/wtq-00113.jsonzon-format/benchmarks/data/wtq-00162.jsonzon-format/benchmarks/data/wtq-00168.json
| "Driver": "Charles Muhangi", | ||
| "Co-driver": "Steven Byaruhanga", | ||
| "Car": "Subaru Impreza", | ||
| "Notes": "" |
There was a problem hiding this comment.
Use one missing-value representation across the dataset.
This file mixes "" and "-" for unavailable values, which can make encode/decode normalization and retrieval assertions brittle.
Proposed normalization patch (use JSON nulls for missing fields)
- "Notes": ""
+ "Notes": null- "Driver": "-",
- "Co-driver": "-",
- "Car": "-",
+ "Driver": null,
+ "Co-driver": null,
+ "Car": null,Also applies to: 28-28, 35-35, 39-41, 49-49, 56-57, 63-64, 70-71, 84-85, 91-92, 105-106, 112-113, 119-120, 126-127
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@zon-format/benchmarks/data/wtq-00080.json` at line 21, The file mixes empty
strings ("") and hyphens ("-") for missing values—normalize all
missing/unavailable fields in this JSON to use JSON null instead; specifically
replace occurrences of the "Notes" field and other empty/hyphen values (e.g.,
the entries referenced at lines 28, 35, 39-41, 49, 56-57, 63-64, 70-71, 84-85,
91-92, 105-106, 112-113, 119-120, 126-127) so their values are null, ensuring
consistent encode/decode behavior across functions that read these records.
| "Driver": "Riyaz Kurji", | ||
| "Co-driver": "Sayed Kadri", | ||
| "Car": "Subaru Impreza N8", | ||
| "Notes": "Price granted posthumously" |
There was a problem hiding this comment.
Fix likely typo in Notes text (Price → Prize).
This looks like a spelling error and can reduce benchmark retrieval quality if tests compare expected phrasing.
Proposed text fix
- "Notes": "Price granted posthumously"
+ "Notes": "Prize granted posthumously"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "Notes": "Price granted posthumously" | |
| "Notes": "Prize granted posthumously" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@zon-format/benchmarks/data/wtq-00080.json` at line 98, Update the JSON entry
where the "Notes" field currently contains the typo "Price granted
posthumously": change the string value to "Prize granted posthumously" by
editing the "Notes" key in the affected record so the wording is correct.
| "Driver": "Riyaz Kurji", | ||
| "Co-driver": "Sayed Kadri", | ||
| "Car": "Subaru Impreza N8", | ||
| "Notes": "Price granted posthumously" |
There was a problem hiding this comment.
Possible typo in Notes text (Price vs Prize).
Line 98 reads “Price granted posthumously”, which likely should be “Prize granted posthumously”. This can affect retrieval assertions that depend on exact wording.
Proposed fix
- "Notes": "Price granted posthumously"
+ "Notes": "Prize granted posthumously"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "Notes": "Price granted posthumously" | |
| "Notes": "Prize granted posthumously" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@zon-format/benchmarks/data/wtq-00113.json` at line 98, The "Notes" field
contains a typo: replace the string "Price granted posthumously" with "Prize
granted posthumously" in the JSON entry so retrievals and assertions match
expected wording; locate the entry with the "Notes" key in wtq-00113.json and
update its value accordingly.
| "Position": "–", | ||
| "Event": "5000 m", | ||
| "Notes": "DNF" |
There was a problem hiding this comment.
Use a canonical non-finish representation in Position.
"Position": "–" with "Notes": "DNF" is human-readable, but the en dash sentinel is easy to mishandle in strict parsers/normalizers. Prefer a canonical machine-friendly value (for example, "Position": "DNF"), or ensure fixture consumers explicitly normalize this symbol.
🛠️ Suggested canonical value
- "Position": "–",
- "Notes": "DNF"
+ "Position": "DNF",
+ "Notes": ""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@zon-format/benchmarks/data/wtq-00162.json` around lines 78 - 80, The Position
field currently uses an en dash sentinel ("Position": "–") alongside "Notes":
"DNF"; replace the en dash with a canonical machine-friendly value (e.g., change
"Position" to "DNF") so the record reads "Position": "DNF", or alternatively add
normalization logic in consumers to map the en dash "–" to "DNF"; update any
code that reads the Position key to expect the canonical "DNF" value and/or to
normalize the "–" sentinel to "DNF".
| "Venue": "Split, Yugoslavia", | ||
| "Position": "4th", | ||
| "Event": "20 km walk", | ||
| "Notes": "1:23.47" |
There was a problem hiding this comment.
Normalize time notation in Notes
Line 16, Line 88, and Line 168 use mixed delimiter style (1:23.47, 3:45.57, 3:42.08) while most rows use HH:MM:SS. This inconsistency can break parser assumptions in benchmark tooling.
Proposed patch
- "Notes": "1:23.47"
+ "Notes": "1:23:47"
...
- "Notes": "3:45.57"
+ "Notes": "3:45:57"
...
- "Notes": "3:42.08"
+ "Notes": "3:42:08"Also applies to: 88-88, 168-168
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@zon-format/benchmarks/data/wtq-00168.json` at line 16, Normalize the
inconsistent time strings in the Notes fields to include hours so the format
matches HH:MM:SS (and preserve fractional seconds if present); specifically
replace "1:23.47" with "00:01:23.47" and similarly change "3:45.57" and
"3:42.08" to "00:03:45.57" and "00:03:42.08". Update the Notes values (the JSON
"Notes" keys) at the affected entries to use the normalized HH:MM:SS.xx form to
keep the benchmark parser assumptions consistent.
| "Year": "1994", | ||
| "Competition": "European Championships", | ||
| "Venue": "Helsinki, Finland", | ||
| "Position": "–", | ||
| "Event": "50 km walk", | ||
| "Notes": "DQ" | ||
| }, | ||
| { | ||
| "Year": "1994", | ||
| "Competition": "European Championships", | ||
| "Venue": "Helsinki, Finland", | ||
| "Position": "5th", | ||
| "Event": "50 km walk", | ||
| "Notes": "3:45:57" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
python - <<'PY'
import json
from collections import defaultdict
path = "zon-format/benchmarks/data/wtq-00168.json"
rows = json.load(open(path, encoding="utf-8"))
groups = defaultdict(list)
for i, r in enumerate(rows, start=1):
k = (r.get("Year"), r.get("Competition"), r.get("Venue"), r.get("Event"))
groups[k].append((i, r.get("Position"), r.get("Notes")))
for k, vals in groups.items():
if len(vals) > 1:
print("DUPLICATE KEY:", k)
for v in vals:
print(" line~", v[0], "Position=", v[1], "Notes=", v[2])
PYRepository: ZON-Format/ZON
Length of output: 217
Conflicting duplicate record for the same event key
Array indices 9 and 10 contain two entries with the same (Year, Competition, Venue, Event) but different outcomes: position – with DQ vs position 5th with result 3:45:57. This ambiguity invalidates benchmark ground truth for retrieval assertions.
The Event field contains a non-breaking space (50\xa0km walk), which may contribute to the duplication. Verify the source dataset and retain only the canonical record, or disambiguate the Event values.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@zon-format/benchmarks/data/wtq-00168.json` around lines 67 - 80, Two
consecutive array entries share the same key tuple (Year, Competition, Venue,
Event) but differ (Position "–", Notes "DQ" vs Position "5th", Notes "3:45:57")
and the Event contains a non‑breaking space; normalize Event by replacing the
NBSP with a normal space and dedupe by collapsing entries with the same (Year,
Competition, Venue, Event) after normalization, keeping the canonical record
(prefer the entry with a concrete Position/result — e.g. Position "5th" with
Notes "3:45:57") or otherwise disambiguate per source data; ensure the fields
Year, Competition, Venue, Event, Position, and Notes are used for the
duplicate-detection and update the array to remove the DQ duplicate.
ronibhakta1
left a comment
There was a problem hiding this comment.
Lets keep the tests file to a different folder here to make sure we can keep a track and just make it plugin which we can plug or unplug it from the test scripts
…my tests failing +1 existing one.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zon-format/tests/unit/test_roundtrip_benchmarks.py`:
- Around line 16-21: get_json_files currently returns an empty list when no
benchmark JSONs match, which lets the test class silently skip all dynamic
tests; change get_json_files so after building the sorted list it asserts or
raises a clear error (e.g., raise RuntimeError or use pytest.fail) when the list
is empty, with a message like "no benchmark JSON files found in
BENCHMARKS_DATA_DIR"; apply the same fail-fast check to the other file-discovery
helper in this module (the sibling function that returns benchmark files used
for dynamic tests) so discovery errors cause the test run to fail immediately.
- Around line 31-32: The test opens benchmark fixture files without specifying
encoding, causing failures on non-UTF-8 locales; update the file-opening call in
test_roundtrip_benchmarks.py (the with open(filepath) block that sets original =
json.load(f)) to open the file explicitly as UTF-8 by passing encoding="utf-8"
so the JSON fixtures with non-ASCII bytes (e.g. github-repos.json,
wtq-00162.json, wtq-00168.json) are read consistently across platforms.
- Around line 37-42: The test currently compares JSON strings (orig_json,
dec_json) which masks key type changes; change the assertion to compare the
decoded Python structures directly by replacing the json.dumps-based comparison
with self.assertEqual(original, decoded, f"Roundtrip failed for
{filepath.name}") so the test checks structural and type equality (use
assertDictEqual if you want a clearer dict-specific diff); update/remove the
orig_json/dec_json variables accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: df3563a5-d0aa-464e-912a-dddd994304cc
📒 Files selected for processing (1)
zon-format/tests/unit/test_roundtrip_benchmarks.py
| orig_json = json.dumps(original, sort_keys=True) | ||
| dec_json = json.dumps(decoded, sort_keys=True) | ||
|
|
||
| self.assertEqual( | ||
| orig_json, dec_json, | ||
| f"Roundtrip failed for {filepath.name}" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Demonstrate that json.dumps can hide mapping key type changes.
python - <<'PY'
import json
original = {"1": "value"}
decoded = {1: "value"}
assert original != decoded
assert json.dumps(original, sort_keys=True) == json.dumps(decoded, sort_keys=True)
print("json.dumps comparison masks string-vs-int key changes")
PYRepository: ZON-Format/ZON
Length of output: 111
🏁 Script executed:
cat -n zon-format/tests/unit/test_roundtrip_benchmarks.py | head -60Repository: ZON-Format/ZON
Length of output: 1733
🏁 Script executed:
python3 <<'PY'
# Test direct comparison of complex Python objects loaded from JSON
import json
json_str = '{"1": "value", "nested": {"2": "data"}, "list": [1, 2, 3]}'
original = json.loads(json_str)
# Simulate a zon decoder that accidentally converts string keys to int
decoded_buggy = {1: "value", "nested": {2: "data"}, "list": [1, 2, 3]}
# Test 1: Direct comparison catches the bug
print("Direct comparison catches key type change:", original != decoded_buggy)
# Test 2: JSON string comparison hides the bug
orig_json = json.dumps(original, sort_keys=True)
dec_json = json.dumps(decoded_buggy, sort_keys=True)
print("JSON comparison hides key type change:", orig_json == dec_json)
# Test 3: Edge case - NaN handling
import math
obj1 = {"value": float('nan')}
obj2 = {"value": float('nan')}
print("NaN comparison (might be False):", obj1 == obj2)
PYRepository: ZON-Format/ZON
Length of output: 662
🏁 Script executed:
python3 <<'PY'
import json
# Test what actually happens with json.dumps() on mixed key types
original = {"1": "value"}
decoded = {1: "value"}
# Without sort_keys
try:
orig_json = json.dumps(original)
dec_json = json.dumps(decoded)
print("Without sort_keys:")
print(f" original: {orig_json}")
print(f" decoded: {dec_json}")
print(f" Equal: {orig_json == dec_json}")
except TypeError as e:
print(f"Error without sort_keys: {e}")
# With sort_keys (as in the actual code)
try:
orig_json = json.dumps(original, sort_keys=True)
dec_json = json.dumps(decoded, sort_keys=True)
print("\nWith sort_keys:")
print(f" original: {orig_json}")
print(f" decoded: {dec_json}")
print(f" Equal: {orig_json == dec_json}")
except TypeError as e:
print(f"Error with sort_keys: {e}")
PYRepository: ZON-Format/ZON
Length of output: 229
🏁 Script executed:
# Let's check the actual benchmark data to see if any have numeric-looking keys
ls -la zon-format/benchmarks/data/*.json | head -20Repository: ZON-Format/ZON
Length of output: 1784
🏁 Script executed:
# Check a smaller JSON file to see the structure
head -50 zon-format/benchmarks/data/hikes.json | python3 -m json.tool 2>/dev/null | head -50Repository: ZON-Format/ZON
Length of output: 906
🏁 Script executed:
# Let's understand what zon codec does - check if it's supposed to preserve string keys
cat -n zon-format/zon/__init__.py | head -40Repository: ZON-Format/ZON
Length of output: 116
🏁 Script executed:
find zon-format -name "*.py" -path "*/zon/*" -type f | head -20Repository: ZON-Format/ZON
Length of output: 786
🏁 Script executed:
cat -n zon-format/src/zon/__init__.py | head -50Repository: ZON-Format/ZON
Length of output: 1870
Compare decoded structures directly, not their JSON strings.
json.dumps() coerces object keys to strings, so this can miss non-lossless key-type changes such as {1: "value"} (integer key) becoming {"1": "value"} (string key) in the decoded output. Since the test docstring claims to validate "lossless" encoding, the current comparison fails to catch this type of data corruption.
Proposed fix
- orig_json = json.dumps(original, sort_keys=True)
- dec_json = json.dumps(decoded, sort_keys=True)
-
self.assertEqual(
- orig_json, dec_json,
+ original, decoded,
f"Roundtrip failed for {filepath.name}"
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| orig_json = json.dumps(original, sort_keys=True) | |
| dec_json = json.dumps(decoded, sort_keys=True) | |
| self.assertEqual( | |
| orig_json, dec_json, | |
| f"Roundtrip failed for {filepath.name}" | |
| self.assertEqual( | |
| original, decoded, | |
| f"Roundtrip failed for {filepath.name}" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@zon-format/tests/unit/test_roundtrip_benchmarks.py` around lines 37 - 42, The
test currently compares JSON strings (orig_json, dec_json) which masks key type
changes; change the assertion to compare the decoded Python structures directly
by replacing the json.dumps-based comparison with self.assertEqual(original,
decoded, f"Roundtrip failed for {filepath.name}") so the test checks structural
and type equality (use assertDictEqual if you want a clearer dict-specific
diff); update/remove the orig_json/dec_json variables accordingly.
There was a problem hiding this comment.
{1: "value"} isn't valid json -- we are specifically loading in json content and encoding+decoding it.
the json.load at line 32 will fail with the example
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 5 (char 6)
There was a problem hiding this comment.
@P6rguVyrst, you're right — JSON only permits string keys, so json.load() will never produce a Python dict with integer keys. My example of {1: "value"} was a Python-dict construct that couldn't originate from a JSON file, making that specific concern invalid in this context.
Given that original is always produced by json.load() and zon.decode() is expected to return JSON-compatible structures, comparing via json.dumps(sort_keys=True) is a reasonable approach here. I'll dismiss this comment.
✏️ Learnings added
Learnt from: P6rguVyrst
Repo: ZON-Format/ZON PR: 9
File: zon-format/tests/unit/test_roundtrip_benchmarks.py:37-42
Timestamp: 2026-04-20T19:10:00.422Z
Learning: In `zon-format/tests/unit/test_roundtrip_benchmarks.py`, comparing roundtrip results via `json.dumps(sort_keys=True)` is valid because all input data comes from `json.load()`, which always produces string-keyed dicts (JSON only allows string keys). The concern about integer-key coercion (`{1: "value"}`) does not apply to JSON-sourced data.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
Thanks for getting back so quickly, I didn't have the energy mid-week to follow up, sorry. Added a test for testing round-trip accuracy-- test_roundtrip_benchmarks tests for all JSON files in How to run the tests that validate that #10 is fixing genuine bugs with the format. Currently ZON format does not offer "lossless round-trips". I'm surpprised that the CI isn't failing. Should probably add a workflow to CI that runs the tests before-- I can do that, but not in this PR, as that's a separate concern. |
Hey @P6rguVyrst Thanks for finding's. I only had round trip check's. Lossless once might have got missed out on the egde cases. The edge cases for roundtrip are like Spider Multiverse We can try fixing and adding the github test's with your data sets. Just tell me which one should we merge first |
I was thinking of adding the failing tests first, and then the PR that's fixing them. I don't mind if you scrap my PRs, and patch the code yourself-- just wanted to use ZON, but not have to maintain my own copy of it for accuracy 😅 The PR history for the actual patch in my for got a bit messy, so I'd probably do a squash-and-merge for at least the 3 bug fixes-- there's a bit back and forth there of reverting a PR I accidentally merged to my main. |
Hey I would love to have code fixes from your end. So I will start with the PR which adds the data set's and then you can try to add a fix to that. |
|
Hey @P6rguVyrst Now you can see the test being in action. Would you like to look into the logs above. |
I mean my main branch will fix the failing tests. |
That can be a better approach. |
The encoder now quotes keys that match boolean/null keywords (t, f, true,
false, null, none, nil) to prevent the decoder from misinterpreting them.
The decoder now uses a new parse_key() function for dictionary keys that
preserves them as strings, unlike parse_value() which converts keywords.
This fixes a critical round-trip bug where {"f": 1} would decode as
{False: 1}.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Values in dictionary compression headers containing commas, colons, or other special characters were not being quoted, causing data corruption on decode (values truncated at delimiter). Now uses _format_value() for consistent quoting of dictionary values. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Delta encoding violated spec §2.3's MUST round-trip requirement for float columns: prev + (cur - prev) in IEEE-754 does not recover the original double's bit pattern for arbitrary values, and round(diff, 10) at the encode step compounded the loss. Benchmark data exposed this as e.g. 1865.43 decoding to 1865.4299999999994. Restrict SparseMode.DELTA eligibility to int-only columns. Float columns now fall through to standard value encoding, which round-trips exactly via Python's shortest-round-trip str(float). Int delta encoding (the common case: IDs, counts, timestamps) is unchanged. Regression test covers multiple precision regimes — benchmark values, math.pi/math.e, 0.1+0.2, extreme exponents, negatives — so a future round-to-N workaround cannot sneak through. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
zon-format/src/zon/core/encoder.py (1)
176-189: Delta-eligibility check correctly excludesbool/float/None.Because
boolis a subclass ofint, usingisinstance(val, int) and not isinstance(val, bool)is the right idiom here — a plainisinstance(val, int)check would have acceptedTrue/Falseas delta-eligible and still been wrong. Restricting DELTA to pure int columns is the correct remedy for the IEEE-754 round-trip failure surfaced by the newtest_float_column_roundtrip_is_losslesstest.One optional refinement: a column that is int-valued except for occasional
None(sparse) will now fall back to non-delta encoding even where delta is still lossless. If that's a noticeable size regression in the benchmarks, you could treatNoneas "skip this sample" rather than disqualifying. Not required for correctness.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@zon-format/src/zon/core/encoder.py` around lines 176 - 189, The is_int_only check should explicitly accept only pure ints and exclude bools; replace the loop condition with the clearer idiom using isinstance(val, int) and not isinstance(val, bool) (i.e., set is_int_only = False when not (isinstance(val, int) and not isinstance(val, bool))) so columns containing bool/float/None are excluded from selecting SparseMode.DELTA (refer to variables/functions: is_int_only, values, SparseMode.DELTA).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zon-format/src/zon/core/utils.py`:
- Around line 49-58: The code branch where trimmed.startswith('"') catches
JSONDecodeError but only strips quotes when trimmed.endswith('"') can return a
string with a dangling leading quote; update the error path so that when a
leading quote is present but JSON decoding fails you do not return trimmed
verbatim — either raise a descriptive parsing exception or strip the leading
(and trailing if present) quote before returning so the caller doesn't get a
leaked '"' character. Locate the block that tries json.loads(trimmed) in
zon/core/utils.py (the code handling trimmed and JSONDecodeError) and make it
mimic the existing parse_value behavior: on decode failure, remove enclosing
quotes (or raise a parsing error) rather than returning the original trimmed
string unchanged.
In `@zon-format/tests/unit/test_delta.py`:
- Around line 135-144: The loop using zip(data, decoded) can silently truncate
if decoded is shorter; either assert lengths before comparing or pass
strict=True to zip so mismatched lengths fail the test—update the test in
test_delta.py to compare len(decoded) == len(data) or change the loop to for
original, got in zip(data, decoded, strict=True) and keep the existing
repr-based float equality check using original['v'] and got['v'] to surface any
truncation regressions.
---
Nitpick comments:
In `@zon-format/src/zon/core/encoder.py`:
- Around line 176-189: The is_int_only check should explicitly accept only pure
ints and exclude bools; replace the loop condition with the clearer idiom using
isinstance(val, int) and not isinstance(val, bool) (i.e., set is_int_only =
False when not (isinstance(val, int) and not isinstance(val, bool))) so columns
containing bool/float/None are excluded from selecting SparseMode.DELTA (refer
to variables/functions: is_int_only, values, SparseMode.DELTA).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 21ae1361-714a-4563-af39-833e74bf2277
📒 Files selected for processing (5)
zon-format/src/zon/core/decoder.pyzon-format/src/zon/core/encoder.pyzon-format/src/zon/core/utils.pyzon-format/tests/unit/test_boolean_keys.pyzon-format/tests/unit/test_delta.py
| if trimmed.startswith('"'): | ||
| try: | ||
| return json.loads(trimmed) | ||
| except json.JSONDecodeError: | ||
| if trimmed.endswith('"'): | ||
| inner = trimmed[1:-1] | ||
| json_str = inner.replace('""', '\\"') | ||
| return json.loads(f'"{json_str}"') | ||
|
|
||
| return trimmed |
There was a problem hiding this comment.
Malformed quoted keys can leak the leading quote into the returned string.
If trimmed starts with " but the initial json.loads raises and trimmed does not end with " (e.g., an unterminated quoted key reaching this point), the endswith branch is skipped and the function returns trimmed verbatim — including the leading quote character. This is unlikely in practice since the tokenizer typically won't yield such fragments, but the current code silently produces a string key containing " rather than signalling the malformed input.
Consider either raising on that path or unconditionally stripping the enclosing quotes before falling through. Behaviour otherwise matches the existing parse_value pattern, so this is minor.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@zon-format/src/zon/core/utils.py` around lines 49 - 58, The code branch where
trimmed.startswith('"') catches JSONDecodeError but only strips quotes when
trimmed.endswith('"') can return a string with a dangling leading quote; update
the error path so that when a leading quote is present but JSON decoding fails
you do not return trimmed verbatim — either raise a descriptive parsing
exception or strip the leading (and trailing if present) quote before returning
so the caller doesn't get a leaked '"' character. Locate the block that tries
json.loads(trimmed) in zon/core/utils.py (the code handling trimmed and
JSONDecodeError) and make it mimic the existing parse_value behavior: on decode
failure, remove enclosing quotes (or raise a parsing error) rather than
returning the original trimmed string unchanged.
There was a problem hiding this comment.
The encoder produces "f" and "true" (properly closed quotes). The malformed "foo case requires:
- Hand-crafted invalid ZON, OR
- Bug in encoder/tokenizer
The bug is not reachable with valid ZON.
Should the decoder be strict (reject malformed input) or permissive (best-effort)?
That's a design choice, not a correctness bug.
There was a problem hiding this comment.
@P6rguVyrst, that's a fair and correct rebuttal. You're right — since the encoder always emits properly closed quoted strings, the unterminated-quote path is not reachable through normal encoder output, and triggering it would require deliberately crafted or externally corrupted ZON. My concern was labeled minor precisely because of this low likelihood.
On the strict vs. permissive question: both are valid positions.
- Strict (
raise ValueError("Malformed quoted key: ...")) makes the decoder a spec validator and surfaces upstream bugs early. - Permissive (current behavior, best-effort strip) maximises interoperability with hand-authored or third-party ZON.
Since this is explicitly a design choice and the current behavior is consistent with the existing parse_value pattern in the same file, I won't push further. The behavior is acceptable as-is.
| decoded = self.decoder.decode(self.encoder.encode(data)) | ||
|
|
||
| for original, got in zip(data, decoded): | ||
| # repr(float) is the shortest string that round-trips to the same | ||
| # double, so repr equality is equivalent to bit equality. | ||
| self.assertEqual( | ||
| repr(original['v']), | ||
| repr(got['v']), | ||
| f"float roundtrip lost precision: {original['v']!r} -> {got['v']!r}", | ||
| ) |
There was a problem hiding this comment.
Pass strict=True to zip so truncation can't silently pass the test.
If a future regression caused decoder.decode(...) to return fewer rows than data, zip would silently iterate over the shorter prefix and the test would still pass. Adding strict=True (or a prior assertEqual(len(decoded), len(data))) makes that failure mode visible. Ruff also flags this as B905.
🔧 Proposed tweak
- for original, got in zip(data, decoded):
+ self.assertEqual(len(decoded), len(data))
+ for original, got in zip(data, decoded, strict=True):🧰 Tools
🪛 Ruff (0.15.10)
[warning] 137-137: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@zon-format/tests/unit/test_delta.py` around lines 135 - 144, The loop using
zip(data, decoded) can silently truncate if decoded is shorter; either assert
lengths before comparing or pass strict=True to zip so mismatched lengths fail
the test—update the test in test_delta.py to compare len(decoded) == len(data)
or change the loop to for original, got in zip(data, decoded, strict=True) and
keep the existing repr-based float equality check using original['v'] and
got['v'] to surface any truncation regressions.
There was a problem hiding this comment.
The finding doesn't apply meaningfully to this test's purpose.
If a future regression caused decoder.decode(...) to return fewer rows than data
Row dropping isn't a realistic failure mode. If it were to happen, the benchmark roundtrip tests would fail immediately.
There was a problem hiding this comment.
@P6rguVyrst, fair point — the float-precision test is narrowly scoped, and the benchmark roundtrip suite already acts as a broader safety net for structural regressions like row dropping. I'll defer to your judgment here.
(╯°□°)╯
🧠 Learnings used
Learnt from: P6rguVyrst
Repo: ZON-Format/ZON PR: 9
File: zon-format/tests/unit/test_roundtrip_benchmarks.py:37-42
Timestamp: 2026-04-20T19:10:05.092Z
Learning: In `zon-format/tests/unit/test_roundtrip_benchmarks.py`, comparing roundtrip results via `json.dumps(sort_keys=True)` is valid because all input data comes from `json.load()`, which always produces string-keyed dicts (JSON only allows string keys). The concern about integer-key coercion (`{1: "value"}`) does not apply to JSON-sourced data.
It's all in one PR now, however I don't see the workflow test runs triggering. |
|
@P6rguVyrst Before we merge. do you wanna do a token test where we use token tester model to get the idea how much is getting consumed there @coderabbitai please tag the right files we have in the code base |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 10786 🏁 Script executed: Length of output: 24275
🔢 Token-related files
|
Yes, I kept it to be triggered mnually |
Looks like at least with this dataset the tests now pass? :) There was one rounding error even with the original dataset-- not sure if it's fixed all the acuracy bugs, but after fixing 3 distinct issues, it does seem more accurate than before. Let me know if I should address fix else within this scope-- did address few CodeRabbit nitpicks yesterday.. |
Yeah the test pass this time thanks for that. Speaking about code rabbit finding if the issues are minor let's fix and finish it off. Speaking about if any of the tests or datasets fails we have to figure it out cause it can lead to more issues on user's accuracy. |
|
Should I do anything else? I addressed CodeRabbit comments where appropriate, not going to commnet on test data which is valid json input. |
Yeah, looks good to me. |
I don't have the permission to merge, nor should I 🫠 |
Yeah sorry I will do that rn. I got busy yesterday to work on something. |
|
@P6rguVyrst I have deployed the package. You can use it seamlessly |

Summary
Three round-trip bugs fixed, discovered via accuracy testing with WTQ benchmark datasets.
Bug Fixes
1. Boolean-like dictionary keys converted to booleans
Problem:
{"f": 1}decoded as{False: 1}.Fix: Encoder now quotes keys matching boolean/null keywords (t, f, true, false, null, none, nil). Decoder uses a new
parse_key()function that preserves keys as strings.2. Dictionary header values not quoted
Problem: Values containing commas, colons, or special characters were truncated at delimiters.
Fix: Uses
_format_value()for consistent quoting of dictionary compression header values.3. Float columns corrupted by delta encoding
Problem: Delta encoding violated spec §2.3's round-trip requirement. IEEE-754 arithmetic
prev + (cur - prev)doesn't recover original bit patterns, andround(diff, 10)compounded the loss.Example:
1865.43decoded as1865.4299999999994Fix: Restrict
SparseMode.DELTAto int-only columns. Float columns use standard value encoding which round-trips exactly via Python's shortest-round-tripstr(float).Regression tests cover: benchmark values, math.pi/math.e, 0.1+0.2, extreme exponents, negatives.
Testing Context
Added 5 WTQ datasets that were failing accuracy tests with
encode()/decode().Model used:
llama3.1:8b-instruct-q4_K_M(localhost via ollama)Limitation: No AZURE_OPENAI_* env vars available to run against the zon repo accuracy suite.
Test data sources: