[April 26] - Hardened parsers #2544
Sebastian Thiel (Byron)
announced in
Progress Update
Replies: 1 comment 1 reply
-
|
Really? Can't justify doing anything purely by hand? You aren't even a software engineer at this point. The fall of another good project. Legitimately thought it was a cool project for a while. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
What a month! And one of the few where the report is late, and the first one where it's 3 days late! But at least I have a good reason for it: security!
In the course of just a couple of weeks a bunch of advisories were reported, and I wouldn't be surprised if this is no coincidence, with one of them openly being associated with Anthropic.
Nothing happened for a few weeks until 2 days before the report and associated release I decided to tackle them all, and opened a little security-rabbit hole in the process.
Gitoxide, now hardened
Of course, by now I can't justify doing anything purely by hand anymore so I unleashed Codex to sort out all the fixes.
After starting the review of these fixes, one advisory in particular stood out: doctored or malicious input causes panics, OOM or long runtimes in
gix-pack. Isn't that something that a fuzzer would easily have found, too?So I went in and instructed it to add fuzzers to everything that even remotely looked like a parser.
2 days of review later and a couple of discarded or severely simplified fuzz setups later, there were a lot more fixes where that one advisory came from.
The result of this effort is hardening for all even moderately complex parsers, along with all advisories fixed, making v0.82 of
gixthe most secure release yet.Better Trust-check on Windows
Git uses file ownership to determine if it's allowed to open a repository, bailing completely if it's untrusted and not in
safe.directory.gitoxidedoes it differently and merely avoids using untrusted configuration, while reducing its tolerance towards allocations that are controlled by a potential attacker (seegitoxide.objects.allocLimitIfReducedTrust).For a long time, the implementation of said trust check was quite lacking on Windows predominantly due to my lack of skill (and pain tolerance) with the Windows API. Thankfully, LLMs feel no pain and motivated by a bug report, Codex could quite easily whip up a fix. A new Windows VM of mine that it could work on and validate fixes against definitely helped. And of course, Eliah Kagan who quickly found a fatal flaw in the first iteration that led to using a Windows VM in the first place.
Being able to run Codex on Windows definitely helps me to feel less averse to spinning it up, after all, I reduce my involvement to a couple of prompts.
As a side-effect of this work, we now also have
gix free trustto evaluate the trust level of any system path, greatly helping with debugging curious cases.Diff and merge
As a close second major topic this month we must talk about blob diffing and the related merging in
gitoxide. Thus far, we were dependent on the venerableimara-diffcrate, and it just seemed to work. However, asgitoxideseems to be heavily fuzzed, Google OSS Fuzz found animara-diffissue causing runaway computation with doctored input.The seeming problem was quickly identified by Codex and just a one-line change. But getting that change released turned out to be a problem as I couldn't get ahold of the crate owner at all.
So I quickly decided to just fork it into
gix-imara-diff, and get the fix this way. And I think it's for the better long-term as well, as I now feel much more confident to be able to maintain the crate, at least as long as I don't run out of tokens :D. Also I will finally be able to add all the tests I always wanted, also to better establish how close it is to the Git baseline.Merge
For merges to work, two diffs are performed,
base -> oursandbase -> theirs. It turns out thatMyers, the default, can create hunk configurations that look odd enough to cause conflicts even though that wouldn't have to be. And of course, Git doesn't suffer from the problem first reproduced by Mattias Granlund from GitButler (a company I am affiliated with).It turns out that the solution was easy: simply apply Git-style hunk postprocessing to the diffs the merge needs, by default. This yields higher-quality and more standard diffs, along with reduced changes for inexplicable merge conflicts.
And my feeling is that this is just the beginning and line-based diffing can now get more conformant and faster as well.
PS: For best performance even in pathological cases, one should use the
histogramdiff (git config --global diff.algorithm histogram), which is now used in the fuzz-tests forgix-mergeas otherwise the fuzzer would start trickingMyersinto large runtimes with small input.thiserror->gix-errorLast month this section accidentally disappeared, apparently because I didn't push that topic forward for a whole month. Initially it was planned as my late-night review topic so auto-generated transitions could be reviewed one by one. It turned out to be hard to review in practice as Codex struggled with it, so review was subtle and tricky to get right, leading to me feeling faster if I had done it by hand and myself.
Maybe GPT 5.5 will make a difference and reinvigorate this topic that is very close to my heart. It will be so good to not have
thiserroringixanymore as it will allow for a much quicker implementation of new topics, while leaving excellent error chains nonetheless.Multi-Line Commit-Trailers
Finally
gixis able to parse commit-trailers properly as these are now supported with all their intricacies (there are surprisingly many). The most useful of these improvements might be the multi-line support for trailers. The previous implementation might have only seen one line even though there were many, and the new one will correctly concatenate them into one line.Community
SHA-256 Support
Thanks to the continuous effort of Christoph Rüßler more crates now correctly deal with SHA-256 hashes.
This includes the
TreeRefIteringix-objectand SHA-256 tests run ingix-diff-tests, and many more that I can't recollect by heart.Thank you!
Faster Trees
Thanks to datdenkikniet the ubiquitous tree parsing got much faster, which should be noticeable in all applications that perform a lot of tree-diffs or have to figure out 'blames', or traverse a lot of trees for other reasons.
There was also a subtle bug that prevented paths that start with a space from being parsed anymore, but that was never released fortunately.
More conforming ref-prefixes
Thanks to Fintan Halpenny from Radicle
gitoxidenow more correctly handles ref-prefixes when fetching, and from all I can tell they are now implemented just like in Git.Thank you!
Gix in Cargo
There is nothing new here, but let's keep the horizon active:
Cheers
Sebastian
PS: The latest timesheets can be found here (2026).
Beta Was this translation helpful? Give feedback.
All reactions