Skip to content

od: fix handling of non-ascii chars#8268

Merged
RenjiSann merged 1 commit into
uutils:mainfrom
tgrez:main
Jun 27, 2025
Merged

od: fix handling of non-ascii chars#8268
RenjiSann merged 1 commit into
uutils:mainfrom
tgrez:main

Conversation

@tgrez

@tgrez tgrez commented Jun 25, 2025

Copy link
Copy Markdown
Contributor

Fixes #8202

I'm not sure if this fix is complete, because I couldn't reproduce the original behavior of uutils impl with GNU od, so I don't know in which situations the previous implementation was valid (I assume there is/was a reason for implementing it the way it was implemented).

Regardless of the above, this fixes the reported issue. It still lacks localization support, but it works correctly for LC_ALL=C as well as LC_ALL=en_US.UTF-8.

@tgrez tgrez force-pushed the main branch 3 times, most recently from 50ecff0 to 94feb2d Compare June 25, 2025 15:59
@github-actions

Copy link
Copy Markdown

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

@RenjiSann

Copy link
Copy Markdown
Collaborator

It still lacks localization support

I'm not familiar with od, do you have an example of its behavior changing depending on the locale ?

@tgrez

tgrez commented Jun 26, 2025

Copy link
Copy Markdown
Contributor Author

It still lacks localization support

I'm not familiar with od, do you have an example of its behavior changing depending on the locale ?

I'd say the difference could come from different encodings, eg:

$ echo -n 'ą' | iconv -f UTF-8 -t ISO-8859-2 | od -t c
0000000 261
0000001
$ echo -n 'ą' | od -t c
0000000 304 205
0000002

@tgrez

tgrez commented Jun 26, 2025

Copy link
Copy Markdown
Contributor Author

And also:

$ echo -n 'ą' | LC_ALL=pl_PL.ISO-8859-2 od -t c
0000000   ? 205
0000002
$ echo -n 'ą' | LC_ALL=C od -t c
0000000 304 205
0000002

@RenjiSann

Copy link
Copy Markdown
Collaborator

I'd say the difference could come from different encodings, eg:

$ echo -n 'ą' | iconv -f UTF-8 -t ISO-8859-2 | od -t c
0000000 261
0000001
$ echo -n 'ą' | od -t c
0000000 304 205
0000002

Unless I'm mistaken, in this example, it's the input given to od that changes, so I don't think it changes anything for od in itself.

@RenjiSann

Copy link
Copy Markdown
Collaborator

I think this fix is good though, I'll happily merge it once the typo is addressed ^^'

@tgrez

tgrez commented Jun 26, 2025

Copy link
Copy Markdown
Contributor Author

Unless I'm mistaken, in this example, it's the input given to od that changes, so I don't think it changes anything for od in itself.

Ah, yes, you're correct.

I'll happily merge it once the typo is addressed ^^'

Sorry for the confusion, which typo? :)

@RenjiSann

RenjiSann commented Jun 26, 2025

Copy link
Copy Markdown
Collaborator

Sorry for the confusion, which typo? :)

C.f. my comment #8268 (comment)

@tgrez

tgrez commented Jun 26, 2025

Copy link
Copy Markdown
Contributor Author

Happy to adjust anything, including typos.

Sorry, but I can't see this comment, after I click the link I'm redirected only to this PR, adding some screenshots:

Screenshot 2025-06-26 at 22 16 27 Screenshot 2025-06-26 at 22 16 59

Comment thread src/uu/od/src/prn_char.rs Outdated

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// // ASCII byte (0xxxxxxx)
// ASCII byte (0xxxxxxx)

@RenjiSann

Copy link
Copy Markdown
Collaborator

That's absolutely my bad, I didn't see this was still pending
Sorry, thanks for noticing 😅

@tgrez

tgrez commented Jun 27, 2025

Copy link
Copy Markdown
Contributor Author

That's absolutely my bad, I didn't see this was still pending Sorry, thanks for noticing 😅

No worries! And thank you for the review!

Fixed the typo, rebased on the latest main and retested.

@github-actions

Copy link
Copy Markdown

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

@RenjiSann RenjiSann merged commit dcc5492 into uutils:main Jun 27, 2025
76 checks passed
@RenjiSann

Copy link
Copy Markdown
Collaborator

Thank you for your contribution ! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

od incorrectly handles non ascii chars

2 participants