Skip to content

optimize put_back_char#3317

Open
Skgland wants to merge 6 commits into
mthom:masterfrom
Skgland:optimize-put_back_char
Open

optimize put_back_char#3317
Skgland wants to merge 6 commits into
mthom:masterfrom
Skgland:optimize-put_back_char

Conversation

@Skgland
Copy link
Copy Markdown
Contributor

@Skgland Skgland commented Apr 26, 2026

Always encode the char directly into the buffer.
Only shift the buffer content if there isn't enough room in the front.

I noticed in #3314 that we were copying the buffer content into a temp buffer just to copy it back.
Always doing two copies seamed unnecessary.
This now does zero copies if there is enough space at the front and otherwise inserts the missing amount of bytes at the front copying the existing content once.
So this changes from always two copies to at most one1.

Footnotes

  1. the two copies were to a separate allocation so non-overlapping, the new copy is most definitely an overlapping copy.

josd and others added 2 commits April 26, 2026 13:42
Always encode the char directly into the buffer.
Only shift the buffer content if there isn't enough room in the front.
@Skgland Skgland force-pushed the optimize-put_back_char branch from 98d3664 to af54fb2 Compare April 26, 2026 12:14
@Skgland
Copy link
Copy Markdown
Contributor Author

Skgland commented Apr 26, 2026

Rebased on top of #3316 to prevent conflicts

Comment thread src/parser/char_reader.rs Outdated
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: see if these tests are still slow

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unexpected results for running the following test script

./test.sh
#!/usr/bin/bash

set -x

commit=$(git describe)

cargo clean --workspace
cargo +nightly miri test --no-run -- --ignored lorem_ipsum

run()
{
time cargo +nightly miri test -- --ignored lorem_ipsum
}

run &> "${commit}.log"
commit real user sys
8dffd72 (master) 0m39.583s 0m39.757s 0m0.553s
b2dad66 (PR 3316) 17m26.012s 17m24.759s 0m0.804s
af54fb2 (this PR) 6m56.416s 6m56.457s 0m0.570s

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit f8ddd78 improves the performance to a point better than before 8dffd72

commit real user sys
f8ddd78 0m27.786s 0m26.686s 0m0.616s

Copy link
Copy Markdown
Contributor Author

@Skgland Skgland Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant to write was

Commit f8ddd78 improves the performance to a point better than before b2dad66 (PR 3316)
i.e better than 8dffd72 (master)

Comment thread src/parser/char_reader.rs

// Raise the error. If we still have data in
// the buffer, it will be returned on the next
// loop.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell this comment is a lie.
Even before my change there is nothing that consumes the invalid bytes from the buffer.

The test I added has to manually consume the invalid bytes

https://github.com/mthom/scryer-prolog/pull/3317/changes#diff-2a865146e51bf3d68c6f674a0abdcf0da9c8a2c25b0d04fa040375443bdb2c86R402

https://github.com/mthom/scryer-prolog/pull/3317/changes#diff-2a865146e51bf3d68c6f674a0abdcf0da9c8a2c25b0d04fa040375443bdb2c86R421

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment can be false. It would be a lie if it is deliberately false, which I think we can safely exclude. Very likely it is an error due to a wrong assumption or mental model of the situation.

Copy link
Copy Markdown
Contributor Author

@Skgland Skgland Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, lie might have been a bit strong of a word.
Could also be that the code was changed but the comment was overlooked and not updated in tandem.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is: Is the current behavior, the comment or neither correct?

The comment reads to me as if peek_char should consume the invalid bytes.
Currently implemented is that neither peek_char nor read_char consume the invalid bytes when they return the error.
It might be reasonable for only read_char to consume the invalid bytes when it returns the error similar to how it consumes the valid bytes when it returns a char.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mthom you wrote the comment in question, could you shine some light on what the expected behavior is. Currently it appears to me that the implementation and the comment disagree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants