Improved pratt ergonomics, added Mini ML example by zesterer · Pull Request #685 · zesterer/chumsky

zesterer · 2024-10-21T19:40:32Z

API-wise, this only really includes one breaking change (bar trivial technicalities relating to semi-internal systems): it removes the overloading that pratt operators previously had. The result is that the compiler is much more likely to infer the correct types in user code without annotation.

TheOnlyTails · 2024-10-22T20:33:16Z

If we can try adding the features from #600 and #682 together in this PR, that would be amazing (I love the changes so far). My language has ~40 constructs that would constitute a Pratt "operator", so it would be really appreciated.

zesterer · 2024-10-22T21:23:39Z

I can give it a go tomorrow :)

zesterer · 2024-10-23T19:13:53Z

If we can try adding the features from #600 and #682 together in this PR, that would be amazing

TheOnlyTails · 2024-10-24T06:54:20Z

I am in genuine awe, this is amazing!

TheOnlyTails · 2024-10-24T16:52:19Z

Just to be sure, the correct way to use this is by using a Vec of boxed operators instead of a tuple of operators, right?

zesterer · 2024-10-25T10:04:08Z

Just to be sure, the correct way to use this is by using a Vec of boxed operators instead of a tuple of operators, right?

You can do both. A tuple still works as before (and will be faster, since the compiler can much more aggressively inline things). But in the case where you simply can't construct a tuple (for example, if you parser gets built up while the program is running) then Vec will work too.

choice already works in a similar way.

zesterer · 2024-10-25T21:49:08Z

@TheOnlyTails I've found a solution to your other problem too. pratt::Operator is now implemented recursively for tuples, so you can nest them inside one-another to an arbitrary depth, allowing for an infinite number of operators in a single pratt combinator (like choice).

mlgiraud · 2024-11-04T10:37:30Z

I like this change, but it seems to have a negative performance impact on parsing times compared to having all the operators in a single larger tuple. For my use case this results in an overall increase of the compile time (parsing takes up most of the time at the moment) of around 100-200ms, from around 1.3 seconds to around 1.5 seconds. The profiler doesn't really give me that much useful information of where the time is spent in release mode. I think i might have to try to profile it in valgrind, but it's currently not worth the effort. Just FYI that this seems to be a bit slower.

zesterer · 2024-11-04T11:03:01Z

I like this change, but it seems to have a negative performance impact on parsing times compared to having all the operators in a single larger tuple. For my use case this results in an overall increase of the compile time (parsing takes up most of the time at the moment) of around 100-200ms, from around 1.3 seconds to around 1.5 seconds. The profiler doesn't really give me that much useful information of where the time is spent in release mode. I think i might have to try to profile it in valgrind, but it's currently not worth the effort. Just FYI that this seems to be a bit slower.

Thanks for letting me know. Just to clarify, are you referring to the compilation time of your compiler itself, or the time rustc requires to compile your compiler?

Additionally, have you made any changes to your code such as boxing operators or using Vec as the operator group?

mlgiraud · 2024-11-04T12:18:10Z

EDIT:
And yes, i am referring to the actual runtime of my parser, and not rustc's runtime. I have not checked if the chagne has a major impact on rustc's runtime.

I had to refactor the code a bit because the spanned function was removed in favor of map_span, but this shouldn't really make that much of a difference, right?. The other change is only that i now no longer provide a single large tuple of pratt parsers, but a tuple of tuples, where each subtuple is grouped by e.g. arithmetic, prefix, or comparison operators. I am not using vectors for the pratt parser.

A small excerpt:

    let operation = atom.clone().pratt((
        (
            prefix(34, unary_operator(UOp::IntNegate), fold_unary_operation),
            prefix(33, unary_operator(UOp::BoolNegate), fold_unary_operation),
            prefix(32, unary_operator(UOp::BitNegate), fold_unary_operation),
            prefix(31, unary_operator(UOp::FloatNegate), fold_unary_operation),
        ),
        (
            infix(left(30), binary_operator(BOp::IntMul), fold_binary_operation),
            infix(left(29), binary_operator(BOp::IntDiv), fold_binary_operation),
            infix(left(28), binary_operator(BOp::IntSDiv), fold_binary_operation),
            infix(left(27), binary_operator(BOp::IntRemainder), fold_binary_operation),
            infix(left(26), binary_operator(BOp::IntSRemainder), fold_binary_operation),
            infix(left(25), binary_operator(BOp::FloatDiv), fold_binary_operation),
            infix(left(24), binary_operator(BOp::FloatMul), fold_binary_operation),
            infix(left(23), binary_operator(BOp::IntAdd), fold_binary_operation),
            infix(left(22), binary_operator(BOp::IntSubtract), fold_binary_operation),
            infix(left(21), binary_operator(BOp::FloatAdd), fold_binary_operation),
            infix(left(20), binary_operator(BOp::FloatSubtract), fold_binary_operation),
        ),

Before, all pratt parsers were simply in a single tuple.

zesterer · 2024-11-04T17:14:29Z

but this shouldn't really make that much of a difference, right?

Correct, it shouldn't. I'd be quite surprised if the codegen was any different at all, actually.

The other change is only that i now no longer provide a single large tuple of pratt parsers, but a tuple of tuples

That might have had an impact. Chumsky will try each operator in turn to see if it matches, so if more likely operators have now been shuffled closer to the bottom of the list, this could have a small impact on performance. How much that actually matters in practice depends on how complex your parser is and what inputs you're feeding it, of course.

One option is that I could enable more aggressive inlining. If I make a branch for this, would you be happy to point your project at it and give it a go?

mlgiraud · 2024-11-05T09:40:19Z

Sure, i can try out anything, just let me know what branch to use. I will at some point also make my sources public. If the issue still persists, then you could just take a look there. The Order of the operators is still the same, except that they are now grouped in tuples by category, so this shouldn't make a difference, right?

zesterer · 2024-11-05T14:08:41Z

except that they are now grouped in tuples by category, so this shouldn't make a difference, right?

I would hope not, provided the inliner is doing its job.

I've pushed a branch called inline-pratt. You can point to it like so:

[dependencies]
chumsky = { git = "https://github.com/zesterer/chumsky.git", branch = "inline-pratt" }

Let me know how it goes!

Edit: thinking about it more, I think there's a possibility it could be due to rustc poorly optimising moves... I will do some investigation on this too.

mlgiraud · 2024-11-05T14:38:50Z

Im trying out right now. Will update with the results. I don't think rust optimizes moves that poorly. I tried passing my AST by reference for example, but it turned out that actually moving the whole tree and destructuring it had better performance than walking through the references. I need to investigate that further though. However, this is currently not my main priority, since the compile time is not that critical to my application.

EDIT:
The inlining doesn't seem to make any difference. Maybe it's not due to the updated pratt parsing, but some other changes that were made between the last alpha release and the current master. I can try to bisect, or do you maybe have an idea what change could have caused this?

EDIT2:
I measured again, and it does seem to be slightly faster, but only around 10-20 ms, so still slower than on the last alpha.

zesterer changed the title ~~Began working on Mini ML example~~ Added Mini ML example Oct 21, 2024

zesterer force-pushed the mini-ml-example branch from 0ca60ec to e2d12ad Compare October 22, 2024 12:27

zesterer changed the title ~~Added Mini ML example~~ Improved pratt ergonomics, added Mini ML example Oct 22, 2024

zesterer and others added 8 commits October 23, 2024 17:14

Began working on Mini ML example

6dfce6a

Added function parsing

9475a0a

Improved pratt parser type inference by removing fold overloading

01c1aa2

fmt

73fb95d

Added required feature

1b45d32

Elided lifetimes

466e3c7

Added type solver to mini_ml

a38a935

Added better error reporting

4fb64df

zesterer force-pushed the mini-ml-example branch from a54c493 to 4fb64df Compare October 23, 2024 16:14

Added pratt operator boxing and vector pratt operator impl

79384b7

Appease the clippy gods

568846c

Recursively implement pratt::Operator

c7b6d51

Added docs to mini_ml

5a01a44

zesterer merged commit 9d2b187 into main Oct 25, 2024

zesterer mentioned this pull request Oct 25, 2024

Added support for pratt parsers with up to 676 operators #682

Closed

Uh oh!

Conversation

zesterer commented Oct 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheOnlyTails commented Oct 22, 2024

Uh oh!

zesterer commented Oct 22, 2024

Uh oh!

zesterer commented Oct 23, 2024

Uh oh!

TheOnlyTails commented Oct 24, 2024

Uh oh!

TheOnlyTails commented Oct 24, 2024

Uh oh!

zesterer commented Oct 25, 2024

Uh oh!

zesterer commented Oct 25, 2024

Uh oh!

mlgiraud commented Nov 4, 2024

Uh oh!

zesterer commented Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlgiraud commented Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zesterer commented Nov 4, 2024

Uh oh!

mlgiraud commented Nov 5, 2024

Uh oh!

zesterer commented Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlgiraud commented Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zesterer commented Oct 21, 2024 •

edited

Loading

zesterer commented Nov 4, 2024 •

edited

Loading

mlgiraud commented Nov 4, 2024 •

edited

Loading

zesterer commented Nov 5, 2024 •

edited

Loading

mlgiraud commented Nov 5, 2024 •

edited

Loading