recursive-leak: fix leak by using middle-man by Zij-IT · Pull Request #494 · zesterer/chumsky

Zij-IT · 2023-07-28T16:16:28Z

This introduces a middle man to prevent both memory leaks (#486) in recursive parsers, and prevents the user from foot-gunning themselves by calling the wrong clone (referring to craftspider's weak_clone) suggestion.

zesterer

Neat trick!

zesterer · 2023-07-31T11:02:09Z

+    pub fn define<P: Parser<'a, I, O, E> + Clone + MaybeSync + 'a + 'b>(
+        parser: P,
+        declaration: Recursive<Indirect<'a, 'b, I, O, E>, Declared>,
+    ) -> Self {


Should this function be on Recursive<..., Declared> and return a Recursive<..., Defined>?

Yeah, you could do that too. For type inference it doesn’t matter, but it could make more sense logically.

I'd say so, yes.

Completed with 4572bb5

CraftSpider · 2023-08-03T20:31:46Z

While this method seems nice, it also makes more complicated recursive patterns harder - you can't have a single context struct on which you both use and define the recursive parser, since the two things are separate types. While you may be able to re-architect code in most cases to avoid this, I'm curious if that's an intended limitation?

Zij-IT · 2023-08-07T09:08:11Z

While this method seems nice, it also makes more complicated recursive patterns harder - you can't have a single context struct on which you both use and define the recursive parser, since the two things are separate types. While you may be able to re-architect code in most cases to avoid this, I'm curious if that's an intended limitation?

No, it's not intended. We could wrap them both in an enum (somewhat like Cow), and have an arm for both Defined and Declared, which would mean that the user would only have to deal with Declared and Defined. The one thing I don't like about the enum is that it opens the door for the user to attempt to use define on a parser that is already been defined, which isn't possible with the way it's currently structured.

Zij-IT · 2023-08-07T09:15:07Z

Given that those checks fail for reasons outside of this PR, how would you like me to proceed?

zesterer · 2023-08-07T15:49:22Z

You should be able to rebase to fix the failing CI now :)

Zij-IT · 2023-08-17T13:43:22Z

Finally got around to it. My bad for the mess that is the commit history in this pull request 😅 One would think that after using it for four+ years that I would be able to keep it somewhat clean

zesterer · 2023-09-22T20:00:51Z

I totally forgot about this PR, I'll take a final look at it tomorrow.

zesterer · 2023-09-24T11:51:17Z

Was the issue mentioned by CraftSpider above resolved?

faassen · 2023-10-10T17:35:23Z

Is there any way I could help on this issue?

zesterer · 2023-10-10T17:45:16Z

I think it probably just needs a rebase and then it's ready to go. That said, I am a bit concerned about the issue @CraftSpider mentioned above: it will probably break the way some people are using Recursive. That said, arguably such uses are just wrong and should be discouraged? Interested in thoughts on this.

CraftSpider · 2023-10-10T18:03:43Z

I think that you can probably change it to always store one or the other in the state, but I haven't really gotten around to trying it out with this change. I'm not too worried personally.

zesterer · 2023-10-10T18:15:20Z

Right, in that case it's probably fine. I would appreciate the commit history being cleaned up a bit here before merging too (otherwise I think I'll squash).

faassen · 2023-10-10T18:25:14Z

I need to look at my use of recursive tomorrow to see whether whether I can understand the limitation and whether it hurts my usage, I don't get it yet without staring at the code.

zesterer · 2023-10-10T18:33:37Z

I believe it's specifically for the case of Recursive::declare/Recursive::define. Doing my_recursive.define(...) previously had the signature fn(self: &mut Recursive, impl Parser) (i.e: it was an in-place mutation) but now has the signature fn(self: Recursive<A>, impl Parser) -> Recursive<B> (i.e: it consumes self and transforms it to a new type, thereby turning it into a strong reference, as is the case internally with the recursive function).

Zij-IT · 2024-01-28T20:12:06Z

So, my apologies for dropping off of the face of this earth for so long. If I get this rebased in the next coming days, would it be good to be merged, or is there anything else that it is needing?

zesterer · 2024-01-29T14:39:33Z

No worries, there's no time constraint here :) I think this change is probably the way to go, on reflection. It's unfortunate that it breaks certain uses of Recursive::declare/define but those modes of use are inherently leaky in a way that seems like it would require generalised garbage collection to disentangle. So, yes, I'd be happy to see this merged.

Zij-IT · 2024-01-30T15:21:20Z

So... I'll be honest. I tried to rebase this thing for like an hour :') Merging took all of two seconds, so obviously I suck at rebasing.

Edit: It seems to fail with an issue that is unrelated to this PR. Should I fix it up anyway and push the change?

zesterer · 2024-01-31T08:46:30Z

Maybe it would be better to just take the diff from main, then go back to main and apply that diff on top, creating a new commit entirely? Then you can force-push that to this branch.

Don't worry about unrelated CI stuff, I'll address that elsewhere :)

Zij-IT · 2024-01-31T08:48:03Z

Sounds good! I’ll take care of that after I get outta work!

Flygrounder · 2024-02-16T16:35:59Z

@zesterer Could we have a new alpha release once this gets merged?

Flygrounder · 2024-02-29T14:19:41Z

Maybe it would be better to just take the diff from main, then go back to main and apply that diff on top, creating a new commit entirely? Then you can force-push that to this branch.

Don't worry about unrelated CI stuff, I'll address that elsewhere :)

Wanted to ask, isn't it the same as using "Squash and merge" option when merging on GitHub? It seems like @Zij-IT got busy and it's a bit sad that this PR is essentially ready, but not merged. We want to use this library in production in our company, but this memory leak is a problem in our use case (long-running process that can parse thousands of inputs).

zesterer · 2024-02-29T15:39:35Z

I believe the behaviour is the same, yes: with the exception that the commit message will be a concatenation of every intermediate commit, which is undesirable. I would do it myself, but I don't want to be taking credit for @Zij-IT's work in the commit log.

Zij-IT · 2024-02-29T19:41:27Z

My apologies to everyone that has been waiting for me to make the change. I just did what was recommended by @zesterer, so this should be good to roll.

The test that fails fails because what was a runtime error became a compiler error in the tests case. There isn't AFAIK a way to test that something fails to compile, so I can't write an equivalent test. As such, I just deleted the test :/

wackbyte · 2024-02-29T22:10:17Z

You can test something that fails to compile by writing a doctest:

/// ```compile_fail
/// // ...
/// ```
fn test_name() {}

A specific error code can be provided, separated by a comma: compile_fail,E9999.

There might be some more setup required, but this is a method I've used in the past.

This has the drawback of not being able to specify an expected error (other than its code, and not all errors have codes). A better solution would likely use trybuild.

(rustdoc documentation)

zesterer · 2024-02-29T22:38:39Z

+    pub fn define<P: Parser<'a, I, O, E> + Clone + MaybeSync + 'a + 'b>(
+        parser: P,
+        declaration: Self,
+    ) -> Recursive<Indirect<'a, 'b, I, O, E>, Defined> {


What's the reason for not having the declaration argument be self?

Honestly nothing in particular. I'll swap it over to having it be a method!

Edit: Actually, there is a small factor for why. If it's a method that consumes self, than you end up having to do something like:

let mut chain = Recursive::declare(); // Note the clone vvvvvvv let result = chain.clone().define(just::<_, _, extra::Err<Simple<char>>>('+') .then(chain.clone()) .map(|(c, chain)| Chain::Link(c, Box::new(chain))) .or_not() .map(|chain| chain.unwrap_or(Chain::End))); // As opposed to: let result = Recursive::define(just::<_, _, extra::Err<Simple<char>>>('+') .then(chain.clone()) .map(|(c, chain)| Chain::Link(c, Box::new(chain))) .or_not() .map(|chain| chain.unwrap_or(Chain::End)), chain);

@zesterer Which one do you prefer? I'll do it whichever way you'd prefer. I personally like the symmetry between Recursive::declare and Recursive::define and the second way makes it more obvious that Recursive::define consumes the parser.

Hmm, I think I prefer the first, personally. The extra parameter way off at the end of the definition feels very isolated and random if you're not already sure what it is. Also, putting it at the front makes clear what the target of the definition is.

Zij-IT · 2024-05-04T22:03:30Z

I'm pretty sure GitHub decided the 7 commits before mine are different because the hashes are different... I try and do the right thing with a rebase I manage to create clutter :D

Anyhow, this should be the final commit for the PR to be merged, right?

zesterer · 2024-05-10T08:56:20Z

I'll try to find time to take another pass of this today. Hopefully, yep!

Zij-IT · 2025-10-25T13:14:02Z

So, I don't believe this solution will work in the way that I was hoping. It was intended to help situations like seen here.

However, due to the issue described jkylling in #664. I found a test case in the wild by Sail's SQL Parser. If you apply this patch to the lastest version of Chumsky, and run the tests for sail-sql-analyzer, it will fail due to a parser "not being defined before it's used". The definition of the failing parser is here.

I haven't tested if the solution in #664 can be extended to support $N$ recursive parsers as seen in the sail parser without. If so, it would be great as it prevents the foot-gun of holding onto a parser for too short a time, which this solution clearly does not.

Here is the patch for the Sail parser: patch

After toying around with ideas for a bit, I believe similarly to jkylling that it is probably worth making Recursive::declared private, and offering wrapped ways to work around it, so that the user doesn't footgun themselves. I have a similar take to the solution offered in #664, but this solution involves another trait.

Here is an example of how this function can be used:

recursive_n(|(expression, query, data_type, table_with_joins)| {
    let expression = Expr::parser(
        (expression.clone(), query.clone(), data_type.clone()),
        options,
    );

    let query = Query::parser(
        (query.clone(), expression.clone(), table_with_joins.clone()),
        options,
    );

    let data_type = DataType::parser(data_type.clone(), options);

    let table_with_joins = TableWithJoins::parser(
        (query.clone(), expression.clone(), table_with_joins.clone()),
        options,
    );

    (expression, query, data_type, table_with_joins)
})

This solution allows the user to define many queries, with the leftmost-parser of the result of recursive_n being used as the parser, and the rest being considered the dependencies. It would also be possible to define a intermediate struct like Recurse { parser: expression, dependencies: (query, data_type, table_with_joins) } so that it is explicit which parser is actually going to be run as the "primary".

I was able to using sail-sql-analyzer as a dependency in a crate and call parse_expression which constructs the above parser. Miri doesn't report a leak, and I can't think of how one would be done. Here are both patches:

Sail - fix.patch

Chumsky - recursive_fix.patch

@zesterer - What are your thoughts?

zesterer · 2025-10-27T15:03:05Z

After toying around with ideas for a bit, I believe similarly to jkylling that it is probably worth making Recursive::declared private, and offering wrapped ways to work around it, so that the user doesn't footgun themselves. I have a similar take to the solution offered in #664, but this solution involves another trait.

I agree with your assessment. Unfortunately I don't have the background in formal methods to prove it, but I'm beginning to suspect that solving this problem in the general case requires cycle-aware garbage collection, which is not something we have the power to apply here.

Here is an example of how this function can be used:

I do quite like this solution! My initial instinct was to use a macro, but perhaps this is entirely achievable with the type system and a generic function alone. I'd love to see what this ends up looking like.

For the sake of completeness, my macro idea looked roughly like:

parsers! {
    atom = ... ;

    // Parsers are internally declared as `Recursive`s and can reference on-another
    expr: Expr = atom.or(block);

    block = expr
        .separated_by(just(';'))
        .delimited_by(just('{'), just('}'));
}

I suspect that your generic function design is likely to be more robust though.

Is this something you were interested in pushing to the PR stage?

Zij-IT · 2025-10-27T20:24:24Z

My initial instinct was to use a macro, but perhaps this is entirely achievable with the type system and a generic function alone. I'd love to see what this ends up looking like.

I had thought about a macro, but then Recursive::declare and Recursive::define would have to remain in the public API, and I don't know if that is something that is necessarily wanted. I suppose it would be possible to #[doc(hidden)] both of them. I abandoned the idea though because I had no interest in breaking my brain trying to write such a macro 😆

Is this something you were interested in pushing to the PR stage?

Yeah, I honestly don't know why I didn't just write a PR for it instead of uploading the patches 😅

Given that this solution doesn't solve the leak without introducing another problem, I am going to go ahead and close this in favor of PR #894.

zesterer · 2025-10-27T23:20:46Z

I had thought about a macro, but then Recursive::declare and Recursive::define would have to remain in the public API, and I don't know if that is something that is necessarily wanted. I suppose it would be possible to #[doc(hidden)] both of them.

A while back I polled some others in the Rust community, and the consensus seems to be that most people consider doc(hidden) to be sufficient to pull an item out of the semver & soundness guarantees of a crate, so that's the policy I'm adopting from now on (it makes my life easier!).

zesterer reviewed Jul 31, 2023

View reviewed changes

Zij-IT mentioned this pull request Sep 22, 2023

How to reuse parsers without recreating them? #501

Closed

zesterer mentioned this pull request Oct 10, 2023

Tracking: Stable 1.0 #543

Open

35 tasks

zesterer mentioned this pull request Jan 8, 2024

when use tokio::spawn i meet problem anyone can help #574

Closed

zesterer force-pushed the main branch from b03bcf8 to b8fc4eb Compare February 29, 2024 10:51

Zij-IT force-pushed the fixes/recursive-leak branch from b1a9004 to 32c97f9 Compare February 29, 2024 18:57

recursive-leak: do what was suggested by @zesterer

30a48ca

Zij-IT force-pushed the fixes/recursive-leak branch from 32c97f9 to 30a48ca Compare February 29, 2024 19:13

recursive-leak: remove failing test because it's a compiler error now

0018352

recursive-leak: fix docs

66dff49

zesterer reviewed Feb 29, 2024

View reviewed changes

zesterer and others added 8 commits May 4, 2024 23:52

Fixed Foo example

5e02609

⚡ Add nom to cbor benchmark

b368a71

Revert "Revert MapExtra lifetime commit (and re-move to input module)"

2fff378

Removed unnecessary feature

067b8e4

Remove unnecessary lifetimes

662ed4c

Bumped version

cbec7c0

Allow leading underscores in idents

cf3b66e

recursive-leak: make item a member as desired in PR

843d1da

zesterer mentioned this pull request Sep 1, 2024

Add recursive_fold and recursive_2 to prevent memory leaks #664

Open

zesterer force-pushed the main branch 5 times, most recently from 38bb88c to 02a1373 Compare May 26, 2025 14:11

zesterer mentioned this pull request Oct 27, 2025

Discussion: A useful pattern for reducing boilerplate in parser functions #884

Open

Zij-IT mentioned this pull request Oct 27, 2025

Fix #486: Add another recursive parser #894

Open

Zij-IT closed this Oct 27, 2025

Uh oh!

Conversation

Zij-IT commented Jul 28, 2023

Uh oh!

zesterer left a comment

Choose a reason for hiding this comment

Uh oh!

zesterer Jul 31, 2023

Choose a reason for hiding this comment

Uh oh!

Zij-IT Jul 31, 2023

Choose a reason for hiding this comment

Uh oh!

zesterer Aug 1, 2023

Choose a reason for hiding this comment

Uh oh!

Zij-IT Aug 7, 2023

Choose a reason for hiding this comment

Uh oh!

CraftSpider commented Aug 3, 2023

Uh oh!

Zij-IT commented Aug 7, 2023

Uh oh!

Zij-IT commented Aug 7, 2023

Uh oh!

zesterer commented Aug 7, 2023

Uh oh!

Zij-IT commented Aug 17, 2023

Uh oh!

zesterer commented Sep 22, 2023

Uh oh!

zesterer commented Sep 24, 2023

Uh oh!

faassen commented Oct 10, 2023

Uh oh!

zesterer commented Oct 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CraftSpider commented Oct 10, 2023

Uh oh!

zesterer commented Oct 10, 2023

Uh oh!

faassen commented Oct 10, 2023

Uh oh!

zesterer commented Oct 10, 2023

Uh oh!

Zij-IT commented Jan 28, 2024

Uh oh!

zesterer commented Jan 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Zij-IT commented Jan 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zesterer commented Jan 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Zij-IT commented Jan 31, 2024

Uh oh!

Flygrounder commented Feb 16, 2024

Uh oh!

Flygrounder commented Feb 29, 2024

Uh oh!

zesterer commented Feb 29, 2024

Uh oh!

Zij-IT commented Feb 29, 2024

Uh oh!

wackbyte commented Feb 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zesterer Feb 29, 2024

Choose a reason for hiding this comment

Uh oh!

Zij-IT Mar 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Zij-IT Mar 1, 2024

Choose a reason for hiding this comment

Uh oh!

zesterer commented Oct 10, 2023 •

edited

Loading

zesterer commented Jan 29, 2024 •

edited

Loading

Zij-IT commented Jan 30, 2024 •

edited

Loading

zesterer commented Jan 31, 2024 •

edited

Loading

wackbyte commented Feb 29, 2024 •

edited

Loading

Zij-IT Mar 1, 2024 •

edited

Loading

Zij-IT commented Oct 25, 2025 •

edited

Loading