recursive-leak: fix leak by using middle-man#494
Conversation
| pub fn define<P: Parser<'a, I, O, E> + Clone + MaybeSync + 'a + 'b>( | ||
| parser: P, | ||
| declaration: Recursive<Indirect<'a, 'b, I, O, E>, Declared>, | ||
| ) -> Self { |
There was a problem hiding this comment.
Should this function be on Recursive<..., Declared> and return a Recursive<..., Defined>?
There was a problem hiding this comment.
Yeah, you could do that too. For type inference it doesn’t matter, but it could make more sense logically.
|
While this method seems nice, it also makes more complicated recursive patterns harder - you can't have a single context struct on which you both use and define the recursive parser, since the two things are separate types. While you may be able to re-architect code in most cases to avoid this, I'm curious if that's an intended limitation? |
No, it's not intended. We could wrap them both in an enum (somewhat like |
|
Given that those checks fail for reasons outside of this PR, how would you like me to proceed? |
|
You should be able to rebase to fix the failing CI now :) |
|
Finally got around to it. My bad for the mess that is the commit history in this pull request 😅 One would think that after using it for four+ years that I would be able to keep it somewhat clean |
|
I totally forgot about this PR, I'll take a final look at it tomorrow. |
|
Was the issue mentioned by CraftSpider above resolved? |
|
Is there any way I could help on this issue? |
|
I think it probably just needs a rebase and then it's ready to go. That said, I am a bit concerned about the issue @CraftSpider mentioned above: it will probably break the way some people are using |
|
I think that you can probably change it to always store one or the other in the state, but I haven't really gotten around to trying it out with this change. I'm not too worried personally. |
|
Right, in that case it's probably fine. I would appreciate the commit history being cleaned up a bit here before merging too (otherwise I think I'll squash). |
|
I need to look at my use of recursive tomorrow to see whether whether I can understand the limitation and whether it hurts my usage, I don't get it yet without staring at the code. |
|
I believe it's specifically for the case of |
|
So, my apologies for dropping off of the face of this earth for so long. If I get this rebased in the next coming days, would it be good to be merged, or is there anything else that it is needing? |
|
No worries, there's no time constraint here :) I think this change is probably the way to go, on reflection. It's unfortunate that it breaks certain uses of |
|
So... I'll be honest. I tried to rebase this thing for like an hour :') Merging took all of two seconds, so obviously I suck at rebasing. Edit: It seems to fail with an issue that is unrelated to this PR. Should I fix it up anyway and push the change? |
|
Maybe it would be better to just take the diff from Don't worry about unrelated CI stuff, I'll address that elsewhere :) |
|
Sounds good! I’ll take care of that after I get outta work! |
|
@zesterer Could we have a new alpha release once this gets merged? |
Wanted to ask, isn't it the same as using "Squash and merge" option when merging on GitHub? It seems like @Zij-IT got busy and it's a bit sad that this PR is essentially ready, but not merged. We want to use this library in production in our company, but this memory leak is a problem in our use case (long-running process that can parse thousands of inputs). |
|
I believe the behaviour is the same, yes: with the exception that the commit message will be a concatenation of every intermediate commit, which is undesirable. I would do it myself, but I don't want to be taking credit for @Zij-IT's work in the commit log. |
b1a9004 to
32c97f9
Compare
32c97f9 to
30a48ca
Compare
|
My apologies to everyone that has been waiting for me to make the change. I just did what was recommended by @zesterer, so this should be good to roll. The test that fails fails because what was a runtime error became a compiler error in the tests case. There isn't AFAIK a way to test that something fails to compile, so I can't write an equivalent test. As such, I just deleted the test :/ |
|
You can test something that fails to compile by writing a doctest: /// ```compile_fail
/// // ...
/// ```
fn test_name() {}A specific error code can be provided, separated by a comma: There might be some more setup required, but this is a method I've used in the past. This has the drawback of not being able to specify an expected error (other than its code, and not all errors have codes). A better solution would likely use |
| pub fn define<P: Parser<'a, I, O, E> + Clone + MaybeSync + 'a + 'b>( | ||
| parser: P, | ||
| declaration: Self, | ||
| ) -> Recursive<Indirect<'a, 'b, I, O, E>, Defined> { |
There was a problem hiding this comment.
What's the reason for not having the declaration argument be self?
There was a problem hiding this comment.
Honestly nothing in particular. I'll swap it over to having it be a method!
Edit: Actually, there is a small factor for why. If it's a method that consumes self, than you end up having to do something like:
let mut chain = Recursive::declare();
// Note the clone vvvvvvv
let result = chain.clone().define(just::<_, _, extra::Err<Simple<char>>>('+')
.then(chain.clone())
.map(|(c, chain)| Chain::Link(c, Box::new(chain)))
.or_not()
.map(|chain| chain.unwrap_or(Chain::End)));
// As opposed to:
let result = Recursive::define(just::<_, _, extra::Err<Simple<char>>>('+')
.then(chain.clone())
.map(|(c, chain)| Chain::Link(c, Box::new(chain)))
.or_not()
.map(|chain| chain.unwrap_or(Chain::End)), chain);There was a problem hiding this comment.
@zesterer Which one do you prefer? I'll do it whichever way you'd prefer. I personally like the symmetry between Recursive::declare and Recursive::define and the second way makes it more obvious that Recursive::define consumes the parser.
There was a problem hiding this comment.
Hmm, I think I prefer the first, personally. The extra parameter way off at the end of the definition feels very isolated and random if you're not already sure what it is. Also, putting it at the front makes clear what the target of the definition is.
|
I'm pretty sure GitHub decided the 7 commits before mine are different because the hashes are different... I try and do the right thing with a rebase I manage to create clutter :D Anyhow, this should be the final commit for the PR to be merged, right? |
|
I'll try to find time to take another pass of this today. Hopefully, yep! |
38bb88c to
02a1373
Compare
|
So, I don't believe this solution will work in the way that I was hoping. It was intended to help situations like seen here. However, due to the issue described jkylling in #664. I found a test case in the wild by Sail's SQL Parser. If you apply this patch to the lastest version of Chumsky, and run the tests for sail-sql-analyzer, it will fail due to a parser "not being defined before it's used". The definition of the failing parser is here. I haven't tested if the solution in #664 can be extended to support Here is the patch for the Sail parser: patch After toying around with ideas for a bit, I believe similarly to jkylling that it is probably worth making Here is an example of how this function can be used: recursive_n(|(expression, query, data_type, table_with_joins)| {
let expression = Expr::parser(
(expression.clone(), query.clone(), data_type.clone()),
options,
);
let query = Query::parser(
(query.clone(), expression.clone(), table_with_joins.clone()),
options,
);
let data_type = DataType::parser(data_type.clone(), options);
let table_with_joins = TableWithJoins::parser(
(query.clone(), expression.clone(), table_with_joins.clone()),
options,
);
(expression, query, data_type, table_with_joins)
})This solution allows the user to define many queries, with the leftmost-parser of the result of I was able to using Sail - fix.patch Chumsky - recursive_fix.patch @zesterer - What are your thoughts? |
I agree with your assessment. Unfortunately I don't have the background in formal methods to prove it, but I'm beginning to suspect that solving this problem in the general case requires cycle-aware garbage collection, which is not something we have the power to apply here.
I do quite like this solution! My initial instinct was to use a macro, but perhaps this is entirely achievable with the type system and a generic function alone. I'd love to see what this ends up looking like. For the sake of completeness, my macro idea looked roughly like: parsers! {
atom = ... ;
// Parsers are internally declared as `Recursive`s and can reference on-another
expr: Expr = atom.or(block);
block = expr
.separated_by(just(';'))
.delimited_by(just('{'), just('}'));
}I suspect that your generic function design is likely to be more robust though. Is this something you were interested in pushing to the PR stage? |
I had thought about a macro, but then
Yeah, I honestly don't know why I didn't just write a PR for it instead of uploading the patches 😅 Given that this solution doesn't solve the leak without introducing another problem, I am going to go ahead and close this in favor of PR #894. |
A while back I polled some others in the Rust community, and the consensus seems to be that most people consider |
This introduces a middle man to prevent both memory leaks (#486) in recursive parsers, and prevents the user from foot-gunning themselves by calling the wrong
clone(referring to craftspider'sweak_clone) suggestion.