Skip to content

Migrate LexFlags parsing to GrmtoolsSectionParser#553

Closed
ratmice wants to merge 1 commit into
softdevteam:masterfrom
ratmice:lrlex_header_lexflags
Closed

Migrate LexFlags parsing to GrmtoolsSectionParser#553
ratmice wants to merge 1 commit into
softdevteam:masterfrom
ratmice:lrlex_header_lexflags

Conversation

@ratmice
Copy link
Copy Markdown
Collaborator

@ratmice ratmice commented Apr 14, 2025

So here is what I've managed to come up with for migrating the LexFlags parsing code in the %grmtools section to use the GrmtoolsSectionParser, with this patch we parse the %grmtools section twice.

  1. The first in CTParserBuilder, to pull out LexerKind,
  2. Then again in one of the LRNonStreamingLexerDef constructors to get and merge the LexFlags.
  3. As a consequence by the time LexParser has run Header has been lowered to a LexFlags,
    and the whole %grmtools section has been stripped off, so we don't need to parse it a third time to find the beginning
    of the lex file's contents.

We do all the merging in step 2, partially because LexFlags lacks a notion of MergeBehavior and that is the point where we still have enough information to merge without introducing multiple parameters like forcing and default.
This also kindly isolates the LexBuildHeaderError (for lack of any good name) out of LexParser.
This allows us to keep around the new_with_lex_flags constructor but simplified a little bit with fewer arguments.

One very important thing to note is that this changes the return type of the trait method from_str.
A second thing to note is that I noticed that from_rules doesn't have any method to change the LexFlags, it is still relying on the defaults.
A third thing to note is that this may not produce the nicest of errors, but I think we should be able to improve them without breaking semver now.

Comment thread lrlex/src/lib/ctbuilder.rs Outdated
.join("\n")
})?;
let lexerdef =
LRNonStreamingLexerDef::<LexerTypesT>::new_with_header(&lex_src, header)
Copy link
Copy Markdown
Collaborator Author

@ratmice ratmice Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get to the trailhead before the defect in this patch came to me,
We aren't using mark_used on the header, but I'm not even sure we can, passing it by value.
merge_from isn't preserving mark_used, it's kind of assuming that all of that will happen after merging.

It isn't exactly clear to me whether we can pass this to LRNonStreamingLexerDef not as an owned value, but as a &mut ref.

Edit: I'll likely (and feel free to) just close this, but there is a bit here in this patch which seems unlikely to change

Copy link
Copy Markdown
Collaborator Author

@ratmice ratmice Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, there are basically two approaches that I can think of,

  1. the first is passing &mut Header in to LexParser, then doing the merge inside LexParser
    That is basically how we did YaccParser, but with LexParser there is the trait involved.
    We may need to add an alternate merge_into which uses the merge behavior of the data source, rather than
    the data's destination like merge_from.
    Given the traits in LexerDef traits I'm not absolutely certain we can pull this one off without modifying the trait, because of the additional lifetime, however I haven't yet tried.
  2. the second is essentially doing the merging outside of LexParser and passing LexFlags in.
    There is a second code path entirely for reading the header from a string without merging.
    Then the difference is the first code path uses Header<Location> while the second uses Header<Span>.
    This approach is nice in that it keeps Location outside of LexParser, and could do the same for
    YaccGrammar if we pass YaccKind in.

I think the second approach is more in line/semver compatible with prior releases but is perhaps a bit tricky to get right with it's generic types the patch in 189bbfc was a first step in this direction, but It got tricky fairly quickly after that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't totally understood the details, but on the face of it (2) does sound like the better route to me. Is this PR the right place to do it or ...?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this PR is the right place for it, it actually ends up undoing a lot of what is in this PR.

I do think (1) can go in this PR, it is basically the same approach taken here, I think maybe a good idea is to try the first approach, finishing it off in this PR if only to give us a fallback option that we can hold off on merging. Just in case something goes awry with (2) which I also think seems preferable but also has some complexity trade-offs.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the first approach is implemented in this PR in 7468581 this didn't require trait modifications at all, and was actually a relatively minor change to the patch.

I'll try and start work on a new branch implementing second approach tomorrow.

Comment thread lrlex/src/lib/ctbuilder.rs Outdated
.join("\n")
})?;
let lexerdef =
LRNonStreamingLexerDef::<LexerTypesT>::new_with_header(&lex_src, header)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't totally understood the details, but on the face of it (2) does sound like the better route to me. Is this PR the right place to do it or ...?

}
}
LexBuildHeaderError::Header(e) => {
format!("{}", e)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question: can we not give line number information for the header?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think even that would be awkward, e.g. YaccGrammarError assumes spans exist (the least SpansKind assumes there is one span), we could add a SpansKind::None, and YaccGrammarErrorKind::Header(HeaderErrorKind), It feels to me somewhat like throwing the baby out with the bath water though, and e.g. we still have to make changes of similar scope to adding Location. In that sense Location::CommandLine is somewhat type equivalent to not having a span.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough!

@ltratt
Copy link
Copy Markdown
Member

ltratt commented Apr 16, 2025

Might we be ready for squashing with that latest commit?

@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented Apr 16, 2025

Sure, but I still want to hold off/try that other approach, because it would seem like if we can keep Location out of these interior errors I can only imagine it'll be much cleaner.

@ratmice ratmice force-pushed the lrlex_header_lexflags branch from 7468581 to 190497c Compare April 16, 2025 09:15
@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented Apr 19, 2025

Think we can close this one, and will start over migrating LexFlags on top of #554

@ratmice ratmice closed this Apr 19, 2025
@ratmice ratmice deleted the lrlex_header_lexflags branch April 24, 2025 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants