Skip to content

Lrpar section parser ctbuilder take2#549

Merged
ltratt merged 5 commits into
softdevteam:masterfrom
ratmice:lrpar_section_parser_ctbuilder_take2
Apr 11, 2025
Merged

Lrpar section parser ctbuilder take2#549
ltratt merged 5 commits into
softdevteam:masterfrom
ratmice:lrpar_section_parser_ctbuilder_take2

Conversation

@ratmice
Copy link
Copy Markdown
Collaborator

@ratmice ratmice commented Apr 7, 2025

Here is a sequence of patches, which comprise my attempts to clean up #547
This is still a bit of a doozy, but I'm not sure how to split it up in any nicer way.
I hope the commit messages, and inline docs help a little in providing the motivation for each patch.

I have left out the RecoveryKind additions for a subsequent patch.

I'm going to keep this as a draft for now, I at least want to go through and do some manual testing,
and trying to figure out how terribly the errors this patch produces have regressed.
I also think I'm probably leaving in some YaccGrammarErrorKind variants that clippy is not discovering are not used
that should probably be removed.

…arserError.

This adds a type `Location`, for non-span specific error
locations such as `CommandLine`. In preparation for converting
command-line arguments into `Header`, and passing a header built
from the command line into the `YaccParser`.

This tries to minimize the impact to the `YaccParser` code by
keeping most of the internals of `YaccParser` still returning
`YaccGrammarError`, and converting these errors at the fringes.
@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented Apr 7, 2025

I also wanted to ask you to please feel free to have adverse or allergic reactions to this code,

I will admit that it is starting to grow on me... primarily because reasonable people will only ever need to pass in Header::new() if all their things are in %grmtools{...} or never see it if they only use nimbleparse or a ctbuilder.

Comment thread cfgrammar/src/lib/header.rs Outdated
}
}

impl From<YaccKind> for Value {
Copy link
Copy Markdown
Collaborator Author

@ratmice ratmice Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it occurred to me that we could also add try_from, I suppose it might simplify update_yacckind a little, but I think that is the only place we convert to YaccKind, so I'm not sure if it really needs to be a trait?

impl TryFrom<Value> for YaccKind {
...
}

Edit: I think that'll be a relevant approach though for RecoveryKind, where we'll want to avoid duplicating the try_from in nimbleparse and CTParserBuilder.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The challenge with From is that it's infallible. At the moment it looks OK but it is a strong future commitment. In that sense TryFrom gives more wriggle room in the future. So it partly depends how confident we are about the API as-is.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I had kind off assumed that because e.g. nimbleparse converts strings to YaccKind we'd always be able to convert or there would be some applicable string version of a YaccKind.

The most complex situation I can think of arises where we have a non-string parameter #541 comes to mind,
where I proposed to add a RecoveryKind::CPCTPlusChannel(mut channel), and one cannot come up with a textual version for passing a channel, however map this value to the normal RecoveryKind::CPCTPlus via text.

It didn't occur to me we could have one with no mapping to text whatsoever, so I'll have to think about that.
My thoughts for the situation with "CPCTPlus" is that it's string intrepretation couold be mapped via the program to be interpreted either as RecoveryKind::CPCTPlus or as a CPCTPlusChannel(...) with a channel constructed by the program.

That is to say I can at least imagine having a surjective mapping, but I hadn't thought of cases where we don't have a mapping.

Copy link
Copy Markdown
Collaborator Author

@ratmice ratmice Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to TryFrom<YaccKind> for Value in 553ce4d

Copy link
Copy Markdown
Collaborator Author

@ratmice ratmice Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curiously trying to change the other direction (TryFrom<Value> for YaccKind like my initial comment), runs into the thing that we try to avoid using the query API (We can't use the query API because we already have the value), so we must duplicate all the error handling for each variant of Value.

Edit: OTOH it does mean we can just get rid of the query API and users can call the following themselves...
Which is not nearly as bad as reimplementing it both in CTParserBuilder and in nimbleparse, since both places share the TryFrom impl.

let x = header.get("yacckind");
x.map(|_| header.mark_used("yacckind"));
YaccKind::try_from(x);

I think I'll try that since the query mechanism has been a point of confusion

Copy link
Copy Markdown
Collaborator Author

@ratmice ratmice Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched from the query mechanism to TryFrom<Value> for YaccKind in 57285ca

Seems like an improvement to me even if each TryFrom implementation is a bit longer than with query,
I think the simplicty of getting rid of the Header wrapper and making it an alias alone is worth it.

So I believe both of our comments regarding this should be fixed.

match self {
Self::Used => 1 << 0,
Self::Required => 1 << 1,
Self::MergeBehavior(mb) => (1 << 2) | ((mb as u16) << 8),
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably worth noting that MergeBehavior(_) and it's inner element are both using more bits than necessary, I felt it kept the logic uniform to just give each behavior a 1 bit, and since these are opaque, and not pressed for space, it seemed like we could change it at any time...

I don't know if it is worth a comment, if we were pressed for space we could:

  • Drop the 1 << 2 entirely for MergeBehavior(_).
  • Just make MutuallyExclusive always a zero bit.

Comment thread cfgrammar/src/lib/header.rs Outdated
}
}

pub fn merge_from(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to document these functions since they're part of the public interface.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I partially did this in c9756c8

Then a did a pass over f671bd4 since the first commit references MarkMap, but merge_from was already documented there.

Comment thread cfgrammar/src/lib/header.rs Outdated
query_mask: u16,
) -> Option<Result<(&Location, &Value), HeaderError>> {
self.contents_mut().mark_used(&key.to_owned());
if let Some((key_loc, value)) = self.contents.get(key) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if clippy will complain about this not being self.contents.get(key).map(...)?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't for me, I don't know why it wouldn't though like maybe option is a special case?

I do kind of dislike Option::map and Result::map, because I can't tell you how many times i've gotten mixed up by it when I really intended to map over Some(iterator).

Comment thread cfgrammar/src/lib/header.rs Outdated
/// Repr vaues bit representation should not overlap `SettingQuery`,
/// The first 8 bits is reserved for `ValueQuery`.
#[repr(u16)]
pub enum ValueQuery {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need to expose the maskiness of this to users? The enum seems fine (users never need to know what the integer discriminant values are) but when we get matches_query_mask(u16) in the public API I do worry a bit about the cognitive load.

A dumb question is: do users ever need to query more than one thing at a time?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More than one thing like more than one key, or more than one mask?

It isn't set up to query more than one key at a time (I never needed it, I don't think we ever do?).
It is used to query more than one value mask value for yacckind which can be a unitary YaccKind::Grmtools or a constructor YaccKind::Original(_), this is used in the last patch here if that link works.

        if let Some(result) = self.header.query(
            "yacckind",
            SettingQuery::Unitary as u16 | SettingQuery::Constructor as u16,
        )

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, then I really have misunderstood this!

If we do go down this route (and I can be convinced -- I just haven't understood it yet!) I think we would want to find a means that avoids as. I have become allergic to as because it has surprising effects on some, but not other, types.

Copy link
Copy Markdown
Collaborator Author

@ratmice ratmice Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know of any way to avoid as (I will look, maybe i've missed a way being added, but last time I looked it was the only way to convert from an enum to it's #[repr(...)])

At the very least we can wrap it in a function like fn repr(self) -> u16 though.

Oh: I actually already added one query_bits(), so these as conversions can just be replaced by calls to that.
query_bits() isn't really a great name though

Copy link
Copy Markdown
Collaborator Author

@ratmice ratmice Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the simplest thing I could think of in af89bd9 and just get rid of the bit masks from the query mechanism entirely.

Edit: Guess I still need docs though. added in c9756c8

@ltratt
Copy link
Copy Markdown
Member

ltratt commented Apr 8, 2025

I will admit that I'm struggling to pick up the high-level gist of where this is going. I don't think I've fully understood ValueQuery, for example: perhaps you can give me a layman's explanation at a high level?

@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented Apr 8, 2025

There is kind of two purposes to the query function (I already mentioned one but I'll repeat it here).

  • In the case of YaccKind we want a value that is either a unitary like YaccKind::Grmtools or a constructor like YaccKind::Original, but we don't care about bool or num values like case_insensitive or size_limit.

The query mechanism thus checks if the value mask contains the query_bits() of the actual repr that was requested,
and if not it produces an error.

In that call to query for YaccKind, we only check the variants which were queried, and the rest are handled by this
_ => unreachable!("Handled all settings queried for"), fallthrough branch. So this whole query mechanism is there to allow this error conversion in just one place so the main code can stick to the happy path.

@ltratt
Copy link
Copy Markdown
Member

ltratt commented Apr 8, 2025

Dumb question: will users tend to think of these things as query bits? Is there a higher level query API that might make more sense to them?

@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented Apr 8, 2025

Maybe I can just take the u16 usage out of the public API, make a way to pass in a Iterator<Item=ValueQuery>, that then internally does all the bit masking and abstracts it away.

Edit: This may require some superficial changes to the types, so that ValueQuery::Setting contains a SettingQuery but that is no problem, Spoke too soon, would require unsafe pointer casting... anyhow it's neither here nor there.
Somehow we should be able to hide the bit manipulation like in MarkMap where it is there, but non-public.

Edit: I tried to simplify this in af89bd9

@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented Apr 8, 2025

I will admit that I'm struggling to pick up the high-level gist of where this is going. I don't think I've fully understood ValueQuery, for example: perhaps you can give me a layman's explanation at a high level?

I think there are 2 major things this is trying to do and both of those have a kind of common theme.

Problems:

  • We have values across both crates and within crates via visibility restrictions that are isolated from one another.
    Examples: (cfgrammar, 'YaccKind'), (lrpar, 'RecoveryKind'), and in lrlex there is LexFlags from LRNonStreamingLexer and LexKind from CTLexeBuilder.
  • We have multiple forcing/default behaviors across these, with YaccResolver having it's own for YaccKind and LexFlags having it's own. But none for RecoveryKind, or LexerKind.

The experiment herein kind of takes the approach of using Header which is limited to core types, to pass these values around in a data type which contains only core:: types, to their respective crates where they can be converted into their respective concrete types like RecoveryKind, YaccKind, etc.

It does this by passing the core:: type restricted Header structure to places like YaccParser where it can then produce the YaccKind. Instead of resolving these beforehand in CTParserBuilder and nimbleparse separately then passing YaccKind in after the fact. So that command line options etc work as though they were built from a synthesized %grmtools section.

The forcing/defaulting mechanism then gets built by a notion of merge on this synthesized header and the one that was actually parsed from the %grmtools section. So now that allows us to just automatically get forcing/defaulting for all the all the values that can be put in a header.

That in itself introduces some problems, primarily dealing with non-parsed spanless locations, so part of this is solving problems introduced by the mechanisms herein.

Edit:
ValueQuery is really just a small convenience we could consider it a version of T::try_from(self.contents.get(key))
except we don't know the value of T, just it's core:: type boolean, numeric, looks like a Foo::Unitary or looks like a
Foo::Contstructor(bar). not for a specific Foo or anything, but
Most values just want one like bool, even recoverykind both recoverykind values are Unitary in RecoveryKind::None and RecoveryKind::CPCTPlus, YacKind::Original(_) and YaccKind::Grmtools are the ones where a flag can either be unitary or constructor.

So ValueQuery is really just a way to say, get only bools for "case_insensitive", or only unitary or constructor for YaccKind, or only unitary for RecoveryKind`, and reduce the number of cases that each of these type conversions need to handle.

@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented Apr 8, 2025

Dumb question: will users tend to think of these things as query bits? Is there a higher level query API that might make more sense to them?

Hopefully I've covered the query bits part already by getting rid of them, but there is still the nagging question of "Will users tend to think of these things?"

So far, I've primarily been running under the assumption that users rarely, if ever need to see this part of the API,
it is mostly pub due to the cross-crate nature of the data involved, users probably rarely use it because they mostly put stuff into the header, but rarely take anything out of the header.

I guess I can think of use cases similar to metadata that people might want to put stuff in the header to consume.
There currently isn't a way to leverage this from ctbuilder, which actively complain if a key goes unused.
At some point if we want to allow this, we could add function similar to lrpar_config that takes a callback/closure,

.header_config(|header| {
    // do something with the header.
})

This could be useful, but likely needs some extensions, like a Value::String variant,

%grmtools {
   valid_test_inputs: "../tests/*.input",
   invalid_test_inputs: "../tests/invalid/*.input",
}

then people could write something to the effect of

cpbuilder.header_config(|&mut header| {
     for input in glob(header.query("valid_test_inputs", [ValueQuery::String])) {
         ...
         header.contents_mut().mark_used("valid_test_inputs");
     }
})

But currently lacking that kind of an API the header value is mostly constructed inside the build() method,
And even if the user calls GrmtoolsSectionParser on the source file directly, the builder will complain unless
the field is one which the builder specifically knows about and uses like yacckind.

So, there is potential we could expose it more to users, but it really isn't going to be of much use outside of programs like nimbleparse, nimbleparse_lsp, and the CT*Builder

Edit: it isn't clear what timing to associate with header_config, before parse so Header can setup merge behaviors to perform forcing, or after so values can be read from the grammar, it's likely both would be needed. Perhaps it would be best to consider a pair of functions header_setup(|&mut header| { ... }) and process_header(|&mut header| {...})

But it is only after we decide we want these kinds of extensions, that the query mechanism becomes something users will have much reason to invoke.

Entry::Occupied(mut o) => {
o.mark_used();
let (_, val) = o.get();
self.yacc_kind = Some(YaccKind::try_from(val)?);
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two things to say about this:

  1. this uses use crate::markmap::Entry here, because this source is already using hash_map::Entry.
  2. This took some finagling to get the borrow checker to work, because mark_used requires a mutable borrow.
    and we want to run it after get which requires a immutable borrow. Thus we use the Entry API which allows
    us to delay the immutable borrow until we actually need the value, while knowing it exists.

@ltratt
Copy link
Copy Markdown
Member

ltratt commented Apr 10, 2025

Right, I get the high-level motivation now: sorry I was a bit dense! So to some extent we can mark this as an unstable, internal API that just so happen to be pub. I can live with that: we can e.g. mark all the relevant parts as #[doc(hidden)] to make this clear to anyone perusing the code.

I guess I can think of use cases similar to metadata that people might want to put stuff in the header to consume.
There currently isn't a way to leverage this from ctbuilder, which actively complain if a key goes unused.
At some point if we want to allow this, we could add function similar to lrpar_config that takes a callback/closure,

This does sound intriguing, in the sense that it would reduce the quantity of "internal but pub" stuff I think?

@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented Apr 10, 2025

Right, I get the high-level motivation now: sorry I was a bit dense! So to some extent we can mark this as an unstable, internal API that just so happen to be pub. I can live with that: we can e.g. mark all the relevant parts as #[doc(hidden)] to make this clear to anyone perusing the code.

I really should have spent time making a table of contents to give the overall big picture too, sorry about that.
I think the only difficulty to making this #[doc(hidden)] is that Header is going to show up as a parameter to documented functions.

I wonder if we can #[doc(hidden)] every function but fn new()?
I'll give that a try in the morning.

I guess I can think of use cases similar to metadata that people might want to put stuff in the header to consume.
There currently isn't a way to leverage this from ctbuilder, which actively complain if a key goes unused.
At some point if we want to allow this, we could add function similar to lrpar_config that takes a callback/closure,

This does sound intriguing, in the sense that it would reduce the quantity of "internal but pub" stuff I think?

I suppose it would, in the sense of perhaps giving people a reason to use it, and building library integrations for things like testing?

I personally am intrigued by it for nimbleparse_lsp (which I have been neglecting 😢) for the purposes of getting rid of the .toml file, it only contains a few necessary pieces of info (optional location of tests as I mentioned above) and the file extension to associate with the parser.

In that sense it is intriguing to me because it could allow me to get rid of the entire nimbleparse.toml file.

@ltratt
Copy link
Copy Markdown
Member

ltratt commented Apr 10, 2025

I wonder if we can #[doc(hidden)] every function but fn new()?
I'll give that a try in the morning.

👍

I personally am intrigued by it for nimbleparse_lsp (which I have been neglecting 😢) for the purposes of getting rid of the .toml file, it only contains a few necessary pieces of info (optional location of tests as I mentioned above) and the file extension to associate with the parser.

In that sense it is intriguing to me because it could allow me to get rid of the entire nimbleparse.toml file.

If it's the more general and more-difficult-to-get-wrong API, it does sound like it might be worth giving it a shot!

@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented Apr 10, 2025

There wasn't a super simple way to document just new, since there is no attribute to negate a hidden structure,
thus I had to remove the top-level docs on struct Header for now, and scatter #[doc(hidden)] about.
It would be nice if we could do something like the following, but afaict there isn't any way currently.

#[doc(hidden)] 
struct Header {...}

impl Header {
  /// ...
  #[doc(hidden = false)]
  #[doc(visible)]
  pub fn new()

@ltratt
Copy link
Copy Markdown
Member

ltratt commented Apr 10, 2025

This is definitely better than nothing! Is the plan to look at the "callback" API in this or another PR?

@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented Apr 10, 2025

I think in another PR, I'd probably want to try and do it along side an example that uses it,
and this PR seems plenty large enough already. I could develop it on-top of this PR but in
another branch though, in case we stumble upon issues on the way?

There is also migration of LexFlags, LexerKind, and RecoveryKind to this API to do either on top of this PR or as separate ones, I had been planning on doing those as separate PRs too but maybe it makes sense to do that work on top of this one, both to make certain we're happy with it, and that it comes out of those without any major changes?

@ltratt
Copy link
Copy Markdown
Member

ltratt commented Apr 10, 2025

I think we can get this one in and iterate further from there. I think this is then ready to squash?

@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented Apr 10, 2025

I think so too, will work on squashing.

ratmice added 4 commits April 10, 2025 12:50
MarkMap is a key, value map with an API similar to the typical
`std::collections` map API. The typical map API is extended in
two ways, a mechanism for marking keys with conditions. Such as
whether they a key is used and required. Additionally it has an
API for merging mark maps, and controlling merge behavior on a
per key basis.
@ratmice ratmice force-pushed the lrpar_section_parser_ctbuilder_take2 branch from 27d79f9 to c5d7b00 Compare April 10, 2025 20:37
@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented Apr 10, 2025

I had a bit of trouble trying to get a nice commit history,
managed to pull out a few nicely isolated commits,
but the last was still way larger than I'd like.

@ratmice ratmice marked this pull request as ready for review April 10, 2025 20:42
@ratmice ratmice mentioned this pull request Apr 11, 2025
@ltratt
Copy link
Copy Markdown
Member

ltratt commented Apr 11, 2025

Thanks!

@ltratt ltratt added this pull request to the merge queue Apr 11, 2025
Merged via the queue into softdevteam:master with commit 28f59b5 Apr 11, 2025
1 check passed
@ratmice ratmice deleted the lrpar_section_parser_ctbuilder_take2 branch April 12, 2025 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants