Skip to content
This repository was archived by the owner on Apr 2, 2026. It is now read-only.

enable using a tokenized input as a Stream input#513

Draft
cosmicexplorer wants to merge 3 commits into
zesterer:mainfrom
cosmicexplorer:enable-select-ref-tokenization
Draft

enable using a tokenized input as a Stream input#513
cosmicexplorer wants to merge 3 commits into
zesterer:mainfrom
cosmicexplorer:enable-select-ref-tokenization

Conversation

@cosmicexplorer
Copy link
Copy Markdown

Problem

I'm parsing a string into tokens with chumsky, and I would like to also use chumsky to parse those tokens into something else. While select! { ... } is intended to enable this, it assumes that the stream of tokens is produced externally to chumsky, as in the logos example:

chumsky/examples/logos.rs

Lines 130 to 145 in 56762fe

let token_iter = Token::lexer(SRC)
.spanned()
// Convert logos errors into tokens. We want parsing to be recoverable and not fail at the lexing stage, so
// we have a dedicated `Token::Error` variant that represents a token error that was previously encountered
.map(|(tok, span)| match tok {
// Turn the `Range<usize>` spans logos gives us into chumsky's `SimpleSpan` via `Into`, because it's easier
// to work with
Ok(tok) => (tok, span.into()),
Err(()) => (Token::Error, span.into()),
});
// Turn the token iterator into a stream that chumsky can use for things like backtracking
let token_stream = Stream::from_iter(token_iter)
// Tell chumsky to split the (Token, SimpleSpan) stream into its parts so that it can handle the spans for us
// This involves giving chumsky an 'end of input' span: we just use a zero-width span at the end of the string
.spanned((SRC.len()..SRC.len()).into());

Solution

  • Expose .parse_iter() outside of #[cfg(test)] and use it to construct a Stream instance.
  • Expose .stream(input) as a public method of IterParser to generate a stream of transformed input.

@zesterer
Copy link
Copy Markdown
Owner

One thing I worry about is that the API seems to imply that the parser gets turned into a Stream, when in reality it's used used to parse elements, collected into a vector, and then those elements are used as a Stream. #399 discusses the former use-case and what problems we've run up against when trying to do this.

Did you have an example of the sort of patterns that this enables?

@zesterer
Copy link
Copy Markdown
Owner

Edit: It seems I misread the implementation earlier, I see it is turning the parser directly into a stream. As mentioned, #399 discusses some of these issues. In particular, ParseIter currently just swallows parser errors, pretending they don't exist.

@cosmicexplorer
Copy link
Copy Markdown
Author

Ah, I see #399 (mentioned directly above parse_iter()) covers exactly this issue, not a different one. I'll see if I can page into that.

@cosmicexplorer cosmicexplorer marked this pull request as draft September 1, 2023 23:11
@cosmicexplorer
Copy link
Copy Markdown
Author

Converted this into a draft as this is really just the easier part of #399.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants