Implement packrat parsing#75
Conversation
|
hey @poteat - thanks for putting this together so quickly! I'm on a little family retreat for a few days, so I'll take a proper look when I get back next week 👍 |
|
@francisrstokes I'd love to pick this back up at some point! |
|
@francisrstokes and @poteat , this pull has been pending for a while. Can this be merged with the base? |
|
My apologies for not getting back to this PR in a more timely manner (to put it lightly!). I would also love to see the performance benefits provided by this approach make their way into the repo. I do have concerns about the cache though - exactly what you have mentioned @poteat - that it essentially disallows concurrent usage of the library. To solve this, we need a mechanism where we can have multiple, contextual caches, or at a minimum, the ability to opt-in and out of caching. Instead of caching at construction time, we might want to have this process take place at |
|
I fear I do not understand the idea well enough to have an opinion on a user provided cache object. |
Packrat Parsing Implementation
As discussed in #74, this pull request implements packrat parsing using a internally specified set of memoization functions in
./src/cache.ts.1. Checklist
2. Discussion
Packrat parsing [1] is a technique for enabling linear-time backtracking in top-down parsers. A common issue with recursive descent parsing is exponential time blowup, whereby the time it takes to parse a string scales exponentially with respect to input length.
Our implementation here using a memoization wrapper around the incremental state update function for each parser. The full, global cache state is cleared on every run.
[1] Packrat Parsing: a Practical Linear-Time Algorithm with Backtracking (PDF). Bryan Ford, 2002, MIT
2.1. Race Condition Risk
If two instances of
runare running in parallel, this could lead to a poisoned cache, and incorrect parse results. Since the parser does not operate on the basis of Promises, it is unclear to me whether this is an important risk.2.2. Incremental Parsing
This implementation resets the global cache on every
runandforkcall. This is necessary to avoid errors in parsing - because the cache is parameterized only onstate, and nottarget, the cache can become poisoned unless it is refreshed on every individual run.On our fork, I think we will attempt to parameterize the state update cache with
target, so that the cache may persist across multiple runs, thereby implementing efficient incremental parsing. A configurable LRU cache will be utilized to prevent arbitrary memory usage.I have avoided implementing this more advanced form in this pull request, as it is a lot more experimental, opinionated, and risky. As yet, I am not convinced it's even possible to do easily - if
targetalways corresponds to do full input, then parameterizing the cache ontargetwon't implement incremental parsing.3. Suggested release notes
Versioning is left up to maintainer discretion, but for convenience here is my suggested changelog.
4. Questions to resolve:
cache.ts? I did not want to unnecessarily bloat the pull request, but if we want them I can write some.