Skip to content

WIP: Lax parser#8

Open
kasbah wants to merge 2 commits into
antlrfrom
lax-parser
Open

WIP: Lax parser#8
kasbah wants to merge 2 commits into
antlrfrom
lax-parser

Conversation

@kasbah
Copy link
Copy Markdown
Member

@kasbah kasbah commented Jan 23, 2018

No description provided.

@kasbah kasbah changed the title Lax parser WIP: Lax parser Jan 23, 2018
@kasbah kasbah force-pushed the lax-parser branch 2 times, most recently from fb67c09 to edfa743 Compare January 23, 2018 20:00
@kasbah
Copy link
Copy Markdown
Member Author

kasbah commented Jan 23, 2018

So I found out I can create and UNKNOWN lexer rule that should match everything that hasn't been previously defined.

ignored: UNKNOWN* EOF;
 
UNKNOWN: .+?;

And that kind of works:

$ node bin/electro-grammar.js "akjdkadj 10 asdjkdj  ohm xaksjdkjd"
line 1:0 mismatched input 'a' expecting NUMBER
line 1:9 extraneous input '10' expecting {<EOF>, UNKNOWN}
line 1:21 extraneous input 'ohm' expecting {<EOF>, UNKNOWN}
{ component: { resistance: 10, type: 'resistor' },
  ignored: 'akjdkadj  asdjkdj   xaksjdkjd' }

I simplified the grammar to just work on resistors for the time being, trying to figure out why it ignores the k in 10k.

$ node bin/electro-grammar.js "10k xaksjdkjd"
line 1:2 mismatched input 'k' expecting {OHM, RPREFIX}
line 1:0 extraneous input '10' expecting {<EOF>, UNKNOWN}
{ component: { resistance: 10, type: 'resistor' },
  ignored: 'k xaksjdkjd' }

@kasbah
Copy link
Copy Markdown
Member Author

kasbah commented Jan 23, 2018

We probably should take advantage of mode(M):

mode (M)
After matching this token, switch the lexer to mode M . The next
time the lexer tries to match a token, it will look only at rules in mode M .
M can be a mode name from the same grammar or an integer literal. See
grammar Strings earlier.

@dvc94ch
Copy link
Copy Markdown
Collaborator

dvc94ch commented Jan 23, 2018

That looks like a lexer ambiguity. Try this on the java backend: echo "1k abc" | grun ElectroGrammar resistor -diagnostics -tree -tokens to find out what token k is (since it's not r-prefix).

@kasbah
Copy link
Copy Markdown
Member Author

kasbah commented Jan 23, 2018

Hmm, it seems to match everything to UNKNOWN so it looks like I was mistaken about how this would work. I wonder how come it kinda half works at all.

@kasbah
Copy link
Copy Markdown
Member Author

kasbah commented Jan 24, 2018

SO answers (1 and 2) seem to suggest that the last lexer rule will be the lowest priority. But that's not what I am seeing.

EDIT: Seems the imports re-order things or otherwise mess up the priority. :/ Tracking here: antlr/antlr4#2209

@dvc94ch dvc94ch closed this Jan 28, 2018
@kasbah kasbah reopened this Jan 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants