Saturday, 1 June 2013

Is this a viable approach to resolving multiple matches in a lexer?

Is this a viable approach to resolving multiple matches in a lexer?

I'm writing a lexer in JavaScript. It's pretty typical - rules are specified with regular expressions and produce a token.
I am unsure of the best way to handle when multiple rules are matched. The existing lexers I've looked at handle this by picking the rule with the longest match.
It also seems like a viable strategy, however, to simply use the first rule that matches. This is my current strategy.
Here is an example:
Input:
:=
Rules:
: -> COLON
= -> EQUALS
:= -> ASSIGN
The longest-match rule would return ASSIGN, where the first-match rule would return COLON and then EQUALS.
Obviously this is not desirable, so under my implementation I would just reorder the rules to put the ASSIGN rule first as follows:
:= -> ASSIGN
: -> COLON
= -> EQUALS
Is this a viable approach to just use the first matching rule? What are the advantages and disadvantages of each approach?

No comments:

Post a Comment