Is there an easy way that I can tokenize a string without a full blown lexer?
I'm looking for an implementation of the Shunting-yard Algorithm , but I need help figuring out what is the best way to split the string into my tokens.
If you've noticed, the first step in the algorithm is to "read the token". This is not a completely non-trivial thing. Tokens can be composed of numbers, operators and partners.
If you do something like:
(5 + 1)
A simple string.split () will give me an array of tokens {"(", "5", "+", "1", ")"}.
However, it gets more complicated if you have numbers with multiple digits, for example:
((2048 * 124) + 42)
Now naive string.split () won't do the trick. Multi-digit numbers are a problem.
I know I could write a lexer, but is there a way to do this without writing a full blown lexer?
I am implementing this in JavaScript and I would like to avoid going the lexical path if possible. I will use the "*", "+", "-" and "/" operators together with integers.
source to share
You can use global match as described at http://mikesamuel.blogspot.com/2009/05/efficient-parsing-in-javascript.html
Basically, you create one regex that describes the token
/[0-9]+|false|true|\(|\)/g
and put 'g' at the end to match globally and then you call its match method
var tokens = myRegex.match(inputString);
and return an array.
source to share