ArnoldC lexer in Ruby
I am trying to write a simple lexer for ArnoldC ( https://github.com/lhartikk/ArnoldC ) in Ruby.
I want the method to be defined as:
LISTEN TO ME VERY CAREFULLY myMethod
TALK TO THE HAND "Hello World!"
HASTA LA VISTA, BABY
I have this code:
class ArnoldLexer
KEYWORDS = ["LISTEN TO ME VERY CAREFULLY", "TALK TO THE HAND", "HASTA LA VISTA, BABY"]
def tokenize(code)
# Cleanup code by removing extra line breaks
code.chomp!
# Current character postion
i = 0
# Collection of all parsed tokens in the form [:TOKEN_TYPE, value]
tokens = []
# Implement a very simple scanner.
# Scan one character at a time until there is something to parse.
while i < code.size
chunk = code[i..-1]
# Matching standard tokens.
if identifier = chunk[/\A([A-Z\s\,]*)/, 1]
# Keywords are special identifiers tagged with their own name,
# 'if' will result in an [:IF, "if"] token.
if KEYWORDS.include?(identifier)
tokens << [identifier.upcase.to_sym, identifier]
# Skip what was just parsed.
i += identifier.size
end
elsif identifier = chunk[/\A([a-z]*)/, 1]
tokens << [:IDENTIFIER, identifier]
i += identifier.size
# Matching class names and constants starting with a capital letter.
elsif constant = chunk[/\A([A-Z]\w*)/, 1]
tokens << [:CONSTANT, constant]
i += constant.size
elsif newline = chunk[/\A\n/, 1]
tokens << [:NEWLINE, "\n"]
elsif number = chunk[/\A([0-9]+)/, 1]
tokens << [:NUMBER, number.to_i]
i += number.size
elsif string = chunk[/\A"(.*?)"/, 1]
tokens << [:STRING, string]
i += string.size + 2
end
end
tokens
end
end
When the above short program is passed to ArnoldLexer.new.tokenize(program)
, it should output the following tokens:
[[:"LISTEN TO ME VERY CAREFULLY", "LISTEN TO ME VERY CAREFULLY"][:IDENTIFIER, "myMethod"][:"TALK TO THE HAND", "TALK TO THE HAND"][:STRING. "Hello World!"][:"HASTA LA VISTA, BABY", "HASTA LA VISTA, BABY"]]
However, it can only display tokens when I have one uppercase operator without a parameter. What do I need to change to make it work? Should I use a newline hash like "LISTEN TO ME CAREFULLY" in "def" to generate tokens?
Thank.
source to share
No one has answered this question yet
Check out similar questions: