ArnoldC lexer in Ruby

I am trying to write a simple lexer for ArnoldC ( https://github.com/lhartikk/ArnoldC ) in Ruby.

I want the method to be defined as:

LISTEN TO ME VERY CAREFULLY myMethod TALK TO THE HAND "Hello World!" HASTA LA VISTA, BABY

I have this code:

class ArnoldLexer

KEYWORDS = ["LISTEN TO ME VERY CAREFULLY", "TALK TO THE HAND", "HASTA LA VISTA, BABY"]

def tokenize(code)
    # Cleanup code by removing extra line breaks
    code.chomp!

    # Current character postion
    i = 0

    # Collection of all parsed tokens in the form [:TOKEN_TYPE, value]
    tokens = []

    # Implement a very simple scanner.
    # Scan one character at a time until there is something to parse.
    while i < code.size
        chunk = code[i..-1]

        # Matching standard tokens.
        if identifier = chunk[/\A([A-Z\s\,]*)/, 1]

            # Keywords are special identifiers tagged with their own name, 
            # 'if' will result in an [:IF, "if"] token.
            if KEYWORDS.include?(identifier)
                tokens << [identifier.upcase.to_sym, identifier]

            # Skip what was just parsed.
            i += identifier.size
            end

        elsif identifier = chunk[/\A([a-z]*)/, 1]
            tokens << [:IDENTIFIER, identifier]
            i += identifier.size

            # Matching class names and constants starting with a capital letter.
        elsif constant = chunk[/\A([A-Z]\w*)/, 1]
            tokens << [:CONSTANT, constant]
            i += constant.size

        elsif newline = chunk[/\A\n/, 1]
            tokens << [:NEWLINE, "\n"]
                    elsif number = chunk[/\A([0-9]+)/, 1]
            tokens << [:NUMBER, number.to_i]
            i += number.size

        elsif string = chunk[/\A"(.*?)"/, 1]
            tokens << [:STRING, string]
            i += string.size + 2
        end
    end
    tokens
end

      

end

When the above short program is passed to ArnoldLexer.new.tokenize(program)

, it should output the following tokens:

[[:"LISTEN TO ME VERY CAREFULLY", "LISTEN TO ME VERY CAREFULLY"][:IDENTIFIER, "myMethod"][:"TALK TO THE HAND", "TALK TO THE HAND"][:STRING. "Hello World!"][:"HASTA LA VISTA, BABY", "HASTA LA VISTA, BABY"]]

However, it can only display tokens when I have one uppercase operator without a parameter. What do I need to change to make it work? Should I use a newline hash like "LISTEN TO ME CAREFULLY" in "def" to generate tokens?

Thank.

+3


source to share





All Articles