Tokening a string into a list of nested arrays with Python

Following this document I am writing an interpreter for Brainfuck which in my implementation includes line rotation such as:

',>,<[>[->+>+<<]>>[-<<+>>]<<<-]>>.'

      

to the list of such instructions:

[',', '>', ',', '<', [ '>', [ '-', '>', '+', '>', '+', '<', '<', ], '>', '>', [ '-', '<', '<', '+', '>', '>', ] '<', '<', '<', '-' ], '>', '>', '.']

      

or, minus symbols:

[ ... [...] ... [...] ... ]

      

At the moment I'm solving recursively using deque and popleft () to iterate over the string one character at a time, but it seems to me that I should immediately split it into subarrays.

How would you solve this problem on the pythonic path?

(Fix from regex for speed reasons)

+3


source to share


3 answers


this is not really a "pythonic way", but ... I find a solution to the problem using recursion and generators

s = ',>,<[>[->+>+<<]>>[-<<+>>]<<<-]>>.'

def brainfuck2list(brainfuck):
  while brainfuck:               #if list is empty then finish
    e = brainfuck.pop(0)
    if e not in ("[","]"):
      yield e
    elif e == "[":
      yield list(brainfuck2list(brainfuck))
    else:
      break

[_ for _ in brainfuck2list(list(s))]

      



you get the following output

[
  ',', '>', ',', '<', 
  [
    '>', 
    [
      '-', '>', '+', '>', '+', '<', '<'
    ]
    , '>', '>', 
    [
      '-','<', '<', '+', '>', '>'
    ], 
    '<', '<', '<', '-'
  ]
  , '>', '>', '.'
]

      

+1


source


For the curious, here is my working solution using recursion:



def tokenize(code):
  instructions = deque()

  if len(code) > 0:
    while len(code) > 0:
      if code[0] is "[":
        code.popleft()

        group = deque()
        r = 0

        while r > -1 and len(code) > 0:
          if code[0] is '[':
            group.append(code.popleft())
            r += 1

          elif code[0] is ']':
            if r is 0: 
              code.popleft()

            else:
              group.append(code.popleft())

            r -= 1

          else:
            group.append(code.popleft())

        instructions.append(tokenize(group))

      else:
        instructions.append(code.popleft())

    return instructions

else:
  return instructions

      

+1


source


I'm almost there:

s=',>,<[>[->+>+<<]>>[-<<+>>]<<<-]>>.'

x = eval('["' + s.replace('[','",["').replace(']','"],"') + '"]')

      

This gives:

[',>,<', ['>', ['->+>+<<'], '>>', ['-<<+>>'], '<<<-'], '>>.']

      

This is not what you wanted, but you can iterate over the lines as well.

Use ast.literal_eval

if you are concerned about the safety of eval.

Update: With regex, I did this completely:

import re
s=',>,<[>[->+>+<<]>>[-<<+>>]<<<-]>>.'
everything = re.compile("([^\]\[])")
y = eval('[' + everything.sub(r'"\1",',s).replace(",]","]").replace(']"','],"') + ']')

      

It will be:

[',', '>', ',', '<', ['>', ['-', '>', '+', '>', '+', '<', '<'], '>', '>', ['-', '<', '<', '+', '>', '>'], '<', '<', '<', '-'], '>', '>', '.']

      

0


source







All Articles