Tokening a string into a list of nested arrays with Python
Following this document I am writing an interpreter for Brainfuck which in my implementation includes line rotation such as:
',>,<[>[->+>+<<]>>[-<<+>>]<<<-]>>.'
to the list of such instructions:
[',', '>', ',', '<', [ '>', [ '-', '>', '+', '>', '+', '<', '<', ], '>', '>', [ '-', '<', '<', '+', '>', '>', ] '<', '<', '<', '-' ], '>', '>', '.']
or, minus symbols:
[ ... [...] ... [...] ... ]
At the moment I'm solving recursively using deque and popleft () to iterate over the string one character at a time, but it seems to me that I should immediately split it into subarrays.
How would you solve this problem on the pythonic path?
(Fix from regex for speed reasons)
source to share
this is not really a "pythonic way", but ... I find a solution to the problem using recursion and generators
s = ',>,<[>[->+>+<<]>>[-<<+>>]<<<-]>>.'
def brainfuck2list(brainfuck):
while brainfuck: #if list is empty then finish
e = brainfuck.pop(0)
if e not in ("[","]"):
yield e
elif e == "[":
yield list(brainfuck2list(brainfuck))
else:
break
[_ for _ in brainfuck2list(list(s))]
you get the following output
[ ',', '>', ',', '<', [ '>', [ '-', '>', '+', '>', '+', '<', '<' ] , '>', '>', [ '-','<', '<', '+', '>', '>' ], '<', '<', '<', '-' ] , '>', '>', '.' ]
source to share
For the curious, here is my working solution using recursion:
def tokenize(code):
instructions = deque()
if len(code) > 0:
while len(code) > 0:
if code[0] is "[":
code.popleft()
group = deque()
r = 0
while r > -1 and len(code) > 0:
if code[0] is '[':
group.append(code.popleft())
r += 1
elif code[0] is ']':
if r is 0:
code.popleft()
else:
group.append(code.popleft())
r -= 1
else:
group.append(code.popleft())
instructions.append(tokenize(group))
else:
instructions.append(code.popleft())
return instructions
else:
return instructions
source to share
I'm almost there:
s=',>,<[>[->+>+<<]>>[-<<+>>]<<<-]>>.'
x = eval('["' + s.replace('[','",["').replace(']','"],"') + '"]')
This gives:
[',>,<', ['>', ['->+>+<<'], '>>', ['-<<+>>'], '<<<-'], '>>.']
This is not what you wanted, but you can iterate over the lines as well.
Use ast.literal_eval
if you are concerned about the safety of eval.
Update: With regex, I did this completely:
import re
s=',>,<[>[->+>+<<]>>[-<<+>>]<<<-]>>.'
everything = re.compile("([^\]\[])")
y = eval('[' + everything.sub(r'"\1",',s).replace(",]","]").replace(']"','],"') + ']')
It will be:
[',', '>', ',', '<', ['>', ['-', '>', '+', '>', '+', '<', '<'], '>', '>', ['-', '<', '<', '+', '>', '>'], '<', '<', '<', '-'], '>', '>', '.']
source to share