Python - re.split: extra blank lines that start and end

I am trying to take a string of ints and / or floats and create a list of floats. The string will have these parentheses in them, which should be ignored. I am using re.split, but if my line starts and ends with a parenthesis, I get additional blank lines. Why is this?

code:

import re
x = "[1 2 3 4][2 3 4 5]"
y =  "1 2 3 4][2 3 4 5"
p = re.compile(r'[^\d\.]+')
print p.split(x)
print p.split(y)

      

Output:

['', '1', '2', '3', '4', '2', '3', '4', '5', '']
['1', '2', '3', '4', '2', '3', '4', '5']

      

+3


source to share


5 answers


If you use re.split

, then a delimiter at the beginning or end of a line causes an empty string at the beginning or end of the array in the result.

If you don't want this, use re.findall

with a regex that matches every NOT sequence containing delimiters.

Example:

import re

a = '[1 2 3 4]'
print(re.split(r'[^\d]+', a))
print(re.findall(r'[\d]+', a))

      



Output:

['', '1', '2', '3', '4', '']
['1', '2', '3', '4']

      

As others have pointed out in the answers, this may not be a perfect solution for this problem, but it is a general answer to the problem described in the title of the question, which I also had to solve when I found this question using Google.

+2


source


For a more pythonic way, you can simply use a list comprehension and str.isdigit()

check for your character: digit:

>>> [i for i in y if i.isdigit()]
['1', '2', '3', '4', '2', '3', '4', '5']

      

And about your code first of all you need to split based on space or brackets that can be done with [\[\] ]

, and in order to get rid of empty lines that refer to leading and trailing brackets, you can first strip

specify your line:



>>> y =  "1 2 3 4][2 3 4 5"
>>> re.split(r'[\[\] ]+',y)
['1', '2', '3', '4', '2', '3', '4', '5']
>>> y =  "[1 2 3 4][2 3 4 5]"
>>> re.split(r'[\[\] ]+',y)
['', '1', '2', '3', '4', '2', '3', '4', '5', '']
>>> re.split(r'[\[\] ]+',y.strip('[]'))
['1', '2', '3', '4', '2', '3', '4', '5']

      

You can also wrap your result with a function filter

and with a function bool

.

>>> filter(bool,re.split(r'[\[\] ]+',y))
['1', '2', '3', '4', '2', '3', '4', '5']

      

+1


source


import re
str= "[1 2 3 4][2 3 4 5]"
print re.findall(r'\d+', str)
str= "1 2 3 4][2 3 4 5"
print re.findall(r'\d+', str)

      

0


source


You can use regex to grab the content you want instead of splitting the string. You can use this regex:

(\d+)

      

Working demo

enter image description here

Python code:

import re
p = re.compile(ur'(\d+)')
test_str = u"[1 2 3 4][2 3 4 5]"

re.findall(p, test_str)

      

0


source


You can simply use filter

to avoid empty results:

x = "[1 2 3 4][2 3 4 5]"

print filter(None, re.split(r'[^\d.]+', x))
//=> ['1', '2', '3', '4', '2', '3', '4', '5']

      

0


source







All Articles