Python - re.split: extra blank lines that start and end
I am trying to take a string of ints and / or floats and create a list of floats. The string will have these parentheses in them, which should be ignored. I am using re.split, but if my line starts and ends with a parenthesis, I get additional blank lines. Why is this?
code:
import re
x = "[1 2 3 4][2 3 4 5]"
y = "1 2 3 4][2 3 4 5"
p = re.compile(r'[^\d\.]+')
print p.split(x)
print p.split(y)
Output:
['', '1', '2', '3', '4', '2', '3', '4', '5', '']
['1', '2', '3', '4', '2', '3', '4', '5']
source to share
If you use re.split
, then a delimiter at the beginning or end of a line causes an empty string at the beginning or end of the array in the result.
If you don't want this, use re.findall
with a regex that matches every NOT sequence containing delimiters.
Example:
import re
a = '[1 2 3 4]'
print(re.split(r'[^\d]+', a))
print(re.findall(r'[\d]+', a))
Output:
['', '1', '2', '3', '4', '']
['1', '2', '3', '4']
As others have pointed out in the answers, this may not be a perfect solution for this problem, but it is a general answer to the problem described in the title of the question, which I also had to solve when I found this question using Google.
source to share
For a more pythonic way, you can simply use a list comprehension and str.isdigit()
check for your character: digit:
>>> [i for i in y if i.isdigit()]
['1', '2', '3', '4', '2', '3', '4', '5']
And about your code first of all you need to split based on space or brackets that can be done with [\[\] ]
, and in order to get rid of empty lines that refer to leading and trailing brackets, you can first strip
specify your line:
>>> y = "1 2 3 4][2 3 4 5"
>>> re.split(r'[\[\] ]+',y)
['1', '2', '3', '4', '2', '3', '4', '5']
>>> y = "[1 2 3 4][2 3 4 5]"
>>> re.split(r'[\[\] ]+',y)
['', '1', '2', '3', '4', '2', '3', '4', '5', '']
>>> re.split(r'[\[\] ]+',y.strip('[]'))
['1', '2', '3', '4', '2', '3', '4', '5']
You can also wrap your result with a function filter
and with a function bool
.
>>> filter(bool,re.split(r'[\[\] ]+',y))
['1', '2', '3', '4', '2', '3', '4', '5']
source to share