Python - re.split: extra blank lines that start and end

Question

Python - re.split: extra blank lines that start and end

I am trying to take a string of ints and / or floats and create a list of floats. The string will have these parentheses in them, which should be ignored. I am using re.split, but if my line starts and ends with a parenthesis, I get additional blank lines. Why is this?

code:

import re
x = "[1 2 3 4][2 3 4 5]"
y =  "1 2 3 4][2 3 4 5"
p = re.compile(r'[^\d\.]+')
print p.split(x)
print p.split(y)

Output:

['', '1', '2', '3', '4', '2', '3', '4', '5', '']
['1', '2', '3', '4', '2', '3', '4', '5']

+3

python regex

user4794127 June 18. 15 at 19:50

source to share

5 answers

Florian winter · Answer 1 · 2017-01-25T11:54:36+0000

If you use re.split

, then a delimiter at the beginning or end of a line causes an empty string at the beginning or end of the array in the result.

If you don't want this, use re.findall

with a regex that matches every NOT sequence containing delimiters.

Example:

import re

a = '[1 2 3 4]'
print(re.split(r'[^\d]+', a))
print(re.findall(r'[\d]+', a))

Output:

['', '1', '2', '3', '4', '']
['1', '2', '3', '4']

As others have pointed out in the answers, this may not be a perfect solution for this problem, but it is a general answer to the problem described in the title of the question, which I also had to solve when I found this question using Google.

Kasramvd · Answer 2 · 2015-06-18T19:58:06+0000

For a more pythonic way, you can simply use a list comprehension and str.isdigit()

check for your character: digit:

>>> [i for i in y if i.isdigit()]
['1', '2', '3', '4', '2', '3', '4', '5']

And about your code first of all you need to split based on space or brackets that can be done with [\[\] ]

, and in order to get rid of empty lines that refer to leading and trailing brackets, you can first strip

specify your line:

>>> y =  "1 2 3 4][2 3 4 5"
>>> re.split(r'[\[\] ]+',y)
['1', '2', '3', '4', '2', '3', '4', '5']
>>> y =  "[1 2 3 4][2 3 4 5]"
>>> re.split(r'[\[\] ]+',y)
['', '1', '2', '3', '4', '2', '3', '4', '5', '']
>>> re.split(r'[\[\] ]+',y.strip('[]'))
['1', '2', '3', '4', '2', '3', '4', '5']

You can also wrap your result with a function filter

and with a function bool

.

>>> filter(bool,re.split(r'[\[\] ]+',y))
['1', '2', '3', '4', '2', '3', '4', '5']

taesu · Answer 3 · 2015-06-18T19:59:00+0000

import re
str= "[1 2 3 4][2 3 4 5]"
print re.findall(r'\d+', str)
str= "1 2 3 4][2 3 4 5"
print re.findall(r'\d+', str)

Federico Piazza · Answer 4 · 2015-06-18T19:59:08+0000

You can use regex to grab the content you want instead of splitting the string. You can use this regex:

(\d+)

Working demo

enter image description here

Python code:

import re
p = re.compile(ur'(\d+)')
test_str = u"[1 2 3 4][2 3 4 5]"

re.findall(p, test_str)

anubhava · Answer 5 · 2015-06-18T20:02:28+0000

You can simply use filter

to avoid empty results:

x = "[1 2 3 4][2 3 4 5]"

print filter(None, re.split(r'[^\d.]+', x))
//=> ['1', '2', '3', '4', '2', '3', '4', '5']

Python - re.split: extra blank lines that start and end

More articles: