Empty space character after re.split

Question

Empty space character after re.split

Here's the line from the file .txt

I'm reading and assigning it x

:

x = "Wild_lions live mostly in "Africa""
result = re.split('[^a-zA-Z0-9]+', x)

I end up with:

['Wild', 'lions', 'live', 'mostly', 'in', 'Africa', ''] # (there an empty space character as the last element)

Why is there empty space at the end? I understand what I can just do result.remove(' ')

to get rid of the space, but for large files, I think that would be rather inefficient.

+3

python split regex expression

dppham1 Apr 17 17 at 8:21

source to share

3 answers

m0nhawk · Answer 1 · 2017-04-17T08:26:56+0000

You don't need to use this complex regex to split on it, the simpler:

result = re.split('\s+', x)
result
# ['Wild_lions', 'live', 'mostly', 'in', '"Africa"']

\s+

will match any number of spaces (tabs, spaces, translation strings, etc.).

If you only want an alphabetical match, it is better to use re.compile

with findall

.

myre = re.compile('[a-zA-Z]+')
myre.findall(x)
# ['Wild', 'lions', 'live', 'mostly', 'in', 'Africa']

Cony · Answer 2 · 2017-04-17T08:34:59+0000

try this:

x = "Wild_lions live mostly in 'Africa'"
result = re.split('[\s_]+', x)

You'll get:

['Wild', 'lions', 'live', 'mostly', 'in', "'Africa'"]

snakecharmerb · Answer 3 · 2017-04-17T08:59:12+0000

The pattern [^a-zA-Z0-9]+

splits the supplied string into any character or character sequences that are not ASCII numbers or letters.

The last character in the example line matches the split pattern. re.split

adds substrings before and after the match (until the next match or the end of the string) before its output. In this case, there will be an empty string after the substring, hence the reported output.

The other answers have provided workarounds to get the behavior you want, so I won't repeat them in this answer.

Empty space character after re.split

More articles: