What pattern should I use to separate between characters?
Consider the line s
:
s = ';hello@;earth@;hello@;mars@'
I want a pattern pat
to get
re.split(pat, s)
[';hello@', ';earth@', ';hello@', ';mars@']
I want them to ;
and @
remain in the result string, but I know that I want to split it between them.
I thought I could use lookahead and lookbehind:
re.split('(?<=@)(?=;)', s)
However, this resulted in an error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-392-27c8b02c2477> in <module>()
----> 1 re.split('(?<=@)(?=;)', s)
//anaconda/envs/3.6/lib/python3.6/re.py in split(pattern, string, maxsplit, flags)
210 and the remainder of the string is returned as the final element
211 of the list."""
--> 212 return _compile(pattern, flags).split(string, maxsplit)
213
214 def findall(pattern, string, flags=0):
ValueError: split() requires a non-empty pattern match.
source to share
The error message is indeed quite eloquent: re.split()
requires a non-empty pattern matching.
Note that it
split
will never split a string into an empty pattern.
You can match them:
re.findall(r';\w+@', s)
or
re.findall(r';[^@]+@', s)
See regex demo
re.findall
will find all non-overlapping occurrences of a matching pattern.
Sample ;[^@]+@
finds ;
followed + 1 symbols other than @
, and will then conform @
, so that the two ;
and @
are returned within the elements.
source to share
The re module does not allow a split by an empty match. You can use the regex module with this pattern to do this:
regex.split(r'(?V1)(?<=@)(?=;)', s)
The modifier (?V1)
switches to the new behavior.
To have the same result with re, you can use re.findall
with this pattern:
re.findall(r'(?:;|^)[^@]*@*', s)
source to share