What pattern should I use to separate between characters?

Consider the line s

:

s = ';hello@;earth@;hello@;mars@'

      

I want a pattern pat

to get

re.split(pat, s)

[';hello@', ';earth@', ';hello@', ';mars@']

      

I want them to ;

and @

remain in the result string, but I know that I want to split it between them.

I thought I could use lookahead and lookbehind:

re.split('(?<=@)(?=;)', s)

      

However, this resulted in an error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-392-27c8b02c2477> in <module>()
----> 1 re.split('(?<=@)(?=;)', s)

//anaconda/envs/3.6/lib/python3.6/re.py in split(pattern, string, maxsplit, flags)
    210     and the remainder of the string is returned as the final element
    211     of the list."""
--> 212     return _compile(pattern, flags).split(string, maxsplit)
    213 
    214 def findall(pattern, string, flags=0):

ValueError: split() requires a non-empty pattern match.

      

+3


source to share


2 answers


The error message is indeed quite eloquent: re.split()

requires a non-empty pattern matching.

Note that it split

will never split a string into an empty pattern.

You can match them:

re.findall(r';\w+@', s)

      

or



re.findall(r';[^@]+@', s)

      

See regex demo

re.findall

will find all non-overlapping occurrences of a matching pattern.

Sample ;[^@]+@

finds ;

followed + 1 symbols other than @

, and will then conform @

, so that the two ;

and @

are returned within the elements.

+8


source


The re module does not allow a split by an empty match. You can use the regex module with this pattern to do this:

regex.split(r'(?V1)(?<=@)(?=;)', s)

      

The modifier (?V1)

switches to the new behavior.




To have the same result with re, you can use re.findall

with this pattern:

re.findall(r'(?:;|^)[^@]*@*', s)

      

+2


source







All Articles