How can I split words in Python while considering hyphenated words?

In Python, it re.split("\W+", "fat-free milk")

outputs ['fat', 'free', 'milk']

.

How do I create ['fat-free', 'milk']

from re.split()

?

I understand that hyphens are not alphanumeric characters, but I'm not sure how to incorporate this fact into the regex. I have tried to re.split("[(^\-)\W]+", "fat-free milk")

no avail.

+3


source to share


4 answers


re.split("[^-\w]+", "fat-free milk")

      



+8


source


No regex needed:

>>> "fat-free milk".split()
['fat-free', 'milk']

      



If you want to split on any non-word character that is not a hyphen, you can use a negative character group (like John's), or a negative result, which can be a little more flexible:

>>> re.split(r'(?:(?!-)\W)+', "fat-free milk. with cream")
['fat-free', 'milk', 'with', 'cream']

      

+7


source


>>>a="fat-free milk fat-full cream"
>>>b=a.split(' ')
>>>print(b)
['fat-free', 'milk', 'fat-full', 'cream']

      

+2


source


We can use this

re.split(" ", "fat-free milk")

0


source







All Articles