How can I split words in Python while considering hyphenated words?
In Python, it re.split("\W+", "fat-free milk")
outputs ['fat', 'free', 'milk']
.
How do I create ['fat-free', 'milk']
from re.split()
?
I understand that hyphens are not alphanumeric characters, but I'm not sure how to incorporate this fact into the regex. I have tried to re.split("[(^\-)\W]+", "fat-free milk")
no avail.
+3
source to share
4 answers
No regex needed:
>>> "fat-free milk".split()
['fat-free', 'milk']
If you want to split on any non-word character that is not a hyphen, you can use a negative character group (like John's), or a negative result, which can be a little more flexible:
>>> re.split(r'(?:(?!-)\W)+', "fat-free milk. with cream")
['fat-free', 'milk', 'with', 'cream']
+7
source to share