Problems selecting a string with re in python
I am doing an exercise in Python and I am stuck at this part where I have to define dates in a string using re.
My only problem is that when day is "1st", it outputs an empty string. What am I doing wrong?
import re
text = "article 1st May 1988; another article 2 June 1992, some new article 25 October 2001; "
result = re.findall(r'(\d*) ([A-Z]\w+) (\d+)',text)
print(result)
Output
[('', 'May', '1988'), ('2', 'June', '1992'), ('25', 'October', '2001')]
thanks for the help
source to share
You can force at least one number (s \d+
instead of just \d*
) and add a subset of possible strings for ordinals:
import re
text = "article 1st May 1988; another article 2 June 1992, some new article 25 October 2001; "
result = re.findall(r'(\d+(?:st|nd|rd|th)?) ([A-Z]\w+) (\d+)',text)
print(result)
# [('1st', 'May', '1988'), ('2', 'June', '1992'), ('25', 'October', '2001')]
source to share
\d*
matches zero or more occurrences of digits followed by a space. In '1', however the digit is followed by 's'.
Doubt arises whether the \d*
right thing is right. You probably want one or more numbers. Or better, even limit it to two digits (eg \d{1,2}
) followed by "st", "nd", "rd", or "th".
source to share