Problems selecting a string with re in python

I am doing an exercise in Python and I am stuck at this part where I have to define dates in a string using re.

My only problem is that when day is "1st", it outputs an empty string. What am I doing wrong?

import re
text = "article 1st May 1988; another article 2 June 1992, some new article 25 October 2001; "

result = re.findall(r'(\d*) ([A-Z]\w+) (\d+)',text)
print(result)

      

Output

[('', 'May', '1988'), ('2', 'June', '1992'), ('25', 'October', '2001')]

      

thanks for the help

+3


source to share


2 answers


You can force at least one number (s \d+

instead of just \d*

) and add a subset of possible strings for ordinals:



import re
text = "article 1st May 1988; another article 2 June 1992, some new article 25 October 2001; "

result = re.findall(r'(\d+(?:st|nd|rd|th)?) ([A-Z]\w+) (\d+)',text)
print(result)
# [('1st', 'May', '1988'), ('2', 'June', '1992'), ('25', 'October', '2001')]

      

+3


source


\d*

matches zero or more occurrences of digits followed by a space. In '1', however the digit is followed by 's'.



Doubt arises whether the \d*

right thing is right. You probably want one or more numbers. Or better, even limit it to two digits (eg \d{1,2}

) followed by "st", "nd", "rd", or "th".

0


source







All Articles