Python: checking string match in wikipedia link format

If I have a string named link, how can I check if it follows the same format as the wikipedia url? To clarify, wikipedia URLs (in this case) always start with en.wikipedia.org/wiki/ They can be any character (including # signs and apostrophes after / wiki / and spaces are denoted by an underscore. They are also could have a word in parentheses like: en.wikipedia.org/wiki/Sesame_Street(Elmo's_World) For example, if the string link looked like "en.wikipedia.org/wiki/Sesame_Street(Elmo's_World" it would not match because the closing parenthesis is closed. Thanks!

-1


source to share


1 answer


I think that something like this might do what you want:

import re
link = 'en.wikipedia.org/wiki/Sesame_street(Elmo\'s_world)'
sub = re.sub(r'^.{2}\.wikipedia\.org/wiki/(.*)', r'\1', link)
if sub != link:
    if '(' in sub:
        if ')' in sub:
            print 'ok'
        else:
            print 'not ok'
    else:
        print 'ok'
else:
    print 'not ok'

      



But it just checks to see if there is a ") sign if paranesia is open, if it opened twice and closed as soon as it matches, but maybe this will help you do something. (By the way, it will match other languages ​​too. unless "en" is to be matched with changing {2} to "en").

0


source







All Articles