I need regEx to match common urls
3 answers
adding that RegEx as a wiki answer:
[\w+-]+://([a-zA-Z0-9]+\.)+[[a-zA-Z0-9]+](/[%\w]+)(\?[-A-Z0-9+&@#/%=~_|!:,.;]*)?
option 2 (Re CMS)
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
But this is weak for any common sense so stripped down as to make it more restrictive and distinguish other things.
proto :// name : pass @ server :port /path ? args
^([^:/?#]+)://(([^/?#@:]+(:[^/?#@:]+)?@)?[^/?#@:]+(:[0-9]+)?)(/[^?#]*)(\?([^#]*))?
+1
source to share
I came to this from a slightly different direction. I wanted to emulate gchats' ability to match something.co.uk
and tie it. So I went with a regex that searches .
without any next period or space on either side and then grabs everything around until it hits spaces. This matches the period at the end of the URI, but I'll take that later. As such, it might be an option if you prefer false positives over the absence of any potentials.
url_re = re.compile(r"""
[^\s] # not whitespace
[a-zA-Z0-9:/\-]+ # the protocol and domain name
\.(?!\.) # A literal '.' not followed by another
[\w\-\./\?=&%~#]+ # country and path components
[^\s] # not whitespace""", re.VERBOSE)
url_re.findall('http://thereisnothing.com/a/path adn some text www.google.com/?=query#%20 https://somewhere.com other-countries.co.nz. ellipsis... is also a great place to buy. But try text-hello.com ftp://something.com')
['http://thereisnothing.com/a/path',
'www.google.com/?=query#%20',
'https://somewhere.com',
'other-countries.co.nz.',
'text-hello.com',
'ftp://something.com']
0
source to share