How to identify links in text?
how can i identify links in the text, bearing in mind that they can be different:
hfajlhfjkdsflkdsja.onion
http://hfajlhfjkdsflkdsja.onion
http://www.hfajlhfjkdsflkdsja.onion
I am thinking of regex but (.*?.onion)
will return the whole paragraph where the url link is like
+3
user3191569
source
to share
3 answers
This will do it: (?:https?://)?(?:www)?(\S*?\.onion)\b
(Added non-capturing groups - credit: @ WiktorStribiżew)
Demo:
s = '''hfajlhfjkdsflkdsja.onion
https://hfajlhfjkdsflkdsja.onion
http://www.hfajlhfjkdsflkdsja.onion
https://www.google.com
https://stackoverflow.com'''
for m in re.finditer(r'(?:https?://)?(?:www)?(\S*?\.onion)\b', s, re.M | re.IGNORECASE):
print(m.group(0))
Output
hfajlhfjkdsflkdsja.onion
https://hfajlhfjkdsflkdsja.onion
http://www.hfajlhfjkdsflkdsja.onion
+3
coldspeed
source
to share
Non-regex approach:
url = 'http://hfajlhfjkdsflkdsja.onion'
split = url.split('.onion')
if len(split)==2 && len(split[1])==0:
%do something
0
SeaMonkey
source
to share
Fast and easy:
([^\s]+\.onion)
Matches all characters from the first space to ".onion".
0
Bernhard
source
to share