How to identify links in text?

Question

How to identify links in text?

how can i identify links in the text, bearing in mind that they can be different:

hfajlhfjkdsflkdsja.onion
http://hfajlhfjkdsflkdsja.onion
http://www.hfajlhfjkdsflkdsja.onion

I am thinking of regex but (.*?.onion)

will return the whole paragraph where the url link is like

+3

python url regex tor

user3191569 June 16 17 at 12:13

source to share

3 answers

Non-regex approach:

url = 'http://hfajlhfjkdsflkdsja.onion'
split = url.split('.onion')
if len(split)==2 && len(split[1])==0:
   %do something

0

SeaMonkey June 16 17 at 12:26

source to share

Fast and easy:

([^\s]+\.onion)

Matches all characters from the first space to ".onion".

0

Bernhard June 16 17 at 13:15

source to share

coldspeed · Accepted Answer · 2017-06-16T12:22:35+0000

This will do it: (?:https?://)?(?:www)?(\S*?\.onion)\b

(Added non-capturing groups - credit: @ WiktorStribiżew)

Demo:

s = '''hfajlhfjkdsflkdsja.onion
https://hfajlhfjkdsflkdsja.onion
http://www.hfajlhfjkdsflkdsja.onion
https://www.google.com
https://stackoverflow.com'''


for m in re.finditer(r'(?:https?://)?(?:www)?(\S*?\.onion)\b', s, re.M | re.IGNORECASE):
    print(m.group(0))

Output

hfajlhfjkdsflkdsja.onion
https://hfajlhfjkdsflkdsja.onion
http://www.hfajlhfjkdsflkdsja.onion

How to identify links in text?

More articles: