Only matching url starting with 'www' or 'http (s): //' and nothing else

I am using the Regular Expression pattern for my blog site so that the urls are link links, which works great. The template has the following format:

/(href=")?([-a-zA-Z0-9@:%_\+.~#?&\/\/=]{2,256}\.[a-z]{2,4}\b(\/?[-a-zA-Z0-9@:%_\+.~#?&\/\/=]+)?)/

      

So what's the problem?

But in the near past I found that this pattern also matches filenames, so when a user submits some filename in a comment, the system will make it a link. You can see this effect here:

enter image description here

And what am I trying to achieve?

What I am trying to achieve matches each of these URL formats except for the last example (see image below), so mysite.com

either filename.php

will not be highlighted.

enter image description here


Inputs to be mapped:

+--------------------------+------------------------------------------------------+
|         Example          |                     Explanation                      |
+--------------------------+------------------------------------------------------+
| http(s)://www.mysite.com | because it starts with http(s):// and has URL format |
| www.mysite.com           | because it starts with www. and has URL format       |
+--------------------------+------------------------------------------------------+

      

Inputs that must not match:

+-------------------+--------------------------------------------------+
|      Example      |                    Explanation                   |
+-------------------+--------------------------------------------------+
| mysite.com        | because it doesn't start with http(s):// or www. |
|                   | even it has URL format                           |
| http(s)://mytext  | because it doesn't have URL format               |
| http://localhost/ | because it doesn't have URL format               |
+-------------------+--------------------------------------------------+

      

What does the URL format look like?

In this case, we can specify the URL format using this pattern:

([-a-zA-Z0-9_.]{2,256}\.[a-z]{2,4}\b(\/?[-a-zA-Z0-9:%_\+.~#?&\/=]+)?))

      

Examples:

google.com, google.co.uk, accounts.google.com, google.com/somepath/ ...

      

Conclusion

Tried adding a string www\.

to this pattern but no matches were found. So how can I change this regex to match URLs that start with "www" or "http (s): //" and nothing else?

Thanks in advance.

+3


source to share


1 answer


This regex is definitely not perfect, but it will do what you want :

(http[s]?:\/\/|www.|ftp:\/\/){1,2}([-a-zA-Z0-9_]{2,256}\.[a-z]{2,4}\b(\/?[-a-zA-Z0-9@:%_\+.~#?&\/=]+)?)

      



It can be tricked into matching non-urls, but it shouldn't be abused. Increasing the skill significantly increases the difficulty.

+1


source







All Articles