Javascript Regex: Matching Text After Template

I have a form text where there are paragraphs of text with moved urls. I would like to parse a string generating html links from urls and using the following text as the text link text ie

possibly some text here http://www.somewebsite.com/some/path/somepage.html descriptive text which may or may not be present

      

in

<a href="http://www.somewebsite.com/some/path/somepage.html">descriptive text which may or may not be present</a>

      

This SO, JS article : Find URLs in Text, Make Links , is relevant to what I am trying to do, it just puts the URL in the text of the anchor element.

I am successfully matching url with

var urlRE= new RegExp("([a-zA-Z0-9]+://)?([a-zA-Z0-9_]+:[a-zA-Z0-9_]+@)?([a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(:[0-9]+)?([^ ])+");

      

but I'm not sure how to execute the match later.

I came across this post Regex - Matching Text AFTER certain characters , which seems to be applicable. I tried to wrap the RE in /(?<=my url pattern here).+/

but get an error stating that an invalid group exists and this results in an invalid RE.

This J-Law post mentions that

Allowed variable lengths arent allowed

Is this what I am trying to do?

Since I am already matching the url, I feel like I can easily do some substring math to get the results I want.

I just use this as an attempt to learn more about regex.

thank

+3


source to share


1 answer


Just add another capture group to capture all the stuff at the end and make your inner groups non-exciting. Something like:

    var urlRE= new RegExp("((?:[a-zA-Z0-9]+://)?(?:[a-zA-Z0-9_]+:[a-zA-Z0-9_]+@)?(?:[a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(?::[0-9]+)?(?:[^ ])+)(.*)$");

    var s = "possibly some text here http://www.somewebsite.com/some/path/somepage.html descriptive text which may or may not be present"
    
    var match = urlRE.exec(s);
    alert(match[0] + "\n\n" + match[1] + "\n\n" + match[2]);

    // Returns: 
    // ["http://www.somewebsite.com/some/path/somepage.html descriptive text which may or may not be present", 
    // "http://www.somewebsite.com/some/path/somepage.html", 
    // " descriptive text which may or may not be present"]
      

Run codeHide result




I wrapped all your regex in parentheses ()

to form the first capturing group and inside that I made all your existing groups non-capturing with ?:

, you don't have to do that (making them non-capturing), but it simplifies the output. Then I added another group (.*)

to capture everything else to the end of the line $

.

After .exec

that, if you have a match, your match will be in [0]

, part of the url will be in [1]

, and the rest of the text will be in [2]

. This is why we used non-capture groups because otherwise you have a bunch of other captures that might or might not be useful.

+4


source







All Articles