Javascript Regex: Matching Text After Template
I have a form text where there are paragraphs of text with moved urls. I would like to parse a string generating html links from urls and using the following text as the text link text ie
possibly some text here http://www.somewebsite.com/some/path/somepage.html descriptive text which may or may not be present
in
<a href="http://www.somewebsite.com/some/path/somepage.html">descriptive text which may or may not be present</a>
This SO, JS article : Find URLs in Text, Make Links , is relevant to what I am trying to do, it just puts the URL in the text of the anchor element.
I am successfully matching url with
var urlRE= new RegExp("([a-zA-Z0-9]+://)?([a-zA-Z0-9_]+:[a-zA-Z0-9_]+@)?([a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(:[0-9]+)?([^ ])+");
but I'm not sure how to execute the match later.
I came across this post Regex - Matching Text AFTER certain characters , which seems to be applicable. I tried to wrap the RE in /(?<=my url pattern here).+/
but get an error stating that an invalid group exists and this results in an invalid RE.
This J-Law post mentions that
Allowed variable lengths arent allowed
Is this what I am trying to do?
Since I am already matching the url, I feel like I can easily do some substring math to get the results I want.
I just use this as an attempt to learn more about regex.
thank
source to share
Just add another capture group to capture all the stuff at the end and make your inner groups non-exciting. Something like:
var urlRE= new RegExp("((?:[a-zA-Z0-9]+://)?(?:[a-zA-Z0-9_]+:[a-zA-Z0-9_]+@)?(?:[a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(?::[0-9]+)?(?:[^ ])+)(.*)$");
var s = "possibly some text here http://www.somewebsite.com/some/path/somepage.html descriptive text which may or may not be present"
var match = urlRE.exec(s);
alert(match[0] + "\n\n" + match[1] + "\n\n" + match[2]);
// Returns:
// ["http://www.somewebsite.com/some/path/somepage.html descriptive text which may or may not be present",
// "http://www.somewebsite.com/some/path/somepage.html",
// " descriptive text which may or may not be present"]
I wrapped all your regex in parentheses ()
to form the first capturing group and inside that I made all your existing groups non-capturing with ?:
, you don't have to do that (making them non-capturing), but it simplifies the output. Then I added another group (.*)
to capture everything else to the end of the line $
.
After .exec
that, if you have a match, your match will be in [0]
, part of the url will be in [1]
, and the rest of the text will be in [2]
. This is why we used non-capture groups because otherwise you have a bunch of other captures that might or might not be useful.
source to share