Twitter modes only when there is no link yet
I know this has already been done to death. I have already found many topics on this subject and a lot of advice. However, if I have the following line:
@testaccount
<a href="http://twitter.com/testaccount">@testaccount</a>
Obviously I don't want to convert the second one to a link as it is already one. I managed to find the first one without being an email (thanks to a few questions already here).
Here's the template I already have:
/(?<=^|(?<=[^a-zA-Z0-9-_\.]))@([A-Za-z]+[A-Za-z0-9_]+)/
This completely transforms the former, but the latter will obviously become a "double bond".
So, I managed to figure out that I should be using something like this (?!<\/a>)
. However, this only removes the last one t
from testaccount
.
Basically, I need to find a way to ignore the entire match, not just remove one character. Is it possible?
The language I am using is PHP.
thank
source to share
You can effectively use check verbs (*SKIP)
and (*FAIL)
.
~<a[^<]*</a>(*SKIP)(*F)|@(\w+)~
The idea is to skip any content between tags <a ..
. On the left side of the alternation operator, we match a subpattern we don't need, causing it to fail and forcing the regex engine not to repeat the substring.
source to share
Regex, bad. Analysis good.
$dom = new DOMDocument();
$dom->loadHTML("<div>".$your_html_source_here."</div>",
LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//text()[contains(.,'@')][not(ancestor::a)]");
foreach($nodes as $node) {
// each of these nodes contains at least one @ to be processed
// note that children of <a> tags are automatically ignored
preg_match_all("/(?:^|(?<=\s))@\w+/",$node->nodeValue,$matches,
PREG_PATTERN_ORDER|PREG_OFFSET_CAPTURE);
// work backwards - it easier
foreach(array_reverse($matches[0]) as $match) {
list($text,$offset) = $match;
$node->splitText($offset+mb_strlen($text));
$middle = $node->splitText($offset);
// now wrap the text in a link:
$link = $dom->createElement('a');
$link->setAttribute("href","http://twitter.com/".substr($text,1));
$node->parentNode->insertBefore($link,$middle);
$link->appendChild($middle);
}
}
// output
$result = substr(trim($dom->saveHTML()),strlen("<div>"),-strlen("</div>"));
(Note: adding <div>
around the content means there is a root element, otherwise there will be problems as a result of parsing.)
Demo here
source to share