Twitter modes only when there is no link yet

Question

Twitter modes only when there is no link yet

I know this has already been done to death. I have already found many topics on this subject and a lot of advice. However, if I have the following line:

@testaccount
<a href="http://twitter.com/testaccount">@testaccount</a>

Obviously I don't want to convert the second one to a link as it is already one. I managed to find the first one without being an email (thanks to a few questions already here).

Here's the template I already have:

/(?<=^|(?<=[^a-zA-Z0-9-_\.]))@([A-Za-z]+[A-Za-z0-9_]+)/

This completely transforms the former, but the latter will obviously become a "double bond".

So, I managed to figure out that I should be using something like this (?!<\/a>)

. However, this only removes the last one t

from testaccount

.

Basically, I need to find a way to ignore the entire match, not just remove one character. Is it possible?

The language I am using is PHP.

thank

+3

php regex

CircularRecursion 23 oct. 14 at 16:19

source to share

3 answers

You can effectively use check verbs (*SKIP)

and (*FAIL)

.

~<a[^<]*</a>(*SKIP)(*F)|@(\w+)~

The idea is to skip any content between tags <a ..

. On the left side of the alternation operator, we match a subpattern we don't need, causing it to fail and forcing the regex engine not to repeat the substring.

Live Demo

+2

hwnd 23 oct. '14 at 16:30

source to share

Regex, bad. Analysis good.

$dom = new DOMDocument();
$dom->loadHTML("<div>".$your_html_source_here."</div>",
                                      LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//text()[contains(.,'@')][not(ancestor::a)]");
foreach($nodes as $node) {
    // each of these nodes contains at least one @ to be processed
    // note that children of <a> tags are automatically ignored
    preg_match_all("/(?:^|(?<=\s))@\w+/",$node->nodeValue,$matches,
                                           PREG_PATTERN_ORDER|PREG_OFFSET_CAPTURE);
    // work backwards - it easier
    foreach(array_reverse($matches[0]) as $match) {
        list($text,$offset) = $match;
        $node->splitText($offset+mb_strlen($text));
        $middle = $node->splitText($offset);
        // now wrap the text in a link:
        $link = $dom->createElement('a');
        $link->setAttribute("href","http://twitter.com/".substr($text,1));
        $node->parentNode->insertBefore($link,$middle);
        $link->appendChild($middle);
    }
}
// output
$result = substr(trim($dom->saveHTML()),strlen("<div>"),-strlen("</div>"));

(Note: adding <div>

around the content means there is a root element, otherwise there will be problems as a result of parsing.)

Demo here

0

Niet the dark absol 23 oct. 14 at 16:55

source to share

Avinash Raj · Accepted Answer · 2014-10-23T16:24:54+0000

You need to add .*?

before <\/a>

inside this negative view. So it won't match lines @

that are already anchored.

(?<=^|(?<=[^a-zA-Z0-9-_\.]))@([A-Za-z0-9_]+)(?!.*?<\/a>)

DEMO

Twitter modes only when there is no link yet

More articles: