Remove HTML element from parsed HTML document provided

Question

Remove HTML element from parsed HTML document provided

I have parsed an HTML document using Simple PHP HTML DOM Parser. The parsed document has an ul-tag with some li-tags in it. One of these li tags contains one of those dreaded Add This buttons that I want to remove.

To make it worse, the list item has no class or ID, and it is not always in the same position in the list. So there is no easy way (correct me if I'm wrong) to remove it using a parser.

What I want to do is search for the string "addthis.com" in all li elements and remove any element that contains that string.

<ul>
    <li>Foobar</li>
    <li>addthis.com</li><!-- How do I remove this? -->
    <li>Foobar</li>
</ul>

FYI: This is a puri hobby project in my quest to learn PHP, not a case of stealing content for profit.

All suggestions are welcome!

+3

string substring html php html-parsing

Gabriel Smoljár 11 Mar 12 at 11:44

source to share

3 answers

Good thing you can do is use jQuery after parsing. Something like that:

$('li').each(function(i) {
    if($(this).html() == "addthis.com"){
        $(this).remove();
    }
});

+1

Hans Wassink 11 Mar 12 at 11:55

source to share

This solution uses the DOMDocument class and domnode.removechild :

$str="<ul><li>Foobar</li><li>addthis.com</li><li>Foobar</li></ul>";
$remove='addthis.com';
$doc = new DOMDocument();
$doc->loadHTML($str);
$elements = $doc->getElementsByTagName('li');
$domElemsToRemove = array();
foreach ($elements as $element) {
  $pos = strpos($element->textContent, $remove); // or similar $element->nodeValue
  if ($pos !== false) {
    $domElemsToRemove[] = $element;
  }
}
foreach( $domElemsToRemove as $domElement ){
  $domElement->parentNode->removeChild($domElement);
}
$str = $doc->saveHTML(); // <ul><li>Foobar</li><li>Foobar</li></ul>

0

Stano 21 jul. At 8:54 am

source to share

Adam · Accepted Answer · 2012-03-11T12:02:13+0000

Could not find a method to remove nodes explicitly, but can be removed by setting the outer text to empty.

$html = new simple_html_dom();
$html->load(file_get_contents("test.html"), false, false); // preserve formatting

foreach($html->find('ul li') as $element) {
  if (count($element->find('a.addthis_button')) > 0) {
    $element->outertext="";
  }
}

echo $html;

Remove HTML element from parsed HTML document provided

More articles: