Plain HTML DOM as echo only text from anchor text

summary of my code:

foreach($html->find('a') as $element) {

      

.. I use the following for inner text:

$element->innertext

      

Any possibility to display only text from anchor text other than Simple HTML DOM, I'm trying to crawl about 10k links, but in some cases it prints IF inside <a tag

, div code, image code, etc.

if <a tag

is standard (simple), for example:

<a href="http://www.test.com">Anchor Text</a>

      

so in this case $ element-> innertext will be "Anchor Text"

BUT

if the cases are:

1    <a href="http://www.test.com"><div id=whatever>Anchor Text</div></a>

      

or

2    <a href="http://www.test.com"><img src="whatever" /></a>

      

my $element->innertext

will be:

Result1 <div id=whatever>Anchor Text</div>
Result2 <img src="whatever" />

      

Is there any change to print ONLY text or should I write my own custom conditions for each case: div, img, etc.

+3


source to share


3 answers


It's as easy as strip_tags($element->innertext);



The result will be an empty string if the anchor is an image.

+3


source


Use Plaintext



     strip_tags($element->plaintext)

      

+2


source


$mbHtml = mb_convert_encoding($element->innertext, 'HTML-ENTITIES', 'utf-8');
$mbHtml = mb_eregi_replace('<(div|option|ul|li|table|tr|td|th|input|select|textarea|form)', ' <\\1', $mbHtml );

      

0


source







All Articles