Better than a DOM without unique identifiers?

I was still messing around with the simple PHP DOM and came across a cryptic scenario. There are really no unique tags that can be used to indicate what I want, just tons of tags <a>

. Except for the fact that they are grouped between comments.

If i do

foreach($html->find('comment a') as $a){
    $articles[] = array($a->href,$a->innertext);
}

      

I get tons of things. So there is a way to indicate that I need tags <a>

between the first and second comments, the third and fourth, etc. OR DOM is not the best thing to use in a situation like where its just a mess of html,

Excerpt:

<! FIRST COLUMN STARTS HERE>
<center><table CELLPADDING="3" WIDTH="100%"><tr>
<td ALIGN="LEFT" VALIGN="TOP" WIDTH="30%"><tt><b>
<A HREF="http://foo.bar">Text text text...</A><BR><BR>

      

thank

+3


source to share


1 answer


It is possible, but it is better to consider the order of tags instead of nesting them.

$articles = array(array());
foreach($html->find('comment, a') as $a) {
    if ($a->nodetype == HDOM_TYPE_COMMENT) {
        $articles[] = array();
    } else {
        $articles[count($articles) - 1][] = array($a->href,$a->innertext);
    }
}

      

This (rather crude) code will create an array of arrays, one for any one <a>

before the first comment and one for each comment, each array containing zero or more articles depending on how many links appear between its comment and the next.



By the way, this code will not work with your provided code snippet as the comments there start with <!

and end with >

instead of <!--

and -->

respectively. I am assuming the comments are showing correctly in the actual HTML markup.

EDIT: Ok, the "comments" are found as they are in the snippet. In this case, simplehtml appears to call all other tags starting with <!

: "unknown". So if you add this to the code above, you have your reference arrays:

$articles = array(array());
foreach($html->find('comment, unknown, a') as $a) {
    if (in_array($a->nodetype, array(HDOM_TYPE_COMMENT, HDOM_TYPE_UNKNOWN))) {
        $articles[] = array();
    } else {
        $articles[count($articles) - 1][] = array($a->href,$a->innertext);
    }
}

      

0


source







All Articles