Create array from content of <div> tags in php

I have the content of a web page assigned to a variable $html

Here's some sample content $html

:

<div class="content">something here</div>
<span>something random thrown in <strong>here</strong></span>
<div class="content">more stuff</div>

      

How, using PHP, can I create an array from one that finds the contents of areas <div class="content"></div>

like this (for the example above), so:

echo $array[0] . "\n" . $array[1]; //etc

      

outputs

something here
more stuff

      

+2


source to share


5 answers


Assuming this is just a simplified case in the OP, and the real situation is trickier, you'll want to use XPath.

If it is really tricky then you can use DOMDocument (with DOMXPath ) but here is a simple example using SimpleXML

$xml = new SimpleXMLElement($html);

$result = $xml->xpath('//div[@class="content"]');

while(list( , $node) = each($result)) {
    echo $node,"\n";
}

      

Since you explicitly asked about creating an array for this, you can use:



$res_Arr = array();
while(list( , $node) = each($result)) {
    $res_Arr[] = $node;
}

      

and $res_Arr

will be an array with the content you are looking for.

See http://php.net/manual/en/simplexmlelement.xpath.php for php SimpleXML Xpath info and http://www.w3.org/TR/xpath for XPath specs

+3


source


PHP has several ways to process HTML, including DomDocument

and SimpleXML

. See Parsing HTML with PHP and DOM . Here's an example:

$dom = new DomDocument; 
$dom->loadHTML($html); 
$dom->preserveWhiteSpace = false; 
$divs = $dom->getElementsByTagName('div'); 
foreach ($divs as $div) {
  $class = $div->getAttribute('class');
  if ($class == 'content') {
    echo $div->nodeValue . "\n";
  }
}

      

Technically, a class attribute can be multiple classes, so you can use:



$classes = explode(' ', $class);
if (in_array('content', $classes)) {
  ...
}

      

The SimpleXML / XPath approach is more concise, but if you don't want to go the XPath path (and learn another technology at least enough to accomplish such tasks), then the above is a programmatic alternative.

+1


source


You probably need to use preg_match_all

()

$matches = array();
preg_match_all('`\<div(.*?)class\=\"content\"(.*?)\>(.*?)\<\/div\>`iUsm',$html,$matches,PREG_SET_ORDER);
foreach($matches as $m){
  // $m[3] represents the content in <div class="content">
}

      

0


source


There is not much you can do without using the string manipulation function or regular expressions. you can load your HTML as XML using the DOM library and use that to navigate to your div, but this can get cumbersome if you're not careful or the complexity of the structure.

http://ca3.php.net/manual/en/book.dom.php

0


source


Looks like Kalem13 beat me, but I agree. You can use the DOMDocument class. I haven't used it personally, but I think it will work for you. First, you instantiate the DOMDocument object, then you load the $ html variable using the loadHTML () function . Then you can use the getElementsByTagName () function .

0


source







All Articles