Create array from content of <div> tags in php

Question

Create array from content of <div> tags in php

I have the content of a web page assigned to a variable $html

Here's some sample content $html

:

<div class="content">something here</div>
<span>something random thrown in <strong>here</strong></span>
<div class="content">more stuff</div>

How, using PHP, can I create an array from one that finds the contents of areas <div class="content"></div>

like this (for the example above), so:

echo $array[0] . "\n" . $array[1]; //etc

outputs

something here
more stuff

+2

arrays php parsing html-parsing

827 Oct 20 '09 at 4:27

source to share

5 answers

PHP has several ways to process HTML, including DomDocument

and SimpleXML

. See Parsing HTML with PHP and DOM . Here's an example:

$dom = new DomDocument; 
$dom->loadHTML($html); 
$dom->preserveWhiteSpace = false; 
$divs = $dom->getElementsByTagName('div'); 
foreach ($divs as $div) {
  $class = $div->getAttribute('class');
  if ($class == 'content') {
    echo $div->nodeValue . "\n";
  }
}

Technically, a class attribute can be multiple classes, so you can use:

$classes = explode(' ', $class);
if (in_array('content', $classes)) {
  ...
}

The SimpleXML / XPath approach is more concise, but if you don't want to go the XPath path (and learn another technology at least enough to accomplish such tasks), then the above is a programmatic alternative.

+1

cletus Oct 20 At 4:47 am

source to share

You probably need to use preg_match_all

()

$matches = array();
preg_match_all('`\<div(.*?)class\=\"content\"(.*?)\>(.*?)\<\/div\>`iUsm',$html,$matches,PREG_SET_ORDER);
foreach($matches as $m){
  // $m[3] represents the content in <div class="content">
}

0

mauris Oct 20 '09 at 4:30

source to share

There is not much you can do without using the string manipulation function or regular expressions. you can load your HTML as XML using the DOM library and use that to navigate to your div, but this can get cumbersome if you're not careful or the complexity of the structure.

http://ca3.php.net/manual/en/book.dom.php

0

Laurent bourgault-roy Oct 20 At 4:36 am

source to share

Looks like Kalem13 beat me, but I agree. You can use the DOMDocument class. I haven't used it personally, but I think it will work for you. First, you instantiate the DOMDocument object, then you load the $ html variable using the loadHTML () function . Then you can use the getElementsByTagName () function .

0

Abinadi Oct 20 At 4:38 am

source to share

Jonathan Fingland · Accepted Answer · 2009-10-20T04:38:58+0000

Assuming this is just a simplified case in the OP, and the real situation is trickier, you'll want to use XPath.

If it is really tricky then you can use DOMDocument (with DOMXPath ) but here is a simple example using SimpleXML

$xml = new SimpleXMLElement($html);

$result = $xml->xpath('//div[@class="content"]');

while(list( , $node) = each($result)) {
    echo $node,"\n";
}

Since you explicitly asked about creating an array for this, you can use:

$res_Arr = array();
while(list( , $node) = each($result)) {
    $res_Arr[] = $node;
}

and $res_Arr

will be an array with the content you are looking for.

See http://php.net/manual/en/simplexmlelement.xpath.php for php SimpleXML Xpath info and http://www.w3.org/TR/xpath for XPath specs

Create array from content of <div> tags in php

More articles: