Create array from content of <div> tags in php
I have the content of a web page assigned to a variable $html
Here's some sample content $html
:
<div class="content">something here</div>
<span>something random thrown in <strong>here</strong></span>
<div class="content">more stuff</div>
How, using PHP, can I create an array from one that finds the contents of areas <div class="content"></div>
like this (for the example above), so:
echo $array[0] . "\n" . $array[1]; //etc
outputs
something here
more stuff
source to share
Assuming this is just a simplified case in the OP, and the real situation is trickier, you'll want to use XPath.
If it is really tricky then you can use DOMDocument (with DOMXPath ) but here is a simple example using SimpleXML
$xml = new SimpleXMLElement($html);
$result = $xml->xpath('//div[@class="content"]');
while(list( , $node) = each($result)) {
echo $node,"\n";
}
Since you explicitly asked about creating an array for this, you can use:
$res_Arr = array();
while(list( , $node) = each($result)) {
$res_Arr[] = $node;
}
and $res_Arr
will be an array with the content you are looking for.
See http://php.net/manual/en/simplexmlelement.xpath.php for php SimpleXML Xpath info and http://www.w3.org/TR/xpath for XPath specs
source to share
PHP has several ways to process HTML, including DomDocument
and SimpleXML
. See Parsing HTML with PHP and DOM . Here's an example:
$dom = new DomDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$divs = $dom->getElementsByTagName('div');
foreach ($divs as $div) {
$class = $div->getAttribute('class');
if ($class == 'content') {
echo $div->nodeValue . "\n";
}
}
Technically, a class attribute can be multiple classes, so you can use:
$classes = explode(' ', $class);
if (in_array('content', $classes)) {
...
}
The SimpleXML / XPath approach is more concise, but if you don't want to go the XPath path (and learn another technology at least enough to accomplish such tasks), then the above is a programmatic alternative.
source to share
You probably need to use preg_match_all
()
$matches = array();
preg_match_all('`\<div(.*?)class\=\"content\"(.*?)\>(.*?)\<\/div\>`iUsm',$html,$matches,PREG_SET_ORDER);
foreach($matches as $m){
// $m[3] represents the content in <div class="content">
}
source to share
Looks like Kalem13 beat me, but I agree. You can use the DOMDocument class. I haven't used it personally, but I think it will work for you. First, you instantiate the DOMDocument object, then you load the $ html variable using the loadHTML () function . Then you can use the getElementsByTagName () function .
source to share