PHP fatal error: cannot use object of type simple_html_dom as array
I am working on web scraping apps using simple_html_dom
. I need to extract all images in a webpage. The possibilities are listed below:
-
<img>
Tag Images - if one page has css with tag
<style>
. - if there is an image with inline styling with
<div>
or with a different tag.
I can clear all images using the following code.
function download_images($html, $page_url , $local_url){
foreach($html->find('img') as $element) {
$img_url = $element->src;
$img_url = rel2abs($img_url, $page_url);
$parts = parse_url($img_url);
$img_path= $parts['path'];
$url_to_be_change = $GLOBALS['website_server_root'].$img_path;
download_file($img_url, $GLOBALS['website_local_root'].$img_path);
$element->src=$url_to_be_change;
}
$css_inline = $html->find("style");
$matches = array();
preg_match_all( "/url\((.*?)\)/", $css_inline, $matches, PREG_SET_ORDER );
foreach ( $matches as $match ) {
$img_url = trim( $match[1], "\"'" );
$img_url = rel2abs($img_url, $page_url);
$parts = parse_url($img_url);
$img_path= $parts['path'];
$url_to_be_change = $GLOBALS['website_server_root'].$img_path ;
download_file($img_url , $GLOBALS['website_local_root'].$img_path);
$html = str_replace($img_url , $url_to_be_change , $html );
}
return $html;
}
$html = download_images($html , $page_url , $dir); // working fine
$html = str_get_html ($html);
$html->save($dir. "/" . $ff);
Please note that I am also changing the HTML after the image is loaded.
the download works fine. but when I try to save the HTML then it gives the following error:
PHP fatal error: cannot use object of type simple_html_dom as array
Important: it works fine if I don't use the str_replace
second loop either .
Fatal error: Cannot use object of type simple_html_dom as an array in / var / www / html / app / framework / cache / includes / simple_html_dom.php on line 1167
source to share
Guess # 1
I see a possible error here:
$html = str_get_html($html);
It looks like you are passing an object to the str_get_html () function while it is taking a string as an argument. Let's fix it like this:
$html = str_get_html($html->plaintext);
We can only guess what the content of the $ html variable that comes with this piece of code is.
Guess # 2
Or maybe we just need to use a different function in the download_images function to make your code correct in both cases:
function download_images($html, $page_url , $local_url){
foreach($html->find('img') as $element) {
$img_url = $element->src;
$img_url = rel2abs($img_url, $page_url);
$parts = parse_url($img_url);
$img_path= $parts['path'];
$url_to_be_change = $GLOBALS['website_server_root'].$img_path ;
download_file($img_url , $GLOBALS['website_local_root'].$img_path);
$element->src=$url_to_be_change;
}
$css_inline = $html->find("style");
$result_html = "";
$matches = array();
preg_match_all( "/url\((.*?)\)/", $css_inline, $matches, PREG_SET_ORDER );
foreach ( $matches as $match ) {
$img_url = trim( $match[1], "\"'" );
$img_url = rel2abs($img_url, $page_url);
$parts = parse_url($img_url);
$img_path= $parts['path'];
$url_to_be_change = $GLOBALS['website_server_root'].$img_path ;
download_file($img_url , $GLOBALS['website_local_root'].$img_path);
$result_html = str_replace($img_url , $url_to_be_change , $html );
}
return $result_html;
}
$html = download_images($html , $page_url , $dir); // working fine
$html = str_get_html ($html);
$html->save($dir. "/" . $ff);
Explanation: if there is no match (the $ matches array is empty), we never go into the second loop, so the $ html variable still has the same value as at the beginning of the function. This is a common mistake when trying to use the same variable instead of code where you need two different variables.
source to share
I had this error, I solved it by using (in my case) return $ html-> save (); at the end of the function. I cannot explain why two instances with different variable names and scopes in different functions made this error. I guess this is how the "simple html dom" class works.
Just to be clear, try: $ html-> save () before doing anything else after
I hope this information helps someone :)
source to share