How to get metadata from HTML file using PHP?
I am trying to create a function on my website where users can download links like Digg. I have some code that grabs the HTML source from a URL that a user uploads to my site and stores it in a TXT file. Then I want to grab the content in the tag
<meta name="content" description="GRAB THIS">
assuming this tag exists. Sometimes it works, but other times it doesn't work, even though the source code for that particular web page contains the required meta tag exactly as I specified in my code. I noticed that it seems wrong if the content of "GRAB THIS" contains html objects (&, etc.). Please let me know if you have any ideas on how to get this to work. Here is my code:
$html_data = file_get_contents( $path_to_txt_file_that_contains_html );
preg_match( '#<meta name="description" content="(.+?)">#si', $html_data, $tor;
$tor = str_replace ( '<meta name="description" content="' , "", $tor[0] );
$tor = str_replace ( '">', "", $tor );
Sometimes $ tor still contains
<meta name="description" content="CONTENT"
but without closing>, so my code breaks as soon as I put this in mySQl database. Any ideas on what I am doing wrong? Thanks in advance for any help!
source to share
It's actually very simple.
PHP offers its own built-in solution: http://php.net/manual/en/function.get-meta-tags.php
source to share
Most people will tell you to use DomDocument to parse html. While I agree in most situations, sometimes it's just easier to use regex. So since you are using regex in your question, this is a regex solution.
$html_data = file_get_contents( $path_to_txt_file_that_contains_html );
preg_match( '#<meta name="description".*content="([^"]+)">#siU', $html_data, $tor);
$tor = $tor[1];
This is untested, but should work fine in your situation.
source to share