How to get metadata from HTML file using PHP?

I am trying to create a function on my website where users can download links like Digg. I have some code that grabs the HTML source from a URL that a user uploads to my site and stores it in a TXT file. Then I want to grab the content in the tag

<meta name="content" description="GRAB THIS"> 


assuming this tag exists. Sometimes it works, but other times it doesn't work, even though the source code for that particular web page contains the required meta tag exactly as I specified in my code. I noticed that it seems wrong if the content of "GRAB THIS" contains html objects (&, etc.). Please let me know if you have any ideas on how to get this to work. Here is my code:

$html_data = file_get_contents( $path_to_txt_file_that_contains_html );
preg_match( '#<meta name="description" content="(.+?)">#si', $html_data, $tor;
$tor = str_replace ( '<meta name="description" content="' , "", $tor[0] );
$tor = str_replace ( '">', "", $tor );


Sometimes $ tor still contains

<meta name="description" content="CONTENT"


but without closing>, so my code breaks as soon as I put this in mySQl database. Any ideas on what I am doing wrong? Thanks in advance for any help!


source to share

2 answers

It's actually very simple.

PHP offers its own built-in solution:



Most people will tell you to use DomDocument to parse html. While I agree in most situations, sometimes it's just easier to use regex. So since you are using regex in your question, this is a regex solution.

$html_data = file_get_contents( $path_to_txt_file_that_contains_html );
preg_match( '#<meta name="description".*content="([^"]+)">#siU', $html_data, $tor);
$tor = $tor[1];


This is untested, but should work fine in your situation.



All Articles