How to get metadata from HTML file using PHP?

Question

How to get metadata from HTML file using PHP?

I am trying to create a function on my website where users can download links like Digg. I have some code that grabs the HTML source from a URL that a user uploads to my site and stores it in a TXT file. Then I want to grab the content in the tag

<meta name="content" description="GRAB THIS">

assuming this tag exists. Sometimes it works, but other times it doesn't work, even though the source code for that particular web page contains the required meta tag exactly as I specified in my code. I noticed that it seems wrong if the content of "GRAB THIS" contains html objects (&, etc.). Please let me know if you have any ideas on how to get this to work. Here is my code:

$html_data = file_get_contents( $path_to_txt_file_that_contains_html );
preg_match( '#<meta name="description" content="(.+?)">#si', $html_data, $tor;
$tor = str_replace ( '<meta name="description" content="' , "", $tor[0] );
$tor = str_replace ( '">', "", $tor );

Sometimes $ tor still contains

<meta name="description" content="CONTENT"

but without closing>, so my code breaks as soon as I put this in mySQl database. Any ideas on what I am doing wrong? Thanks in advance for any help!

+3

html php preg-match file-get-contents str-replace

John Anderson 18 March 12 at 5:17 am

source to share

2 answers

Most people will tell you to use DomDocument to parse html. While I agree in most situations, sometimes it's just easier to use regex. So since you are using regex in your question, this is a regex solution.

$html_data = file_get_contents( $path_to_txt_file_that_contains_html );
preg_match( '#<meta name="description".*content="([^"]+)">#siU', $html_data, $tor);
$tor = $tor[1];

This is untested, but should work fine in your situation.

+1

Rob 18 March 12 at 5:43

source to share

Daniel · Accepted Answer · 2012-03-18T05:38:58+0000

It's actually very simple.

PHP offers its own built-in solution: http://php.net/manual/en/function.get-meta-tags.php

How to get metadata from HTML file using PHP?

More articles: