How do I read a chapter section of a web page?

On my web page, when a user enters a URL in a text box, I want to get some information about that page, such as title or link information.

Is there a way to do this? On the client (JavaScript) or on the server (PHP)? And How?

+2


source to share


4 answers


On server:

simpledom required



    include("simpledom.php");
    $html = file_get_html('http://www.google.com/');
    echo $html->find('head')->outertext; // returns <head>...</head>

      

+2


source


You cannot do this via Javascript, unless the page is on your domain. This is because cross-server scripting is limited.



But you can use PHP (check file_get_contents()

function), parse the content of the tag <head>

with a simplified one and then pass it to the ajax request.

+3


source


In php you can open urls like files i.e.

$f = fopen ("http://www.site/page.htm", r);

      

If you really want to use the real DOM, use the plain part or another module.

Edit: you can probably ignore the fopen () suggestion above, for some reason I thought you were only asking how to read sites that you have complete control over.

0


source


So, you have a text input field on your page, and when your user enters a link you want information about it?

This can be helpful:

http://www.bin-co.com/php/scripts/load/

According to this page, which will return something like this:

Array
(
    [headers] => Array
        (
            [Date] => Mon, 18 Jun 2007 13:56:22 GMT
            [Server] => Apache/2.0.54 (Unix) PHP/4.4.7 mod_ssl/2.0.54 OpenSSL/0.9.7e mod_fastcgi/2.4.2 DAV/2 SVN/1.4.2
            [X-Powered-By] => PHP/5.2.2
            [Expires] => Thu, 19 Nov 1981 08:52:00 GMT
            [Cache-Control] => no-store, no-cache, must-revalidate, post-check=0, pre-check=0
            [Pragma] => no-cache
            [Set-Cookie] => PHPSESSID=85g9n1i320ao08kp5tmmneohm1; path=/
            [Last-Modified] => Tue, 30 Nov 1999 00:00:00 GMT
            [Vary] => Accept-Encoding
            [Transfer-Encoding] => chunked
            [Content-Type] => text/xml
        )
    [body] => ... Contents of the Page ...
    [info] => Array
        (
            [url] => http://www.bin-co.com/rss.xml.php?section=2
            [content_type] => text/xml
            [http_code] => 200
            [header_size] => 501
            [request_size] => 146
            [filetime] => -1
            [ssl_verify_result] => 0
            [redirect_count] => 0
            [total_time] => 1.113792
            [namelookup_time] => 0.180019
            [connect_time] => 0.467973
            [pretransfer_time] => 0.468035
            [size_upload] => 0
            [size_download] => 2274
            [speed_download] => 2041
            [speed_upload] => 0
            [download_content_length] => 0
            [upload_content_length] => 0
            [starttransfer_time] => 0.826031
            [redirect_time] => 0
        )

      

)

0


source







All Articles