Use HTTPWebRequest to get the remote page title

I have a web service that acts as an interface between a website farm and some analytics software. Part of analytics tracking requires collecting the page title. Instead of passing it from a web page to a web service, I would like to use it HTTPWebRequest

to call the page.

I have some code that will get the whole page and parse the html to grab the title tag, but I don't want to load the whole page just to get the information that's in the head.

I started with

HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("url");  
request.Method = "HEAD";

      

+1


source to share


4 answers


Great idea, but the HEAD request only returns the HTTP header headers. This does not include the title element, which is part of the HTTP message body.



+4


source


Try the following:



using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Net;
using System.IO;
using System.Text.RegularExpressions;

namespace ConsoleApplication2
{
    class Program
    {
        static void Main(string[] args)
        {
            string page = @"http://stackoverflow.com/";
            HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(page);
            StreamReader SR = new StreamReader(req.GetResponse().GetResponseStream());

            Char[] buf = new Char[256];
            int count = SR.Read(buf, 0, 256);
            while (count > 0)
            {
                String outputData = new String(buf, 0, count);
                Match match = Regex.Match(outputData, @"<title>([^<]+)", RegexOptions.IgnoreCase);
                if (match.Success)
                {
                    Console.WriteLine(match.Groups[1].Value);
                }
                count = SR.Read(buf, 0, 256);
            }
        }

    }
}

      

+2


source


If you don't want to query the entire page, you can query it in chunks. The http specification defines an http header called Range. You would use it like below:

Range: bytes = 0-100

You can browse the returned content and find the title. If it's not there, ask for Range: 101-200 and so on until you get what you need.

Obviously, the web server needs to maintain a range, so this could be hit or miss.

0


source


So I would need with something like ...

HttpWebRequest req   = (HttpWebRequest)WebRequest.Create(URL);
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
Stream st            = resp.GetResponseStream();
StreamReader sr      = new StreamReader(st);
string buffer        = sr.ReadToEnd();
int startPos, endPos;
startPos = buffer.IndexOf("&lt;title>",
StringComparison.CurrentCultureIgnoreCase) + 7;
endPos = buffer.IndexOf("&lt;/title>",
StringComparison.CurrentCultureIgnoreCase);
string title = buffer.Substring(startPos, endPos - startPos);
Console.WriteLine("Response code from {0}: {1}", s,
        resp.StatusCode);
Console.WriteLine("Page title: {0}", title);
sr.Close();
st.Close();

      

-1


source







All Articles