Use HTTPWebRequest to get the remote page title
I have a web service that acts as an interface between a website farm and some analytics software. Part of analytics tracking requires collecting the page title. Instead of passing it from a web page to a web service, I would like to use it HTTPWebRequest
to call the page.
I have some code that will get the whole page and parse the html to grab the title tag, but I don't want to load the whole page just to get the information that's in the head.
I started with
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("url");
request.Method = "HEAD";
Great idea, but the HEAD request only returns the HTTP header headers. This does not include the title element, which is part of the HTTP message body.
Try the following:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Net;
using System.IO;
using System.Text.RegularExpressions;
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string page = @"http://stackoverflow.com/";
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(page);
StreamReader SR = new StreamReader(req.GetResponse().GetResponseStream());
Char[] buf = new Char[256];
int count = SR.Read(buf, 0, 256);
while (count > 0)
{
String outputData = new String(buf, 0, count);
Match match = Regex.Match(outputData, @"<title>([^<]+)", RegexOptions.IgnoreCase);
if (match.Success)
{
Console.WriteLine(match.Groups[1].Value);
}
count = SR.Read(buf, 0, 256);
}
}
}
}
If you don't want to query the entire page, you can query it in chunks. The http specification defines an http header called Range. You would use it like below:
Range: bytes = 0-100
You can browse the returned content and find the title. If it's not there, ask for Range: 101-200 and so on until you get what you need.
Obviously, the web server needs to maintain a range, so this could be hit or miss.
So I would need with something like ...
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
Stream st = resp.GetResponseStream();
StreamReader sr = new StreamReader(st);
string buffer = sr.ReadToEnd();
int startPos, endPos;
startPos = buffer.IndexOf("<title>",
StringComparison.CurrentCultureIgnoreCase) + 7;
endPos = buffer.IndexOf("</title>",
StringComparison.CurrentCultureIgnoreCase);
string title = buffer.Substring(startPos, endPos - startPos);
Console.WriteLine("Response code from {0}: {1}", s,
resp.StatusCode);
Console.WriteLine("Page title: {0}", title);
sr.Close();
st.Close();