Use HTTPWebRequest to get the remote page title
I have a web service that acts as an interface between a website farm and some analytics software. Part of analytics tracking requires collecting the page title. Instead of passing it from a web page to a web service, I would like to use it HTTPWebRequest
to call the page.
I have some code that will get the whole page and parse the html to grab the title tag, but I don't want to load the whole page just to get the information that's in the head.
I started with
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("url");
request.Method = "HEAD";
source to share
Try the following:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Net;
using System.IO;
using System.Text.RegularExpressions;
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string page = @"http://stackoverflow.com/";
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(page);
StreamReader SR = new StreamReader(req.GetResponse().GetResponseStream());
Char[] buf = new Char[256];
int count = SR.Read(buf, 0, 256);
while (count > 0)
{
String outputData = new String(buf, 0, count);
Match match = Regex.Match(outputData, @"<title>([^<]+)", RegexOptions.IgnoreCase);
if (match.Success)
{
Console.WriteLine(match.Groups[1].Value);
}
count = SR.Read(buf, 0, 256);
}
}
}
}
source to share
If you don't want to query the entire page, you can query it in chunks. The http specification defines an http header called Range. You would use it like below:
Range: bytes = 0-100
You can browse the returned content and find the title. If it's not there, ask for Range: 101-200 and so on until you get what you need.
Obviously, the web server needs to maintain a range, so this could be hit or miss.
source to share
So I would need with something like ...
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
Stream st = resp.GetResponseStream();
StreamReader sr = new StreamReader(st);
string buffer = sr.ReadToEnd();
int startPos, endPos;
startPos = buffer.IndexOf("<title>",
StringComparison.CurrentCultureIgnoreCase) + 7;
endPos = buffer.IndexOf("</title>",
StringComparison.CurrentCultureIgnoreCase);
string title = buffer.Substring(startPos, endPos - startPos);
Console.WriteLine("Response code from {0}: {1}", s,
resp.StatusCode);
Console.WriteLine("Page title: {0}", title);
sr.Close();
st.Close();
source to share