Why is my WebClient returning a 404 error most of the time, but not always?

I want to get information about Microsoft Update in my program. However, the server returns a 404 error about 80% of the time. I welded the problematic code down to this console app:

using System;
using System.Net;

namespace WebBug
{
    class Program
    {
        static void Main(string[] args)
        {
            while (true)
            {
                try
                {
                    WebClient client = new WebClient();
                    Console.WriteLine(client.DownloadString("https://support.microsoft.com/api/content/kb/3068708"));
                }
                catch (Exception ex)
                {
                    Console.WriteLine(ex.Message);
                }
                Console.ReadKey();
            }
        }
    }
}

      

When I run the code, I have to loop through the loop several times until I get the actual response:

The remote server returned an error: (404) Not found.
The remote server returned an error: (404) Not Found.
The remote server returned an error: (404) Not Found.
<div kb-title title = "Update for customer service and diagnostic telemetry [...]

I can open and force-refresh the link (Ctrl + F5) in my browser as often as I want, but it will be clearly visible.

The problem occurs on two different machines with two different internet connections.
I also tested this case using the Html Agility Pack, but with the same result.
The problem is not related to other websites. (The root https://support.microsoft.com

works fine 100% of the time)

Why am I getting this strange result?

+3


source to share


1 answer


Cookies. This is due to cookies.

When I started looking into this issue, I noticed that the first time I opened the site in a new browser, I got a 404, but after refreshing (sometimes once, sometimes several times), the site continued to work.

This is when I pushed Chrome Incognito Mode and Developer Tools.

There was nothing suspicious on the network: there was a redirect to the https version if you downloaded http.

But I noticed that the cookie has changed. This is what I see the first time I load the page:

enter image description here

and here the page after (or several) is updated:



enter image description here

Notice how some more cookies are added? The site should try to read them, not find and block them. It could be a bot prevention device or bad programming, I'm not sure.

Anyway, here's how to get your code to work. This example uses HttpWebRequest / Response, not WebClient.

string url = "https://support.microsoft.com/api/content/kb/3068708";

//this holds all the cookies we need to add
//notice the values match the ones in the screenshot above
CookieContainer cookieJar = new CookieContainer();
cookieJar.Add(new Cookie("SMCsiteDir", "ltr", "/", ".support.microsoft.com"));
cookieJar.Add(new Cookie("SMCsiteLang", "en-US", "/", ".support.microsoft.com"));
cookieJar.Add(new Cookie("smc_f", "upr", "/", ".support.microsoft.com"));
cookieJar.Add(new Cookie("smcexpsessionticket", "100", "/", ".microsoft.com"));
cookieJar.Add(new Cookie("smcexpticket", "100", "/", ".microsoft.com"));
cookieJar.Add(new Cookie("smcflighting", "wwp", "/", ".microsoft.com"));

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
//attach the cookie container
request.CookieContainer = cookieJar;

//and now go to the internet, fetching back the contents
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
using(StreamReader sr = new StreamReader(response.GetResponseStream()))
{
    string site = sr.ReadToEnd();
}

      

If you uninstall request.CookieContainer = cookieJar;

it will crash with a 404 which reproduces your problem.

Most of the work for the sample code is taken from this post and this post .

+5


source







All Articles