How to read content created with ajax using webclient?

I am loading a website using the WebClient

public void download()
{
client = new WebClient();
client.DownloadStringCompleted += new DownloadStringCompletedEventHandler(client_DownloadStringCompleted);
client.Encoding = Encoding.UTF8;
client.DownloadStringAsync(new Uri(eUrl.Text));
}
void client_DownloadStringCompleted(object sender, DownloadStringCompletedEventArgs e)
{
    SaveFileDialog sd = new SaveFileDialog();
    if (sd.ShowDialog() == DialogResult.OK)
    {
        StreamWriter writer = new StreamWriter(sd.FileName,false,Encoding.Unicode);
        writer.Write(e.Result);
        writer.Close();                
    }
}

      

This works great. But I cannot read the content loaded with ajax. Like this:

<div class="center-box-body" id="boxnews" style="width:768px;height:1167px; ">
    loading ....    </div>

<script language="javascript">
    ajax_function('boxnews',"ajax/category/personal_notes/",'');
    </script>

      

This function "ajax_function" loads data from the server on the client side.

How to load complete html data on the web?

+2


source to share


3 answers


To do this, you will need to host the Javascript runtime inside a full blown web browser. Unfortunately the WebClient is unable to do this.

Your only option is to automate the WebBrowser control. You will need to post it to a URL, wait for both the main page and any AJAX content to load (including triggering the load if it requires user action to do so), and then clear the entire DOM.



If you're only scraping a particular site, you're probably better off just pulling the AJAX url yourself (mimicking all the required parameters), rather than pulling the webpage that requires it.

+1


source


I think you will need to use WebBrowser for this, as you really need the javascript on the page to run before the page has finished loading. Depending on your application, this may or may not be possible for you - pay attention to managing Windows.Forms.



0


source


When you visit a page in a browser, it

1.loads the document from the requested url

2.loads everything referenced by img, link, script, etc. (nothing that links to an external file)

3.executes javascript where applicable.

The WebClient class only goes through step 1. It encapsulates a single HTTP request and response. It does not contain a script engine and, as far as I know, does not find image tags etc. that link to other files and initiate further requests to retrieve those files.

If you want to get the page when it has been modified by the call and the AJAX handler, you will need to use a class that has all the capabilities of a web browser, which pretty much means using a web browser that you can use to automate the server side somehow. The WebBrowser control does this, but this is for WinForms I guess. I shudder to think about the security issues here, or the demand to be hosted on the server if multiple users use this object at the same time.

Better to ask yourself: why are you doing this? If the data you are really interested in is received via AJAX (possibly via a web service), why not skip the webClient step and just go straight to the source?

0


source







All Articles