Optimization for fire & forget using async / await and tasks

I have about 5 million items to update. I'm really not interested in the answer (the answer would be nice, so I can log it, but I don't want an answer if it costs me time.) Having said that, is this code optimized to run as fast as possible? If there are 5 million items, can I risk getting any canceled task or timeout errors? I am getting about 1 or 2 responses every second.

var tasks = items.Select(async item =>
{
    await Update(CreateUrl(item));
}).ToList();

if (tasks.Any())
{
    await Task.WhenAll(tasks);
}                

private async Task<HttpResponseMessage> Update(string url)
{
    var client = new HttpClient();
    var response = await client.SendAsync(url).ConfigureAwait(false);    
    //log response.
}

      

UPDATE: I am indeed getting TaskCanceledExceptions. Is my system out of threads? What could I have done to avoid this?

+2


source to share


3 answers


You can run all tasks at the same time, which may not be what you want. There would be no threads involved, because with async

operations No thread , but there could be a number of parallel connection restrictions.

There may be better tools for this, but if you want to use async / await, one option is to use Stephen Toub ForEachAsync

as described in this article . It allows you to control the number of concurrent operations you want to perform, so you don't exceed your connection limit.

This is from the article:



public static class Extensions
{
     public static async Task ExecuteInPartition<T>(IEnumerator<T> partition, Func<T, Task> body)
     {
         using (partition)
             while (partition.MoveNext())
                await body(partition.Current);
     }

     public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
     {      
         return Task.WhenAll(
             from partition in Partitioner.Create(source).GetPartitions(dop)
                  select ExecuteInPartition(partition, body));
     }
}

      

Application:

public async Task UpdateAll()
{
    // Allow for 100 concurrent Updates
    await items.ForEachAsync(100, async t => await Update(t));  
}

      

+3


source


Much better would be to use TPL Dataflow

ActionBlock

with MaxDegreeOfParallelism

and one HttpClient

:

Task UpdateAll(IEnumerable<Item> items)
{
    var block = new ActionBlock<Item>(
        item => UpdateAsync(CreateUrl(item)), 
        new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = 1000});

    foreach (var item in items)
    {
        block.Post(item);
    }

    block.Complete();
    return block.Completion;
}

async Task UpdateAsync(string url)
{
    var response = await _client.SendAsync(url).ConfigureAwait(false);    
    Console.WriteLine(response.StatusCode);
}

      



  • One HttpClient

    can be used for multiple requests at the same time and therefore it is much better to create and delete just one instance instead of 5 million.
  • There are many problems when running so many requests at the same time: network machine stack, target website, timeouts, etc. Patterns ActionBlock

    that match the number MaxDegreeOfParallelism

    (which you should test and optimize for your particular case). It is important to note that the TPL may choose a lower number if it deems it appropriate.
  • When you have one call async

    at the end of a method async

    or lambda expression, it is better for performance to remove the redundant async-await

    and just return the task (i.e. return block.Completion;

    )
  • Complete

    will notify ActionBlock

    that it is not accepting any more items, but it finishes processing existing items. When this is done, the task Completion

    is done so you can do await

    it.
+2


source


I suspect that you are suffering from outbound connection control preventing a large number of simultaneous connections to the same domain. The answers given in this extensive Q + A might give you some avenues to explore.

What limits the number of concurrent connections that an ASP.NET application can make to a web service?

As far as the structure of the code goes, I will personally try to use a dynamic connection pool. You know you can't actually connect 5m at the same time, so trying to try just won't work - you can also handle a reasonable and configured limit of (for example) 20 connections and use them in a pool. So you can tune up or down.

alternatively you can investigate HTTP pipelining (which I haven't used yet), which is specifically for the work you're doing (pushing to Http requests). http://en.wikipedia.org/wiki/HTTP_pipelining

0


source







All Articles