Optimization for fire & forget using async / await and tasks
I have about 5 million items to update. I'm really not interested in the answer (the answer would be nice, so I can log it, but I don't want an answer if it costs me time.) Having said that, is this code optimized to run as fast as possible? If there are 5 million items, can I risk getting any canceled task or timeout errors? I am getting about 1 or 2 responses every second.
var tasks = items.Select(async item =>
{
await Update(CreateUrl(item));
}).ToList();
if (tasks.Any())
{
await Task.WhenAll(tasks);
}
private async Task<HttpResponseMessage> Update(string url)
{
var client = new HttpClient();
var response = await client.SendAsync(url).ConfigureAwait(false);
//log response.
}
UPDATE: I am indeed getting TaskCanceledExceptions. Is my system out of threads? What could I have done to avoid this?
source to share
You can run all tasks at the same time, which may not be what you want. There would be no threads involved, because with async
operations No thread , but there could be a number of parallel connection restrictions.
There may be better tools for this, but if you want to use async / await, one option is to use Stephen Toub ForEachAsync
as described in this article . It allows you to control the number of concurrent operations you want to perform, so you don't exceed your connection limit.
This is from the article:
public static class Extensions
{
public static async Task ExecuteInPartition<T>(IEnumerator<T> partition, Func<T, Task> body)
{
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select ExecuteInPartition(partition, body));
}
}
Application:
public async Task UpdateAll()
{
// Allow for 100 concurrent Updates
await items.ForEachAsync(100, async t => await Update(t));
}
source to share
Much better would be to use TPL Dataflow
ActionBlock
with MaxDegreeOfParallelism
and one HttpClient
:
Task UpdateAll(IEnumerable<Item> items)
{
var block = new ActionBlock<Item>(
item => UpdateAsync(CreateUrl(item)),
new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = 1000});
foreach (var item in items)
{
block.Post(item);
}
block.Complete();
return block.Completion;
}
async Task UpdateAsync(string url)
{
var response = await _client.SendAsync(url).ConfigureAwait(false);
Console.WriteLine(response.StatusCode);
}
- One
HttpClient
can be used for multiple requests at the same time and therefore it is much better to create and delete just one instance instead of 5 million. - There are many problems when running so many requests at the same time: network machine stack, target website, timeouts, etc. Patterns
ActionBlock
that match the numberMaxDegreeOfParallelism
(which you should test and optimize for your particular case). It is important to note that the TPL may choose a lower number if it deems it appropriate. - When you have one call
async
at the end of a methodasync
or lambda expression, it is better for performance to remove the redundantasync-await
and just return the task (i.e.return block.Completion;
) -
Complete
will notifyActionBlock
that it is not accepting any more items, but it finishes processing existing items. When this is done, the taskCompletion
is done so you can doawait
it.
source to share
I suspect that you are suffering from outbound connection control preventing a large number of simultaneous connections to the same domain. The answers given in this extensive Q + A might give you some avenues to explore.
As far as the structure of the code goes, I will personally try to use a dynamic connection pool. You know you can't actually connect 5m at the same time, so trying to try just won't work - you can also handle a reasonable and configured limit of (for example) 20 connections and use them in a pool. So you can tune up or down.
alternatively you can investigate HTTP pipelining (which I haven't used yet), which is specifically for the work you're doing (pushing to Http requests). http://en.wikipedia.org/wiki/HTTP_pipelining
source to share