How do I use Parallel.ForEach with thread-local state?
Problem: I've seen 2 implementations of Parallel.Foreach()
loading urls from WebCLient
in the article. The author assumed that in the first example, if we have an array of 100 URLs, 100 WebClients will be started and most of them will time out. So he proposed a second implementation in which he used thread local state, and he stated that "as many WebClient () objects will be spawned as we need."
Question: How does the second example ensure there are no timeouts? Or, in other words, how does the second example account for the local connection limit? Will clients be reusable or something else?
Source:
// First example
Parallel.ForEach(urls,
(url,loopstate,index) =>
{
WebClient webclient = new WebClient();
webclient.DownloadFile(url, filenames[index];
});
// Second example
Parallel.ForEach(urls,
() => new WebClient(),
(url, loopstate, index, webclient) =>
{
webclient.DownloadFile(url, filenames[index]);
},
(webclient) => { });
Note. Spawning WebClients for multiple threads is for demonstration purposes only. I know this will be more efficient with asynchronous operations.
The link I got the source from (I simplified it a bit): When should I use Parallel.ForEach? When should I use PLINQ? Look at the chapter "Topic-Local State".
source to share
in other words, how does the second example account for the local connection limit? Will clients be reusable or something else?
What the second example does is instead of creating an object WebClient
for each iteration, it creates instead WebClient
for a stream. This means if it Parallel.ForEach
uses 4 threads it will create 4 instances and reuse those objects between iterations. This way, the ability to reuse the connection created by each client, rather than a new instance, which in turn will have to wait for all other clients to close.
After all, all clients are competing for the same I / O resource available through the underlying ServicePointManager.DefaultConnectionLimit
. The fewer connections you have open, the more time you have for each request to complete execution. You can also fix this by increasing the number of allowed connection limits, which is 2 by default.
Generally speaking, there is no need to use multiple threads to execute concurrent I / O requests. Parallelism doesn't really help.
source to share
By using thread-local state, we now have one WebClient per thread. More than one client per iteration.
The author's idea is that we now have less WebClient floating around and consuming resources. This argument is bogus because WebClient instances that are not currently making any call do not hold up any resource. Dispose does nothing on the WebClient. Wrap it up in use and you're done.
You should use PLINQ here because Parallel tends to spawn unlimited threads. With IO, you need to manage the DOP yourself. Only with PLINQ can you set the exact DOP. TPL cannot know how many concurrent requests your network can support.
source to share