Limiting Streams of Multithreaded Network Crawlers
I have access to an application written in .NET C # that connects (using mostly raw HTTP requests, partially web services and xml requests, still over http) to numerous external systems and updates some things in them.
There can be a lot of work in the queue at a given time, and the naive approach to increasing throughput is increasing the number of threads. The logic behind this was that since most of the time we wait for network responses, we can expect more network responses at the same time. Processes and plungers do not seem to be breaking their limits.
Still creating about 300 threads, everything is slower than with fewer threads.
I am wondering if this is the operating system limit (Windows 2012 r2 server), .NET limit (4.5) or something else? How can I determine where the bottleneck is? (as I said, cpu and ram don't seem like a problem)
I know that external systems can overload and reduce overall performance, but let's assume it's not significant.
source to share
The most likely problems are:
- As noted in the comment by AgentFire, the ServicePointManager.DefaultConnectionLimit property will limit the number of concurrent connections that can have to the same domain. The default is 2. If you are trying to get a lot of data from one server in multiple requests, you will be limited. You can increase this value if you need. Keep in mind, however, that the server may see your multiple connections as an attempt at a denial of service attack and block or throttle you.
- It looks like something in the .NET HTTP stack is single threaded. I suspect this is in DNS resolution. I found it easy to maintain 15 to 20 requests per second throughput using multiple threads. Of course it depends on the size of the documents you are uploading and the response of the servers you are communicating with, but my experience with the crawler was that naive method (one request per thread using
HttpWebRequest
), I ended up with an average of 15 to 20 requests in give me a sec.
In case # 2, I suspect DNS because I did a separate test in which I was doing DNS resolution on the domains I worked with and my average was 50 to 60 milliseconds per request. Most came back very quickly, but some took a few seconds. Also, my throughput has increased significantly when I put a large DNS cache on my LAN.
CPU and memory won't be your limiting factors. In addition to DNS resolution, you must consider your connection speed. If you have 10 concurrent connections, each downloading from the server at 1 Mbps, you are going to saturate your Internet connection with 10 Mbps. You have to look at the network bandwidth that you are using.
These are the biggest bottlenecks in my experience. You should research each one to see if one or more of the reasons for your poor performance are.
source to share
Thread creation requires some cpu and RAM, 300 thread creation allocates at least 1MB per thread plus packet allocation plus some other things.
You must use a thread pool for this. The threads in the pool have already been created and are waiting for you.
In case of long waiting for network response, you can use asynchronous IO algorythm, which doesn't require many threads.
source to share
When you have enough RAM, 300 threads are not a problem at all. Most people instinctively ridicule such architecture. The same people have never experienced it themselves. It works great. OS-induced performance issues start to hit in the 10000+ range on my Win7 system. The OS gets jerky to use.
I would prefer async IO in your case because the number of your threads is quite large. However, IO sync is probably not your problem.
How can I determine where the bottleneck is?
Check for all possible bottlenecks. Neither the central processing unit nor the RAM is one. Check network usage. Are you using a disc? Are you sure your external services are not exceeded? They can have concurrency limit.
I'm guessing you've raised the .NET connection limits ?! Find out how many requests are actually running at the same time. I would do this:
- Suspending the debugger and ensuring that there are many threads currently in the .NET thread.
- Looking at the number of open TCP connections (Process Explorer or TcpView.exe)
- Using Fiddler and seeing how many requests seem to be active at the same time.
source to share
You are viewing the problem from the wrong point of view, although in Windows the number of practical limits on the number of threads that can be created to perform a parallel task will be quite high, however, as you saw with an increase in the number of threads, performance may initially increase, but then it will decrease as for the following reasons:
- A theme is an expensive resource when and when it is created.
- More threads, there will be more switching between processor cores to serve them, assuming they have default priority and processor affinity, thus waste more time in the switching context and process all queued requests
The best way to do the same is to use Threadpool, which is also used by Parallel APIs, which is designed to optimize CPU core usage and maximize performance, here the number of parallel threads will generally be equal to the core / cpu and after processing one request they will deactivate the next one, so Thus, the advantage remains that all CPU cores are used at their optimum level and that all requests are processed in parallel to give maximum leverage.
Now the ideal option for you would be the Parallel API, which will handle all the complexity internally, but if you don't want to do it this way, then the broad rule is:
number of threads = number of cpu cores * 2
I read it in the definitive guide, couldn't get the link right now, you can try options between 1.5, 2, 2.5, 3. This will definitely improve performance, but the problem remains, how to ensure that each one go unique / a free processor that is the magic that Parallel API takes care of to perfectly balance the load for maximum performance optimization.
To add further, as suggested above, you can use Async Await (.net 4.5) code markups along with Task to asynchronously output the request and still keep your interface responsive, Howeveer Async does not always mean fast, it can be slower and so for speed refer to TPL
source to share