Debug network performance with minimal Winsock2 application
I have a very basic Winsock2 TCP client - complete list below - that just blows up a bunch of bytes. However, it is very slow over the network; the data just leaks out.
Here's what I tried and found (both Windows PCs are on the same local network):
- Launching this app from one computer to another is slow - it takes ~ 50 seconds to send 8MB.
- Two different servers - netcat and a specially written one (as simple as the client below) gave the same results.
- taskmgr shows how CPU and network are mostly used.
- Running this app with a server on one computer is fast - it takes ~ 1-2 seconds to send 8MB.
- Another client, netcat, works just fine - it takes ~ 7s to send 20MB of data. (I used the nc that ships with Cygwin.)
- Changing the buffer size (1 * 4096, 16 * 4096, and 128 * 4096) didn't make a big difference.
- Running pretty much the same code on Linux boxes on a different LAN worked fine.
- Adding a few print statements around the call
send
shows that we spend most of our time blocking. - On the server side, we see a bunch of receiving blocks <= 4K (no matter what size of buffers the sender is sending). However, this also happens with other clients such as netcat, which runs at full speed.
Any ideas? Thanks in advance for any advice.
#include <winsock2.h>
#include <iostream>
using namespace std;
enum { bytecount = 8388608 };
enum { bufsz = 16*4096 };
int main(int argc, TCHAR* argv[])
{
WSADATA wsaData;
WSAStartup(MAKEWORD(2,2), &wsaData);
struct sockaddr_in sa;
memset(&sa, 0, sizeof sa);
sa.sin_family = AF_INET;
sa.sin_port = htons(9898);
sa.sin_addr.s_addr = inet_addr("157.54.144.70");
if (sa.sin_addr.s_addr == -1) {
cerr << "inet_addr: " << WSAGetLastError() << endl;
return 1;
}
char *blob = new char[bufsz];
for (int i = 0; i < bufsz; ++i) blob[i] = (char) i;
SOCKET s = socket(AF_INET, SOCK_STREAM, IPPROTO_IP);
if (s == INVALID_SOCKET) {
cerr << "socket: " << WSAGetLastError() << endl;
return 1;
}
int res = connect(s, reinterpret_cast<sockaddr*>(&sa), sizeof sa);
if (res != 0) {
cerr << "connect: " << WSAGetLastError() << endl;
return 1;
}
int sent;
for (int j = 0; j < bytecount; j += sent) {
sent = send(s, blob, bufsz, 0);
if (sent < 0) {
cerr << "send: " << WSAGetLastError() << endl;
return 1;
}
}
closesocket(s);
return 0;
}
source to share
I looked at the packets using Microsoft Network Monitor (netmon) with the nice TCP Analyzer and it turns out that tons of packets are being lost and need to be retransmitted - hence slow speeds due to retransmission timeouts (RTOs).
A colleague helped me debug this:
Well, from this trace on the receiver side, it definitely looks like some packets are not reaching the receiver. I can also see what some malformed packets look like (such as partial TCP headers, etc.) in these traces.
Even in the "good" trace (receiver view for netcat client) I see some malformed packets (wrong TCP data length, etc.). However, errors are as frequent as on the other track.
Given that these computers are on the same subnet, the router does not work as it would drop packets. This leaves two network cards, Ethernet cables, and Ethernet switches. You can try to isolate the bad machine by adding a third machine to the mix and try the same test with a new machine replacing first the sender and then the receiver. Use a different physical port for the third machine. If one of the original machines has a switch between it and the floor socket, try removing that switch from the equation. You can also try an Ethernet reversing cable between the original two computers (or another Ethernet switch through which you connect two machines directly) and see if the problem goes away.
Since the issue is related to the contents of the package, I doubt the issue is related to the cables. Considering the sender has an NVIDIA nForce Ethernet chipset and the recipient has Broadcom Ethernet, my money is on the culprit side of the senders. If this looks like a problem with a specific network adapter, try disabling special functions of the network adapter such as Checksum Offload or Large Transfer Offload.
I tried using a third box as sender (identical to the original sender, Shuttle XPC with nForce chipset) and it worked smoothly - TCP Analyzer showed very smooth TCP sessions. This suggests that the problem is actually due to a NIC / driver error in the original sender field or a bad Ethernet cable.
source to share
Here's what you can do to get the best image.
- You can check how much time it spends inside "connect", "send" API calls. You can see if calling the connection is the problem. You can do this with a profiler, but if your application is very slow, you should be able to see it while debugging.
- Try running Wireshark (or Ethereal) to drop your network traffic so you can see that TCP packets are being transmitted with some reputation. If the answers come quickly, it only depends on your system. If you find delays then it is a routing / network issue.
- You can run "print route" to check how your computer is sending traffic to the destination computer (157.54.144.70). You will be able to see if a gateway is in use and check the routing priority for different routes.
- Try sending small chunks. (I mean changing "bufsz" to 1024). Is there any correlation between performance and buffer size?
- Check if you have installed antivirus software, firewalls? Be sure to turn it off. You can try running the same application in Safe Mode with Networking.
source to share
The app looks great and you said it works fine with linux. I don't know if it helps you, but I would compare - 1) mtu values ββfor windows with linux system. 2) checked tcp input memory size on Windows and Linux. 3) checks if the network card speeds of both systems are the same.
source to share