Speed up tcp Loopback connection
I am trying to send some bytes to a third party application (running on the same server) using tcp loopback connection using the following code.
struct sockaddr_in serv_addr;
struct hostent *server;
int sockfd = socket(PF_INET, SOCK_STREAM, 0);
server = gethostbyname(host_address);
bzero((char *) &serv_addr, sizeof (serv_addr));
serv_addr.sin_family = AF_INET;
bcopy((char *) server->h_addr, (char *) &serv_addr.sin_addr.s_addr, server->h_length);
/**** Port No. Set ****/
serv_addr.sin_port = htons(portno);
int sockKeepAliveOption = 1;
int al = setsockopt(sockfd, SOL_SOCKET, SO_KEEPALIVE, (void*) &sockKeepAliveOption, sizeof (sockKeepAliveOption));
if (al == -1) {
std::cout << "Setsocket option err: SO_KEEPALIVE --unable to set keep alive tcp connection." << std::endl;
}
else {
std::cout << "S0_KEEPALIVE set, with SOL_SOCKET.. . ..\n" << std::endl;
}
I am sending 400 bytes at a time and I am sending 100 times per second. I am using the following code to send
int n = send(sockfd,sendB,400, ONLOAD_MSG_WARM);
My problem is I am getting high jitter. I get a minimum latency of 3 us, avg 7 us and max 19 us. How can I optimize it?
thank
Edit 08/28/2014.
Let me add a few more details. I am also receiving data from the same port on a different thread, but after sending. I also assign one core to each thread, following the code, and all cpu except core 0 are isolated from the scheduler.
thread1= new std::thread(myfunction, input1, input2);
pthread_t thread_hnd = thread1->native_handle();
CPU_SET(5, &cpuset);
s = pthread_setaffinity_np(thread_hnd, sizeof (cpu_set_t), &cpuset);
I get a good number (3 or 4 us) when I send continuously every 1ms, but if the frequency is less (say 1-5 seconds) then for a while I get about 20, but the avg is about 7 us.
Can listening and sending on the same port from a different thread create jitter?
2ND Edit 08/28/2014.
Here is my processor state. It won't be C3. Core 2 [7] is the stream from which I send the data through the loop back.
Cpu speed from cpuinfo 3499.00Mhz
True Frequency (without accounting Turbo) 3499 MHz
Socket [0] - [physical cores=6, logical cores=6, max online cores ever=6]
CPU Multiplier 35x || Bus clock frequency (BCLK) 99.97 MHz
TURBO ENABLED on 6 Cores, Hyper Threading OFF
Max Frequency without considering Turbo 3598.97 MHz (99.97 x [36])
Max TURBO Multiplier (if Enabled) with 1/2/3/4/5/6 cores is 38x/37x/36x/36x/36x/36x
Real Current Frequency 3600.17 MHz (Max of below)
Core [core-id] :Actual Freq (Mult.) C0% Halt(C1)% C3 % C6 % Temp
Core 1 [0]: 3600.17 (36.01x) 1.08 98.9 0 0 41
Core 2 [1]: 3595.44 (35.96x) 1.07 98.9 0 0 46
Core 3 [2]: 3595.28 (35.96x) 1 99.1 0 0 40
Core 4 [3]: 3599.01 (36.00x) 1 99.9 0 0 46
Core 5 [4]: 3599.51 (36.01x) 0 100 0 0 50
Core 6 [5]: 3598.97 (36.00x) 100 0 0 0 56
Socket [1] - [physical cores=6, logical cores=6, max online cores ever=6]
CPU Multiplier 35x || Bus clock frequency (BCLK) 99.97 MHz
TURBO ENABLED on 6 Cores, Hyper Threading OFF
Max Frequency without considering Turbo 3598.97 MHz (99.97 x [36])
Max TURBO Multiplier (if Enabled) with 1/2/3/4/5/6 cores is 38x/37x/36x/36x/36x/36x
Real Current Frequency 3600.12 MHz (Max of below)
Core [core-id] :Actual Freq (Mult.) C0% Halt(C1)% C3 % C6 % Temp
Core 1 [6]: 3598.97 (36.00x) 100 0 0 0 56
Core 2 [7]: 3598.51 (36.00x) 1.12 98.8 0 0 49
Core 3 [8]: 3599.98 (36.01x) 1.94 98 0 0 45
Core 4 [9]: 3598.97 (36.00x) 100 0 0 0 56
Core 5 [10]: 3599.48 (36.01x) 1 99.9 0 0 48
Core 6 [11]: 3600.12 (36.01x) 3.44 96.5 0 0 45
C0 = Processor running without halting
C1 = Processor running with halts (States >C0 are power saver)
C3 = Cores running with PLL turned off and core cache turned off
C6 = Everything in C3 + core state saved to last level cache
Above values in table are in percentage over the last 1 sec
[core-id] refers to core-id number in /proc/cpuinfo
source to share
First of all, there are methods to possibly speed this up, but that won't necessarily solve the jitter. Most of the speed optimizations also rely on handling asynchronous sockets and mostly help when receiving data, less when sending data.
What can help to set the parameter TCP_NODELAY
. This ensures that packets are sent as quickly as possible by disabling the Nagle algorithm . Essentially, Nagle tries to add multiple TCP buffers to a single packet to maximize throughput at the expense of latency / jitter.
Also, remember that syncing at such a low resolution is tricky at best. Double check the timer resolution ( clock_getres
) and be aware that any system interruption and process scheduling can affect timing. Your actual jitter may be better than you.
source to share
Can you try sched_setaffinity(2)
the network stream? If your code is single threaded it will be easier to use a wrapper around it taskset(1)
.
Moreover, it would be better to boot Linux with a parameter isolcpus
so that other irrelevant processes don't bother your experiment.
Updating state C
Is it possible your cpu is sleeping too much (> = C3)?
This tool can be useful when monitoring the state of C:
You might want to change a kernel parameter intel_idle.max_cstate
or something similar, depending on your processor and kernel version.
source to share