Speed ​​up tcp Loopback connection

I am trying to send some bytes to a third party application (running on the same server) using tcp loopback connection using the following code.

struct sockaddr_in serv_addr;
struct hostent *server;
int sockfd = socket(PF_INET, SOCK_STREAM, 0);
server = gethostbyname(host_address);

bzero((char *) &serv_addr, sizeof (serv_addr));
serv_addr.sin_family = AF_INET;

bcopy((char *) server->h_addr, (char *) &serv_addr.sin_addr.s_addr, server->h_length);

/**** Port No. Set   ****/
serv_addr.sin_port = htons(portno);
int sockKeepAliveOption = 1;
int al = setsockopt(sockfd, SOL_SOCKET, SO_KEEPALIVE, (void*) &sockKeepAliveOption, sizeof (sockKeepAliveOption));
if (al == -1) {
    std::cout << "Setsocket option err: SO_KEEPALIVE --unable to set keep alive tcp connection." << std::endl;
} 
else {
    std::cout << "S0_KEEPALIVE set, with SOL_SOCKET.. . ..\n" << std::endl;
}

      

I am sending 400 bytes at a time and I am sending 100 times per second. I am using the following code to send

int n = send(sockfd,sendB,400, ONLOAD_MSG_WARM); 

      

My problem is I am getting high jitter. I get a minimum latency of 3 us, avg 7 us and max 19 us. How can I optimize it?

thank

Edit 08/28/2014.

Let me add a few more details. I am also receiving data from the same port on a different thread, but after sending. I also assign one core to each thread, following the code, and all cpu except core 0 are isolated from the scheduler.

        thread1= new std::thread(myfunction, input1, input2);
        pthread_t thread_hnd = thread1->native_handle();
        CPU_SET(5, &cpuset);
        s = pthread_setaffinity_np(thread_hnd, sizeof (cpu_set_t), &cpuset);

      

I get a good number (3 or 4 us) when I send continuously every 1ms, but if the frequency is less (say 1-5 seconds) then for a while I get about 20, but the avg is about 7 us.

Can listening and sending on the same port from a different thread create jitter?

2ND Edit 08/28/2014.

Here is my processor state. It won't be C3. Core 2 [7] is the stream from which I send the data through the loop back.

 Cpu speed from cpuinfo 3499.00Mhz
 True Frequency (without accounting Turbo) 3499 MHz

 Socket [0] - [physical cores=6, logical cores=6, max online cores ever=6]
 CPU Multiplier 35x || Bus clock frequency (BCLK) 99.97 MHz
 TURBO ENABLED on 6 Cores, Hyper Threading OFF
 Max Frequency without considering Turbo 3598.97 MHz (99.97 x [36])
 Max TURBO Multiplier (if Enabled) with 1/2/3/4/5/6 cores is  38x/37x/36x/36x/36x/36x
 Real Current Frequency 3600.17 MHz (Max of below)
    Core [core-id]  :Actual Freq (Mult.)      C0%   Halt(C1)%  C3 %   C6 %  Temp
    Core 1 [0]:       3600.17 (36.01x)      1.08    98.9       0       0    41
    Core 2 [1]:       3595.44 (35.96x)      1.07    98.9       0       0    46
    Core 3 [2]:       3595.28 (35.96x)         1    99.1       0       0    40
    Core 4 [3]:       3599.01 (36.00x)         1    99.9       0       0    46
    Core 5 [4]:       3599.51 (36.01x)         0     100       0       0    50
    Core 6 [5]:       3598.97 (36.00x)       100       0       0       0    56

  Socket [1] - [physical cores=6, logical cores=6, max online cores ever=6]
  CPU Multiplier 35x || Bus clock frequency (BCLK) 99.97 MHz
  TURBO ENABLED on 6 Cores, Hyper Threading OFF
  Max Frequency without considering Turbo 3598.97 MHz (99.97 x [36])
  Max TURBO Multiplier (if Enabled) with 1/2/3/4/5/6 cores is  38x/37x/36x/36x/36x/36x
  Real Current Frequency 3600.12 MHz (Max of below)
    Core [core-id]  :Actual Freq (Mult.)      C0%   Halt(C1)%  C3 %   C6 %  Temp
    Core 1 [6]:       3598.97 (36.00x)       100       0       0       0    56
    Core 2 [7]:       3598.51 (36.00x)      1.12    98.8       0       0    49
    Core 3 [8]:       3599.98 (36.01x)      1.94      98       0       0    45
    Core 4 [9]:       3598.97 (36.00x)       100       0       0       0    56
    Core 5 [10]:      3599.48 (36.01x)         1    99.9       0       0    48
    Core 6 [11]:      3600.12 (36.01x)      3.44    96.5       0       0    45

 C0 = Processor running without halting
 C1 = Processor running with halts (States >C0 are power saver)
 C3 = Cores running with PLL turned off and core cache turned off
 C6 = Everything in C3 + core state saved to last level cache
 Above values in table are in percentage over the last 1 sec
 [core-id] refers to core-id number in /proc/cpuinfo

      

+3


source to share


2 answers


First of all, there are methods to possibly speed this up, but that won't necessarily solve the jitter. Most of the speed optimizations also rely on handling asynchronous sockets and mostly help when receiving data, less when sending data.

What can help to set the parameter TCP_NODELAY

. This ensures that packets are sent as quickly as possible by disabling the Nagle algorithm . Essentially, Nagle tries to add multiple TCP buffers to a single packet to maximize throughput at the expense of latency / jitter.



Also, remember that syncing at such a low resolution is tricky at best. Double check the timer resolution ( clock_getres

) and be aware that any system interruption and process scheduling can affect timing. Your actual jitter may be better than you.

+1


source


Can you try sched_setaffinity(2)

the network stream? If your code is single threaded it will be easier to use a wrapper around it taskset(1)

.

Moreover, it would be better to boot Linux with a parameter isolcpus

so that other irrelevant processes don't bother your experiment.

Updating state C

Is it possible your cpu is sleeping too much (> = C3)?



This tool can be useful when monitoring the state of C:

You might want to change a kernel parameter intel_idle.max_cstate

or something similar, depending on your processor and kernel version.

+1


source







All Articles