How to make a TCP connection pool in C / C ++

I am developing a distributed server / client system with C ++ where many clients send a request to many servers over TCP and server, throwing a thread to process the request and send its response. In my case of using a limited number of clients, there will be access to the server and I need very high performance. The data sent from the client and server are all small, but very frequent. Thus, making the connection and breaking it after use is expensive. Therefore, I want to use caching to solve this problem: after the connection is created, it will be stored in the cache for future use (assuming the number of clients does not exceed the cache size).

My question is:

  • I noticed someone said that pooling connections is a client side method. If this connection pool is used only on the client side, then the first time it makes a connection to the server and sends data. This connection action calls the accept () function on the server side, which returns the socket to accept from the client. So when a client wants to use an existing connection (in the cache), it doesn't establish a new connection, it just sends data. The problem is, if you don't connect, who will run accept () on the server side and throw the stream?
  • If pooling also needs to be implemented on the server side, how can I know where the request came from? Since only from accept () I can get the client's address, but in the meantime accept () is already making a new socket for this request, so there is no point in using a cached connection.

Any answer and suggestion would be appreciated. Or can anyone give me an example of connection pooling or connection caching?


source to share

1 answer

Please note that I intend to leave this answer incomplete, reflecting a similar project that I have devoted immensely to efforts in the past, but lost progress due to unrelated circumstances. Watch this space for future updates!

I noticed that someone said that pooling is a client-side method .... if not connect, who will call accept () on the server-side and throw the stream?

First, pooling is not just a client-side method; it is a connection mode technology. It applies to both types of peers ("server" and "client").

Second, you accept

don't need to call to start the thread. Programs can run threads for whatever reason they like ... They can run threads to run more threads in a massive parallel thread creating threads. (edit: we call this a "fork")

Finally, an efficient thread pooling implementation does not start a thread for every client. Each thread usually takes between 512KB-4MB (counting stack space and other contextual information), so if you have 10,000 clients each taking up so much, that much wasted memory.

I want to do this, but just don't know how to do it in the case of multithreading.

You shouldn't be using multithreading here ... At least until you have a solution that uses a single thread and decide it's not fast enough. You do not have this information at the moment; you are just guessing and guessing does not guarantee optimization.

At the turn of the century, there were FTP servers that solved the C10K problem ; they could handle 10,000 clients at any given time, browse, download, or be idle as users tend to work on FTP servers. They solved this problem not with threads , but with non-blocking and / or asynchronous sockets and / or calls .

To clarify, these web servers were handling thousands of connections in a single thread ! One of the typical ways is to use select

, but I don't particularly like this method because it requires a rather ugly series of loops. I prefer to use ioctlsocket

for Windows and fcntl

other POSIX OSs to set the file descriptor to non-blocking mode, e.g .:

#ifdef WIN32
ioctlsocket(fd, FIONBIO, (u_long[]){1});
fcntl(fd, F_SETFL, fcntl(fd, F_GETFL, 0) | O_NONBLOCK);


At this moment recv

and read

will not be blocked when working on fd

; if there is no data available, they return an error value immediately, rather than waiting for data to arrive. This means that you can loop over multiple sockets.

If connection pooling also needs to be implemented on the server side, how can I know where the request came from?

Keep the client fd

alongside yours struct sockaddr_storage

and any other state information that you should keep about clients, in struct

which you state but you feel. If that ends up with 4 KB (that's pretty big struct

, usually about as big as needed) then 10,000 of that would take up roughly 40,000 KB (~ 40 MB). Even mobile phones today shouldn't have any problem with this. Think about how to execute the following code for your needs:

struct client {
    struct sockaddr_storage addr;
    socklen_t addr_len;
    int fd;
    /* Other stateful information */

#define BUFFER_SIZE 4096
#define CLIENT_COUNT 10000

int main(void) {
    int server;
    struct client client[CLIENT_COUNT] = { 0 };
    size_t client_count = 0;
    /* XXX: Perform usual bind/listen */
    #ifdef WIN32
    ioctlsocket(server, FIONBIO, (u_long[]){1});
    fcntl(server, F_SETFL, fcntl(server, F_GETFL, 0) | O_NONBLOCK);

    for (;;) {
        /* Accept connection if possible */
        if (client_count < sizeof client / sizeof *client) {
            struct sockaddr_storage addr = { 0 };
            socklen_t addr_len = sizeof addr;
            int fd = accept(server, &addr, &addr_len);
            if (fd != -1) {
#               ifdef WIN32
                ioctlsocket(fd, FIONBIO, (u_long[]){1});
#               else
                fcntl(fd, F_SETFL, fcntl(fd, F_GETFL, 0) | O_NONBLOCK);
#               endif
                client[client_count++] = (struct client) { .addr = addr
                                                         , .addr_len = addr_len
                                                         , .fd = fd };
        /* Loop through clients */
        char buffer[BUFFER_SIZE];
        for (size_t index = 0; index < client_count; index++) {
            ssize_t bytes_recvd = recv(client[index].fd, buffer, sizeof buffer, 0);
#           ifdef WIN32
            int closed = bytes_recvd == 0
                      || (bytes_recvd < 0 && WSAGetLastError() == WSAEWOULDBLOCK);
#           else
            int closed = bytes_recvd == 0
                      || (bytes_recvd < 0 && errno == EAGAIN) || errno == EWOULDBLOCK;
#           endif
            if (closed) {
                memmove(client + index, client + index + 1, (client_count - index) * sizeof client);
            /* XXX: Process buffer[0..bytes_recvd-1] */

        sleep(0); /* This is necessary to pass control back to the kernel,
                   * so it can queue more data for us to process


Suppose you want to merge connections on the client side, the code will look very similar, except, obviously, there will be no need for the accept

associated code. Assuming you have an array client

that you want to use connect

, you can use non-blocking connection calls to make all connections at the same time like this:

size_t index = 0, in_progress = 0;
for (;;) {
    if (client[index].fd == 0) {
        client[index].fd = socket(/* TODO */);
#       ifdef WIN32
        ioctlsocket(client[index].fd, FIONBIO, (u_long[]){1});
#       else
        fcntl(client[index].fd, F_SETFL, fcntl(client[index].fd, F_GETFL, 0) | O_NONBLOCK);
#       endif
#   ifdef WIN32
    in_progress += connect(client[index].fd, (struct sockaddr *) &client[index].addr, client[index].addr_len) < 0
                && (WSAGetLastError() == WSAEALREADY
                ||  WSAGetLastError() == WSAEWOULDBLOCK
                ||  WSAGetLastError() == WSAEINVAL);
#   else
    in_progress += connect(client[index].fd, (struct sockaddr *) &client[index].addr, client[index].addr_len) < 0
                && (errno == EALREADY
                ||  errno == EINPROGRESS);
#   endif
    if (++index < sizeof client / sizeof *client) {
    index = 0;
    if (in_progress == 0) {
    in_progress = 0;


As far as optimization goes, given that this should be able to handle 10,000 clients, perhaps a few minor tweaks, you won't need multiple threads.

However, by binding items from a collection mutex

with client

and before a non-blocking socket operation with a non-blocking socket pthread_mutex_trylock

, the above loops can be adapted to run concurrently on multiple threads while processing the same socket group. This provides a working model for all POSIX-compliant platforms, be it Windows, BSD, or Linux, but it is not optimal. To achieve optimality, we must move into an asynchronous world that varies from system to system:

It might pay to codify the "non-blocking socket" abstraction mentioned earlier, since the two asynchronous mechanisms differ significantly with respect to their interface. Like everything else, unfortunately, we have to write abstractions so that our Windows conforming code remains legible on POSIX-compliant systems. As a bonus, this will allow us to combine server processing (i.e. accept

, everything that follows) with the client processing (i.e. connect

, everything that follows), so our server loop can become a client (or vice versa).



All Articles