What is the best way to implement an echo server with asynchronous I / O and IOCP?

As we all know, an echo server is a server that reads from a socket and writes that data to another socket.

Since Windows I / O completion ports give you different ways to do things, I was wondering what is the best (most efficient) way to implement an echo server. I will definitely find someone who has tested the ways that I describe here and can give him / her input.

My classes Stream

that abstract a socket, named pipe or whatever, and IoRequest

that abstract the structure OVERLAPPED

and memory buffer for I / O (good for reading and writing, of course). This way, when I allocate IoRequest

, I just allocate memory for the memory buffer for the data structure + OVERLAPPED

in one shot, so I malloc()

only call once. In addition to this, I also implement fancy and useful things in the object IoRequest

like atomic reference counting etc.

Said that, let's explore ways to make a better echo server:

-------------------------------------------- Method A. --- ---------------------------------------

1) The reader socket finishes reading, the IOCP callback returns, and you have IoRequest

just completed with a memory buffer.

2) Copy only the buffer produced by the "reader" IoRequest

to "writer" IoRequest

. (this will include memcpy()

or whatever).

3) Let's start a new read again with ReadFile()

in the "reader" with the same IoRequest

one used for reading.

4) Let the "writer" open a new letter with WriteFile()

.

-------------------------------------------- Method B. --- ---------------------------------------

1) The reader socket finishes reading, the IOCP callback returns, and you have IoRequest

just completed with a memory buffer.

2) Instead of copying data, pass IoRequest

to "writer" for writing without copying the data with memcpy()

.

3) Now the reader needs a new one IoRequest

to continue reading, highlighting a new one, or transferring a previously selected one, perhaps one just written for writing, before a new letter happens.


So, in the first case, each object Stream

has its own IoRequest

, the data is copied using memcpy()

or similar functions, and everything works fine. In the second case, 2 objects Stream

are passing objects to IoRequest

each other without copying data, but its a bit more complicated, you need to manage the "swapping" of objects IoRequest

between 2 Stream

objects, with a possible downside to getting synchronization problems (what about these completions on different threads?)

My questions:

Q1) Don't really copy the data! Copying 2 buffers using memcpy()

or the like is very fast, also because the CPU cache is used for this purpose. Let's look at this with the first method, I have the ability to echo from the "reader" socket to multiple "writer" sockets, but with the second one I cannot do that since I have to create N new IoRequest

objects for every N writers as each one WriteFile()

needs in its own structure OVERLAPPED

.

Q2) My guess is that when I start new N records for N different sockets with WriteFile()

, I have to provide N different structures OVERLAPPED

AND N different buffers where to read the data. Or can I run N WriteFile()

calls with N different ones OVERLAPPED

, taking data from the same buffer for N sockets?

+3


source to share


1 answer


Don't really copy the data!

Depends on how much you copy. 10 bytes, not much. 10 MB, then yes, copying should be avoided!

In this case, since you already have an object containing rx data and an OVERLAPPED block, it seems somewhat pointless to copy it - just reissue it in WSASend () or whatever.

but with the second one I can't do that

      

You can, but you need to separate the IORequest class from the Buffer class. The buffer stores data, atom reference counts and any other control information for all calls, an IOrequest OVERLAPPED block and a pointer to the data, and any other control information for each call. This information can contain the atomic number int-count for the buffer object.



IOrequest is the class that is used for every dispatch call. Since it only contains a pointer to the buffer, there is no need to copy the data and therefore it is small enough and O (1) to the size of the data.

When tx padding is introduced, the handler threads receive the IOrequest, split the buffer, and divide the atomic int in it by zero. A thread that manages to hit 0 knows that the buffer object is no longer needed and can delete it (or, more likely, on a high-performance server, redirect it for later reuse).

Or, can I run N calls to WriteFile () with N different OVERLAPPED data from the same buffer for N sockets?

Yes, you can. See above.

Re. threading - of course, if your "control data" can be reached from multiple threads of the completion handler, then yes, you can protect it with a critical section, but atomic int has to do to recalculate the buffer.

+3


source







All Articles