Java 8 continuous reading has a race condition?
now this problem has been bugging me for a while.
In a production application I'm working on, I'm using a SocketChannel in non-blocking mode to communicate with embedded devices. Now I am getting sporadically corrupted data. On some PCs this doesn't happen, now it happens on mine. But when I change the program too much, the problem goes away.
So many can have consequences. Time, network interface equipment, win7, java version, company firewall, ...
Reading data comes down to this code:
byteBuffer.compact();
socketChannel.read(byteBuffer); // <<< problem here ?
byteBuffer.flip();
if( byteBuffer.hasRemaining() ){
handleData( byteBuffer );
}
This is done on the same thread as writing, when the selector wakes up and the op OP_READ is set.
This code is the only place referenced by byteBuffer. socketChannel is only used from one stream when writing.
I measured the code so I can print the contents of the last few read () calls when an error occurred. At the same time, I am analyzing network traffic on Wireshark. I have added many statements to check the integrity of the byte buffer.
The resulting stream looks good in Wireshark. No DUP-ACK or anything else suspicious. Recent calls to read () match exactly with data in Wireshark.
In Wireshark, I see a lot of small TCP frames receiving 90 bytes of payload data in intervals like 10ms. Typically a Java stream reads data for all 10ms as well when it just arrived.
When it comes down to the problem, the Java stream presents a bit latency, as the read happens after 300ms and the read comes back at ~ 3000 bytes, which is plausible. But the data is corrupted.
The data looks like it was copied to the buffer and at the same time the received data overwrites the first data.
Now I don't know how to proceed. I cannot create a small example as this rarely happens and I do not know the exact condition that is needed.
Can someone tell me?
How can I prove if it is a Java lib or not?
What conditions might be important for viewing as well?
thank you Frank
29-June-2015:
I have now managed to create an example to reproduce.
There is one Sender and a Receiving Program .
The sender uses I / O blocking by first waiting for a connection and then sending 90 byte blocks every 2ms. The first 4-byte counter is running, the rest are not set. The sender uses setNoTcpDelay (true).
The receiver uses non-blocking IO. It first connects to the sender, then it reads the channel whenever a select key is ready for it. Sometime a read loop executes Thread.sleep (300).
If they run on the same PC via loopback, this works for me all the time. If I put Sender on another computer directly connected via LAN, it throws an error. Checking with Wireshark, traffic and sent data looks good.
To start, first start Sender on one PC, then (after editing the hostaddress) start the receiver.
While it is running, it prints a line approximately every 2 seconds. If it fails, it prints information about the last 5 read () calls.
What I found as a trigger:
- Sender configured setNoTcpDelay (true)
- The receiver sometimes has Thread.sleep (300) before executing read ().
thank you Frank
source to share
I ended up as a driver problem, or so it seems.
I used the "D-Link E-DUB100 Rev A " USB to Ethernet adapter .
Due to the wireshark showing the correct data, I thought that eliminating the hardware was a possible reason for the failure.
But so far I tried "D-Link E-DUB100 Rev C1 " and the problem went away.
So my guess is that this is a problem in the supplied drivers from D-Link for Rev A. And with Rev C1 it might be using a system driver that doesn't have this problem.
thanks for taking the time to read my question.
source to share
buf.order(ByteOrder.BIG_ENDIAN);
This is the default. Delete this.
buf.clear();
The buffer is already empty because you just allocated it. Delete this.
buf.limit(0);
The limit is already zero after clear () and also after the initial selection. Delete this.
while( true ) {
There should be a call to select () here.
Iterator<SelectionKey> it = selector.selectedKeys().iterator();
// ...
if( key == keyData && key.isConnectable() ) {
ch.finishConnect();
This method can return false. You are not handling this case.
// ...
if( key == keyData && key.isReadable() ) {
// ...
readPos += ch.read(buf);
Completely wrong. You completely ignore the case where it read()
returns -1, which means the peer is down. In this case, you must close the channel.
// without this Thread.sleep, it would not trigger the error
So? Hasn't the penny dropped? Take off the dream. It's completely pointless. select()
will block until data arrives. It doesn't need your help. This dream is literally a waste of time.
if( rnd.nextInt(20) == 0 ) {
Thread.sleep(300);
}
Delete this.
selector.select();
It should be at the top of the loop, not at the bottom.
source to share