How to recover Zookeeper from java.io.EOFException after server crash?
How do I recover the following error after a server crash? Zookeeper won't start and the following message appears in the log multiple times.
2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:java.io.tmpdir=/tmp
2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:java.compiler=<NA>
2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:os.name=Linux
2017-05-27 01:02:08,072 [myid:] - INFO [main:Environment@100] - Server environment:os.arch=amd64
2017-05-27 01:02:08,073 [myid:] - INFO [main:Environment@100] - Server environment:os.version=3.10.0-514.16.1.el7.x86_64
2017-05-27 01:02:08,073 [myid:] - INFO [main:Environment@100] - Server environment:user.name=zookeeper
2017-05-27 01:02:08,073 [myid:] - INFO [main:Environment@100] - Server environment:user.home=/opt/zookeeper
2017-05-27 01:02:08,073 [myid:] - INFO [main:Environment@100] - Server environment:user.dir=/
2017-05-27 01:02:08,074 [myid:] - INFO [main:ZooKeeperServer@829] - tickTime set to 2000
2017-05-27 01:02:08,074 [myid:] - INFO [main:ZooKeeperServer@838] - minSessionTimeout set to -1
2017-05-27 01:02:08,074 [myid:] - INFO [main:ZooKeeperServer@847] - maxSessionTimeout set to -1
2017-05-27 01:02:08,080 [myid:] - INFO [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2181
2017-05-27 01:02:08,385 [myid:] - ERROR [main:Util@239] - Last transaction was partial.
2017-05-27 01:02:08,400 [myid:] - ERROR [main:Util@239] - Last transaction was partial.
2017-05-27 01:02:08,403 [myid:] - ERROR [main:Util@239] - Last transaction was partial.
2017-05-27 01:02:08,403 [myid:] - ERROR [main:Util@239] - Last transaction was partial.
2017-05-27 01:02:08,404 [myid:] - ERROR [main:Util@239] - Last transaction was partial.
2017-05-27 01:02:08,404 [myid:] - ERROR [main:ZooKeeperServerMain@64] - Unexpected exception, exiting abnormally
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:585)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:604)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:570)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:652)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:166)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:283)
at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:410)
at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:118)
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:119)
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:87)
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:53)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
thanks IPVP
source to share
It looks like you are facing a known Apache ZooKeeper bug. There are several different Apache JIRA issues related to this: ZOOKEEPER-1621 and ZOOKEEPER-2332 . See comments on these questions if you are interested in a root cause analysis and some possible suggested fixes.
Unfortunately, there is currently no version of Apache ZooKeeper that contains a fix for the bug. There are several possible solutions to this problem:
- Build your own local ZooKeeper build with one of the fixes attached to the related JIRA issues. Please note that these patches are not yet accepted by the ZooKeeper community, so use them at your own risk.
- Delete the damaged log file. The root cause of the problem is that the log file from the previous ZooKeeper startup was written with an incomplete header. Since the header is at the beginning of the file, and the header itself is incomplete, we can assume that there is no transaction data in the log file after this point. Therefore, it should be safely deleted without data loss.
- If it's easier, you can just reformat this ZooKeeper cluster. This might be a suitable solution if all data in your ZooKeeper installation is ephemeral and does not require long-term storage.
source to share
The solution for me was to find the log file last (which was 0 bytes long)
You will find it in the catalog version-2
ls -l -r --sort=time
-rw-r--r-- 1 chris chris 67108880 Jan 24 10:37 log.23c6a70
-rw-r--r-- 1 chris chris 0 Jan 24 10:37 log.23d3fb4
I tried first to delete the snapshot and the last 2 logfiles, which also work, but then you will have a version that is "slightly" older.
-rw-r--r-- 1 chris chris 3685904 Jan 24 00:56 snapshot.23c6a6e
Perhaps you need to delete the last snapshot file and the last log file together, and a file of length 0 would be safe.
by the way. The log file and snapshot have the same HEX pattern which must match
Magazine. 23c6a 70
snapshot. 23c6a 6e
They must be consistent and consistent, and you must fix this problem.
source to share