AWS - EC2 - MongoDB Replica Time Synchronization Issue - NTP - Replication Lag
We are facing clock drift issues with our MongoDB replica set running on AWS. This seems to have started recently after we added additional data to the dataset, before that we did not notice this problem unless the system was under heavy load. The following error is logged sporadically in the mongod.log file and the system is not under load.
To test this, we isolated a set of machines with the same dataset and did not use our web application, although the error still occurs;
2014-12-12T13: 33: 51.333 + 0000 [rsBackgroundSync] change target sync because the current sync target of the latest OpTime is 12 Dec 13: 32: 42: c which is more than 30 seconds behind member mongo1: 27017 whose last OpTime is 1418391230
From the above, the timestamp shows that one of the members of the mongodb replica set is within a minute. The worst thing we've seen is 12 minutes from sync.
This error, in turn, causes a replication lag and we get notified of this from the Mongo Monitoring Service, although it fixes itself.
The setup is 3 x r3.xlarge
AWS Linux Instances, 1 in each Zone Availability Zone EU-West-1A
. The machines have been configured using Mongo's recommended settings using the Raid array and scripts cloud formation
provided by Mongo. The data is about 4 GB.
We believe the issue is due to synchronization NTP
, by default on AWS Linux Amazon Machine Image the ntpd service is configured to a pool of aws ntp servers hosted on www.pool.ntp.org
.
To try and eliminate this, we set up our own NTP server on AWS that MongoDB servers can sync with. The problem still happened, so we changed the maxpoll and minpoll times for the ntpd service on the mongo machines to sync the time every 16 seconds
with the NTP server, but the error still occurs.
We increased the size of the MongoDB OpLog to see if that changes, but it doesn't.
Anyone else running into this type of problem? Is there something we are missing?
Greetings,
Colin.
ps -ef | grep ntp;
mongodb1
ntp 5163 1 0 Dec11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 15865 15839 0 09:31 pts/2 00:00:00 grep ntp
mongodb2
ntp 4834 1 0 Dec11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 19056 19029 0 09:31 pts/0 00:00:00 grep ntp
mongodb3
ntp 5795 1 0 Dec11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 26199 26173 0 09:31 pts/0 00:00:00 grep ntp
cat / etc / ntp.conf;
# For more information about this file, see the man pages
# ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).
driftfile /var/lib/ntp/drift
# Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery
# Permit all access over the loopback interface. This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1
restrict -6 ::1
# Hosts on local network are less restricted.
#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server 0.amazon.pool.ntp.org iburst dynamic
#server 1.amazon.pool.ntp.org iburst dynamic
#server 2.amazon.pool.ntp.org iburst dynamic
#server 3.amazon.pool.ntp.org iburst dynamic
server time-server.domain.com iburst
#broadcast 192.168.1.255 autokey # broadcast server
#broadcastclient # broadcast client
#broadcast 224.0.1.1 autokey # multicast server
#multicastclient 224.0.1.1 # multicast client
#manycastserver 239.255.254.254 # manycast server
#manycastclient 239.255.254.254 autokey # manycast client
# Enable public key cryptography.
#crypto
includefile /etc/ntp/crypto/pw
# Key file containing the keys and key identifiers used when operating
# with symmetric key cryptography.
keys /etc/ntp/keys
# Specify the key identifiers which are trusted.
#trustedkey 4 8 42
# Specify the key identifier to use with the ntpdc utility.
#requestkey 8
# Specify the key identifier to use with the ntpq utility.
#controlkey 8
# Enable writing of statistics records.
#statistics clockstats cryptostats loopstats peerstats
# Enable additional logging.
logconfig =clockall =peerall =sysall =syncall
# Listen only on the primary network interface.
interface listen eth0
interface ignore ipv6
ntpq -npcrv;
remote refid st t when poll reach delay offset jitter
==============================================================================
*172.31.14.137 91.*.*.* 3 u 557 1024 377 1.121 -0.264 0.161
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.6p5@1.2349-o Sat Mar 23 00:37:31 UTC 2013 (1)",
processor="x86_64", system="Linux/3.14.23-22.44.amzn1.x86_64", leap=00,
stratum=4, precision=-23, rootdelay=23.597, rootdisp=109.962,
refid=172.31.14.137,
reftime=d83a757a.175b5fa1 Tue, Dec 16 2014 9:10:18.091,
clock=d83a77a7.82431efa Tue, Dec 16 2014 9:19:35.508, peer=27361,
tc=10, mintc=3, offset=-0.264, frequency=-13.994, sys_jitter=0.000,
clk_jitter=0.358, clk_wander=0.053
source to share