Recognition of an obsolete member of the Mongo clan

If the mongo node has been offline for too long and the oplog covers before it appears, it may get stuck in a stale state and require manual intervention. How can I find out this state from the replica set status document? Will it be inserted into state 3, which is also used by nodes in maintenance mode, and presumably by nodes that might catch up? If so, how can I tell the difference?

From http://docs.mongodb.org/manual/reference/replica-status/ :

Number State
0      Starting up, phase 1 (parsing configuration)
1      Primary
2      Secondary
3      Recovering (initial syncing, post-rollback, stale members)
4      Fatal error
5      Starting up, phase 2 (forking threads)
6      Unknown state (the set has never connected to the member)
7      Arbiter
8      Down
9      Rollback
10     Removed

      

+3


source to share


1 answer


He will be in state 3, Recovery. To recognize an obsolete state, you need to look for a field errmsg

. When it is deprecated, the secondary will have errmsg:

"errmsg" : "error RS102 too stale to catch up"

      

In terms of full output, it will look something like this:



 rs.status()
{
        "set" : "testReplSet",
        "date" : ISODate("2013-01-29T01:39:38Z"),
        "myState" : 1,
        "members" : [
                {
                        "_id" : 0,
                        "name" : "hostname:31000",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 507,
                        "optime" : Timestamp(1359423456000, 893),
                        "optimeDate" : ISODate("2013-01-29T01:37:36Z"),
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "hostname:31001",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 483,
                        "optime" : Timestamp(1359423456000, 893),
                        "optimeDate" : ISODate("2013-01-29T01:37:36Z"),
                        "lastHeartbeat" : ISODate("2013-01-29T01:39:37Z"),
                        "pingMs" : 0
                },
                {
                        "_id" : 2,
                        "name" : "hostname:31002",
                        "health" : 1,
                        "state" : 3,
                        "stateStr" : "RECOVERING",
                        "uptime" : 4,
                        "optime" : Timestamp(1359423087000, 1),
                        "optimeDate" : ISODate("2013-01-29T01:31:27Z"),
                        "lastHeartbeat" : ISODate("2013-01-29T01:39:38Z"),
                        "pingMs" : 0,
                        "errmsg" : "error RS102 too stale to catch up"
                }
        ],
        "ok" : 1
}

      

And finally, a piece of code to print the error, only if it exists, from the shell:

rs.status().members.forEach(function printError(rsmember){if (rsmember.errmsg){print(rsmember.errmsg)}})

      

+4


source







All Articles