How to ignore repeated key errors safely Using insert_many
I need to ignore duplicate inserts when using insert_many with pymongo where duplicates are index based. I saw this question asked on stackoverflow but I didn't find a helpful answer.
Here is my code snippet:
try:
results = mongo_connection[db][collection].insert_many(documents, ordered=False, bypass_document_validation=True)
except pymongo.errors.BulkWriteError as e:
logger.error(e)
I would like insert_many to ignore duplicates and not throw an exception (which fills my error logs). In addition, there is a separate exception handler that I could use so that I can just ignore errors. I am missing "w = 0" ...
thank
source to share
You can deal with this by checking the errors generated with BulkWriteError
. It is actually an "object" that has several properties. The interesting parts are in details
:
import pymongo
from bson.json_util import dumps
from pymongo import MongoClient
client = MongoClient()
db = client.test
collection = db.duptest
docs = [{ '_id': 1 }, { '_id': 1 },{ '_id': 2 }]
try:
result = collection.insert_many(docs,ordered=False)
except pymongo.errors.BulkWriteError as e:
print e.details['writeErrors']
In the first run, this will give a list of errors in e.details['writeErrors']
:
[
{
'index': 1,
'code': 11000,
'errmsg': u'E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 1 }',
'op': {'_id': 1}
}
]
In the second run, you see three errors because all the elements existed:
[
{
"index": 0,
"code": 11000,
"errmsg": "E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 1 }",
"op": {"_id": 1}
},
{
"index": 1,
"code": 11000,
"errmsg": "E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 1 }",
"op": {"_id": 1}
},
{
"index": 2,
"code": 11000,
"errmsg": "E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 2 }",
"op": {"_id": 2}
}
]
So all you need is an array filter for the records with "code": 11000
and then only "panic" when something else is in there
panic = filter(lambda x: x['code'] != 11000, e.details['writeErrors'])
if len(panic) > 0:
print "really panic"
This gives you the ability to ignore repetitive key errors, but of course paying attention to what is actually the problem.
source to share