How to ignore repeated key errors safely Using insert_many

Question

How to ignore repeated key errors safely Using insert_many

I need to ignore duplicate inserts when using insert_many with pymongo where duplicates are index based. I saw this question asked on stackoverflow but I didn't find a helpful answer.

Here is my code snippet:

try:
    results = mongo_connection[db][collection].insert_many(documents, ordered=False, bypass_document_validation=True)
except pymongo.errors.BulkWriteError as e:
    logger.error(e)

I would like insert_many to ignore duplicates and not throw an exception (which fills my error logs). In addition, there is a separate exception handler that I could use so that I can just ignore errors. I am missing "w = 0" ...

thank

+3

python mongodb pymongo

vgoklani June 30. '17 at 3:48

source to share

1 answer

Neil lunn · Accepted Answer · 2017-06-30T04:45:09+0000

You can deal with this by checking the errors generated with BulkWriteError

. It is actually an "object" that has several properties. The interesting parts are in details

:

import pymongo
from bson.json_util import dumps
from pymongo import MongoClient
client = MongoClient()
db = client.test

collection = db.duptest

docs = [{ '_id': 1 }, { '_id': 1 },{ '_id': 2 }]


try:
  result = collection.insert_many(docs,ordered=False)

except pymongo.errors.BulkWriteError as e:
  print e.details['writeErrors']

In the first run, this will give a list of errors in e.details['writeErrors']

:

[
  { 
    'index': 1,
    'code': 11000, 
    'errmsg': u'E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 1 }', 
    'op': {'_id': 1}
  }
]

In the second run, you see three errors because all the elements existed:

[
  {
    "index": 0,
    "code": 11000,
    "errmsg": "E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 1 }", 
    "op": {"_id": 1}
   }, 
   {
     "index": 1,
     "code": 11000,
     "errmsg": "E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 1 }",
     "op": {"_id": 1}
   },
   {
     "index": 2,
     "code": 11000,
     "errmsg": "E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 2 }",
     "op": {"_id": 2}
   }
]

So all you need is an array filter for the records with "code": 11000

and then only "panic" when something else is in there

panic = filter(lambda x: x['code'] != 11000, e.details['writeErrors'])

if len(panic) > 0:
  print "really panic"

This gives you the ability to ignore repetitive key errors, but of course paying attention to what is actually the problem.

How to ignore repeated key errors safely Using insert_many

More articles: