Create mongodb file with subdocuments atomically?
Hope I have a big brain moment. But here is my situation in the szenario scraper;
I want to be able to scratch multiple machines and cores. I have different pages on the site Front
, I scrabble (exmpl. For a stackoverflow site I would have fronts stackoverflow.com/questions/tagged/javascript and stackoverflow.com/questions/tagged/nodejs).
An article
can be on everyone Front
, and when I find an article I want to create article
if the url is unknown, if it is known that I want to record Front
in article.discover
if Front
unknown and otherwise paste mine FrontDiscovery
in the appropriate one Front
.
Here are my schematics:
FrontDiscovery = new Schema({
_id :{ type:ObjectId, auto:true },
date :{ type: Date, default:Date.now},
dims :{ type: Object, default:null},
pos :{ type: Object, default:null}
});
Front = new Schema({
_id :{ type:ObjectId, auto:true },
url :{type:String}, //front
found :[ FrontDiscovery ]
});
Article = new Schema({
_id :{ type:ObjectId, auto:true },
url :{ type: String , index: { unique: true } },
site :{ type: String },
discover:[ Front]
});
The problem I am thinking about will end up running into race conditions. When two jobs (in parallel) find the same (previously unknown) article and create a new one. Yes, I have a unique index and I could handle it that way - totally frantic imho.
But let's let go further; When - for whatever reason - my 2 worker runners are scrubbing the same front at the same time and both notice there Front
is no entry for yet and create a new one by adding FrontDiscovery
, I would end up with two entries for the same Front
.
What are your strategies for getting around this situation? findByIdAndUpdate with upsert: true for each document separately? If so, how can I only pipe something to the built-in document collection and not overwrite everything else at the same time, but still create defaults if they weren't created?
Thanks for any help in guiding me in the right direction! I really hope that I will have a massive brain.
Updating with help upsert=true
can be used to perform atomic "insert or update" ( http://docs.mongodb.org/manual/core/update/#update-operations-with-the-upsert-flag ).
For example, if we wanted a document in the Front collection with a specific url
insert exactly once, we could run something like:
db.Front.update(
{url: 'http://example.com'},
{$set: {
url: 'http://example.com'},
found: true
}
)
Operations on a single document in MongoDB are always atomic. If you are doing updates that span multiple documents, then atomicity is not guaranteed. In such cases, you might ask yourself: do I really need operations to be atomic? If the answer is no, then you will likely find your way around potentially inconsistent data. If the answer is yes and you want to stick with MongoDB, check out the Two Phase Commit design pattern .