Create mongodb file with subdocuments atomically?

Hope I have a big brain moment. But here is my situation in the szenario scraper;

I want to be able to scratch multiple machines and cores. I have different pages on the site Front

, I scrabble (exmpl. For a stackoverflow site I would have fronts stackoverflow.com/questions/tagged/javascript and stackoverflow.com/questions/tagged/nodejs).

An article

can be on everyone Front

, and when I find an article I want to create article

if the url is unknown, if it is known that I want to record Front

in article.discover

if Front

unknown and otherwise paste mine FrontDiscovery

in the appropriate one Front

.

Here are my schematics:

FrontDiscovery = new Schema({
    _id         :{ type:ObjectId, auto:true },
    date        :{ type: Date, default:Date.now},
    dims        :{ type: Object, default:null},
    pos         :{ type: Object, default:null}
});

Front = new Schema({
    _id         :{ type:ObjectId, auto:true },
    url         :{type:String}, //front
    found       :[ FrontDiscovery ]
});

Article = new Schema({
    _id         :{ type:ObjectId, auto:true },
    url         :{ type: String , index: { unique: true } },
    site        :{ type: String },
    discover:[ Front]
});

      

The problem I am thinking about will end up running into race conditions. When two jobs (in parallel) find the same (previously unknown) article and create a new one. Yes, I have a unique index and I could handle it that way - totally frantic imho.

But let's let go further; When - for whatever reason - my 2 worker runners are scrubbing the same front at the same time and both notice there Front

is no entry for yet and create a new one by adding FrontDiscovery

, I would end up with two entries for the same Front

.

What are your strategies for getting around this situation? findByIdAndUpdate with upsert: true for each document separately? If so, how can I only pipe something to the built-in document collection and not overwrite everything else at the same time, but still create defaults if they weren't created?

Thanks for any help in guiding me in the right direction! I really hope that I will have a massive brain.

0
node.js parallel-processing mongodb mongoose database-schema


source to share


1 answer


Updating with help upsert=true

can be used to perform atomic "insert or update" ( http://docs.mongodb.org/manual/core/update/#update-operations-with-the-upsert-flag ).

For example, if we wanted a document in the Front collection with a specific url

insert exactly once, we could run something like:



db.Front.update(
    {url: 'http://example.com'},
    {$set: {
       url: 'http://example.com'},
       found: true
    }
)

      

Operations on a single document in MongoDB are always atomic. If you are doing updates that span multiple documents, then atomicity is not guaranteed. In such cases, you might ask yourself: do I really need operations to be atomic? If the answer is no, then you will likely find your way around potentially inconsistent data. If the answer is yes and you want to stick with MongoDB, check out the Two Phase Commit design pattern .

+1


source to share







All Articles
Loading...
X
Show
Funny
Dev
Pics