Promises and upgrade to database in bulk

I am currently parsing a list of js objects that are added to the db one by one, roughly the same with Node.js:

return promise.map(list,
    return parseItem(item)
        .then(upsertSingleItemToDB)
    ).then(all finished!)

      

The problem is that when the sizes of the list get very large (~ 3000 elements), parsing all elements in parallel is too heavy. It was very easy to add a concurrency limit with the promise library and not run out of memory this way (when / guard).

But I would also like to optimize the db upserts since mongodb offers the bulkWrite function. Since parsing and bulk writing all the elements at the same time is not possible, I will need to split the original list of objects into smaller sets that are parsed with promises in parallel, and then the array of the results of this set will be passed to the promising bulkWrite file.And this will be repeated for the rest of the sets if list items.

I find it hard to wrap my head around how I can structure smaller sets of promises so that I only do one set of parseSomeItems-BulkUpsertThem at a time (something like Promise.all ([set1Bulk] [set2Bulk]) where set1Bulk is another array parallel Promises parser?), any pseudocode help would be appreciated (but I use if it matters).

+3


source to share


2 answers


It might look something like this using mongoose and a basic nodejs-mongodb-driver:



const saveParsedItems = items => ItemCollection.collection.bulkWrite( // accessing underlying driver
   items.map(item => ({
      updateOne: {
           filter: {id: item.id}, // or any compound key that makes your items unique for upsertion
           upsert: true,
           update: {$set: item} // should be a key:value formatted object
      }
   }))
);


const parseAndSaveItems = (items, offset = 0, limit = 3000) => { // the algorithm for retrieving items in batches be anything you want, basically
  const itemSet = items.slice(offset, limit);
  
  return Promise.all(
    itemSet.map(parseItem) // parsing all your items first
  )
    .then(saveParsedItems)
    .then(() => {
      const newOffset = offset + limit;
      if (items.length >= newOffset) {
        return parseAndSaveItemsSet(items, newOffset, limit);
      }
      
      return true;
    });
};

return parseAndSaveItems(yourItems);
      

Run codeHide result


+1


source


The first answer looks complete. However, here are some thoughts that come to mind.

As a hack, you can call the timeout function in the write operation callback before the next write operation is performed. This can lead to processor and memory tearing between calls. Even if you add one millisecond between calls, it is only an addition of 3 seconds if you have a total of 3000 record objects.



Or, you can segment your insertObjects array and post them to your own bulk writer.

+1


source







All Articles