Node.js parallel execution

I am trying to learn parallel execution in node.js. I wrote below example code. However, the output is consistent. First, 0..99 is printed, then 100..200.

I understand that this is because node.js is essentially single-threaded and inside the loop, the stream is captured by the for loop.

What am I trying to figure out in which cases this structure flow.parallel

is useful? Any request to I / O or database will be asynchronous anyway in node.js. Then why should we flow.parallel

?

var flow = require('nimble');


flow.parallel([

    function a(callback)
    {
        for(var i=0;i<100;++i)
        {
            console.log(i);

        }
            callback();
    },
    function b(callback)
    {

        for (var i=100;i<200;++i)
        {
            console.log(i);

        }
        callback();
    }
    ]);

      

+3


source to share


2 answers


Most of the time, using a parallel stream like this, you won't be printing a bunch of numbers in a for loop (which happens to block execution). When you register your functions, they are registered in the same order in which you defined them in this array, going to parallel

. In the above case, function a

first and function b

second. Hence, the Node event loop will call a()

first, then b()

at an undisclosed time later. Since we know that for-loops are blocking and Node is running on the same thread, it must terminate the entire for loop inside the loop a()

and finally return before the Node loop takes control of it again, where it b()

expects the queues to execute. similarly.

Why is a parallel flow control design needed? By design, you shouldn't be doing blocking operations in Node (see example). a()

consumes the entire thread, then b()

will consume the entire thread before anything else happens.

a()  b()
 |
 |
 |
 |
RET
     |
     |
     |
     |
    RET

      

Now say that you are creating a web application where a user can register and upload an image at the same time. Your user registration might have a code like this:

var newUser = {
  username: 'bob',
  password: '...', 
  email: 'bob@example.com',
  picture: '20140806-210743.jpg'
}

var file = path.join(img.IMG_STORE_DIR, newUser.picture);

flow.parallel([
  function processImage(callback) {
    img.process(function (err) {
      if (err) return callback(err); 

      img.save(file, function (err) {
        return callback(err); // err should be falsey if everything was good
      })
    });
  },
  function dbInsert(callback) {
    db.doQuery('insert', newUser, function (err, id) {
      return callback(err);
    });
  }
], function () {
  // send the results to the user now to let them know they are all registered! 
});

      

Internal functions are not blocked here and both require processing or networking. They are, however, fairly independent of each other. You don't need someone to finish for another to start. Inside functions that we can't see the code with, they use more asynchronous calls with function callbacks, each passing a different element for Node to handle. Node will try to flush the queue by balancing the load evenly between CPU cycles.



Hopefully something like this is happening now:

a = processImage
b = dbInsert
a()  b()
 |
      |
 |
      |
 |   
      |
 |
RET   |
     RET

      

If we had them sequentially, i.e. you have to wait for the image to be fully processed before inserting the db, you have to wait a lot. If the IO is really high on your system, Node will squeeze the thumbs waiting in the OS. By contrast, using parallel theoretically allows slower operations for faster ones.

If Node does it on its own, why do we really need it? The key is in the second argument you omitted.

nimble.parallel([a,b], function () {
  // both functions have now returned and called-back. 
}); 

      

Now you can see when both tasks are done, Node does not do this by default, so it can be quite useful.

+7


source


flow.parallel

gives reuse logic to determine when all concurrent operations have completed. Yes, if you just did db.query('one');db.query('two');db.query('three');

, they will all run in parallel by the nature of asynchronization, but you will have to write boilerplate code to keep track of when it was done and if someone encountered an error. This is the part that flow.parallel

(or its equivalent in any flow control library).



+2


source







All Articles