Memory leak when repeating mongo data stream using NodeJS

I am passing data from a mongodb collection, doing some calculations on the data, and then saving it back to mongo. The process proceeds normally through the first 50k records or so, after which it ties in. The first 50k records seem to store 2-3k records per second, then closer to 2 per second.

var stream = Schema.find().stream();
stream.on('data', function (doc) {
  pauseStream(this);
  total++;
  OtherSchema.find().exec(function(err,others) {
   doc.total = others.data + doc.data;
   doc.save(function(err) {
     written++;
   });    
  });
});

function pauseStream(stream) {
  if((total > (written + 50)) && !timedout) {
    timedout = true;
    stream.pause();
    setTimeout(function() {
      timedout = false;
      pauseStream(stream);
    }, 100);
  }
  else {
    stream.resume();
  }
}

      

I am trying to control the flow of only up to 50 outstanding updates at a time, I changed that number up and down, no change where everyone hung it up. What am I doing wrong? It seems to be something like a memory leak. When I use memwatch, the 50,000 stat is as follows:

{ num_full_gc: 2368,al: 168610
  num_inc_gc: 55680,
  heap_compactions: 2368,
  usage_trend: 4177.7,
  estimated_base: 89033445,
  current_base: 121087440,
  min: 15957344,
  max: 366396904 }

      

+3


source to share


2 answers


Try setting batchSize instead of pausing the stream.



var stream = Schema.find (). batchSize (50) .stream ();

+1


source


I suspect there may be a leak in your code. You are using doc in the inner closure of the callback. Try telling the GC doc is no longer needed after you save it.



OtherSchema.find().exec(function(err,others) {
   doc.total = others.data + doc.data;
   doc.save(function(err) {
     written++;
   });
   doc = null;//tell GC to free doc
  });

      

0


source







All Articles