Mongdb hadoop integration for faster data processing

Mongodb can be integrated with hadoop to speed up data processing, but during this integration course (MongoDB-> Hadoop), data is transferred from mongodb to hadoop. The question is here,

1.Data transfer rate from mongodb to hadoop is not more expensive than actual data processing in mongodb?

2.Is data transfer (MongoDB-> Hadoop) is it a one-time job? If so, how will later mongodb updates be reflected in hadoop.

+3


source to share


1 answer


To comply with the "Single Source of Truth" principle, you should try not to copy data, and you should not keep it redundant in HDFS.

To avoid breaking Mongo-Hadoop connector, you can query Mongodb directly instead of local HDFS. It is, of course, a disadvantage that your production database is getting more load. An alternative to this is to query against your monsodb bson dumps.

To your questions:



to 1: If the Hadoop nodes are "close" to mongo nodes, there is not too much overhead. When you use the Hadoops Map shorthand, you can use more features like HIVE, PIG, ... that you cannot use on Mongos Map. And it allows you to scale the "compute power" on demaned without touching your database (all hadoop nodes will be used. In MongoDB you need to take care of the shard key).

to 2: You do it over and over. (You are expected to be using a bounded collection and you have configured Stream to handle it. But I guess you are not using them).

You should read about Lambda Architecture

in the Big Data Book http://www.manning.com/marz/ . They are great at determining why you are pairing smth. like MongoDB and Hadoop.

+1


source







All Articles