Hi I am new to Hadoop and NoSQL technologies. I started learning with world-count program by reading file stored in HDFS and and processing it. Now I want to use Hadoop with MongoDB. Started program from here .
Now here is confusion with me that it stores mongodb data on my local file system, and read data from local file system to HDFS in map/reduce and again write it to mongodb local file system. When I studied HBase, we can configure it to store it's data on HDFS, and hadoop can directly process it on HDFS(map/reduce). How to configure mongodb to store it's data on HDFS.
I think it is better approach to store data in HDFS for fast processing. Not in the local file system. Am I right? Please clear my concept if I am going in wrong direction.
MongoDB isn't built to work on top of HDFS and it's not really necessary since Mongo already has its own approach for scaling horizontally and working with data stored across multiple machines.
A better approach if you need to work with MongoDB and Hadoop is to use MongoDB as the source of your data but process everything in Hadoop (which will use HDFS for any temporary storage). Once your done processing the data you can write it back to MongoDB, S3, or wherever you want.
I wrote a blog post that goes into a little more details about how you can work with Mongo and Hadoop here: http://blog.mortardata.com/post/43080668046/mongodb-hadoop-why-how