I have just installed Google Cloud platform for a free trial. In order to run MapReduce tasks with DataStore, the docs says to run
./bdutil --upload_files "samples/*" run_command ./test-mr-datastore.sh
But I couldn't get this file on my local and there's a good reason for that, this way to run MapReduce jobs seem to be deprecated see this on github. Is that true, is there an alternative way to create MapReduce tasks from local command lines without requiring BigQuery ?
NOTE: Google team removed DataStore connector from
bdutilv1.3.0 (2015-05-27) going forward so you might need to use older version or use GCS or BigQuery as proxy to accessing your data in DataStore.I try to cover as much as I can, but
bdutilis require lots more detail which is hard to document it in this answer, but I hope this can give you enough to start:Setup Google Cloud SDK - detail
Download and extract bdutil source code that contains DataStore connector.
Create bdutil custom environment variable file. Please refer to bdutil configuration documentation about creating correct configuration file, since you need to specify project, number of servers, GCS bucket, machine type, etc...
Deploy your Hadoop instances (Full documentation) using
datastore_env.shConnect to Hadoop Master node
Now in Master node you can run your MapReduce Job which will have access to DataStore as well.
Turn down your Hadoop Cluster