I have deployed a hadoop cluster on google compute engine. I then run a machine learning algorithm (Cloudera's Oryx) on the master node of the hadoop cluster. The output of this algorithm is accessed via an HTTP REST API. Thus I need to access the output either by a web browser, or via REST commands. However, I cannot resolve the address for the output of the master node which takes the form http://CLUSTER_NAME-m.c.PROJECT_NAME.internal:8091.
I have allowed http traffic and allowed access to ports 80 and 8091 on the network. But I cannot resolve the address given. Note this http address is NOT the IP address of the master node instance.
I have followed along with examples for accessing IP addresses of compute instances. However, I cannot find examples of accessing a single node of a hadoop cluster on GCE, that follows this form http://CLUSTER_NAME-m.c.PROJECT_NAME.internal:8091. Any help would be appreciated. Thank you.
The reason you're seeing this is that the "HOSTNAME.c.PROJECT.internal" name is only resolvable from within the GCE network of that same instance itself; these domain names are not globally visible. So, if you were to SSH into your master node first, and then try to
curl http://CLUSTER_NAME-m.c.PROJECT_NAME.internal:8091then you should successfully retrieve the contents, whereas trying to access from your personal browser will just fail to resolve that hostname into any IP address.So unfortunately, the quickest way for you to retrieve those contents is indeed to use the
external IP addressof your GCE instance. If you've already opened port 8091 on the network, simply usegcutil getinstance CLUSTER_NAME-mand look for the entry specifyingexternal IP address; then plug that in as your URL:http://[external ip address]:8091.If you turned up the cluster using
bdutil, a more involved but nicer way to access your cluster is by running thebdutil socksproxycommand. This opens a dynamic-port-forwarding SSH tunnel to your master node as a SOCKS5 proxy, so that you can then configure your browser to uselocalhost:1080as your proxy server, make sure to enable remote DNS resolution, and then visit your browser using the normalhttp://CLUSTER_NAME-m.c.PROJECT_NAME.internal:8091URL.