Curl connection in H2O 3.11.4.8 using Apache Hadoop 2.7.3

309 Views Asked by At

I have installed HDP 2.6 in computer cluster with only 2 node. Each node has

  • Processor 2 Core
  • RAM 8 GB
  • Harddisk 40 GB

enter image description here

I also installed Apache Hadoop 2.7.3, too. Because of that, i can run H2O 3.11.4.8 using YARN. But, the error has occurred when i am trying to build Deep Learning Model using 500 MB dataset with R. This is the error

Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = urlSuffix,  : 
  Unexpected CURL error: Failed connect to 172.16.0.14:54321; Connection refused
Calls: h2o.deeplearning ... tryCatchOne -> doTryCatch -> .h2o.doSafeGET -> .h2o.doSafeREST
In addition: Warning message:
In .verify_dataxy(training_frame, x, y, autoencoder) :
  removing response variable from the explanatory variables
Execution halted
Error in .h2o.__checkConnectionHealth() : 
  H2O connection has been severed. Cannot connect to instance at http://172.16.0.14:54321/
Failed connect to 172.16.0.14:54321; Connection refused
Calls: <Anonymous> -> .h2o.__remoteSend -> .h2o.__checkConnectionHealth

Before using R, i also using Python, too. But, again i get an error similar like that. The error says that i get a problem with requests package because this package can not make a new connection with H2O. This is an error when i'm using Python API.

H2OConnectionError: Unexpected HTTP error: HTTPConnectionPool(host='localhost', port=54321): Max retries     exceeded with url: /3/Jobs/$0301ac10000e32d4ffffffff$_91d77c50e0aff3019565b9b6dddc4c69 (Caused by     NewConnectionError('<urllib3.connection.HTTPConnection object at 0xc21fa10>: Failed to establish a new     connection: [Errno 111] Connection refused',))
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/h2o/utils/debugging.py", line 95, in _except_hook
_handle_soft_error(exc_type, exc_value, exc_tb)
File "/usr/lib/python2.7/site-packages/h2o/utils/debugging.py", line 225, in _handle_soft_error
args_str = _get_args_str(func, highlight=highlight)
File "/usr/lib/python2.7/site-packages/h2o/utils/debugging.py", line 316, in _get_args_str
s = str(inspect.signature(func))[1:-1]
AttributeError: 'module' object has no attribute 'signature'
Original exception was:
Traceback (most recent call last):
File "hadoop-sed.py", line 18, in <module>
y="I_TORNADO_LOGICAL", training_frame=training, validation_frame=validation)
File "/usr/lib/python2.7/site-packages/h2o/estimators/estimator_base.py", line 204, in train
model.poll()
File "/usr/lib/python2.7/site-packages/h2o/job.py", line 54, in poll
pb.execute(self._refresh_job_status)
File "/usr/lib/python2.7/site-packages/h2o/utils/progressbar.py", line 160, in execute
res = progress_fn()  # may raise StopIteration
File "/usr/lib/python2.7/site-packages/h2o/job.py", line 89, in _refresh_job_status
jobs = h2o.api("GET /3/Jobs/%s" % self.job_key)
File "/usr/lib/python2.7/site-packages/h2o/h2o.py", line 99, in api
return h2oconn.request(endpoint, data=data, json=json, filename=filename, save_to=save_to)
File "/usr/lib/python2.7/site-packages/h2o/backend/connection.py", line 410, in request
raise H2OConnectionError("Unexpected HTTP error: %s" % e)
h2o.exceptions.H2OConnectionError: Unexpected HTTP error: HTTPConnectionPool(host='localhost', port=54321):
Max retries exceeded with url: /3/Jobs/$0301ac10000e32d4ffffffff$_91d77c50e0aff3019565b9b6dddc4c69 (Caused by     NewConnectionError('<urllib3.connection.HTTPConnection object at 0xc21fa10>: Failed to establish a new     connection: [Errno 111] Connection refused',))

From this error, i am trying to figure out why this is happen. I get a several important information about it.

  1. H2O Documentation in Hadoop Section (I'm sorry not to give a link, my reputation is low), H2O should run with 6 GB of RAM. With screenshot that i provided before. The RAM is not a problem.

  2. Community H2O Question "H2O Memory Requirements", it says that RAM size should be 4x from dataset size. Because my dataset is 500MB, it should be passed.

From this information, i have a conclusion that my cluster is good enough to process dataset without a problem. So, the problem should not from a hardware.

And i get a better clue from a question similar to my question.

  1. Community H2O Question "Error in .h2o.doSafeREST: Could not resolve host: localhost". It says in answer point number 2. This is happen because "H2O is still serving previous request(s) and this request could not go through".

I think API in R and Python using Curl and requests to connect with H2O Rest API. Because the request is too many, H2O Server is no capable to handle it and gives me this error.

I also trying to slow down the request, but i dont know how to do it. Are you have a better solution to this problem.

Thanks a lot

P.S. I also get this problem in Sparkling Water 1.6.11 and 2.1.8 using YARN. Both suddenly stopped working when try to build Deep Learning Model with same dataset.

Container in yarn application -list is killed without my interference. I dont know why, but i think this is a same problem.

0

There are 0 best solutions below