Which the benefits of Sparking Water over H20 Machine learning Library

189 Views Asked by At

I've understood that Sparkling Water is H20 executed on a Spark environment and so it can use the Spark Engine (and all Spark distributed structures) to distribute computing, but in term of performances which are the benefits since H2O is already a distributed and scalable library for machine learning?

And more, the standalone version of H2O is really capable of managing a distributed processing over a cluster of computers?

1

There are 1 best solutions below

0
On BEST ANSWER

The main benefit of using Sparkling Water over regular H2O is that it fits nicely into an existing Spark pipeline. If you are not already using Spark, then it's best just to use the regular H2O library. H2O is already distributed, so adding Spark to the equation does not provide any additional value in terms of distributed computing.

H2O has a lot of the same components that Spark does, such as distributed data frames and shared, in-memory computation. So yes, H2O is capable of managing distributed processing over a multi-core or multi-node cluster of computers. That's exactly what it was designed to do.