apache-spark deployment: stand alone VS multiple VM's

117 Views Asked by At

I have one machine on which to deploy Spark, Hadoop, and Tachyon. Are spark operations from hdfs/tachyon going to be faster on one node with all cores/RAM or a number of VM nodes evenly dividing the resources? Ram is < 200GB.

Performance and Scalability of Broadcast in Spark is quite old, but suggests that the increase network traffic could be a strong negative in the all vs VM's problem.

1

There are 1 best solutions below

1
On

Its probably better to have multiple instances of the workers, while their is an increase in network overhead the JVM performance with a really large heap isn't great.