SpatialHadoop: no scaling with multiple computing nodes

34 Views Asked by At

I am using SpatialHadoop to store and index a dataset with 87 million points. I then apply various range queries.

I tested on 3 different cluster configurations: 1 , 2 and 4 nodes. Unfortunately, I don't see a runtime decrease with growing node number.

Any ideas why there is no horizontal-scaling effect?

1

There are 1 best solutions below

0
aseldawy On

How big is your file in megabytes? While it has 87 million points, it can still be small enough that Hadoop decides to create one or two splits only out of it.

If this is the case, you can try reducing the block size in your HDFS configuration so that the file will be split into several blocks.

Another possibility is that you might be running virtual nodes on the same machine which means that you do not get a real distributed environment.