Spark on Rapids single node

180 Views Asked by At

I'm trying to run Tpcds on Rapids single node on EMR using this guide: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-rapids.html But getting results that worst than CPU. That make me think that maybe I'm not doing it right or maybe rapids is not well working on single node.

I try to measure also on databricks using this guide: https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-databricks.html And its stuck whithout executing the query.

Is it possible that on single node rapids has low performance? If so, what is the size of recommended claster?

Note: The instance type of cpu is “r5d.xlarge” (16 vcpu, 128G mem, net 10Gbps). GPU instance is “g3.4xlarge” (16vcpu, 122G mem, net 10Gbps). the times were 670 sec on rapids Vs 60 sec on x86. I used spark version 3.1.0 (EMR 6.4.0)

1

There are 1 best solutions below

0
On

In AWS, you can use G4dn, p3, p4, and g5 clusters. https://rapids.ai/cloud#aws. Your selected cluster's GPU is incompatible with RAPIDS, so it may be falling back to CPU.